[
  {
    "path": ".gitignore",
    "content": "*.s\n*.o\n*.so\n.~\n*.swp\n*.out\n*.pdf\n*~\n"
  },
  {
    "path": "1/cradle.c",
    "content": "#include \"cradle.h\"\n#include <stdio.h>\n#include <stdlib.h>\n\n\nvoid GetChar() \n{\n    Look = getchar();\n}\n\n\nvoid Error(char *s)\n{\n    printf(\"\\nError: %s.\", s);\n}\n\nvoid Abort(char *s)\n{\n    Error(s);\n    exit(1);\n}\n\n\nvoid Expected(char *s)\n{\n    sprintf(tmp, \"%s Expected\", s);\n    Abort(tmp);\n}\n\n\nvoid Match(char x)\n{\n    if(Look == x) {\n        GetChar();\n    } else {\n        sprintf(tmp, \"' %c ' \",  x);\n        Expected(tmp);\n    }\n}\n\n\nint IsAlpha(char c)\n{\n    return (UPCASE(c) >= 'A') && (UPCASE(c) <= 'Z');\n} \n\nint IsDigit(char c)\n{\n    return (c >= '0') && (c <= '9');\n}\n\n\nchar GetName()\n{\n    char c = Look;\n\n    if( !IsAlpha(Look)) {\n        sprintf(tmp, \"Name\");\n        Expected(tmp);\n    }\n\n    GetChar();\n\n    return UPCASE(c);\n}\n\n\nchar GetNum()\n{\n    char c = Look;\n\n    if( !IsDigit(Look)) {\n        sprintf(tmp, \"Integer\");\n        Expected(tmp);\n    }\n\n    GetChar();\n\n    return c;\n}\n\nvoid Emit(char *s)\n{\n    printf(\"\\t%s\", s);\n}\n\nvoid EmitLn(char *s)\n{\n    Emit(s);\n    printf(\"\\n\");\n}\n\nvoid Init()\n{\n    GetChar();\n}\n\n"
  },
  {
    "path": "1/cradle.h",
    "content": "#ifndef _CRADLE_H\n#define _CRADLE_H\n\n#define UPCASE(C) (~(1<<5) & (C))\n#define MAX_BUF 100\n\nstatic char tmp[MAX_BUF];\n\nchar Look;\n\nvoid GetChar();\n\nvoid Error(char *s);\nvoid Abort(char *s);\nvoid Expected(char *s);\nvoid Match(char x);\n\nint IsAlpha(char c);\nint IsDigit(char c);\n\nchar GetName();\nchar GetNum();\n\nvoid Emit(char *s);\nvoid EmitLn(char *s);\n\nvoid Init();\n\n#endif\n"
  },
  {
    "path": "1/tutor1.txt",
    "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                            LET'S BUILD A COMPILER!\n\n                                       By\n\n                            Jack W. Crenshaw, Ph.D.\n\n                                  24 July 1988\n\n\n                              Part I: INTRODUCTION\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1988 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\nINTRODUCTION\n\n\nThis series of articles is a tutorial on the theory  and practice\nof  developing language parsers and compilers.    Before  we  are\nfinished,  we  will  have  covered  every   aspect   of  compiler\nconstruction, designed a new programming  language,  and  built a\nworking compiler.\n\nThough I am not a computer scientist by education (my Ph.D. is in\na different  field, Physics), I have been interested in compilers\nfor many years.  I have  bought  and tried to digest the contents\nof virtually every  book  on  the  subject ever written.  I don't\nmind  telling you that it was slow going.    Compiler  texts  are\nwritten for Computer  Science  majors, and are tough sledding for\nthe rest of us.  But over the years a bit of it began to seep in.\nWhat really caused it to jell was when I began  to  branch off on\nmy own and begin to try things on my own computer.  Now I plan to\nshare with you what I have  learned.    At the end of this series\nyou will by no means be  a  computer scientist, nor will you know\nall the esoterics of  compiler  theory.    I intend to completely\nignore the more theoretical  aspects  of  the  subject.  What you\n_WILL_ know is all  the  practical aspects that one needs to know\nto build a working system.\n\nThis is a \"learn-by-doing\" series.  In the course of the series I\nwill be performing  experiments  on  a  computer.    You  will be\nexpected to follow along,  repeating  the  experiments that I do,\nand  performing  some  on your own.  I will be using Turbo Pascal\n4.0 on a PC  clone.   I will periodically insert examples written\nin TP.  These will be executable code, which you will be expected\nto copy into your own computer and run.  If you don't have a copy\nof  Turbo,  you  will be severely limited in how well you will be\nable to follow what's going on.  If you don't have a copy, I urge\nyou to get one.  After  all,  it's an excellent product, good for\nmany other uses!\n\nSome articles on compilers show you examples, or show you  (as in\nthe case of Small-C) a finished product, which you can  then copy\nand  use without a whole lot of understanding of how it works.  I\nhope to do much more  than  that.    I  hope to teach you HOW the\nthings get done,  so that you can go off on your own and not only\nreproduce what I have done, but improve on it.\n                              \nThis is admittedly an ambitious undertaking, and it won't be done\nin  one page.  I expect to do it in the course  of  a  number  of\narticles.    Each  article will cover a single aspect of compiler\ntheory,  and  will  pretty  much  stand  alone.   If  all  you're\ninterested in at a given time is one  aspect,  then  you  need to\nlook only at that one article.  Each article will be  uploaded as\nit  is complete, so you will have to wait for the last one before\nyou can consider yourself finished.  Please be patient.\n\n\n\nThe average text on  compiler  theory covers a lot of ground that\nwe won't be covering here.  The typical sequence is:\n\n o An introductory chapter describing what a compiler is.\n\n o A chapter or two on syntax equations, using Backus-Naur Form\n   (BNF).\n\n o A chapter or two on lexical scanning, with emphasis on\n   deterministic and non-deterministic finite automata.\n\n o Several chapters on parsing theory, beginning with top-down\n   recursive descent, and ending with LALR parsers.\n\n o A chapter on intermediate languages, with emphasis on P-code\n   and similar reverse polish representations.\n\n o Many chapters on alternative ways to handle subroutines and\n   parameter passing, type declarations, and such.\n\n o A chapter toward the end on code generation, usually for some\n   imaginary CPU with a simple instruction set.  Most readers\n   (and in fact, most college classes) never make it this far.\n\n o A final chapter or two on optimization. This chapter often\n   goes unread, too.\n\n\nI'll  be taking a much different approach in  this  series.    To\nbegin  with,  I  won't dwell long on options.  I'll be giving you\n_A_ way that works.  If you want  to  explore  options,  well and\ngood ...  I  encourage  you  to do so ... but I'll be sticking to\nwhat I know.   I also will skip over most of the theory that puts\npeople  to  sleep.  Don't get me  wrong:  I  don't  belittle  the\ntheory, and it's vitally important  when it comes to dealing with\nthe more tricky  parts  of  a  given  language.  But I believe in\nputting first things first.    Here we'll be dealing with the 95%\nof compiler techniques that don't need a lot of theory to handle.\n\nI  also  will  discuss only one approach  to  parsing:  top-down,\nrecursive descent parsing, which is the  _ONLY_  technique that's\nat  all   amenable  to  hand-crafting  a  compiler.    The  other\napproaches are only useful if you have a tool like YACC, and also\ndon't care how much memory space the final product uses.\n                              \nI  also take a page from the work of Ron Cain, the author of  the\noriginal Small C.  Whereas almost all other compiler authors have\nhistorically  used  an  intermediate  language  like  P-code  and\ndivided  the  compiler  into two parts (a front end that produces\nP-code,  and   a  back  end  that  processes  P-code  to  produce\nexecutable   object  code),  Ron  showed  us   that   it   is   a\nstraightforward  matter  to  make  a  compiler  directly  produce\nexecutable  object  code,  in  the  form  of  assembler  language\nstatements.  The code will _NOT_ be the world's tightest code ...\nproducing optimized code is  a  much  more  difficult job. But it\nwill work, and work reasonably well.  Just so that I  don't leave\nyou with the impression that our end product will be worthless, I\n_DO_ intend to show you how  to  \"soup up\" the compiler with some\noptimization.\n\n\n\nFinally, I'll be  using  some  tricks  that I've found to be most\nhelpful in letting  me  understand what's going on without wading\nthrough a lot of boiler plate.  Chief among these  is  the use of\nsingle-character tokens, with no embedded spaces,  for  the early\ndesign work.  I figure that  if  I  can get a parser to recognize\nand deal with I-T-L, I can  get  it  to do the same with IF-THEN-\nELSE.  And I can.  In the second \"lesson,\"   I'll  show  you just\nhow easy it  is  to  extend  a  simple parser to handle tokens of\narbitrary length.  As another  trick,  I  completely  ignore file\nI/O, figuring that  if  I  can  read source from the keyboard and\noutput object to the screen, I can also do it from/to disk files.\nExperience  has  proven  that  once  a   translator   is  working\ncorrectly, it's a  straightforward  matter to redirect the I/O to\nfiles.    The last trick is that I make no attempt  to  do  error\ncorrection/recovery.   The   programs   we'll  be  building  will\nRECOGNIZE errors, and will not CRASH, but they  will  simply stop\non the first error ... just like good ol' Turbo does.  There will\nbe  other tricks that you'll see as you go. Most of them can't be\nfound in any compiler textbook, but they work.\n\nA word about style and efficiency.    As  you will see, I tend to\nwrite programs in  _VERY_  small, easily understood pieces.  None\nof the procedures we'll  be  working with will be more than about\n15-20 lines long.  I'm a fervent devotee  of  the  KISS  (Keep It\nSimple, Sidney) school of software development.  I  try  to never\ndo something tricky or  complex,  when  something simple will do.\nInefficient?  Perhaps, but you'll like the  results.    As  Brian\nKernighan has said,  FIRST  make  it  run, THEN make it run fast.\nIf, later on,  you want to go back and tighten up the code in one\nof  our products, you'll be able to do so, since the code will be\nquite understandable. If you  do  so, however, I urge you to wait\nuntil the program is doing everything you want it to.\n\nI  also  have  a  tendency  to  delay  building  a module until I\ndiscover that I need  it.    Trying  to anticipate every possible\nfuture contingency can  drive  you  crazy,  and  you'll generally\nguess wrong anyway.    In  this  modern day of screen editors and\nfast compilers, I don't hesitate to change a module when I feel I\nneed a more powerful one.  Until then,  I'll  write  only  what I\nneed.\n\nOne final caveat: One of the principles we'll be sticking to here\nis that we don't  fool  around with P-code or imaginary CPUs, but\nthat we will start out on day one  producing  working, executable\nobject code, at least in the form of  assembler  language source.\nHowever, you may not  like  my  choice  of assembler language ...\nit's 68000 code, which is what works on my system (under SK*DOS).\nI  think  you'll  find, though, that the translation to any other\nCPU such as the 80x86 will  be  quite obvious, though, so I don't\nsee  a problem here.  In fact, I hope someone out there who knows\nthe '86 language better than I do will offer  us  the  equivalent\nobject code fragments as we need them.\n\n\nTHE CRADLE\n\nEvery program needs some boiler  plate  ...  I/O  routines, error\nmessage routines, etc.   The  programs we develop here will be no\nexceptions.    I've  tried to hold  this  stuff  to  an  absolute\nminimum, however, so that we  can  concentrate  on  the important\nstuff without losing it  among  the  trees.  The code given below\nrepresents about the minimum that we need to  get  anything done.\nIt consists of some I/O routines, an error-handling routine and a\nskeleton, null main program.   I  call  it  our  cradle.    As we\ndevelop other routines, we'll add them to the cradle, and add the\ncalls to them as we  need to.  Make a copy of the cradle and save\nit, because we'll be using it more than once.\n\nThere are many different ways to organize the scanning activities\nof  a  parser.   In Unix systems, authors tend to  use  getc  and\nungetc.  I've had very good luck with the  approach  shown  here,\nwhich is to use  a  single, global, lookahead character.  Part of\nthe initialization procedure  (the  only part, so far!) serves to\n\"prime  the  pump\"  by reading the first character from the input\nstream.  No other special  techniques are required with Turbo 4.0\n... each successive call to  GetChar will read the next character\nin the stream.\n\n\n{--------------------------------------------------------------}\nprogram Cradle;\n\n{--------------------------------------------------------------}\n{ Constant Declarations }\n\nconst TAB = ^I;\n\n{--------------------------------------------------------------}\n{ Variable Declarations }\n\nvar Look: char;              { Lookahead Character }\n                              \n{--------------------------------------------------------------}\n{ Read New Character From Input Stream }\n\nprocedure GetChar;\nbegin\n   Read(Look);\nend;\n\n{--------------------------------------------------------------}\n{ Report an Error }\n\nprocedure Error(s: string);\nbegin\n   WriteLn;\n   WriteLn(^G, 'Error: ', s, '.');\nend;\n\n\n{--------------------------------------------------------------}\n{ Report Error and Halt }\n\nprocedure Abort(s: string);\nbegin\n   Error(s);\n   Halt;\nend;\n\n\n{--------------------------------------------------------------}\n{ Report What Was Expected }\n\nprocedure Expected(s: string);\nbegin\n   Abort(s + ' Expected');\nend;\n\n{--------------------------------------------------------------}\n{ Match a Specific Input Character }\n\nprocedure Match(x: char);\nbegin\n   if Look = x then GetChar\n   else Expected('''' + x + '''');\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an Alpha Character }\n\nfunction IsAlpha(c: char): boolean;\nbegin\n   IsAlpha := upcase(c) in ['A'..'Z'];\nend;\n                              \n\n{--------------------------------------------------------------}\n\n{ Recognize a Decimal Digit }\n\nfunction IsDigit(c: char): boolean;\nbegin\n   IsDigit := c in ['0'..'9'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nfunction GetName: char;\nbegin\n   if not IsAlpha(Look) then Expected('Name');\n   GetName := UpCase(Look);\n   GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nfunction GetNum: char;\nbegin\n   if not IsDigit(Look) then Expected('Integer');\n   GetNum := Look;\n   GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Output a String with Tab }\n\nprocedure Emit(s: string);\nbegin\n   Write(TAB, s);\nend;\n\n\n\n\n{--------------------------------------------------------------}\n{ Output a String with Tab and CRLF }\n\nprocedure EmitLn(s: string);\nbegin\n   Emit(s);\n   WriteLn;\nend;\n\n{--------------------------------------------------------------}\n{ Initialize }\n\nprocedure Init;\nbegin\n   GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\nend.\n{--------------------------------------------------------------}\n\n\nThat's it for this introduction.  Copy the code above into TP and\ncompile it.  Make sure that it compiles and runs  correctly. Then\nproceed to the first lesson, which is on expression parsing.\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1988 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\n\n\n"
  },
  {
    "path": "10/Makefile",
    "content": "IN=main.c cradle.c\nOUT=main\nFLAGS=-Wall -Werror\n\nall:\n\tgcc -o $(OUT) $(IN) $(FLAGS)\n\nrun:\n\t./$(OUT)\n\n.PHONY: clean\nclean:\n\trm $(OUT)\n"
  },
  {
    "path": "10/cradle.c",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <stdbool.h>\n#include <string.h>\n\n#include \"cradle.h\"\n#include <malloc.h>\n\n#define MaxEntry 100\n#define MAX_SYMBOL_LENGTH 10\nstatic int LCount = 0;\nstatic char labelName[MAX_BUF];\nchar tmp[MAX_BUF];\n\n/*char ST[TABLE_SIZE];*/\nstatic int NEntry = 0;\nconst char *ST[MaxEntry];\nchar SType[MaxEntry];\n\n\n/* Keywords symbol table */\nconst char const *KWList[] = {\n    \"IF\",\n    \"ELSE\",\n    \"ENDIF\",\n    \"WHILE\",\n    \"ENDWHILE\",\n    \"VAR\",\n    \"BEGIN\",\n    \"END\",\n    \"PROGRAM\"\n};\nconst char KWCode[] = \"xilewevbep\";\nconst int KWNum = sizeof(KWList)/sizeof(*KWList);\n\nchar Token;             /* current token */\nchar Value[MAX_BUF];    /* string token of Look */\n\n/* Helper Functions */\nchar uppercase(char c)\n{\n    if (IsAlpha(c)) {\n        return (c & 0xDF);\n    } else {\n        return c;\n    }\n}\n\n/* Table Lookup\n * If the input string matches a table entry, return the entry index, else\n * return -1.\n * *n* is the size of the table */\nint Lookup(const char const *table[], const char *string, int n)\n{\n    int i;\n    bool found = false;\n\n    for (i = 0; i < n; ++i) {\n        if (strcmp(table[i], string) == 0) {\n            found = true;\n            break;\n        }\n    }\n    return found ? i : -1;\n}\n\n/* Add a new entry to symbol table */\nvoid AddEntry(char *symbol, char type)\n{\n    if (InTable(symbol)) {\n        sprintf(tmp, \"Duplicate Identifier %s\", symbol);\n        Abort(tmp);\n    }\n    if (NEntry == MaxEntry) {\n        Abort(\"Symbol Table Full\");\n    }\n\n    char *new_entry = (char *)malloc((strlen(symbol)+1)*sizeof(*new_entry));\n    if (new_entry == NULL) {\n        Abort(\"AddEntry: not enough memory allocating new_entry.\");\n    }\n    strcpy(new_entry, symbol);\n    ST[NEntry] = new_entry;\n    SType[NEntry] = type;\n\n    NEntry++;\n}\n\n/* Get an Identifier and Scan it for keywords */\nvoid Scan()\n{\n    GetName();\n    int index = Lookup(KWList, Value, KWNum);\n    Token = KWCode[index+1];\n}\n\nvoid MatchString(char *str)\n{\n    if (strcmp(Value, str) != 0) {\n        sprintf(tmp, \"\\\"%s\\\"\", Value);\n        Expected(tmp);\n    }\n}\n\nvoid GetChar()\n{\n    Look = getchar();\n    /* printf(\"Getchar: %c\\n\", Look); */\n}\n\n\nvoid Error(char *s)\n{\n    printf(\"\\nError: %s.\", s);\n}\n\nvoid Abort(char *s)\n{\n    Error(s);\n    exit(1);\n}\n\n\nvoid Expected(char *s)\n{\n    sprintf(tmp, \"%s Expected\", s);\n    Abort(tmp);\n}\n\n\nvoid Match(char x)\n{\n    NewLine();\n    if(Look == x) {\n        GetChar();\n    } else {\n        sprintf(tmp, \"' %c ' \",  x);\n        Expected(tmp);\n    }\n    SkipWhite();\n}\n\nint IsAlpha(char c)\n{\n    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');\n}\n\nint IsDigit(char c)\n{\n    return (c >= '0') && (c <= '9');\n}\n\nint IsAddop(char c)\n{\n    return (c == '+') || (c == '-');\n}\n\nint IsMulop(char c)\n{\n    return (c == '*') || (c == '/');\n}\n\nint IsOrOp(char c)\n{\n    return strchr(\"|~\", c) != NULL;\n}\n\nint IsRelop(char c)\n{\n    return strchr(\"=#<>\", c) != NULL;\n}\n\nint IsWhite(char c)\n{\n    return strchr(\" \\t\\r\\n\", c) != NULL;\n}\n\nint IsAlNum(char c)\n{\n    return IsAlpha(c) || IsDigit(c);\n}\n\nvoid GetName()\n{\n    NewLine();\n    if( !IsAlpha(Look)) {\n        Expected(\"Name\");\n    }\n\n    char *p = Value;\n    while(IsAlNum(Look)) {\n        *p++ = uppercase(Look);\n        GetChar();\n    }\n    *p = '\\0';\n    SkipWhite();\n}\n\n\nint GetNum()\n{\n    NewLine();\n    int value = 0;\n    if( !IsDigit(Look)) {\n        sprintf(tmp, \"Integer\");\n        Expected(tmp);\n    }\n\n    while (IsDigit(Look)) {\n        value = value * 10 + Look - '0';\n        GetChar();\n    }\n\n    SkipWhite();\n\n    return value;\n}\n\nvoid Emit(char *s)\n{\n    printf(\"\\t%s\", s);\n}\n\nvoid EmitLn(char *s)\n{\n    Emit(s);\n    printf(\"\\n\");\n}\n\nvoid Init()\n{\n    LCount = 0;\n\n    InitTable();\n    GetChar();\n    Scan();\n    SkipWhite();\n}\n\nvoid InitTable()\n{\n    int i;\n    for (i = 0; i < MaxEntry; i++) {\n        ST[i] = NULL;\n        SType[i] = ' ';\n    }\n\n}\n\nbool InTable(char *name)\n{\n    return Lookup(ST, name, NEntry) != -1;\n}\n\nchar *NewLabel()\n{\n    sprintf(labelName, \"L%02d\", LCount);\n    LCount ++;\n    return labelName;\n}\n\nvoid PostLabel(char *label)\n{\n    printf(\"%s:\\n\", label);\n}\n\nvoid SkipWhite()\n{\n    while (IsWhite(Look)) {\n        GetChar();\n    }\n}\n\n/* Skip over an End-of-Line */\nvoid NewLine()\n{\n    while(Look == '\\n') {\n        GetChar();\n        if (Look == '\\r') {\n            GetChar();\n        }\n        SkipWhite();\n    }\n}\n\n/* re-targetable routines */\nvoid Clear()\n{\n    EmitLn(\"xor %eax, %eax\");\n}\n\nvoid Negate()\n{\n    EmitLn(\"neg %eax\");\n}\n\nvoid LoadConst(int n)\n{\n    sprintf(tmp, \"movl $%d, %%eax\", n);\n    EmitLn(tmp);\n}\n\n/* Load a variable to primary register */\nvoid LoadVar(char *name)\n{\n    if (!InTable(name)) {\n        char name_string[MAX_BUF];\n        Undefined(name_string);\n    }\n    sprintf(tmp, \"movl %s, %%eax\", name);\n    EmitLn(tmp);\n}\n\n\n/* Push Primary onto stack */\nvoid Push()\n{\n    EmitLn(\"pushl %eax\");\n}\n\n/* Add Top of Stack to primary */\nvoid PopAdd()\n{\n    EmitLn(\"addl (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* Subtract Primary from Top of Stack */\nvoid PopSub()\n{\n    EmitLn(\"subl (%esp), %eax\");\n    EmitLn(\"neg %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* multiply top of stack by primary */\nvoid PopMul()\n{\n    EmitLn(\"imull (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* divide top of stack by primary */\nvoid PopDiv()\n{\n    /* for a expersion like a/b we have eax=b and %(esp)=a\n     * but we need eax=a, and b on the stack\n     */\n    EmitLn(\"movl (%esp), %edx\");\n    EmitLn(\"addl $4, %esp\");\n    EmitLn(\"pushl %eax\");\n    EmitLn(\"movl %edx, %eax\");\n\n    /* sign extesnion */\n    EmitLn(\"sarl $31, %edx\");\n    EmitLn(\"idivl (%esp)\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* store primary to variable */\nvoid Store(char *name)\n{\n    if (!InTable(name)) {\n        char name_string[MAX_BUF];\n        Undefined(name_string);\n    }\n    sprintf(tmp, \"movl %%eax, %s\", name);\n    EmitLn(tmp);\n}\n\nvoid Undefined(char *name)\n{\n    sprintf(tmp, \"Undefined Identifier: %s\", name);\n    Abort(tmp);\n}\n\n/* Complement the primary register */\nvoid NotIt()\n{\n    EmitLn(\"not %eax\");\n}\n\n/* AND top of Stack with primary */\nvoid PopAnd()\n{\n    EmitLn(\"and (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* OR top of Stack with primary */\nvoid PopOr()\n{\n    EmitLn(\"or (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* XOR top of Stack with primary */\nvoid PopXor()\n{\n    EmitLn(\"xor (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* Compare top of Stack with primary */\nvoid PopCompare()\n{\n    EmitLn(\"addl $4, %esp\");\n    EmitLn(\"cmp -4(%esp), %eax\");\n}\n\n/* set %eax if Compare was = */\nvoid SetEqual()\n{\n    EmitLn(\"sete %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* set %eax if Compare was != */\nvoid SetNEqual()\n{\n    EmitLn(\"setne %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* set %eax if Compare was > */\nvoid SetGreater()\n{\n    EmitLn(\"setl %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* set %eax if Compare was >= */\nvoid SetGreaterOrEqual()\n{\n    EmitLn(\"setle %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* set %eax if Compare was < */\nvoid SetLess()\n{\n    EmitLn(\"setg %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* set %eax if Compare was <= */\nvoid SetLessOrEqual()\n{\n    EmitLn(\"setge %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* Branch unconditional */\nvoid Branch(char *label)\n{\n    sprintf(tmp, \"jmp %s\", label);\n    EmitLn(tmp);\n}\n\n/* Branch False */\nvoid BranchFalse(char *label)\n{\n    EmitLn(\"test $1, %eax\");\n    sprintf(tmp, \"jz %s\", label);\n    EmitLn(tmp);\n}\n"
  },
  {
    "path": "10/cradle.h",
    "content": "#ifndef _CRADLE_H\n#define _CRADLE_H\n#include <stdbool.h>\n\n#define MAX_BUF 100\n#define MaxEntry 100\nextern char tmp[MAX_BUF];\nextern const char *ST[];\nextern char SType[];\nextern char Token;\nextern char Value[MAX_BUF];\nchar Look;\n\nvoid GetChar();\n\nvoid Error(char *s);\nvoid Abort(char *s);\nvoid Expected(char *s);\nvoid Match(char x);\nvoid MatchString(char *str);\n\nint IsAlpha(char c);\nint IsDigit(char c);\nint IsAddop(char c);\nint IsMulop(char c);\nint IsOrOp(char c);\nint IsRelop(char c);\nint IsWhite(char c);\nint IsAlNum(char c);\n\nvoid GetName();\nint GetNum();\n\nvoid Emit(char *s);\nvoid EmitLn(char *s);\n\nvoid Init();\nvoid InitTable();\nbool InTable(char *name);\nvoid AddEntry(char *symbol, char type);\n\nchar *NewLabel();\nvoid PostLabel(char *label);\nvoid SkipWhite();\nvoid NewLine();\nvoid Scan();\n\n/* re-targetable routines */\nvoid Clear();\nvoid Negate();\nvoid LoadConst(int n);\nvoid LoadVar(char *name);\nvoid Push();\nvoid PopAdd();\nvoid PopSub();\nvoid PopMul();\nvoid PopDiv();\nvoid Store(char *name);\nvoid Undefined(char *name);\nvoid NotIt();\nvoid PopAnd();\nvoid PopOr();\nvoid PopXor();\nvoid PopCompare();\nvoid SetEqual();\nvoid SetNEqual();\nvoid SetGreater();\nvoid SetGreaterOrEqual();\nvoid SetLess();\nvoid SetLessOrEqual();\nvoid Branch(char *label);\nvoid BranchFalse(char *label);\n\n#endif\n"
  },
  {
    "path": "10/main.c",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n#include <stdbool.h>\n\n#include \"cradle.h\"\n\n#ifdef DEBUG\n#define dprint(fmt, ...) printf(fmt, __VA_ARGS__);\n#else\n#define dprint(fmt, ...)\n#endif\n\n\nvoid Prog();\nvoid Prolog();\nvoid Epilog();\nvoid Header();\nvoid Main();\nvoid Decl();\nvoid TopDecls();\nvoid Alloc(char *);\nvoid Block();\nvoid Assignment();\n\nvoid Factor();\nvoid NegFactor();\nvoid Expression();\nvoid Subtract();\nvoid FirstTerm();\nvoid Term();\nvoid Term1();\nvoid Divide();\nvoid Multiply();\nvoid FirstFactor();\nvoid Add();\nvoid Equals();\nvoid NotEquals();\nvoid Less();\nvoid LessOrEqual();\nvoid Greater();\nvoid Relation();\nvoid NotFactor();\nvoid BoolTerm();\nvoid BoolOr();\nvoid BoolXor();\nvoid BoolExpression();\nvoid DoIf();\nvoid DoWhile();\n\nvoid Prog()\n{\n    MatchString(\"PROGRAM\");\n    Header();\n    TopDecls();\n    Main();\n    Match('.');\n}\n\nvoid Header()\n{\n    EmitLn(\".global _start\");\n}\n\nvoid Prolog()\n{\n    EmitLn(\".section .text\");\n    EmitLn(\"_start:\");\n}\n\nvoid Epilog()\n{\n    EmitLn(\"movl %eax, %ebx\");\n    EmitLn(\"movl $1, %eax\");\n    EmitLn(\"int $0x80\");\n}\n\nvoid Main()\n{\n    MatchString(\"BEGIN\");\n    Prolog();\n    Block();\n    MatchString(\"END\");\n    Epilog();\n}\n\nvoid TopDecls()\n{\n    NewLine();\n    Scan();\n    while(Token != 'b') {\n        switch(Token) {\n            case 'v':\n                Decl();\n                break;\n            default:\n                sprintf(tmp, \"Unrecognized Keyword '%c'\", Look);\n                Abort(tmp);\n                break;\n        }\n        Scan();\n        NewLine();\n    }\n}\n\nvoid Decl()\n{\n    NewLine();\n    EmitLn(\".section .data\"); /* in case that the variable and function\n                                 declarations are mixed */\n    GetName();\n    Alloc(Value);\n    while(Look == ',') {\n        Match(',');\n        GetName();\n        Alloc(Value);\n        NewLine();\n    }\n}\n\nvoid Alloc(char *name)\n{\n    if (InTable(name)) {\n        sprintf(tmp, \"Duplicate Variable Name: %s\", name);\n        Abort(tmp);\n    }\n    AddEntry(name, 'v');\n    sprintf(tmp, \"%s: .int \", name);\n    Emit(tmp);\n    if (Look == '=') {\n        Match('=');\n        if (Look == '-') {\n            Emit(\"-\");\n            Match('-');\n        } else {\n            Emit(\"\");\n        }\n        printf(\"%d\\n\", GetNum());\n    } else {\n        EmitLn(\"0\");\n    }\n}\n\n/* Parse and Translate a Block of Statements \n * <block> ::= ( <statement> )*\n * <statement> ::= <if> | <while> | <assignment>\n * */\nvoid Block()\n{\n    Scan();\n    NewLine();\n    while(strchr(\"el\", Token) == NULL) {\n        switch (Token) {\n            case 'i':\n                DoIf();\n                break;\n            case 'w':\n                DoWhile();\n                break;\n            default:\n                Assignment();\n                break;\n        }\n        Scan();\n        NewLine();\n    }\n}\n\nvoid Assignment()\n{\n    char name[MAX_BUF];\n    sprintf(name, Value);\n    Match('=');\n    BoolExpression();\n    Store(name);\n}\n\nvoid Factor()\n{\n    if (Look == '(') {\n        Match('(');\n        BoolExpression();\n        Match(')');\n    } else if (IsAlpha(Look)) {\n        GetName();\n        LoadVar(Value);\n    } else {\n        LoadConst(GetNum());\n    }\n}\n\nvoid NegFactor()\n{\n    Match('-');\n    if (IsDigit(Look)) {\n        LoadConst(-GetNum());\n    } else {\n        Factor();\n        Negate();\n    }\n}\n\n/* Parse and Translate a Leading Factor */\nvoid FirstFactor()\n{\n    switch (Look) {\n        case '+':\n            Match('+');\n            Factor();\n            break;\n        case '-':\n            NegFactor();\n            break;\n        default:\n            Factor();\n    }\n}\n\nvoid Multiply()\n{\n    Match('*');\n    Factor();\n    PopMul();\n}\n\nvoid Divide()\n{\n    Match('/');\n    Factor();\n    PopDiv();\n}\n\nvoid Term1()\n{\n    NewLine();\n    while(IsMulop(Look)) {\n        Push();\n        switch(Look) {\n            case '*':\n                Multiply();\n                break;\n            case '/':\n                Divide();\n                break;\n            default:\n                break;\n        }\n        NewLine();\n    }\n}\n\nvoid Term()\n{\n    Factor();\n    Term1();\n}\n\nvoid FirstTerm()\n{\n    FirstFactor();\n    Term1();\n}\n\nvoid Add()\n{\n    Match('+');\n    Term();\n    PopAdd();\n}\n\nvoid Subtract()\n{\n    Match('-');\n    Term();\n    PopSub();\n}\n\nvoid Expression()\n{\n    NewLine();\n    FirstTerm();\n    while(IsAddop(Look)) {\n        Push();\n        switch(Look) {\n            case '+':\n                Add();\n                break;\n            case '-':\n                Subtract();\n                break;\n            default:\n                break;\n        }\n        NewLine();\n    }\n}\n\n/* Recognize and Translate a Relational \"Equals\" */\nvoid Equals()\n{\n    Match('=');\n    Expression();\n    PopCompare();\n    SetEqual();\n}\n\n/* Recognize and Translate a Relational \"Not Equals\" */\nvoid NotEquals()\n{\n    Match('>');\n    Expression();\n    PopCompare();\n    SetNEqual();\n}\n\n/* Recognize and Translate a Relational \"Less Than\" */\nvoid Less()\n{\n    Match('<');\n    switch(Look) {\n        case '=':\n            LessOrEqual();\n            break;\n        case '>':\n            NotEquals();\n            break;\n        default:\n            Expression();\n            PopCompare();\n            SetLess();\n            break;\n    }\n}\n\n/* Recognize and Translate a Relational \"Less or Equal\" */\nvoid LessOrEqual()\n{\n    Match('=');\n    Expression();\n    PopCompare();\n    SetLessOrEqual();\n}\n\n/* Recognize and Translate a Relational \"Greater Than\" */\nvoid Greater()\n{\n    Match('>');\n    if (Look == '=') {\n        Match('=');\n        Expression();\n        PopCompare();\n        SetGreaterOrEqual();\n    } else {\n        Expression();\n        PopCompare();\n        SetGreater();\n    }\n}\n\n/* Parse and Translate a Relation */\nvoid Relation()\n{\n    Expression();\n    if (IsRelop(Look)) {\n        Push();\n        switch (Look) {\n            case '=':\n                Equals();\n                break;\n            case '#':\n                NotEquals();\n                break;\n            case '<':\n                Less();\n                break;\n            case '>':\n                Greater();\n                break;\n            default:\n                break;\n        }\n    }\n}\n\n/* Parse and Translate a Boolean Factor with Leading NOT */\nvoid NotFactor()\n{\n    if (Look == '!') {\n        Match('!');\n        Relation();\n        NotIt();\n    } else {\n        Relation();\n    }\n}\n\n/* Parse and Translate a Boolean Term \n * <bool_term> ::= <not_factor> ( and_op <not_factor )*\n * */\nvoid BoolTerm()\n{\n    NewLine();\n    NotFactor();\n    while(Look == '&') {\n        Push();\n        Match('&');\n        NotFactor();\n        PopAnd();\n        NewLine();\n    }\n}\n\n/* Recognize and Translate a Boolean OR */\nvoid BoolOr()\n{\n    Match('|');\n    BoolTerm();\n    PopOr();\n}\n\n/* Recognize and Translate a Boolean XOR */\nvoid BoolXor()\n{\n    Match('~');\n    BoolTerm();\n    PopXor();\n}\n\n/* Parse and Translate a Boolean Expression \n * <bool_expression> ::= <bool_term> ( or_op <bool_term> )* */\nvoid BoolExpression()\n{\n    NewLine();\n    BoolTerm();\n    while(IsOrOp(Look)) {\n        Push();\n        switch(Look) {\n            case '|':\n                BoolOr();\n                break;\n            case '~':\n                BoolXor();\n                break;\n            default:\n                break;\n        }\n        NewLine();\n    }\n}\n\n/* Recognize and Translate an IF construct */\nvoid DoIf()\n{\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    sprintf(L1, NewLabel());\n    sprintf(L2, L1);\n    BoolExpression();\n    BranchFalse(L1);\n    Block();\n    if (Token == 'l') {\n        sprintf(L2, NewLabel());\n        Branch(L2);\n        PostLabel(L1);\n        Block();\n    }\n    PostLabel(L2);\n    MatchString(\"ENDIF\");\n}\n\nvoid DoWhile()\n{\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    sprintf(L1, NewLabel());\n    sprintf(L2, NewLabel());\n    PostLabel(L1);\n    BoolExpression();\n    BranchFalse(L2);\n    Block();\n    MatchString(\"ENDWHILE\");\n    Branch(L1);\n    PostLabel(L2);\n}\n\n\nint main()\n{\n    Init();\n    Prog();\n    if (Look != '\\n') {\n        Abort(\"Unexpected data after '.'\");\n    }\n    return 0;\n}\n"
  },
  {
    "path": "10/prog.txt",
    "content": "PROGRAM\nVAR xx,\nyy=1,\nzz=10\nBEGIN\n  WHILE yy <= zz\n    IF yy <> 5 \n      xx=xx+yy\n    ELSE\n      xx=xx+5\n    ENDIF\n  yy=yy+1\n  ENDWHILE\nEND.\n\n"
  },
  {
    "path": "10/tutor10.txt",
    "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                     LET'S BUILD A COMPILER!\n\n                                By\n\n                     Jack W. Crenshaw, Ph.D.\n\n                           21 May 1989\n\n\n                    Part X: INTRODUCING \"TINY\"\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\nINTRODUCTION\n\nIn the last installment, I showed you the general  idea  for  the\ntop-down development of  a  compiler.    I gave you the first few\nsteps  of  the process for compilers for  Pascal  and  C,  but  I\nstopped  far  short  of  pushing  it through to completion.   The\nreason was simple: if we're going to produce  a  real, functional\ncompiler  for  any  language, I'd rather  do  it  for  KISS,  the\nlanguage that I've been defining in this tutorial series.\n\nIn this installment, we're going to do just that, for a subset of\nKISS which I've chosen to call TINY.\n\nThe process  will be essentially that outlined in Installment IX,\nexcept  for  one  notable  difference.   In that  installment,  I\nsuggested  that  you  begin  with  a full BNF description of  the\nlanguage.  That's fine for something like Pascal or C,  for which\nthe language definition is  firm.   In the case of TINY, however,\nwe don't yet have a full  description  ... we seem to be defining\nthe language as we go.  That's OK.    In  fact,  it's preferable,\nsince we can tailor the language  slightly  as we go, to keep the\nparsing easy.\n\nSo in the development  that  follows,  we'll  actually be doing a\ntop-down development of BOTH the  language and its compiler.  The\nBNF description will grow along with the compiler.\n\nIn this process, there will be a number of decisions to  be made,\neach of which will influence the BNF and therefore the  nature of\nthe language.   At  each  decision  point I'll try to remember to\nexplain  the  decision  and the rationale behind my choice.  That\nway, if you happen to hold a different opinion and would prefer a\ndifferent option, you can choose it instead.  You  now  have  the\nbackground  to  do  that.  I guess the important thing to note is\nthat  nothing  we  do  here  is  cast  in  concrete.  When YOU'RE\ndesigning YOUR language, you should feel free to do it YOUR way.\n\nMany of you may be asking at this point: Why bother starting over\nfrom  scratch?  We had a working subset of KISS as the outcome of\nInstallment  VII  (lexical  scanning).  Why not just extend it as\nneeded?  The  answer  is  threefold.    First of all, I have been\nmaking  a  number  of changes to further simplify the program ...\nchanges  like  encapsulating  the  code generation procedures, so\nthat  we  can  convert to a different target machine more easily.\nSecond, I want you to see how the development can indeed  be done\nfrom the top down as outlined in the last installment.   Finally,\nwe both need the practice.  Each time I go through this exercise,\nI get a little better at it, and you will, also.\n\n\nGETTING STARTED\n\nMany  years  ago  there were languages called  Tiny  BASIC,  Tiny\nPascal, and Tiny C, each of which was a subset of its parent full\nlanguage.  Tiny BASIC,  for  example,  had  only single-character\nvariable names and global variables.   It supported only a single\ndata type.  Sound familiar?  At this point we have almost all the\ntools we need to build a compiler like that.\n\nYet a language called Tiny-anything  still  carries  some baggage\ninherited from its parent language.   I've often wondered if this\nis a  good  idea.    Granted,  a  language based upon some parent\nlanguage will have the  advantage  of  familiarity, but there may\nalso  be  some  peculiar syntax carried over from the parent that\nmay tend  to add unnecessary complexity to the compiler. (Nowhere\nis this more true than in Small C.)\n\nI've wondered just how small and simple a compiler could  be made\nand  still  be  useful, if it were designed from the outset to be\nboth easy to use and to  parse.    Let's find out.  This language\nwill just be called \"TINY,\" period.  It's a subset of KISS, which\nI  also  haven't  fully  defined,  so  that  at  least  makes  us\nconsistent (!).  I suppose you could call it TINY KISS.  But that\nopens  up a whole can of worms involving  cuter  and  cuter  (and\nperhaps more risque) names, so let's just stick with TINY.\n\nThe main limitations  of  TINY  will  be because of the things we\nhaven't yet covered, such as data types.  Like its cousins Tiny C\nand Tiny BASIC,  TINY  will  have  only one data type, the 16-bit\ninteger.    The  first  version  we  develop  will also  have  no\nprocedure  calls  and  will  use single-character variable names,\nalthough as you will see we can remove these restrictions without\nmuch effort.\n\nThe language I have in mind will share some of the  good features\nof  Pascal,  C,  and Ada.  Taking a lesson from the comparison of\nthe Pascal and  C  compilers in the previous installment, though,\nTINY will have a decided Pascal flavor.  Wherever  feasible,    a\nlanguage structure will  be  bracketed by keywords or symbols, so\nthat  the parser will know where it's  going  without  having  to\nguess.\n\nOne other ground rule:  As we go, I'd like  to  keep the compiler\nproducing real, executable code.  Even though it may not  DO much\nat the beginning, it will at least do it correctly.\n\nFinally,  I'll  use  a couple of Pascal  restrictions  that  make\nsense:  All data and procedures must be declared before  they are\nused.  That makes good sense,  even  though for now the only data\ntype we'll use  is a word.  This rule in turn means that the only\nreasonable place to put the  executable code for the main program\nis at the end of the listing.\n\nThe top-level definition will be similar to Pascal:\n\n\n     <program> ::= PROGRAM <top-level decl> <main> '.'\n\n\nAlready, we've reached a decision point.  My first thought was to\nmake the main block optional.   It  doesn't seem to make sense to\nwrite a \"program\" with no main program, but it does make sense if\nwe're  allowing  for  multiple modules, linked together.    As  a\nmatter of fact,  I intend to allow for this in KISS.  But then we\nbegin  to open up a can of worms that I'd rather leave closed for\nnow.  For example, the  term \"PROGRAM\" really becomes a misnomer.\nThe MODULE of Modula-2 or the Unit of Turbo Pascal would  be more\nappropriate.  Second,  what  about  scope  rules?    We'd  need a\nconvention for  dealing  with  name  visibility  across  modules.\nBetter  for  now  to  just  keep  it  simple  and ignore the idea\naltogether.\n\nThere's also a decision in choosing to require  the  main program\nto  be  last.    I  toyed  with  the idea of making its  position\noptional,  as  in  C.  The nature of SK*DOS, the OS I'm compiling\nfor, make this very easy to do.   But  this  doesn't  really make\nmuch sense in view of the Pascal-like requirement  that  all data\nand procedures  be declared before they're referenced.  Since the\nmain  program can only call procedures  that  have  already  been\ndeclared, the only position that makes sense is at the end,  a la\nPascal.\n\nGiven  the  BNF  above, let's write a parser that just recognizes\nthe brackets:\n\n\n{--------------------------------------------------------------}\n{  Parse and Translate a Program }\n\nprocedure Prog;\nbegin\n   Match('p');\n   Header;\n   Prolog;\n   Match('.');\n   Epilog;\nend;\n{--------------------------------------------------------------}\n\n\nThe procedure Header just emits  the startup code required by the\nassembler:\n                              \n\n{--------------------------------------------------------------}\n{ Write Header Info }\n\nprocedure Header;\nbegin\n   WriteLn('WARMST', TAB, 'EQU $A01E');\nend;\n{--------------------------------------------------------------}\n\n\nThe procedures Prolog and  Epilog  emit  the code for identifying\nthe main program, and for returning to the OS:\n\n\n{--------------------------------------------------------------}\n{ Write the Prolog }\n\nprocedure Prolog;\nbegin\n   PostLabel('MAIN');\nend;\n\n\n{--------------------------------------------------------------}\n{ Write the Epilog }\n\nprocedure Epilog;\nbegin\n   EmitLn('DC WARMST');\n   EmitLn('END MAIN');\nend;\n{--------------------------------------------------------------}\n\n\nThe  main program just calls Prog, and then  looks  for  a  clean\nending:\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\n   Prog;\n   if Look <> CR then Abort('Unexpected data after ''.''');\nend.\n{--------------------------------------------------------------}\n\n\nAt this point, TINY  will  accept  only  one input \"program,\" the\nnull program:\n\n\n     PROGRAM .   (or 'p.' in our shorthand.)\n\nNote, though, that the  compiler  DOES  generate correct code for\nthis program.  It will run, and do  what  you'd  expect  the null\nprogram to do, that is, nothing but return gracefully to the OS.\n\nAs a matter of interest, one of my  favorite  compiler benchmarks\nis to compile, link,  and  execute  the  null program in whatever\nlanguage   is   involved.     You  can  learn  a  lot  about  the\nimplementation by measuring  the  overhead  in  time  required to\ncompile what should be a trivial case.  It's also  interesting to\nmeasure the amount of code produced.  In many compilers, the code\ncan be fairly large, because they always include  the  whole run-\ntime  library whether they need it or not.    Early  versions  of\nTurbo Pascal produced a 12K object file for  this  case.    VAX C\ngenerates 50K!\n\nThe  smallest  null  programs  I've  seen are those  produced  by\nModula-2 compilers, and they run about 200-800 bytes.\n\nIn the case of TINY, we HAVE no run-time library  as  yet, so the\nobject code is indeed tiny:  two  bytes.    That's  got  to  be a\nrecord, and it's  likely  to  remain  one since it is the minimum\nsize required by the OS.\n\nThe  next step is to process the code for the main program.  I'll\nuse the Pascal BEGIN-block:\n\n\n     <main> ::= BEGIN <block> END\n\n\nHere,  again,  we  have made a decision.  We could have chosen to\nrequire a \"PROCEDURE MAIN\" sort of declaration, similar to C.   I\nmust  admit  that  this  is  not  a bad idea at all ...  I  don't\nparticularly  like  the  Pascal  approach  since I tend  to  have\ntrouble locating the main  program  in a Pascal listing.  But the\nalternative is a little awkward, too, since you have to deal with\nthe  error condition where the user omits  the  main  program  or\nmisspells its name.  Here I'm taking the easy way out.\n\nAnother solution to the \"where is the main program\" problem might\nbe to require a name for  the  program, and then bracket the main\nby\n\n\n     BEGIN <name>\n     END <name>\n\n\nsimilar to the convention of  Modula  2.    This  adds  a  bit of\n\"syntactic sugar\" to the language.  Things like this are  easy to\nadd or change to your liking, if the language is your own design.\n\nTo parse this definition of a main block,  change  procedure Prog\nto read:\n\n{--------------------------------------------------------------}\n{  Parse and Translate a Program }\n\nprocedure Prog;\nbegin\n   Match('p');\n   Header;\n   Main;\n   Match('.');\nend;\n{--------------------------------------------------------------}\n\n\nand add the new procedure:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Main Program }\n\nprocedure Main;\nbegin\n   Match('b');\n   Prolog;\n   Match('e');\n   Epilog;\nend;\n{--------------------------------------------------------------}\n\n\nNow, the only legal program is:\n\n\n     PROGRAM BEGIN END . (or 'pbe.')\n\n\nAren't we making progress???  Well, as usual it gets better.  You\nmight try some deliberate errors here, like omitting  the  'b' or\nthe 'e', and see what happens.  As always,  the  compiler  should\nflag all illegal inputs.\n\n\nDECLARATIONS\n\nThe obvious next step is to decide what we mean by a declaration.\nMy  intent  here  is to have two kinds of declarations: variables\nand  procedures/functions.    At  the  top  level,   only  global\ndeclarations are allowed, just as in C.\n\nFor now, there  can  only be variable declarations, identified by\nthe keyword VAR (abbreviated 'v'):\n\n\n     <top-level decls> ::= ( <data declaration> )*\n\n     <data declaration> ::= VAR <var-list>\n\n\nNote that since there is only one variable type, there is no need\nto  declare the type.  Later on, for full KISS, we can easily add\na type description.\n\nThe procedure Prog becomes:\n\n\n{--------------------------------------------------------------}\n{  Parse and Translate a Program }\n\nprocedure Prog;\nbegin\n   Match('p');\n   Header;\n   TopDecls;\n   Main;\n   Match('.');\nend;\n{--------------------------------------------------------------}\n\n\nNow, add the two new procedures:\n\n\n{--------------------------------------------------------------}\n{ Process a Data Declaration }\n\nprocedure Decl;\nbegin\n   Match('v');\n   GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate Global Declarations }\n\nprocedure TopDecls;\nbegin\n   while Look <> 'b' do\n      case Look of\n        'v': Decl;\n      else Abort('Unrecognized Keyword ''' + Look + '''');\n      end;\nend;\n{--------------------------------------------------------------}\n\n\nNote that at this point, Decl  is  just  a stub.  It generates no\ncode, and it doesn't process a list ... every variable must occur\nin a separate VAR statement.\n\nOK,  now  we  can have any  number  of  data  declarations,  each\nstarting with a 'v' for VAR,  before  the BEGIN-block.  Try a few\ncases and see what happens.\n\n\nDECLARATIONS AND SYMBOLS\n\nThat looks pretty good, but  we're still only generating the null\nprogram  for  output.    A  real compiler would  issue  assembler\ndirectives to allocate storage for  the  variables.    It's about\ntime we actually produced some code.\n\nWith  a  little  extra  code,  that's  an  easy  thing to do from\nprocedure Decl.  Modify it as follows:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Data Declaration }\n\nprocedure Decl;\nvar Name: char;\nbegin\n   Match('v');\n   Alloc(GetName);\nend;\n{--------------------------------------------------------------}\n\n\nThe procedure Alloc just  issues  a  command  to the assembler to\nallocate storage:\n\n\n{--------------------------------------------------------------}\n{ Allocate Storage for a Variable }\n\nprocedure Alloc(N: char);\nbegin\n   WriteLn(N, ':', TAB, 'DC 0');\nend;\n{--------------------------------------------------------------}\n\n\nGive  this  one  a  whirl.    Try  an  input  that declares  some\nvariables, such as:\n\n     pvxvyvzbe.\n\nSee how the storage is allocated?    Simple, huh?  Note also that\nthe entry point, \"MAIN,\" comes out in the right place.\n\nFor the record, a \"real\" compiler would also have a  symbol table\nto record the variables being used.  Normally,  the  symbol table\nis necessary to record the type  of  each variable.  But since in\nthis case  all  variables  have  the  same  type, we don't need a\nsymbol  table  for  that reason.  As it turns out, we're going to\nfind a symbol  necessary  even without different types, but let's\npostpone that need until it arises.\n\nOf course, we haven't really parsed the correct syntax for a data\ndeclaration, since it involves a variable list.  Our version only\npermits a single variable.  That's easy to fix, too.\n\nThe BNF for <var-list> is\n\n\n     <var-list> ::= <ident> (, <ident>)*\n\n\nAdding this syntax to Decl gives this new version:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Data Declaration }\n\nprocedure Decl;\nvar Name: char;\nbegin\n   Match('v');\n   Alloc(GetName);\n   while Look = ',' do begin\n      GetChar;\n      Alloc(GetName);\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nOK, now compile this code and give it  a  try.    Try a number of\nlines of VAR declarations, try a list of several variables on one\nline, and try combinations of the two.  Does it work?\n\n\nINITIALIZERS\n\nAs long as we're dealing with data declarations, one thing that's\nalways  bothered  me  about  Pascal  is  that  it  doesn't  allow\ninitializing  data items in the declaration.    That  feature  is\nadmittedly sort of a frill, and it  may  be  out  of  place  in a\nlanguage that purports to  be  a minimal language.  But it's also\nSO easy to add that it seems a shame not  to  do  so.    The  BNF\nbecomes:\n\n\n     <var-list> ::= <var> ( <var> )*\n\n     <var> ::= <ident> [ = <integer> ]\n\nChange Alloc as follows:\n\n\n{--------------------------------------------------------------}\n{ Allocate Storage for a Variable }\n\nprocedure Alloc(N: char);\nbegin\n   Write(N, ':', TAB, 'DC ');\n   if Look = '=' then begin\n      Match('=');\n      WriteLn(GetNum);\n      end\n   else\n      WriteLn('0');\nend;\n{--------------------------------------------------------------}\n\n\nThere you are: an initializer with six added lines of Pascal.\n\nOK, try this  version  of  TINY  and verify that you can, indeed,\ngive the variables initial values.\n\nBy golly, this thing is starting to look  real!    Of  course, it\nstill doesn't DO anything, but it looks good, doesn't it?\n\nBefore leaving this section, I should point out  that  we've used\ntwo versions of function GetNum.  One, the earlier one, returns a\ncharacter value, a single digit.  The other accepts a multi-digit\ninteger and returns an integer value.  Either one will work here,\nsince WriteLn will handle either type.  But there's no  reason to\nlimit ourselves  to  single-digit  values  here,  so  the correct\nversion to use is the one that returns an integer.  Here it is:\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nfunction GetNum: integer;\nvar Val: integer;\nbegin\n   Val := 0;\n   if not IsDigit(Look) then Expected('Integer');\n   while IsDigit(Look) do begin\n      Val := 10 * Val + Ord(Look) - Ord('0');\n      GetChar;\n   end;\n   GetNum := Val;\nend;\n{--------------------------------------------------------------}\n\nAs a matter  of  fact,  strictly  speaking  we  should  allow for\nexpressions in the data field of the initializer, or at  the very\nleast  for  negative  values.  For  now,  let's  just  allow  for\nnegative values by changing the code for Alloc as follows:\n\n\n{--------------------------------------------------------------}\n{ Allocate Storage for a Variable }\n\nprocedure Alloc(N: char);\nbegin\n   if InTable(N) then Abort('Duplicate Variable Name ' + N);\n   ST[N] := 'v';\n   Write(N, ':', TAB, 'DC ');\n   if Look = '=' then begin\n      Match('=');\n      If Look = '-' then begin\n         Write(Look);\n         Match('-');\n      end;\n      WriteLn(GetNum);\n      end\n   else\n      WriteLn('0');\nend;\n{--------------------------------------------------------------}\n\n\nNow  you should be able to  initialize  variables  with  negative\nand/or multi-digit values.\n\n\nTHE SYMBOL TABLE\n\nThere's one problem  with  the  compiler  as it stands so far: it\ndoesn't do anything to record a variable when we declare it.   So\nthe compiler is perfectly content to allocate storage for several\nvariables with the same name.  You can easily verify this with an\ninput like\n\n\n     pvavavabe.\n\n\nHere we've declared the variable A three times.  As you  can see,\nthe compiler will  cheerfully  accept  that,  and  generate three\nidentical labels.  Not good.\n\nLater on,  when we start referencing variables, the compiler will\nalso let us reference variables  that don't exist.  The assembler\nwill  catch  both  of these error conditions, but it doesn't seem\nfriendly at all to pass such errors along to the assembler.   The\ncompiler should catch such things at the source language level.\n\nSo even though we don't need a symbol table to record data types,\nwe ought to install  one  just to check for these two conditions.\nSince at this  point  we are still restricted to single-character\nvariable names, the symbol table can be trivial.  To  provide for\nit, first add the following  declaration at the beginning of your\nprogram:\n\n\n     var ST: array['A'..'Z'] of char;\n\n\nand insert the following function:\n\n\n{--------------------------------------------------------------}\n{ Look for Symbol in Table }\n\nfunction InTable(n: char): Boolean;\nbegin\n   InTable := ST[n] <> ' ';\nend;\n{--------------------------------------------------------------}\n\n\nWe  also  need  to initialize the  table  to  all  blanks.    The\nfollowing lines in Init will do the job:\n\n\nvar i: char;\nbegin\n   for i := 'A' to 'Z' do\n      ST[i] := ' ';\n   ...\n\n\nFinally,  insert  the  following two lines at  the  beginning  of\nAlloc:\n\n\n   if InTable(N) then Abort('Duplicate Variable Name ' + N);\n   ST[N] := 'v';\n\n\nThat  should  do  it.  The  compiler  will  now  catch  duplicate\ndeclarations.  Later, we can  also  use  InTable  when generating\nreferences to the variables.\n\n\nEXECUTABLE STATEMENTS\n\nAt this point, we can generate a null program that has  some data\nvariables  declared  and  possibly initialized.  But  so  far  we\nhaven't arranged to generate the first line of executable code.\n\nBelieve  it or not, though, we almost  have  a  usable  language!\nWhat's missing is the executable code that must go into  the main\nprogram.  But that code is just assignment statements and control\nstatements ... all stuff we have done before.   So  it  shouldn't\ntake us long to provide for them, as well.\n\nThe BNF definition given earlier  for the main program included a\nstatement block, which we have so far ignored:\n\n\n     <main> ::= BEGIN <block> END\n\n\nFor now,  we  can  just  consider  a  block  to  be  a  series of\nassignment statements:\n\n\n     <block> ::= (Assignment)*\n\n\nLet's start things off by adding  a  parser for the block.  We'll\nbegin with a stub for the assignment statement:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Assignment Statement }\n\nprocedure Assignment;\nbegin\n   GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Block of Statements }\n\nprocedure Block;\nbegin\n   while Look <> 'e' do\n      Assignment;\nend;\n{--------------------------------------------------------------}\n\n\nModify procedure Main to call Block as shown below:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Main Program }\n\nprocedure Main;\nbegin\n   Match('b');\n   Prolog;\n   Block;\n   Match('e');\n   Epilog;\nend;\n{--------------------------------------------------------------}\n\n\nThis version still won't generate any code for  the   \"assignment\nstatements\" ... all it does is to eat characters  until  it  sees\nthe 'e' for 'END.'  But it sets the stage for what is to follow.\n\nThe  next  step,  of  course,  is  to  flesh out the code for  an\nassignment statement.  This  is  something  we've done many times\nbefore,  so  I  won't belabor it.  This time, though, I'd like to\ndeal with the code generation a little differently.  Up till now,\nwe've always just inserted the Emits that generate output code in\nline with  the parsing routines.  A little unstructured, perhaps,\nbut it seemed the most straightforward approach, and made it easy\nto see what kind of code would be emitted for each construct.\n\nHowever, I realize that most of you are using an  80x86 computer,\nso  the 68000 code generated is of little use to you.  Several of\nyou have asked me if the CPU-dependent code couldn't be collected\ninto one spot  where  it  would  be easier to retarget to another\nCPU.  The answer, of course, is yes.\n\nTo  accomplish  this,  insert  the  following  \"code  generation\"\nroutines:\n\n\n{---------------------------------------------------------------}\n{ Clear the Primary Register }\n\nprocedure Clear;\nbegin\n   EmitLn('CLR D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Negate the Primary Register }\n\nprocedure Negate;\nbegin\n   EmitLn('NEG D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Load a Constant Value to Primary Register }\n\nprocedure LoadConst(n: integer);\nbegin\n   Emit('MOVE #');\n   WriteLn(n, ',D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Load a Variable to Primary Register }\n\nprocedure LoadVar(Name: char);\nbegin\n   if not InTable(Name) then Undefined(Name);\n   EmitLn('MOVE ' + Name + '(PC),D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Push Primary onto Stack }\n\nprocedure Push;\nbegin\n   EmitLn('MOVE D0,-(SP)');\nend;\n\n\n{---------------------------------------------------------------}\n{ Add Top of Stack to Primary }\n\nprocedure PopAdd;\nbegin\n   EmitLn('ADD (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Subtract Primary from Top of Stack }\n\nprocedure PopSub;\nbegin\n   EmitLn('SUB (SP)+,D0');\n   EmitLn('NEG D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Multiply Top of Stack by Primary }\n\nprocedure PopMul;\nbegin\n   EmitLn('MULS (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Divide Top of Stack by Primary }\n\nprocedure PopDiv;\nbegin\n   EmitLn('MOVE (SP)+,D7');\n   EmitLn('EXT.L D7');\n   EmitLn('DIVS D0,D7');\n   EmitLn('MOVE D7,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Store Primary to Variable }\n\nprocedure Store(Name: char);\nbegin\n   if not InTable(Name) then Undefined(Name);\n   EmitLn('LEA ' + Name + '(PC),A0');\n   EmitLn('MOVE D0,(A0)')\nend;\n{---------------------------------------------------------------}\n\n\nThe  nice  part  of  this  approach,  of  course,  is that we can\nretarget  the compiler to a new CPU  simply  by  rewriting  these\n\"code generator\" procedures.  In  addition,  we  will  find later\nthat we can improve the code quality by tweaking these routines a\nbit, without having to modify the compiler proper.\n\nNote that both LoadVar  and  Store check the symbol table to make\nsure that the variable is defined.  The  error  handler Undefined\nsimply calls Abort:\n\n\n{--------------------------------------------------------------}\n{ Report an Undefined Identifier }\n\nprocedure Undefined(n: string);\nbegin\n   Abort('Undefined Identifier ' + n);\nend;\n{--------------------------------------------------------------}\n\n\nOK, we are now finally ready to begin processing executable code.\nWe'll  do  that  by  replacing  the  stub  version  of  procedure\nAssignment.\n\nWe've been down this  road  many times before, so this should all\nbe familiar to you.    In fact, except for the changes associated\nwith the code generation, we  could just copy the procedures from\nPart  VII.    Since we are making some changes, I won't just copy\nthem, but we will go a little faster than usual.\n\nThe BNF for the assignment statement is:\n\n     <assignment> ::= <ident> = <expression>\n\n     <expression> ::= <first term> ( <addop> <term> )*\n\n     <first term> ::= <first factor> <rest>\n\n     <term> ::= <factor> <rest>\n\n     <rest> ::= ( <mulop> <factor> )*\n\n     <first factor> ::= [ <addop> ] <factor>\n\n     <factor> ::= <var> | <number> | ( <expression> )\n\n\nThis version of the BNF is  also  a bit different than we've used\nbefore ... yet another \"variation on the theme of an expression.\"\nThis particular version  has  what  I  consider  to  be  the best\ntreatment  of  the  unary minus.  As you'll see later, it lets us\nhandle   negative  constant  values  efficiently.    It's   worth\nmentioning  here  that  we  have  often  seen  the advantages  of\n\"tweaking\"  the  BNF  as we go, to help make the language easy to\nparse.    What  you're looking at here is a bit different:  we've\ntweaked  the  BNF  to make the CODE  GENERATION  more  efficient!\nThat's a first for this series.\n\nAnyhow, the following code implements the BNF:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Factor }\n\nprocedure Expression; Forward;\n\nprocedure Factor;\nbegin\n   if Look = '(' then begin\n      Match('(');\n      Expression;\n      Match(')');\n      end\n   else if IsAlpha(Look) then\n      LoadVar(GetName)\n   else\n      LoadConst(GetNum);\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Negative Factor }\n\nprocedure NegFactor;\nbegin\n   Match('-');\n   if IsDigit(Look) then\n      LoadConst(-GetNum)\n   else begin\n      Factor;\n      Negate;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Leading Factor }\n\nprocedure FirstFactor;\nbegin\n   case Look of\n     '+': begin\n             Match('+');\n             Factor;\n          end;\n     '-': NegFactor;\n   else  Factor;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Multiply }\n\nprocedure Multiply;\nbegin\n   Match('*');\n   Factor;\n   PopMul;\nend;\n\n\n{-------------------------------------------------------------}\n{ Recognize and Translate a Divide }\n\nprocedure Divide;\nbegin\n   Match('/');\n   Factor;\n   PopDiv;\nend;\n\n\n{---------------------------------------------------------------}\n{ Common Code Used by Term and FirstTerm }\n\nprocedure Term1;\nbegin\n   while IsMulop(Look) do begin\n      Push;\n      case Look of\n       '*': Multiply;\n       '/': Divide;\n      end;\n   end;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Term }\n\nprocedure Term;\nbegin\n   Factor;\n   Term1;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Leading Term }\n\nprocedure FirstTerm;\nbegin\n   FirstFactor;\n   Term1;\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an Add }\n\nprocedure Add;\nbegin\n   Match('+');\n   Term;\n   PopAdd;\nend;\n\n\n{-------------------------------------------------------------}\n{ Recognize and Translate a Subtract }\n\nprocedure Subtract;\nbegin\n   Match('-');\n   Term;\n   PopSub;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nprocedure Expression;\nbegin\n   FirstTerm;\n   while IsAddop(Look) do begin\n      Push;\n      case Look of\n       '+': Add;\n       '-': Subtract;\n      end;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Assignment Statement }\n\nprocedure Assignment;\nvar Name: char;\nbegin\n   Name := GetName;\n   Match('=');\n   Expression;\n   Store(Name);\nend;\n{--------------------------------------------------------------}\n\n\nOK, if you've  got  all  this  code inserted, then compile it and\ncheck  it out.  You should  be  seeing  reasonable-looking  code,\nrepresenting a complete program that will  assemble  and execute.\nWe have a compiler!\n\n\nBOOLEANS\n\nThe next step should also  be  familiar  to  you.    We  must add\nBoolean  expressions  and relational operations.    Again,  since\nwe've already dealt with them more than once,  I  won't elaborate\nmuch on them, except  where  they  are  different from what we've\ndone before.  Again, we won't just copy from other  files because\nI've changed a few things just a bit.  Most  of  the changes just\ninvolve encapsulating the machine-dependent parts as  we  did for\nthe   arithmetic  operations.    I've  also  modified   procedure\nNotFactor  somewhat,  to  parallel  the structure of FirstFactor.\nFinally,  I  corrected  an  error  in  the  object code  for  the\nrelational operators:  The Scc instruction I used  only  sets the\nlow 8 bits of D0.  We want all 16 bits set for a logical true, so\nI've added an instruction to sign-extend the low byte.\n\nTo begin, we're going to need some more recognizers:\n\n\n{--------------------------------------------------------------}\n{ Recognize a Boolean Orop }\n\nfunction IsOrop(c: char): boolean;\nbegin\n   IsOrop := c in ['|', '~'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Relop }\n\nfunction IsRelop(c: char): boolean;\nbegin\n   IsRelop := c in ['=', '#', '<', '>'];\nend;\n{--------------------------------------------------------------}\n\n\nAlso, we're going to need some more code generation routines:\n\n\n{---------------------------------------------------------------}\n{ Complement the Primary Register }\n\nprocedure NotIt;\nbegin\n   EmitLn('NOT D0');\nend;\n{---------------------------------------------------------------}\n.\n.\n.\n{---------------------------------------------------------------}\n{ AND Top of Stack with Primary }\n\nprocedure PopAnd;\nbegin\n   EmitLn('AND (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ OR Top of Stack with Primary }\n\nprocedure PopOr;\nbegin\n   EmitLn('OR (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ XOR Top of Stack with Primary }\n\nprocedure PopXor;\nbegin\n   EmitLn('EOR (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Compare Top of Stack with Primary }\n\nprocedure PopCompare;\nbegin\n   EmitLn('CMP (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was = }\n\nprocedure SetEqual;\nbegin\n   EmitLn('SEQ D0');\n   EmitLn('EXT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was != }\n\nprocedure SetNEqual;\nbegin\n   EmitLn('SNE D0');\n   EmitLn('EXT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was > }\n\nprocedure SetGreater;\nbegin\n   EmitLn('SLT D0');\n   EmitLn('EXT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was < }\n\nprocedure SetLess;\nbegin\n   EmitLn('SGT D0');\n   EmitLn('EXT D0');\nend;\n{---------------------------------------------------------------}\n\nAll of this  gives us the tools we need.  The BNF for the Boolean\nexpressions is:\n\n\n     <bool-expr> ::= <bool-term> ( <orop> <bool-term> )*\n\n     <bool-term> ::= <not-factor> ( <andop> <not-factor> )*\n\n     <not-factor> ::= [ '!' ] <relation>\n\n     <relation> ::= <expression> [ <relop> <expression> ]\n\n\nSharp-eyed readers might  note  that this syntax does not include\nthe non-terminal  \"bool-factor\" used in earlier versions.  It was\nneeded then because I also allowed for the Boolean constants TRUE\nand FALSE.   But  remember  that  in TINY there is no distinction\nmade between Boolean and arithmetic  types ... they can be freely\nintermixed.   So there is really no  need  for  these  predefined\nvalues ... we can just use -1 and 0, respectively.\n\nIn C terminology, we could always use the defines:\n\n\n     #define TRUE -1\n     #define FALSE 0\n\n\n(That is, if TINY had a  preprocessor.)   Later on, when we allow\nfor  declarations  of  constants,  these  two   values   will  be\npredefined by the language.\n\nThe reason that I'm harping on this is that  I've  already  tried\nthe alternative, which is to  include TRUE and FALSE as keywords.\nThe problem with that approach is that it  then  requires lexical\nscanning for EVERY variable name  in every expression.  If you'll\nrecall,  I pointed out in Installment VII  that  this  slows  the\ncompiler  down considerably.  As long as  keywords  can't  be  in\nexpressions, we need to do the scanning only at the  beginning of\nevery  new  statement  ...  quite  an improvement.  So using  the\nsyntax above not only simplifies the parsing, but  speeds  up the\nscanning as well.\n\nOK, given that we're  all  satisfied  with  the syntax above, the\ncorresponding code is shown below:\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Equals\" }\n\nprocedure Equals;\nbegin\n   Match('=');\n   Expression;\n   PopCompare;\n   SetEqual;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Not Equals\" }\n\nprocedure NotEquals;\nbegin\n   Match('#');\n   Expression;\n   PopCompare;\n   SetNEqual;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Less Than\" }\n\nprocedure Less;\nbegin\n   Match('<');\n   Expression;\n   PopCompare;\n   SetLess;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Greater Than\" }\n\nprocedure Greater;\nbegin\n   Match('>');\n   Expression;\n   PopCompare;\n   SetGreater;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Relation }\n\n\nprocedure Relation;\nbegin\n   Expression;\n   if IsRelop(Look) then begin\n      Push;\n      case Look of\n       '=': Equals;\n       '#': NotEquals;\n       '<': Less;\n       '>': Greater;\n      end;\n   end;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Boolean Factor with Leading NOT }\n\nprocedure NotFactor;\nbegin\n   if Look = '!' then begin\n      Match('!');\n      Relation;\n      NotIt;\n      end\n   else\n      Relation;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Boolean Term }\n\nprocedure BoolTerm;\nbegin\n   NotFactor;\n   while Look = '&' do begin\n      Push;\n      Match('&');\n      NotFactor;\n      PopAnd;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Boolean OR }\n\nprocedure BoolOr;\nbegin\n   Match('|');\n   BoolTerm;\n   PopOr;\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an Exclusive Or }\n\nprocedure BoolXor;\nbegin\n   Match('~');\n   BoolTerm;\n   PopXor;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Boolean Expression }\n\nprocedure BoolExpression;\nbegin\n   BoolTerm;\n   while IsOrOp(Look) do begin\n      Push;\n      case Look of\n       '|': BoolOr;\n       '~': BoolXor;\n      end;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nTo tie it all together, don't forget to change the  references to\nExpression in  procedures Factor and Assignment so that they call\nBoolExpression instead.\n\nOK, if  you've  got  all  that typed in, compile it and give it a\nwhirl.    First,  make  sure  you  can  still parse  an  ordinary\narithmetic expression.  Then, try a Boolean one.    Finally, make\nsure  that you can assign the results of  relations.    Try,  for\nexample:\n\n     pvx,y,zbx=z>ye.\n\nwhich stands for:\n\n     PROGRAM\n     VAR X,Y,Z\n     BEGIN\n     X = Z > Y\n     END.\n\n\nSee how this assigns a Boolean value to X?\n\nCONTROL STRUCTURES\n\nWe're almost home.   With  Boolean  expressions  in place, it's a\nsimple  matter  to  add control structures.  For TINY, we'll only\nallow two kinds of them, the IF and the WHILE:\n\n\n     <if> ::= IF <bool-expression> <block> [ ELSE <block>] ENDIF\n\n     <while> ::= WHILE <bool-expression> <block> ENDWHILE\n\nOnce  again,  let  me  spell  out the decisions implicit in  this\nsyntax, which departs strongly from that of C or Pascal.  In both\nof those languages, the \"body\" of an IF or WHILE is regarded as a\nsingle  statement.  If you intend to use a block of more than one\nstatement, you have to build a compound statement using BEGIN-END\n(in Pascal) or  '{}' (in C).  In TINY (and KISS) there is no such\nthing as a compound statement  ... single or multiple they're all\njust blocks to these languages.\n\nIn KISS, all the control structures will have explicit and unique\nkeywords  bracketing  the  statement block, so there  can  be  no\nconfusion as to where things begin  and  end.  This is the modern\napproach, used in such respected languages as Ada  and  Modula 2,\nand it completely eliminates the problem of the \"dangling else.\"\n\nNote  that I could have chosen to use the same keyword END to end\nall  the constructs, as is done in Pascal.  (The closing '}' in C\nserves the same purpose.)  But this has always led  to confusion,\nwhich is why Pascal programmers tend to write things like\n\n\n     end { loop }\n\nor   end { if }\n\n\nAs I explained in  Part  V,  using  unique terminal keywords does\nincrease  the  size  of the keyword list and therefore slows down\nthe  scanning, but in this case it seems a small price to pay for\nthe added insurance.   Better  to find the errors at compile time\nrather than run time.\n\nOne last thought:  The two constructs above each  have  the  non-\nterminals\n\n\n      <bool-expression> and <block>\n\n\njuxtaposed with no separating keyword.  In Pascal we would expect\nthe keywords THEN and DO in these locations.\n\nI have no problem with leaving out these keywords, and the parser\nhas no trouble either, ON CONDITION that we make no errors in the\nbool-expression part.  On  the  other hand, if we were to include\nthese extra keywords we would get yet one more level of insurance\nat very little  cost,  and  I  have no problem with that, either.\nUse your best judgment as to which way to go.\n\nOK, with that bit of explanation let's proceed.  As  usual, we're\ngoing to need some new  code generation routines.  These generate\nthe code for conditional and unconditional branches:\n\n{---------------------------------------------------------------}\n{ Branch Unconditional  }\n\nprocedure Branch(L: string);\nbegin\n   EmitLn('BRA ' + L);\nend;\n\n\n{---------------------------------------------------------------}\n{ Branch False }\n\nprocedure BranchFalse(L: string);\nbegin\n   EmitLn('TST D0');\n   EmitLn('BEQ ' + L);\nend;\n{--------------------------------------------------------------}\n\n\nExcept for the encapsulation of  the code generation, the code to\nparse the control constructs is the same as you've seen before:\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate an IF Construct }\n\nprocedure Block; Forward;\n\n\nprocedure DoIf;\nvar L1, L2: string;\nbegin\n   Match('i');\n   BoolExpression;\n   L1 := NewLabel;\n   L2 := L1;\n   BranchFalse(L1);\n   Block;\n   if Look = 'l' then begin\n      Match('l');\n      L2 := NewLabel;\n      Branch(L2);\n      PostLabel(L1);\n      Block;\n   end;\n   PostLabel(L2);\n   Match('e');\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a WHILE Statement }\n\nprocedure DoWhile;\nvar L1, L2: string;\nbegin\n   Match('w');\n   L1 := NewLabel;\n   L2 := NewLabel;\n   PostLabel(L1);\n   BoolExpression;\n   BranchFalse(L2);\n   Block;\n   Match('e');\n   Branch(L1);\n   PostLabel(L2);\nend;\n{--------------------------------------------------------------}\n\n\nTo tie everything  together,  we need only modify procedure Block\nto recognize the \"keywords\" for the  IF  and WHILE.  As usual, we\nexpand the definition of a block like so:\n\n\n     <block> ::= ( <statement> )*\n\n\nwhere\n\n\n     <statement> ::= <if> | <while> | <assignment>\n\n\nThe corresponding code is:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Block of Statements }\n\nprocedure Block;\nbegin\n   while not(Look in ['e', 'l']) do begin\n      case Look of\n       'i': DoIf;\n       'w': DoWhile;\n      else Assignment;\n      end;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nOK,  add the routines I've given, compile and  test  them.    You\nshould be able to parse the single-character versions  of  any of\nthe control constructs.  It's looking pretty good!\n\nAs a matter  of  fact, except for the single-character limitation\nwe've got a virtually complete version of TINY.  I call  it, with\ntongue planted firmly in cheek, TINY Version 0.1.\n\n\nLEXICAL SCANNING\n\nOf course, you know what's next:  We have to convert  the program\nso that  it can deal with multi-character keywords, newlines, and\nwhitespace.   We have just gone through all  that  in  Part  VII.\nWe'll use the distributed scanner  technique that I showed you in\nthat  installment.    The  actual  implementation  is   a  little\ndifferent because the way I'm handling newlines is different.\n\nTo begin with, let's simply  allow for whitespace.  This involves\nonly adding calls to SkipWhite at the end of the  three routines,\nGetName, GetNum, and Match.    A call to SkipWhite in Init primes\nthe pump in case there are leading spaces.\n\nNext, we need to deal with  newlines.   This is really a two-step\nprocess,  since  the  treatment  of  the  newlines  with  single-\ncharacter tokens is different from that for multi-character ones.\nWe can eliminate some work by doing both  steps  at  once,  but I\nfeel safer taking things one step at a time.\n\nInsert the new procedure:\n\n\n{--------------------------------------------------------------}\n{ Skip Over an End-of-Line }\n\nprocedure NewLine;\nbegin\n   while Look = CR do begin\n      GetChar;\n      if Look = LF then GetChar;\n      SkipWhite;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nNote that  we  have  seen  this  procedure  before in the form of\nProcedure Fin.  I've changed the name since this  new  one  seems\nmore descriptive of the actual function.  I've  also  changed the\ncode  to  allow  for multiple newlines and lines with nothing but\nwhite space.\n\nThe next step is to insert calls to NewLine wherever we  decide a\nnewline is permissible.  As I've pointed out before, this  can be\nvery different in different languages.   In TINY, I've decided to\nallow them virtually anywhere.  This means that we need  calls to\nNewLine at the BEGINNING (not the end, as with SkipWhite)  of the\nprocedures GetName, GetNum, and Match.\n\nFor procedures that have while loops, such as TopDecl, we  need a\ncall  to NewLine at the beginning of the  procedure  AND  at  the\nbottom  of  each  loop.  That way, we can be assured that NewLine\nhas just been called at the beginning of each  pass  through  the\nloop.\n\nIf you've got all this done, try the program out and  verify that\nit will indeed handle white space and newlines.\n\nIf it does, then we're  ready to deal with multi-character tokens\nand keywords.   To begin, add the additional declarations (copied\nalmost verbatim from Part VII):\n\n\n{--------------------------------------------------------------}\n{ Type Declarations }\n\ntype Symbol = string[8];\n\n     SymTab = array[1..1000] of Symbol;\n\n     TabPtr = ^SymTab;\n\n\n{--------------------------------------------------------------}\n{ Variable Declarations }\n\nvar Look : char;             { Lookahead Character }\n    Token: char;             { Encoded Token       }\n    Value: string[16];       { Unencoded Token     }\n\n    ST: Array['A'..'Z'] of char;\n\n{--------------------------------------------------------------}\n{ Definition of Keywords and Token Types }\n\nconst NKW =   9;\n      NKW1 = 10;\n\nconst KWlist: array[1..NKW] of Symbol =\n              ('IF', 'ELSE', 'ENDIF', 'WHILE', 'ENDWHILE',\n               'VAR', 'BEGIN', 'END', 'PROGRAM');\n\nconst KWcode: string[NKW1] = 'xilewevbep';\n{--------------------------------------------------------------}\n\n\nNext, add the three procedures, also from Part VII:\n\n\n{--------------------------------------------------------------}\n{ Table Lookup }\n\nfunction Lookup(T: TabPtr; s: string; n: integer): integer;\nvar i: integer;\n    found: Boolean;\nbegin\n   found := false;\n   i := n;\n   while (i > 0) and not found do\n      if s = T^[i] then\n         found := true\n      else\n         dec(i);\n   Lookup := i;\nend;\n{--------------------------------------------------------------}\n.\n.\n{--------------------------------------------------------------}\n{ Get an Identifier and Scan it for Keywords }\n\nprocedure Scan;\nbegin\n   GetName;\n   Token := KWcode[Lookup(Addr(KWlist), Value, NKW) + 1];\nend;\n{--------------------------------------------------------------}\n.\n.\n{--------------------------------------------------------------}\n{ Match a Specific Input String }\n\nprocedure MatchString(x: string);\nbegin\n   if Value <> x then Expected('''' + x + '''');\nend;\n{--------------------------------------------------------------}\n\n\nNow, we have to make a  fairly  large number of subtle changes to\nthe remaining procedures.  First,  we  must  change  the function\nGetName to a procedure, again as we did in Part VII:\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nprocedure GetName;\nbegin\n   NewLine;\n   if not IsAlpha(Look) then Expected('Name');\n   Value := '';\n   while IsAlNum(Look) do begin\n      Value := Value + UpCase(Look);\n      GetChar;\n   end;\n   SkipWhite;\nend;\n{--------------------------------------------------------------}\n\n\nNote that this procedure leaves its result in  the  global string\nValue.\n\nNext, we have to change every reference to GetName to reflect its\nnew form. These occur in Factor, Assignment, and Decl:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Factor }\n\nprocedure BoolExpression; Forward;\n\nprocedure Factor;\nbegin\n   if Look = '(' then begin\n      Match('(');\n      BoolExpression;\n      Match(')');\n      end\n   else if IsAlpha(Look) then begin\n      GetName;\n      LoadVar(Value[1]);\n      end\n   else\n      LoadConst(GetNum);\nend;\n{--------------------------------------------------------------}\n.\n.\n{--------------------------------------------------------------}\n{ Parse and Translate an Assignment Statement }\n\nprocedure Assignment;\nvar Name: char;\nbegin\n   Name := Value[1];\n   Match('=');\n   BoolExpression;\n   Store(Name);\nend;\n{---------------------------------------------------------------}\n.\n.\n{--------------------------------------------------------------}\n{ Parse and Translate a Data Declaration }\n\nprocedure Decl;\nbegin\n   GetName;\n   Alloc(Value[1]);\n   while Look = ',' do begin\n      Match(',');\n      GetName;\n      Alloc(Value[1]);\n   end;\nend;\n{--------------------------------------------------------------}\n\n\n(Note that we're still  only  allowing  single-character variable\nnames,  so we take the easy way out here and simply use the first\ncharacter of the string.)\n\nFinally, we must make the changes to use Token instead of Look as\nthe  test  character  and to call Scan at the appropriate places.\nMostly, this  involves  deleting  calls  to  Match,  occasionally\nreplacing calls to  Match  by calls to MatchString, and Replacing\ncalls  to  NewLine  by  calls  to  Scan.    Here are the affected\nroutines:\n\n{---------------------------------------------------------------}\n{ Recognize and Translate an IF Construct }\n\nprocedure Block; Forward;\n\n\nprocedure DoIf;\nvar L1, L2: string;\nbegin\n   BoolExpression;\n   L1 := NewLabel;\n   L2 := L1;\n   BranchFalse(L1);\n   Block;\n   if Token = 'l' then begin\n      L2 := NewLabel;\n      Branch(L2);\n      PostLabel(L1);\n      Block;\n   end;\n   PostLabel(L2);\n   MatchString('ENDIF');\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a WHILE Statement }\n\nprocedure DoWhile;\nvar L1, L2: string;\nbegin\n   L1 := NewLabel;\n   L2 := NewLabel;\n   PostLabel(L1);\n   BoolExpression;\n   BranchFalse(L2);\n   Block;\n   MatchString('ENDWHILE');\n   Branch(L1);\n   PostLabel(L2);\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Block of Statements }\n\nprocedure Block;\nbegin\n   Scan;\n   while not(Token in ['e', 'l']) do begin\n      case Token of\n       'i': DoIf;\n       'w': DoWhile;\n      else Assignment;\n      end;\n      Scan;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate Global Declarations }\n\nprocedure TopDecls;\nbegin\n   Scan;\n   while Token <> 'b' do begin\n      case Token of\n        'v': Decl;\n      else Abort('Unrecognized Keyword ' + Value);\n      end;\n      Scan;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Main Program }\n\nprocedure Main;\nbegin\n   MatchString('BEGIN');\n   Prolog;\n   Block;\n   MatchString('END');\n   Epilog;\nend;\n\n{--------------------------------------------------------------}\n{  Parse and Translate a Program }\n\nprocedure Prog;\nbegin\n   MatchString('PROGRAM');\n   Header;\n   TopDecls;\n   Main;\n   Match('.');\nend;\n\n\n{--------------------------------------------------------------}\n{ Initialize }\n\nprocedure Init;\nvar i: char;\nbegin\n   for i := 'A' to 'Z' do\n      ST[i] := ' ';\n   GetChar;\n   Scan;\nend;\n{--------------------------------------------------------------}\n\n\nThat should do  it.    If  all  the changes got in correctly, you\nshould now be parsing programs that look like programs.   (If you\ndidn't  make  it  through all the  changes,  don't  despair.    A\ncomplete listing of the final form is given later.)\n\nDid it work?  If so, then we're just about home.  In fact, with a\nfew minor  exceptions we've already got a compiler that's usable.\nThere are still a few areas that need improvement.\n\n\nMULTI-CHARACTER VARIABLE NAMES\n\nOne of those is  the  restriction  that  we still have, requiring\nsingle-character variable names.    Now that we can handle multi-\ncharacter keywords, this one  begins  to  look  very much like an\narbitrary  and  unnecessary  limitation.    And  indeed   it  is.\nBasically, its only virtue is  that it permits a trivially simple\nimplementation  of  the   symbol   table.    But  that's  just  a\nconvenience to the compiler writers, and needs to be eliminated.\n\nWe've done this step before.  This time, as usual, I'm doing it a\nlittle differently.  I think  the approach used here keeps things\njust about as simple as possible.\n\nThe natural  way  to  implement  a  symbol  table in Pascal is by\ndeclaring a record type, and making the symbol table an  array of\nsuch records.  Here, though, we don't really need  a  type  field\nyet  (there is only one kind of entry allowed so far), so we only\nneed an array of symbols.  This has the advantage that we can use\nthe existing procedure Lookup to  search the symbol table as well\nas the  keyword  list.    As it turns out, even when we need more\nfields we can still use the same approach, simply by  storing the\nother fields in separate arrays.\n\nOK, here are the changes that  need  to  be made.  First, add the\nnew typed constant:\n\n\n      NEntry: integer = 0;\n\n\nThen change the definition of the symbol table as follows:\n\n\nconst MaxEntry = 100;\n\nvar ST   : array[1..MaxEntry] of Symbol;\n\n\n(Note that ST is _NOT_ declared as a SymTab.  That declaration is\na phony one to get Lookup to work.  A SymTab  would  take  up too\nmuch RAM space, and so one is never actually allocated.)\n\nNext, we need to replace InTable:\n\n\n{--------------------------------------------------------------}\n{ Look for Symbol in Table }\n\nfunction InTable(n: Symbol): Boolean;\nbegin\n   InTable := Lookup(@ST, n, MaxEntry) <> 0;\nend;\n{--------------------------------------------------------------}\n\n\nWe also need a new procedure, AddEntry, that adds a new  entry to\nthe table:\n\n\n{--------------------------------------------------------------}\n{ Add a New Entry to Symbol Table }\n\nprocedure AddEntry(N: Symbol; T: char);\nbegin\n   if InTable(N) then Abort('Duplicate Identifier ' + N);\n   if NEntry = MaxEntry then Abort('Symbol Table Full');\n   Inc(NEntry);\n   ST[NEntry] := N;\n   SType[NEntry] := T;\nend;\n{--------------------------------------------------------------}\n\n\nThis procedure is called by Alloc:\n\n\n{--------------------------------------------------------------}\n{ Allocate Storage for a Variable }\n\nprocedure Alloc(N: Symbol);\nbegin\n   if InTable(N) then Abort('Duplicate Variable Name ' + N);\n   AddEntry(N, 'v');\n.\n.\n.\n{--------------------------------------------------------------}\n\n\nFinally, we must change all the routines that currently treat the\nvariable name as a single character.  These include   LoadVar and\nStore (just change the  type  from  char  to string), and Factor,\nAssignment, and Decl (just change Value[1] to Value).\n\nOne  last  thing:  change  procedure  Init to clear the array  as\nshown:\n\n\n{--------------------------------------------------------------}\n{ Initialize }\n\nprocedure Init;\nvar i: integer;\nbegin\n   for i := 1 to MaxEntry do begin\n      ST[i] := '';\n      SType[i] := ' ';\n   end;\n   GetChar;\n   Scan;\nend;\n{--------------------------------------------------------------}\n\n\nThat should do it.  Try it out and verify  that  you can, indeed,\nuse multi-character variable names.\n\n\nMORE RELOPS\n\nWe still have one remaining single-character restriction: the one\non relops.  Some of the relops are indeed single  characters, but\nothers  require two.  These are '<=' and '>='.  I also prefer the\nPascal '<>' for \"not equals,\"  instead of '#'.\n\nIf you'll recall, in Part VII I pointed out that the conventional\nway  to  deal  with  relops  is  to  include them in the list  of\nkeywords, and let the  lexical  scanner  find  them.  But, again,\nthis requires scanning throughout the expression parsing process,\nwhereas so far we've been able to limit the use of the scanner to\nthe beginning of a statement.\n\nI mentioned then that we can still get away with this,  since the\nmulti-character relops are so few  and so limited in their usage.\nIt's easy to just treat them as special cases and handle  them in\nan ad hoc manner.\n\nThe changes required affect only the code generation routines and\nprocedures Relation and friends.   First, we're going to need two\nmore code generation routines:\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was <= }\n\nprocedure SetLessOrEqual;\nbegin\n   EmitLn('SGE D0');\n   EmitLn('EXT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was >= }\n\nprocedure SetGreaterOrEqual;\nbegin\n   EmitLn('SLE D0');\n   EmitLn('EXT D0');\nend;\n{---------------------------------------------------------------}\n\n\nThen, modify the relation parsing routines as shown below:\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Less Than or Equal\" }\n\nprocedure LessOrEqual;\nbegin\n   Match('=');\n   Expression;\n   PopCompare;\n   SetLessOrEqual;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Not Equals\" }\n\nprocedure NotEqual;\nbegin\n   Match('>');\n   Expression;\n   PopCompare;\n   SetNEqual;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Less Than\" }\n\nprocedure Less;\nbegin\n   Match('<');\n   case Look of\n     '=': LessOrEqual;\n     '>': NotEqual;\n   else begin\n           Expression;\n           PopCompare;\n           SetLess;\n        end;\n   end;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Greater Than\" }\n\nprocedure Greater;\nbegin\n   Match('>');\n   if Look = '=' then begin\n      Match('=');\n      Expression;\n      PopCompare;\n      SetGreaterOrEqual;\n      end\n   else begin\n      Expression;\n      PopCompare;\n      SetGreater;\n   end;\nend;\n{---------------------------------------------------------------}\n\n\nThat's all it takes.  Now  you  can  process all the relops.  Try\nit.\n\n\nINPUT/OUTPUT\n\nWe  now  have  a complete, working language, except for one minor\nembarassment: we have no way to get data in or out.  We need some\nI/O.\n\nNow, the convention these days, established in C and continued in\nAda and Modula 2, is to leave I/O statements out of  the language\nitself,  and  just  include them in the subroutine library.  That\nwould  be  fine, except that so far  we  have  no  provision  for\nsubroutines.  Anyhow, with this approach you run into the problem\nof variable-length argument lists.  In Pascal, the I/O statements\nare built into the language because they are the  only  ones  for\nwhich  the  argument  list can have a variable number of entries.\nIn C, we settle for kludges like scanf and printf, and  must pass\nthe argument count to the called procedure.  In Ada and  Modula 2\nwe must use the  awkward  (and SLOW!) approach of a separate call\nfor each argument.\n\nSo I think I prefer the  Pascal  approach of building the I/O in,\neven though we don't need to.\n\nAs  usual,  for  this we need some more code generation routines.\nThese turn out  to be the easiest of all, because all we do is to\ncall library procedures to do the work:\n\n\n{---------------------------------------------------------------}\n{ Read Variable to Primary Register }\n\nprocedure ReadVar;\nbegin\n   EmitLn('BSR READ');\n   Store(Value);\nend;\n\n\n{---------------------------------------------------------------}\n{ Write Variable from Primary Register }\n\nprocedure WriteVar;\nbegin\n   EmitLn('BSR WRITE');\nend;\n{--------------------------------------------------------------}\n\n\nThe idea is that READ loads the value from input  to  the D0, and\nWRITE outputs it from there.\n\nThese two procedures represent  our  first  encounter with a need\nfor library procedures ... the components of a  Run  Time Library\n(RTL).    Of  course, someone (namely  us)  has  to  write  these\nroutines, but they're not  part  of the compiler itself.  I won't\neven bother  showing the routines here, since these are obviously\nvery much OS-dependent.   I  _WILL_  simply  say that for SK*DOS,\nthey  are  particularly  simple ... almost trivial.  One reason I\nwon't show them here is that  you  can add all kinds of fanciness\nto the things, for  example  by prompting in READ for the inputs,\nand by giving the user a chance to reenter a bad input.\n\nBut that is really separate from compiler design, so for now I'll\njust assume that a library call TINYLIB.LIB exists.  Since we now\nneed  it  loaded,  we need to add a statement to  include  it  in\nprocedure Header:\n\n\n{--------------------------------------------------------------}\n{ Write Header Info }\n\nprocedure Header;\nbegin\n\n   WriteLn('WARMST', TAB, 'EQU $A01E');\n   EmitLn('LIB TINYLIB');\nend;\n{--------------------------------------------------------------}\n\nThat takes care of that part.  Now, we also need to recognize the\nread  and  write  commands.  We can do this by  adding  two  more\nkeywords to our list:\n\n\n{--------------------------------------------------------------}\n{ Definition of Keywords and Token Types }\n\nconst NKW =   11;\n      NKW1 = 12;\n\nconst KWlist: array[1..NKW] of Symbol =\n              ('IF', 'ELSE', 'ENDIF', 'WHILE', 'ENDWHILE',\n               'READ',    'WRITE',    'VAR',    'BEGIN',   'END',\n'PROGRAM');\n\nconst KWcode: string[NKW1] = 'xileweRWvbep';\n{--------------------------------------------------------------}\n\n\n(Note how I'm using upper case codes here to avoid  conflict with\nthe 'w' of WHILE.)\n\nNext, we need procedures for processing the  read/write statement\nand its argument list:\n\n\n{--------------------------------------------------------------}\n{ Process a Read Statement }\nprocedure DoRead;\nbegin\n   Match('(');\n   GetName;\n   ReadVar;\n   while Look = ',' do begin\n      Match(',');\n      GetName;\n      ReadVar;\n   end;\n   Match(')');\nend;\n\n\n{--------------------------------------------------------------}\n{ Process a Write Statement }\n\nprocedure DoWrite;\nbegin\n   Match('(');\n   Expression;\n   WriteVar;\n   while Look = ',' do begin\n      Match(',');\n      Expression;\n      WriteVar;\n   end;\n   Match(')');\nend;\n{--------------------------------------------------------------}\n\n\nFinally,  we  must  expand  procedure  Block  to  handle the  new\nstatement types:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Block of Statements }\n\nprocedure Block;\nbegin\n   Scan;\n   while not(Token in ['e', 'l']) do begin\n      case Token of\n       'i': DoIf;\n       'w': DoWhile;\n       'R': DoRead;\n       'W': DoWrite;\n      else Assignment;\n      end;\n      Scan;\n   end;\nend;\n{--------------------------------------------------------------}\n\nThat's all there is to it.  _NOW_ we have a language!\n\n\nCONCLUSION\n\nAt this point we have TINY completely defined.  It's not much ...\nactually a toy  compiler.    TINY  has  only one data type and no\nsubroutines  ... but it's a complete,  usable  language.    While\nyou're not likely to be able to write another compiler in  it, or\ndo anything else very seriously, you could write programs to read\nsome input, perform calculations,  and  output  the results.  Not\ntoo bad for a toy.\n\nMost importantly, we have a firm base upon which to build further\nextensions.  I know you'll be glad to hear this: this is the last\ntime  I'll  start  over in building a parser ... from  now  on  I\nintend to just add features to  TINY  until it becomes KISS.  Oh,\nthere'll be other times we will  need  to try things out with new\ncopies  of  the  Cradle, but once we've found out how to do those\nthings they'll be incorporated into TINY.\n\nWhat  will  those  features  be?    Well,  for starters  we  need\nsubroutines and functions.    Then  we  need to be able to handle\ndifferent types, including arrays, strings, and other structures.\nThen we need to deal with the idea of pointers.  All this will be\nupcoming in future installments.\n\nSee you then.\n\nFor references purposes, the complete listing of TINY Version 1.0\nis shown below:\n\n\n{--------------------------------------------------------------}\nprogram Tiny10;\n\n{--------------------------------------------------------------}\n{ Constant Declarations }\n\nconst TAB = ^I;\n      CR  = ^M;\n      LF  = ^J;\n\n      LCount: integer = 0;\n      NEntry: integer = 0;\n\n\n{--------------------------------------------------------------}\n{ Type Declarations }\n\ntype Symbol = string[8];\n\n     SymTab = array[1..1000] of Symbol;\n     TabPtr = ^SymTab;\n\n\n{--------------------------------------------------------------}\n{ Variable Declarations }\n\nvar Look : char;             { Lookahead Character }\n    Token: char;             { Encoded Token       }\n    Value: string[16];       { Unencoded Token     }\n\n\nconst MaxEntry = 100;\n\nvar ST   : array[1..MaxEntry] of Symbol;\n    SType: array[1..MaxEntry] of char;\n\n\n{--------------------------------------------------------------}\n{ Definition of Keywords and Token Types }\n\nconst NKW =   11;\n      NKW1 = 12;\n\nconst KWlist: array[1..NKW] of Symbol =\n              ('IF', 'ELSE', 'ENDIF', 'WHILE', 'ENDWHILE',\n               'READ',    'WRITE',    'VAR',    'BEGIN',   'END',\n'PROGRAM');\n\nconst KWcode: string[NKW1] = 'xileweRWvbep';\n\n\n{--------------------------------------------------------------}\n{ Read New Character From Input Stream }\n\nprocedure GetChar;\nbegin\n   Read(Look);\nend;\n\n{--------------------------------------------------------------}\n{ Report an Error }\n\nprocedure Error(s: string);\nbegin\n   WriteLn;\n   WriteLn(^G, 'Error: ', s, '.');\nend;\n\n\n{--------------------------------------------------------------}\n{ Report Error and Halt }\n\nprocedure Abort(s: string);\nbegin\n   Error(s);\n   Halt;\nend;\n\n\n{--------------------------------------------------------------}\n{ Report What Was Expected }\n\nprocedure Expected(s: string);\nbegin\n   Abort(s + ' Expected');\nend;\n\n{--------------------------------------------------------------}\n{ Report an Undefined Identifier }\n\nprocedure Undefined(n: string);\nbegin\n   Abort('Undefined Identifier ' + n);\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an Alpha Character }\n\nfunction IsAlpha(c: char): boolean;\nbegin\n   IsAlpha := UpCase(c) in ['A'..'Z'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Decimal Digit }\n\nfunction IsDigit(c: char): boolean;\nbegin\n   IsDigit := c in ['0'..'9'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an AlphaNumeric Character }\n\nfunction IsAlNum(c: char): boolean;\nbegin\n   IsAlNum := IsAlpha(c) or IsDigit(c);\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an Addop }\n\nfunction IsAddop(c: char): boolean;\nbegin\n   IsAddop := c in ['+', '-'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Mulop }\n\nfunction IsMulop(c: char): boolean;\nbegin\n   IsMulop := c in ['*', '/'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Boolean Orop }\n\nfunction IsOrop(c: char): boolean;\nbegin\n   IsOrop := c in ['|', '~'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Relop }\n\nfunction IsRelop(c: char): boolean;\nbegin\n   IsRelop := c in ['=', '#', '<', '>'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize White Space }\n\nfunction IsWhite(c: char): boolean;\nbegin\n   IsWhite := c in [' ', TAB];\nend;\n\n\n{--------------------------------------------------------------}\n{ Skip Over Leading White Space }\n\nprocedure SkipWhite;\nbegin\n   while IsWhite(Look) do\n      GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Skip Over an End-of-Line }\n\nprocedure NewLine;\nbegin\n   while Look = CR do begin\n      GetChar;\n      if Look = LF then GetChar;\n      SkipWhite;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Match a Specific Input Character }\n\nprocedure Match(x: char);\nbegin\n   NewLine;\n   if Look = x then GetChar\n   else Expected('''' + x + '''');\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Table Lookup }\n\nfunction Lookup(T: TabPtr; s: string; n: integer): integer;\nvar i: integer;\n    found: Boolean;\nbegin\n   found := false;\n   i := n;\n   while (i > 0) and not found do\n      if s = T^[i] then\n         found := true\n      else\n         dec(i);\n   Lookup := i;\nend;\n\n\n{--------------------------------------------------------------}\n{ Locate a Symbol in Table }\n{ Returns the index of the entry.  Zero if not present. }\n\nfunction Locate(N: Symbol): integer;\nbegin\n   Locate := Lookup(@ST, n, MaxEntry);\nend;\n\n\n{--------------------------------------------------------------}\n{ Look for Symbol in Table }\n\nfunction InTable(n: Symbol): Boolean;\nbegin\n   InTable := Lookup(@ST, n, MaxEntry) <> 0;\nend;\n\n\n{--------------------------------------------------------------}\n{ Add a New Entry to Symbol Table }\n\nprocedure AddEntry(N: Symbol; T: char);\nbegin\n   if InTable(N) then Abort('Duplicate Identifier ' + N);\n   if NEntry = MaxEntry then Abort('Symbol Table Full');\n   Inc(NEntry);\n   ST[NEntry] := N;\n   SType[NEntry] := T;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nprocedure GetName;\nbegin\n   NewLine;\n   if not IsAlpha(Look) then Expected('Name');\n   Value := '';\n   while IsAlNum(Look) do begin\n      Value := Value + UpCase(Look);\n      GetChar;\n   end;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nfunction GetNum: integer;\nvar Val: integer;\nbegin\n   NewLine;\n   if not IsDigit(Look) then Expected('Integer');\n   Val := 0;\n   while IsDigit(Look) do begin\n      Val := 10 * Val + Ord(Look) - Ord('0');\n      GetChar;\n   end;\n   GetNum := Val;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier and Scan it for Keywords }\n\nprocedure Scan;\nbegin\n   GetName;\n   Token := KWcode[Lookup(Addr(KWlist), Value, NKW) + 1];\nend;\n\n\n{--------------------------------------------------------------}\n{ Match a Specific Input String }\n\nprocedure MatchString(x: string);\nbegin\n   if Value <> x then Expected('''' + x + '''');\nend;\n\n\n{--------------------------------------------------------------}\n{ Output a String with Tab }\n\nprocedure Emit(s: string);\nbegin\n   Write(TAB, s);\nend;\n\n\n{--------------------------------------------------------------}\n{ Output a String with Tab and CRLF }\n\nprocedure EmitLn(s: string);\nbegin\n   Emit(s);\n   WriteLn;\nend;\n\n\n{--------------------------------------------------------------}\n{ Generate a Unique Label }\n\nfunction NewLabel: string;\nvar S: string;\nbegin\n   Str(LCount, S);\n   NewLabel := 'L' + S;\n   Inc(LCount);\nend;\n\n\n{--------------------------------------------------------------}\n{ Post a Label To Output }\n\nprocedure PostLabel(L: string);\nbegin\n   WriteLn(L, ':');\nend;\n\n\n{---------------------------------------------------------------}\n{ Clear the Primary Register }\n\nprocedure Clear;\nbegin\n   EmitLn('CLR D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Negate the Primary Register }\n\nprocedure Negate;\nbegin\n   EmitLn('NEG D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Complement the Primary Register }\n\nprocedure NotIt;\nbegin\n   EmitLn('NOT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Load a Constant Value to Primary Register }\n\nprocedure LoadConst(n: integer);\nbegin\n   Emit('MOVE #');\n   WriteLn(n, ',D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Load a Variable to Primary Register }\n\nprocedure LoadVar(Name: string);\nbegin\n   if not InTable(Name) then Undefined(Name);\n   EmitLn('MOVE ' + Name + '(PC),D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Push Primary onto Stack }\n\nprocedure Push;\nbegin\n   EmitLn('MOVE D0,-(SP)');\nend;\n\n\n{---------------------------------------------------------------}\n{ Add Top of Stack to Primary }\n\nprocedure PopAdd;\nbegin\n   EmitLn('ADD (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Subtract Primary from Top of Stack }\n\nprocedure PopSub;\nbegin\n   EmitLn('SUB (SP)+,D0');\n   EmitLn('NEG D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Multiply Top of Stack by Primary }\n\nprocedure PopMul;\nbegin\n   EmitLn('MULS (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Divide Top of Stack by Primary }\n\nprocedure PopDiv;\nbegin\n   EmitLn('MOVE (SP)+,D7');\n   EmitLn('EXT.L D7');\n   EmitLn('DIVS D0,D7');\n   EmitLn('MOVE D7,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ AND Top of Stack with Primary }\n\nprocedure PopAnd;\nbegin\n   EmitLn('AND (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ OR Top of Stack with Primary }\n\nprocedure PopOr;\nbegin\n   EmitLn('OR (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ XOR Top of Stack with Primary }\n\nprocedure PopXor;\nbegin\n   EmitLn('EOR (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Compare Top of Stack with Primary }\n\nprocedure PopCompare;\nbegin\n   EmitLn('CMP (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was = }\n\nprocedure SetEqual;\nbegin\n   EmitLn('SEQ D0');\n   EmitLn('EXT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was != }\n\nprocedure SetNEqual;\nbegin\n   EmitLn('SNE D0');\n   EmitLn('EXT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was > }\n\nprocedure SetGreater;\nbegin\n   EmitLn('SLT D0');\n   EmitLn('EXT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was < }\n\nprocedure SetLess;\nbegin\n   EmitLn('SGT D0');\n   EmitLn('EXT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was <= }\n\nprocedure SetLessOrEqual;\nbegin\n   EmitLn('SGE D0');\n   EmitLn('EXT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was >= }\n\nprocedure SetGreaterOrEqual;\nbegin\n   EmitLn('SLE D0');\n   EmitLn('EXT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Store Primary to Variable }\n\nprocedure Store(Name: string);\nbegin\n   if not InTable(Name) then Undefined(Name);\n   EmitLn('LEA ' + Name + '(PC),A0');\n   EmitLn('MOVE D0,(A0)')\nend;\n\n\n{---------------------------------------------------------------}\n{ Branch Unconditional  }\n\nprocedure Branch(L: string);\nbegin\n   EmitLn('BRA ' + L);\nend;\n\n\n{---------------------------------------------------------------}\n{ Branch False }\n\nprocedure BranchFalse(L: string);\nbegin\n   EmitLn('TST D0');\n   EmitLn('BEQ ' + L);\nend;\n\n\n{---------------------------------------------------------------}\n{ Read Variable to Primary Register }\n\nprocedure ReadVar;\nbegin\n   EmitLn('BSR READ');\n   Store(Value[1]);\nend;\n\n\n{ Write Variable from Primary Register }\n\nprocedure WriteVar;\nbegin\n   EmitLn('BSR WRITE');\nend;\n\n\n{--------------------------------------------------------------}\n{ Write Header Info }\n\nprocedure Header;\nbegin\n   WriteLn('WARMST', TAB, 'EQU $A01E');\nend;\n\n\n{--------------------------------------------------------------}\n{ Write the Prolog }\n\nprocedure Prolog;\nbegin\n   PostLabel('MAIN');\nend;\n\n\n{--------------------------------------------------------------}\n{ Write the Epilog }\n\nprocedure Epilog;\nbegin\n   EmitLn('DC WARMST');\n   EmitLn('END MAIN');\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Factor }\n\nprocedure BoolExpression; Forward;\n\nprocedure Factor;\nbegin\n   if Look = '(' then begin\n      Match('(');\n      BoolExpression;\n      Match(')');\n      end\n   else if IsAlpha(Look) then begin\n      GetName;\n      LoadVar(Value);\n      end\n   else\n      LoadConst(GetNum);\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Negative Factor }\n\nprocedure NegFactor;\nbegin\n   Match('-');\n   if IsDigit(Look) then\n      LoadConst(-GetNum)\n   else begin\n      Factor;\n      Negate;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Leading Factor }\n\nprocedure FirstFactor;\nbegin\n   case Look of\n     '+': begin\n             Match('+');\n             Factor;\n          end;\n     '-': NegFactor;\n   else  Factor;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Multiply }\n\nprocedure Multiply;\nbegin\n   Match('*');\n   Factor;\n   PopMul;\nend;\n\n\n{-------------------------------------------------------------}\n{ Recognize and Translate a Divide }\n\nprocedure Divide;\nbegin\n   Match('/');\n   Factor;\n   PopDiv;\nend;\n\n\n{---------------------------------------------------------------}\n{ Common Code Used by Term and FirstTerm }\n\nprocedure Term1;\nbegin\n   while IsMulop(Look) do begin\n      Push;\n      case Look of\n       '*': Multiply;\n       '/': Divide;\n      end;\n   end;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Term }\n\nprocedure Term;\nbegin\n   Factor;\n   Term1;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Leading Term }\n\nprocedure FirstTerm;\nbegin\n   FirstFactor;\n   Term1;\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an Add }\n\nprocedure Add;\nbegin\n   Match('+');\n   Term;\n   PopAdd;\nend;\n\n\n{-------------------------------------------------------------}\n{ Recognize and Translate a Subtract }\n\nprocedure Subtract;\nbegin\n   Match('-');\n   Term;\n   PopSub;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nprocedure Expression;\nbegin\n   FirstTerm;\n   while IsAddop(Look) do begin\n      Push;\n      case Look of\n       '+': Add;\n       '-': Subtract;\n      end;\n   end;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Equals\" }\n\nprocedure Equal;\nbegin\n   Match('=');\n   Expression;\n   PopCompare;\n   SetEqual;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Less Than or Equal\" }\n\nprocedure LessOrEqual;\nbegin\n   Match('=');\n   Expression;\n   PopCompare;\n   SetLessOrEqual;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Not Equals\" }\n\nprocedure NotEqual;\nbegin\n   Match('>');\n   Expression;\n   PopCompare;\n   SetNEqual;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Less Than\" }\n\nprocedure Less;\nbegin\n   Match('<');\n   case Look of\n     '=': LessOrEqual;\n     '>': NotEqual;\n   else begin\n           Expression;\n           PopCompare;\n           SetLess;\n        end;\n   end;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Greater Than\" }\n\nprocedure Greater;\nbegin\n   Match('>');\n   if Look = '=' then begin\n      Match('=');\n      Expression;\n      PopCompare;\n      SetGreaterOrEqual;\n      end\n   else begin\n      Expression;\n      PopCompare;\n      SetGreater;\n   end;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Relation }\n\n\nprocedure Relation;\nbegin\n   Expression;\n   if IsRelop(Look) then begin\n      Push;\n      case Look of\n       '=': Equal;\n       '<': Less;\n       '>': Greater;\n      end;\n   end;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Boolean Factor with Leading NOT }\n\nprocedure NotFactor;\nbegin\n   if Look = '!' then begin\n      Match('!');\n      Relation;\n      NotIt;\n      end\n   else\n      Relation;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Boolean Term }\n\nprocedure BoolTerm;\nbegin\n   NotFactor;\n   while Look = '&' do begin\n      Push;\n      Match('&');\n      NotFactor;\n      PopAnd;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Boolean OR }\n\nprocedure BoolOr;\nbegin\n   Match('|');\n   BoolTerm;\n   PopOr;\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an Exclusive Or }\n\nprocedure BoolXor;\nbegin\n   Match('~');\n   BoolTerm;\n   PopXor;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Boolean Expression }\n\nprocedure BoolExpression;\nbegin\n   BoolTerm;\n   while IsOrOp(Look) do begin\n      Push;\n      case Look of\n       '|': BoolOr;\n       '~': BoolXor;\n      end;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Assignment Statement }\n\nprocedure Assignment;\nvar Name: string;\nbegin\n   Name := Value;\n   Match('=');\n   BoolExpression;\n   Store(Name);\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate an IF Construct }\n\nprocedure Block; Forward;\n\n\nprocedure DoIf;\nvar L1, L2: string;\nbegin\n   BoolExpression;\n   L1 := NewLabel;\n   L2 := L1;\n   BranchFalse(L1);\n   Block;\n   if Token = 'l' then begin\n      L2 := NewLabel;\n      Branch(L2);\n      PostLabel(L1);\n      Block;\n   end;\n   PostLabel(L2);\n   MatchString('ENDIF');\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a WHILE Statement }\n\nprocedure DoWhile;\nvar L1, L2: string;\nbegin\n   L1 := NewLabel;\n   L2 := NewLabel;\n   PostLabel(L1);\n   BoolExpression;\n   BranchFalse(L2);\n   Block;\n   MatchString('ENDWHILE');\n   Branch(L1);\n   PostLabel(L2);\nend;\n\n\n{--------------------------------------------------------------}\n{ Process a Read Statement }\n\nprocedure DoRead;\nbegin\n   Match('(');\n   GetName;\n   ReadVar;\n   while Look = ',' do begin\n      Match(',');\n      GetName;\n      ReadVar;\n   end;\n   Match(')');\nend;\n\n\n{--------------------------------------------------------------}\n{ Process a Write Statement }\n\nprocedure DoWrite;\nbegin\n   Match('(');\n   Expression;\n   WriteVar;\n   while Look = ',' do begin\n      Match(',');\n      Expression;\n      WriteVar;\n   end;\n   Match(')');\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Block of Statements }\n\nprocedure Block;\nbegin\n   Scan;\n   while not(Token in ['e', 'l']) do begin\n      case Token of\n       'i': DoIf;\n       'w': DoWhile;\n       'R': DoRead;\n       'W': DoWrite;\n      else Assignment;\n      end;\n      Scan;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Allocate Storage for a Variable }\n\nprocedure Alloc(N: Symbol);\nbegin\n   if InTable(N) then Abort('Duplicate Variable Name ' + N);\n   AddEntry(N, 'v');\n   Write(N, ':', TAB, 'DC ');\n   if Look = '=' then begin\n      Match('=');\n      If Look = '-' then begin\n         Write(Look);\n         Match('-');\n      end;\n      WriteLn(GetNum);\n      end\n   else\n      WriteLn('0');\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Data Declaration }\n\nprocedure Decl;\nbegin\n   GetName;\n   Alloc(Value);\n   while Look = ',' do begin\n      Match(',');\n      GetName;\n      Alloc(Value);\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate Global Declarations }\n\nprocedure TopDecls;\nbegin\n   Scan;\n   while Token <> 'b' do begin\n      case Token of\n        'v': Decl;\n      else Abort('Unrecognized Keyword ' + Value);\n      end;\n      Scan;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Main Program }\n\nprocedure Main;\nbegin\n   MatchString('BEGIN');\n   Prolog;\n   Block;\n   MatchString('END');\n   Epilog;\nend;\n\n\n{--------------------------------------------------------------}\n{  Parse and Translate a Program }\n\nprocedure Prog;\nbegin\n   MatchString('PROGRAM');\n   Header;\n   TopDecls;\n   Main;\n   Match('.');\nend;\n\n\n{--------------------------------------------------------------}\n{ Initialize }\n\nprocedure Init;\nvar i: integer;\nbegin\n   for i := 1 to MaxEntry do begin\n      ST[i] := '';\n      SType[i] := ' ';\n   end;\n   GetChar;\n   Scan;\nend;\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\n   Prog;\n   if Look <> CR then Abort('Unexpected data after ''.''');\nend.\n{--------------------------------------------------------------}\n\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n"
  },
  {
    "path": "11/Makefile",
    "content": "IN=main.c cradle.c\nOUT=main\nFLAGS=-Wall -Werror\n\nall:\n\tgcc -o $(OUT) $(IN) $(FLAGS)\n\nrun:\n\t./$(OUT)\n\n.PHONY: clean\nclean:\n\trm $(OUT)\n"
  },
  {
    "path": "11/cradle.c",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <stdbool.h>\n#include <string.h>\n\n#include \"cradle.h\"\n#include <malloc.h>\n\n#define MaxEntry 100\n#define MAX_SYMBOL_LENGTH 10\nstatic int LCount = 0;\nstatic char labelName[MAX_BUF];\nchar tmp[MAX_BUF];\n\n/*char ST[TABLE_SIZE];*/\nstatic int NEntry = 0;\nconst char *ST[MaxEntry];\nchar SType[MaxEntry];\n\n\n/* Keywords symbol table */\nconst char const *KWList[] = {\n    \"IF\",\n    \"ELSE\",\n    \"ENDIF\",\n    \"WHILE\",\n    \"ENDWHILE\",\n    \"VAR\",\n    \"END\",\n};\nconst char KWCode[] = \"xileweve\";\nconst int KWNum = sizeof(KWList)/sizeof(*KWList);\n\nchar Token;             /* current token */\nchar Value[MAX_BUF];    /* string token of Look */\n\n/* Helper Functions */\nchar uppercase(char c)\n{\n    if (IsAlpha(c)) {\n        return (c & 0xDF);\n    } else {\n        return c;\n    }\n}\n\n/* Table Lookup\n * If the input string matches a table entry, return the entry index, else\n * return -1.\n * *n* is the size of the table */\nint Lookup(const char const *table[], const char *string, int n)\n{\n    int i;\n    bool found = false;\n\n    for (i = 0; i < n; ++i) {\n        if (strcmp(table[i], string) == 0) {\n            found = true;\n            break;\n        }\n    }\n    return found ? i : -1;\n}\n\nint Locate(char *symbol)\n{\n    return Lookup(ST, symbol, NEntry);\n}\n\n/* Add a new entry to symbol table */\nvoid AddEntry(char *symbol, char type)\n{\n    CheckDup(symbol);\n    if (NEntry == MaxEntry) {\n        Abort(\"Symbol Table Full\");\n    }\n\n    char *new_entry = (char *)malloc((strlen(symbol)+1)*sizeof(*new_entry));\n    if (new_entry == NULL) {\n        Abort(\"AddEntry: not enough memory allocating new_entry.\");\n    }\n    strcpy(new_entry, symbol);\n    ST[NEntry] = new_entry;\n    SType[NEntry] = type;\n\n    NEntry++;\n}\n\n/* Get an Identifier and Scan it for keywords */\nvoid Scan()\n{\n    if (Token == 'x') {\n        int index = Lookup(KWList, Value, KWNum);\n        Token = KWCode[index+1];\n    }\n}\n\nvoid MatchString(char *str)\n{\n    if (strcmp(Value, str) != 0) {\n        sprintf(tmp, \"\\\"%s\\\"\", str);\n        Expected(tmp);\n    }\n    Next();\n}\n\nvoid GetChar()\n{\n    Look = getchar();\n    /* printf(\"Getchar: %c\\n\", Look); */\n}\n\n\nvoid Error(char *s)\n{\n    printf(\"\\nError: %s.\", s);\n}\n\nvoid Abort(char *s)\n{\n    Error(s);\n    exit(1);\n}\n\n\nvoid Expected(char *s)\n{\n    sprintf(tmp, \"%s Expected\", s);\n    Abort(tmp);\n}\n\n\nvoid Match(char x)\n{\n    NewLine();\n    if(Look == x) {\n        GetChar();\n    } else {\n        sprintf(tmp, \"' %c ' \",  x);\n        Expected(tmp);\n    }\n    SkipWhite();\n}\n\nint IsAlpha(char c)\n{\n    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');\n}\n\nint IsDigit(char c)\n{\n    return (c >= '0') && (c <= '9');\n}\n\nint IsAddop(char c)\n{\n    return (c == '+') || (c == '-');\n}\n\nint IsMulop(char c)\n{\n    return (c == '*') || (c == '/');\n}\n\nint IsOrOp(char c)\n{\n    return strchr(\"|~\", c) != NULL;\n}\n\nint IsRelop(char c)\n{\n    return strchr(\"=#<>\", c) != NULL;\n}\n\nint IsWhite(char c)\n{\n    return strchr(\" \\t\\r\\n\", c) != NULL;\n}\n\nint IsAlNum(char c)\n{\n    return IsAlpha(c) || IsDigit(c);\n}\n\nvoid GetName()\n{\n    SkipWhite();\n    if( !IsAlpha(Look)) {\n        Expected(\"Name\");\n    }\n\n    Token = 'x';\n    char *p = Value;\n    do {\n        *p++ = uppercase(Look);\n        GetChar();\n    } while(IsAlNum(Look)) ;\n    *p = '\\0';\n}\n\nvoid GetNum()\n{\n    SkipWhite();\n    if( !IsDigit(Look)) {\n        Expected(\"Integer\");\n    }\n\n    Token = '#';\n    char *p = Value;\n    do {\n        *p++ = Look;\n        GetChar();\n    } while (IsDigit(Look));\n    *p = '\\0';\n}\n\n/* Get an operator */\nvoid GetOp()\n{\n    SkipWhite();\n    Token = Look;\n    Value[0] = Look;\n    Value[1] = '\\0';\n    GetChar();\n}\n\n/* Get the next input token */\nvoid Next()\n{\n    SkipWhite();\n    if (IsAlpha(Look)) {\n        GetName();\n    } else if (IsDigit(Look)) {\n        GetNum();\n    } else {\n        GetOp();\n    }\n}\n\nvoid Emit(char *s)\n{\n    printf(\"\\t%s\", s);\n}\n\nvoid EmitLn(char *s)\n{\n    Emit(s);\n    printf(\"\\n\");\n}\n\nvoid Init()\n{\n    LCount = 0;\n\n    InitTable();\n    GetChar();\n    Next();\n}\n\nvoid InitTable()\n{\n    int i;\n    for (i = 0; i < MaxEntry; i++) {\n        ST[i] = NULL;\n        SType[i] = ' ';\n    }\n\n}\n\n/* look for symbol in table */\nbool InTable(char *symbol)\n{\n    return Locate(symbol) != -1;\n}\n\n/* Check to see if an identifier is in the symbol table,\n * report an error if it's not */\nvoid CheckTable(char *symbol)\n{\n    if (! InTable(symbol)) {\n        Undefined(symbol);\n    }\n}\n\nvoid CheckDup(char *symbol)\n{\n    if (InTable(symbol)) {\n        Duplicate(symbol);\n    }\n}\n\nchar *NewLabel()\n{\n    sprintf(labelName, \"L%02d\", LCount);\n    LCount ++;\n    return labelName;\n}\n\nvoid PostLabel(char *label)\n{\n    printf(\"%s:\\n\", label);\n}\n\nvoid SkipWhite()\n{\n    while (IsWhite(Look)) {\n        GetChar();\n    }\n}\n\n/* Skip over an End-of-Line */\nvoid NewLine()\n{\n    while(Look == '\\n') {\n        GetChar();\n        if (Look == '\\r') {\n            GetChar();\n        }\n        SkipWhite();\n    }\n}\n\n/* re-targetable routines */\nvoid Clear()\n{\n    EmitLn(\"xor %eax, %eax\");\n}\n\nvoid Negate()\n{\n    EmitLn(\"neg %eax\");\n}\n\nvoid LoadConst(char *value)\n{\n    sprintf(tmp, \"movl $%s, %%eax\", value);\n    EmitLn(tmp);\n}\n\n/* Load a variable to primary register */\nvoid LoadVar(char *name)\n{\n    if (!InTable(name)) {\n        char name_string[MAX_BUF];\n        Undefined(name_string);\n    }\n    sprintf(tmp, \"movl %s, %%eax\", name);\n    EmitLn(tmp);\n}\n\n\n/* Push Primary onto stack */\nvoid Push()\n{\n    EmitLn(\"pushl %eax\");\n}\n\n/* Add Top of Stack to primary */\nvoid PopAdd()\n{\n    EmitLn(\"addl (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* Subtract Primary from Top of Stack */\nvoid PopSub()\n{\n    EmitLn(\"subl (%esp), %eax\");\n    EmitLn(\"neg %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* multiply top of stack by primary */\nvoid PopMul()\n{\n    EmitLn(\"imull (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* divide top of stack by primary */\nvoid PopDiv()\n{\n    /* for a expersion like a/b we have eax=b and %(esp)=a\n     * but we need eax=a, and b on the stack\n     */\n    EmitLn(\"movl (%esp), %edx\");\n    EmitLn(\"addl $4, %esp\");\n    EmitLn(\"pushl %eax\");\n    EmitLn(\"movl %edx, %eax\");\n\n    /* sign extesnion */\n    EmitLn(\"sarl $31, %edx\");\n    EmitLn(\"idivl (%esp)\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* store primary to variable */\nvoid Store(char *name)\n{\n    if (!InTable(name)) {\n        char name_string[MAX_BUF];\n        Undefined(name_string);\n    }\n    sprintf(tmp, \"movl %%eax, %s\", name);\n    EmitLn(tmp);\n}\n\nvoid Undefined(char *name)\n{\n    sprintf(tmp, \"Undefined Identifier: %s\", name);\n    Abort(tmp);\n}\n\nvoid Duplicate(char *name)\n{\n    sprintf(tmp, \"Duplicate Identifier: %s\", name);\n    Abort(tmp);\n}\n\n/* Complement the primary register */\nvoid NotIt()\n{\n    EmitLn(\"not %eax\");\n}\n\n/* AND top of Stack with primary */\nvoid PopAnd()\n{\n    EmitLn(\"and (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* OR top of Stack with primary */\nvoid PopOr()\n{\n    EmitLn(\"or (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* XOR top of Stack with primary */\nvoid PopXor()\n{\n    EmitLn(\"xor (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* Compare top of Stack with primary */\nvoid PopCompare()\n{\n    EmitLn(\"addl $4, %esp\");\n    EmitLn(\"cmp -4(%esp), %eax\");\n}\n\n/* set %eax if Compare was = */\nvoid SetEqual()\n{\n    EmitLn(\"sete %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* set %eax if Compare was != */\nvoid SetNEqual()\n{\n    EmitLn(\"setne %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* set %eax if Compare was > */\nvoid SetGreater()\n{\n    EmitLn(\"setl %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* set %eax if Compare was >= */\nvoid SetGreaterOrEqual()\n{\n    EmitLn(\"setle %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* set %eax if Compare was < */\nvoid SetLess()\n{\n    EmitLn(\"setg %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* set %eax if Compare was <= */\nvoid SetLessOrEqual()\n{\n    EmitLn(\"setge %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* Branch unconditional */\nvoid Branch(char *label)\n{\n    sprintf(tmp, \"jmp %s\", label);\n    EmitLn(tmp);\n}\n\n/* Branch False */\nvoid BranchFalse(char *label)\n{\n    EmitLn(\"test $1, %eax\");\n    sprintf(tmp, \"jz %s\", label);\n    EmitLn(tmp);\n}\n"
  },
  {
    "path": "11/cradle.h",
    "content": "#ifndef _CRADLE_H\n#define _CRADLE_H\n#include <stdbool.h>\n\n#define MAX_BUF 100\n#define MaxEntry 100\nextern char tmp[MAX_BUF];\nextern const char *ST[];\nextern char SType[];\nextern char Token;\nextern char Value[MAX_BUF];\nchar Look;\n\nvoid GetChar();\n\nvoid Error(char *s);\nvoid Abort(char *s);\nvoid Expected(char *s);\nvoid Match(char x);\nvoid MatchString(char *str);\n\nint IsAlpha(char c);\nint IsDigit(char c);\nint IsAddop(char c);\nint IsMulop(char c);\nint IsOrOp(char c);\nint IsRelop(char c);\nint IsWhite(char c);\nint IsAlNum(char c);\n\nvoid GetName();\nvoid GetNum();\nvoid GetOp();\nvoid Next();\n\nvoid Emit(char *s);\nvoid EmitLn(char *s);\n\nvoid Init();\nvoid InitTable();\nint Locate(char *symbol);\nbool InTable(char *symbol);\nvoid CheckTable(char *symbol);\nvoid CheckDup(char *symbol);\nvoid AddEntry(char *symbol, char type);\n\nchar *NewLabel();\nvoid PostLabel(char *label);\nvoid SkipWhite();\nvoid NewLine();\nvoid Scan();\n\n/* re-targetable routines */\nvoid Clear();\nvoid Negate();\nvoid LoadConst(char *value);\nvoid LoadVar(char *name);\nvoid Push();\nvoid PopAdd();\nvoid PopSub();\nvoid PopMul();\nvoid PopDiv();\nvoid Store(char *name);\nvoid Undefined(char *name);\nvoid Duplicate(char *name);\nvoid NotIt();\nvoid PopAnd();\nvoid PopOr();\nvoid PopXor();\nvoid PopCompare();\nvoid SetEqual();\nvoid SetNEqual();\nvoid SetGreater();\nvoid SetGreaterOrEqual();\nvoid SetLess();\nvoid SetLessOrEqual();\nvoid Branch(char *label);\nvoid BranchFalse(char *label);\n\n#endif\n"
  },
  {
    "path": "11/main.c",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n#include <stdbool.h>\n\n#include \"cradle.h\"\n\n#ifdef DEBUG\n#define dprint(fmt, ...) printf(fmt, __VA_ARGS__);\n#else\n#define dprint(fmt, ...)\n#endif\n\n\nvoid TopDecls();\nvoid Allocate(char *name, char *value);\nvoid Alloc();\nvoid Block();\nvoid Assignment();\n\nvoid Factor();\nvoid Expression();\nvoid Subtract();\nvoid Term();\nvoid Divide();\nvoid Multiply();\nvoid FirstFactor();\nvoid Add();\nvoid Equals();\nvoid NotEqual();\nvoid Less();\nvoid LessOrEqual();\nvoid Greater();\nvoid Relation();\nvoid NotFactor();\nvoid BoolTerm();\nvoid BoolOr();\nvoid BoolXor();\nvoid BoolExpression();\nvoid DoIf();\nvoid DoWhile();\nvoid CompareExpression();\nvoid NextExpression();\n\nvoid Header()\n{\n    EmitLn(\".global _start\");\n}\n\nvoid Prolog()\n{\n    EmitLn(\".section .text\");\n    EmitLn(\"_start:\");\n}\n\nvoid Epilog()\n{\n    EmitLn(\"movl %eax, %ebx\");\n    EmitLn(\"movl $1, %eax\");\n    EmitLn(\"int $0x80\");\n}\n\nvoid TopDecls()\n{\n    Scan();\n    while(Token == 'v') {\n        EmitLn(\".section .data\"); /* in case that the variable and function\n                                     declarations are mixed */\n        Alloc();\n        while(Token == ',') {\n            Alloc();\n        }\n    }\n}\n\n/* Allocate Storage for a static variable */\nvoid Allocate(char *name, char *value)\n{\n    sprintf(tmp, \"%s: .int %s\", name, value);\n    EmitLn(tmp);\n}\n\nvoid Alloc()\n{\n    char name[MAX_BUF];\n    Next();\n    if (Token != 'x') {\n        Expected(\"Variable Name\");\n    }\n    CheckDup(Value);\n\n    sprintf(name, Value);\n    AddEntry(name, 'v');\n    Next();\n    if (Token == '=') {\n        Next();\n        if (Token != '#') {\n            Expected(\"Integer\");\n        }\n        Allocate(name, Value);\n        Next();\n    } else {\n        Allocate(name, \"0\");\n    }\n}\n\n/* Parse and Translate a Block of Statements \n * <block> ::= ( <statement> )*\n * <statement> ::= <if> | <while> | <assignment>\n * */\nvoid Block()\n{\n    Scan();\n    while(strchr(\"el\", Token) == NULL) {\n        switch (Token) {\n            case 'i':\n                DoIf();\n                break;\n            case 'w':\n                DoWhile();\n                break;\n            default:\n                Assignment();\n                break;\n        }\n        Scan();\n    }\n}\n\nvoid Assignment()\n{\n    char name[MAX_BUF];\n    sprintf(name, Value);\n    Next();\n    MatchString(\"=\");\n    BoolExpression();\n    Store(name);\n}\n\nvoid Factor()\n{\n    if (Token == '(') {\n        Next();\n        BoolExpression();\n        MatchString(\")\");\n    } else {\n        if (Token == 'x') {\n            LoadVar(Value);\n        } else if (Token == '#') {\n            LoadConst(Value);\n        } else {\n            Expected(\"Math Factor\");\n        }\n        Next();\n    }\n}\n\n\nvoid Multiply()\n{\n    Next();\n    Factor();\n    PopMul();\n}\n\nvoid Divide()\n{\n    Next();\n    Factor();\n    PopDiv();\n}\n\nvoid Term()\n{\n    Factor();\n    while(IsMulop(Token)) {\n        Push();\n        switch(Token) {\n            case '*':\n                Multiply();\n                break;\n            case '/':\n                Divide();\n                break;\n            default:\n                break;\n        }\n    }\n}\n\nvoid Add()\n{\n    Next();\n    Term();\n    PopAdd();\n}\n\nvoid Subtract()\n{\n    Next();\n    Term();\n    PopSub();\n}\n\nvoid Expression()\n{\n    if (IsAddop(Token)) {\n        Clear();\n    } else {\n        Term();\n    }\n\n    while(IsAddop(Token)) {\n        Push();\n        switch(Token) {\n            case '+':\n                Add();\n                break;\n            case '-':\n                Subtract();\n                break;\n            default:\n                break;\n        }\n    }\n}\n\n/* Get another expression and compare */\nvoid CompareExpression()\n{\n    Expression();\n    PopCompare();\n}\n\n/* Get the next expression and compare */\nvoid NextExpression()\n{\n    Next();\n    CompareExpression();\n}\n\n/* Recognize and Translate a Relational \"Equals\" */\nvoid Equals()\n{\n    NextExpression();\n    SetEqual();\n}\n\n/* Recognize and Translate a Relational \"Not Equals\" */\nvoid NotEqual()\n{\n    NextExpression();\n    SetNEqual();\n}\n\n/* Recognize and Translate a Relational \"Less Than\" */\nvoid Less()\n{\n    Next();\n    switch(Token) {\n        case '=':\n            LessOrEqual();\n            break;\n        case '>':\n            NotEqual();\n            break;\n        default:\n            CompareExpression();\n            SetLess();\n            break;\n    }\n}\n\n/* Recognize and Translate a Relational \"Less or Equal\" */\nvoid LessOrEqual()\n{\n    NextExpression();\n    SetLessOrEqual();\n}\n\n/* Recognize and Translate a Relational \"Greater Than\" */\nvoid Greater()\n{\n    Next();\n    if (Token == '=') {\n        NextExpression();\n        SetGreaterOrEqual();\n    } else {\n        CompareExpression();\n        SetGreater();\n    }\n}\n\n/* Parse and Translate a Relation */\nvoid Relation()\n{\n    Expression();\n    if (IsRelop(Token)) {\n        Push();\n        switch (Token) {\n            case '=':\n                Equals();\n                break;\n            case '<':\n                Less();\n                break;\n            case '>':\n                Greater();\n                break;\n            default:\n                break;\n        }\n    }\n}\n\n/* Parse and Translate a Boolean Factor with Leading NOT */\nvoid NotFactor()\n{\n    if (Token == '!') {\n        Next();\n        Relation();\n        NotIt();\n    } else {\n        Relation();\n    }\n}\n\n/* Parse and Translate a Boolean Term \n * <bool_term> ::= <not_factor> ( and_op <not_factor )*\n * */\nvoid BoolTerm()\n{\n    NotFactor();\n    while(Token == '&') {\n        Push();\n        Next();\n        NotFactor();\n        PopAnd();\n    }\n}\n\n/* Recognize and Translate a Boolean OR */\nvoid BoolOr()\n{\n    Next();\n    BoolTerm();\n    PopOr();\n}\n\n/* Recognize and Translate a Boolean XOR */\nvoid BoolXor()\n{\n    Next();\n    BoolTerm();\n    PopXor();\n}\n\n/* Parse and Translate a Boolean Expression \n * <bool_expression> ::= <bool_term> ( or_op <bool_term> )* */\nvoid BoolExpression()\n{\n    BoolTerm();\n    while(IsOrOp(Token)) {\n        Push();\n        switch(Look) {\n            case '|':\n                BoolOr();\n                break;\n            case '~':\n                BoolXor();\n                break;\n            default:\n                break;\n        }\n    }\n}\n\n/* Recognize and Translate an IF construct */\nvoid DoIf()\n{\n    Next();\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    sprintf(L1, NewLabel());\n    sprintf(L2, L1);\n    BoolExpression();\n    BranchFalse(L1);\n    Block();\n    if (Token == 'l') {\n        Next();\n        sprintf(L2, NewLabel());\n        Branch(L2);\n        PostLabel(L1);\n        Block();\n    }\n    PostLabel(L2);\n    MatchString(\"ENDIF\");\n}\n\nvoid DoWhile()\n{\n    Next();\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    sprintf(L1, NewLabel());\n    sprintf(L2, NewLabel());\n    PostLabel(L1);\n    BoolExpression();\n    BranchFalse(L2);\n    Block();\n    MatchString(\"ENDWHILE\");\n    Branch(L1);\n    PostLabel(L2);\n}\n\n\nint main()\n{\n    Init();\n    MatchString(\"PROGRAM\");\n    Header();\n    TopDecls();\n    MatchString(\"BEGIN\");\n    Prolog();\n    Block();\n    MatchString(\"END\");\n    Epilog();\n\n    return 0;\n}\n"
  },
  {
    "path": "11/prog.txt",
    "content": "PROGRAM\nVAR xx,\nyy=1,\nzz=10\nBEGIN\n  WHILE yy <= zz\n    IF yy <> 5 \n      xx=xx+yy\n    ELSE\n      xx=xx+5\n    ENDIF\n  yy=yy+1\n  ENDWHILE\nEND.\n\n"
  },
  {
    "path": "11/tutor11.txt",
    "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                     LET'S BUILD A COMPILER!\n\n                                By\n\n                     Jack W. Crenshaw, Ph.D.\n\n                           3 June 1989\n\n\n                 Part XI: LEXICAL SCAN REVISITED\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\nINTRODUCTION\n\nI've got some  good news and some bad news.  The bad news is that\nthis installment is  not  the  one  I promised last time.  What's\nmore, the one after this one won't be, either.\n\nThe good news is the reason for this installment:  I've  found  a\nway  to simplify and improve the lexical  scanning  part  of  the\ncompiler.  Let me explain.\n\n\nBACKGROUND\n\nIf  you'll remember, we talked at length  about  the  subject  of\nlexical  scanners in Part VII, and I left you with a design for a\ndistributed scanner that I felt was about as simple  as  I  could\nmake it ... more than most that I've  seen  elsewhere.    We used\nthat idea in Part X.  The compiler structure  that  resulted  was\nsimple, and it got the job done.\n\nRecently, though, I've begun  to  have  problems, and they're the\nkind that send a message that you might be doing something wrong.\n\nThe  whole thing came to a head when I tried to address the issue\nof  semicolons.  Several people have asked  me  about  them,  and\nwhether or not KISS will have them separating the statements.  My\nintention has been NOT to  use semicolons, simply because I don't\nlike them and, as you can see, they have not proved necessary.\n\nBut I know that many of you, like me, have  gotten  used to them,\nand so  I  set  out  to write a short installment to show you how\nthey could easily be added, if you were so inclined.\n\nWell, it  turned  out  that  they weren't easy to add at all.  In\nfact it was darned difficult.\n\nI guess I should have  realized that something was wrong, because\nof the issue  of  newlines.    In the last couple of installments\nwe've addressed that issue,  and  I've shown you how to deal with\nnewlines with a  procedure called, appropriately enough, NewLine.\nIn  TINY  Version  1.0,  I  sprinkled calls to this procedure  in\nstrategic spots in the code.\n\nIt  seems  that  every time I've addressed the issue of newlines,\nthough,  I've found it to be tricky,  and  the  resulting  parser\nturned out to be quite fragile ... one addition or  deletion here\nor  there and things tended to go to pot.  Looking back on it,  I\nrealize that  there  was  a  message  in  this that I just wasn't\npaying attention to.\n\nWhen I tried to add semicolons  on  top of the newlines, that was\nthe last straw.   I ended up with much too complex a solution.  I\nbegan to realize that something fundamental had to change.\n\nSo,  in  a  way this installment will cause us to backtrack a bit\nand revisit the issue of scanning all over again.    Sorry  about\nthat.  That's the price you pay for watching me  do  this in real\ntime.  But the new version is definitely an improvement, and will\nserve us well for what is to come.\n\nAs  I said, the scanner we used in Part X was about as simple  as\none can get.  But anything can be improved.   The  new scanner is\nmore like the classical  scanner,  and  not  as simple as before.\nBut the overall  compiler  structure is even simpler than before.\nIt's also more robust, and easier to add  to  and/or  modify.   I\nthink that's worth the time spent in this digression.  So in this\ninstallment, I'll be showing  you  the  new  structure.  No doubt\nyou'll  be  happy  to  know  that, while the changes affect  many\nprocedures, they aren't very profound  and so we lose very little\nof what's been done so far.\n\nIronically, the new scanner  is  much  more conventional than the\nold one, and is very much like the more generic scanner  I showed\nyou  earlier  in  Part VII.  Then I started trying to get clever,\nand I almost clevered myself clean out of business.   You'd think\none day I'd learn: K-I-S-S!\n\n\nTHE PROBLEM\n\nThe problem begins to show  itself in procedure Block, which I've\nreproduced below:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Block of Statements }\n\nprocedure Block;\nbegin\n   Scan;\n   while not(Token in ['e', 'l']) do begin\n      case Token of\n       'i': DoIf;\n       'w': DoWhile;\n       'R': DoRead;\n       'W': DoWrite;\n      else Assignment;\n      end;\n      Scan;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nAs  you   can  see,  Block  is  oriented  to  individual  program\nstatements.  At each pass through  the  loop, we know that we are\nat  the beginning of a statement.  We exit the block when we have\nscanned an END or an ELSE.\n\nBut suppose that we see a semicolon instead.   The  procedure  as\nit's shown above  can't  handle that, because procedure Scan only\nexpects and can only accept tokens that begin with a letter.\n\nI  tinkered  around for quite awhile to come up with a  fix.    I\nfound many possible approaches, but none were very satisfying.  I\nfinally figured out the reason.\n\nRecall that when we started with our single-character parsers, we\nadopted a convention that the lookahead character would always be\nprefetched.    That   is,   we  would  have  the  character  that\ncorresponds to our  current  position in the input stream fetched\ninto the global character Look, so that we could  examine  it  as\nmany  times  as  needed.    The  rule  we  adopted was that EVERY\nrecognizer, if it found its target token, would  advance  Look to\nthe next character in the input stream.\n\nThat simple and fixed convention served us very well when  we had\nsingle-character tokens, and it still does.  It would make  a lot\nof sense to apply the same rule to multi-character tokens.\n\nBut when we got into lexical scanning, I began  to  violate  that\nsimple rule.  The scanner of Part X  did  indeed  advance  to the\nnext token if it found an identifier or keyword, but it DIDN'T do\nthat if it found a carriage return, a whitespace character, or an\noperator.\n\nNow, that sort of mixed-mode  operation gets us into deep trouble\nin procedure Block, because whether or not the  input  stream has\nbeen advanced depends upon the kind of token we  encounter.    If\nit's  a keyword or the target of  an  assignment  statement,  the\n\"cursor,\" as defined by the contents of Look,  has  been advanced\nto  the next token OR to the beginning of whitespace.  If, on the\nother  hand,  the  token  is  a  semicolon,  or if we have hit  a\ncarriage return, the cursor has NOT advanced.\n\nNeedless to say, we can add enough logic  to  keep  us  on track.\nBut it's tricky, and makes the whole parser very fragile.\n\nThere's a much  better  way,  and  that's just to adopt that same\nrule that's worked so well before, to apply to TOKENS as  well as\nsingle characters.  In other words, we'll prefetch tokens just as\nwe've always done for  characters.   It seems so obvious once you\nthink about it that way.\n\nInterestingly enough, if we do things this way  the  problem that\nwe've had with newline characters goes away.  We  can  just  lump\nthem in as  whitespace  characters, which means that the handling\nof  newlines  becomes  very trivial, and MUCH less prone to error\nthan we've had to deal with in the past.\n\n\nTHE SOLUTION\n\nLet's  begin  to  fix  the  problem  by  re-introducing  the  two\nprocedures:\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nprocedure GetName;\nbegin\n   SkipWhite;\n   if Not IsAlpha(Look) then Expected('Identifier');\n   Token := 'x';\n   Value := '';\n   repeat\n      Value := Value + UpCase(Look);\n      GetChar;\n   until not IsAlNum(Look);\nend;\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nprocedure GetNum;\nbegin\n   SkipWhite;\n   if not IsDigit(Look) then Expected('Number');\n   Token := '#';\n   Value := '';\n   repeat\n      Value := Value + Look;\n      GetChar;\n   until not IsDigit(Look);\nend;\n{--------------------------------------------------------------}\n\n\nThese two procedures are  functionally  almost  identical  to the\nones  I  showed  you in Part VII.  They each  fetch  the  current\ntoken, either an identifier or a number, into  the  global string\nValue.    They  also  set  the  encoded  version, Token,  to  the\nappropriate code.  The input  stream is left with Look containing\nthe first character NOT part of the token.\n\nWe  can do the same thing  for  operators,  even  multi-character\noperators, with a procedure such as:\n\n\n{--------------------------------------------------------------}\n{ Get an Operator }\n\nprocedure GetOp;\nbegin\n   Token := Look;\n   Value := '';\n   repeat\n      Value := Value + Look;\n      GetChar;\n   until IsAlpha(Look) or IsDigit(Look) or IsWhite(Look);\nend;\n{--------------------------------------------------------------}\n\nNote  that  GetOp  returns,  as  its  encoded  token,  the  FIRST\ncharacter of the operator.  This is important,  because  it means\nthat we can now use that single character to  drive  the  parser,\ninstead of the lookahead character.\n\nWe need to tie these  procedures together into a single procedure\nthat can handle all three  cases.  The  following  procedure will\nread any one of the token types and always leave the input stream\nadvanced beyond it:\n\n\n{--------------------------------------------------------------}\n{ Get the Next Input Token }\n\nprocedure Next;\nbegin\n   SkipWhite;\n   if IsAlpha(Look) then GetName\n   else if IsDigit(Look) then GetNum\n   else GetOp;\nend;\n{--------------------------------------------------------------}\n\n\n***NOTE  that  here  I have put SkipWhite BEFORE the calls rather\nthan after.  This means that, in general, the variable  Look will\nNOT have a meaningful value in it, and therefore  we  should  NOT\nuse it as a test value for parsing, as we have been doing so far.\nThat's the big departure from our normal approach.\n\nNow, remember that before I was careful not to treat the carriage\nreturn (CR) and line  feed  (LF) characters as white space.  This\nwas  because,  with  SkipWhite  called  as the last thing in  the\nscanner, the encounter with  LF  would  trigger a read statement.\nIf we were on the last line of the program,  we  couldn't get out\nuntil we input another line with a non-white  character.   That's\nwhy I needed the second procedure, NewLine, to handle the CRLF's.\n\nBut now, with the call  to SkipWhite coming first, that's exactly\nthe behavior we want.    The  compiler  must know there's another\ntoken coming or it wouldn't be calling Next.  In other words,  it\nhasn't found the terminating  END  yet.  So we're going to insist\non more data until we find something.\n\nAll this means that we can greatly simplify both the  program and\nthe concepts, by treating CR and LF as whitespace characters, and\neliminating NewLine.  You  can  do  that  simply by modifying the\nfunction IsWhite:\n\n\n{--------------------------------------------------------------}\n{ Recognize White Space }\n\nfunction IsWhite(c: char): boolean;\nbegin\n   IsWhite := c in [' ', TAB, CR, LF];\nend;\n{--------------------------------------------------------------}\n\n\nWe've already tried similar routines in Part VII,  but  you might\nas well try these new ones out.  Add them to a copy of the Cradle\nand call Next with the following main program:\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\n   repeat\n      Next;\n      WriteLn(Token, ' ', Value);\n   until Token = '.';\nend.\n{--------------------------------------------------------------}\n\n\nCompile  it and verify that you can separate  a  program  into  a\nseries of tokens, and that you get the right  encoding  for  each\ntoken.\n\nThis ALMOST works,  but  not  quite.    There  are  two potential\nproblems:    First,  in KISS/TINY almost all of our operators are\nsingle-character operators.  The only exceptions  are  the relops\n>=, <=, and <>.  It seems  a  shame  to  treat  all  operators as\nstrings and do a  string  compare,  when  only a single character\ncompare  will  almost  always  suffice.   Second, and  much  more\nimportant, the  thing  doesn't  WORK  when  two  operators appear\ntogether, as in (a+b)*(c+d).  Here the string following 'b' would\nbe interpreted as a single operator \")*(.\"\n\nIt's possible to fix that problem.  For example,  we  could  just\ngive GetOp a  list  of  legal  characters, and we could treat the\nparentheses as different operator types  than  the  others.   But\nthis begins to get messy.\n\nFortunately, there's a  better  way that solves all the problems.\nSince almost  all the operators are single characters, let's just\ntreat  them  that  way, and let GetOp get only one character at a\ntime.  This not only simplifies GetOp, but also speeds  things up\nquite a  bit.    We  still have the problem of the relops, but we\nwere treating them as special cases anyway.\n\nSo here's the final version of GetOp:\n\n\n{--------------------------------------------------------------}\n{ Get an Operator }\n\nprocedure GetOp;\nbegin\n   SkipWhite;\n   Token := Look;\n   Value := Look;\n   GetChar;\nend;\n{--------------------------------------------------------------}\n\n\nNote that I still give the string Value a value.  If you're truly\nconcerned about efficiency, you could leave this out.  When we're\nexpecting an operator, we will only be testing  Token  anyhow, so\nthe  value of the string won't matter.  But to me it seems to  be\ngood practice to give the thing a value just in case.\n\nTry  this  new  version with some realistic-looking  code.    You\nshould  be  able  to  separate  any program into  its  individual\ntokens, with the  caveat  that the two-character relops will scan\ninto two separate tokens.  That's OK ... we'll  parse  them  that\nway.\n\nNow, in Part VII the function of Next was combined with procedure\nScan,  which  also  checked every identifier against  a  list  of\nkeywords and encoded each one that was found.  As I  mentioned at\nthe time, the last thing we would want  to  do  is  to use such a\nprocedure in places where keywords  should not appear, such as in\nexpressions.  If we  did  that, the keyword list would be scanned\nfor every identifier appearing in the code.  Not good.\n\nThe  right  way  to  deal  with  that  is  to simply separate the\nfunctions  of  fetching  tokens and looking for  keywords.    The\nversion of Scan shown below  does NOTHING but check for keywords.\nNotice that it operates on the current token and does NOT advance\nthe input stream.\n\n\n{--------------------------------------------------------------}\n{ Scan the Current Identifier for Keywords }\n\nprocedure Scan;\nbegin\n   if Token = 'x' then\n      Token := KWcode[Lookup(Addr(KWlist), Value, NKW) + 1];\nend;\n{--------------------------------------------------------------}\n\n\nThere is one last detail.  In the compiler there are a few places\nthat we must  actually  check  the  string  value  of  the token.\nMainly, this  is done to distinguish between the different END's,\nbut there are a couple  of  other  places.    (I  should  note in\npassing that we could always  eliminate the need for matching END\ncharacters by encoding each one  to a different character.  Right\nnow we are definitely taking the lazy man's route.)\n\nThe  following  version  of MatchString takes the  place  of  the\ncharacter-oriented Match.  Note that, like Match, it DOES advance\nthe input stream.\n\n\n{--------------------------------------------------------------}\n{ Match a Specific Input String }\n\nprocedure MatchString(x: string);\nbegin\n   if Value <> x then Expected('''' + x + '''');\n   Next;\nend;\n{--------------------------------------------------------------}\n\n\nFIXING UP THE COMPILER\n\nArmed with these new scanner procedures, we can now begin  to fix\nthe compiler to  use  them  properly.   The changes are all quite\nminor,  but  there  are quite a  few  places  where  changes  are\nnecessary.  Rather than  showing  you each place, I will give you\nthe general idea and then just give the finished product.\n\n\nFirst of all, the code for procedure Block doesn't change, though\nits function does:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Block of Statements }\n\nprocedure Block;\nbegin\n   Scan;\n   while not(Token in ['e', 'l']) do begin\n      case Token of\n       'i': DoIf;\n       'w': DoWhile;\n       'R': DoRead;\n       'W': DoWrite;\n      else Assignment;\n      end;\n      Scan;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nRemember that the new version of Scan doesn't  advance  the input\nstream, it only  scans  for  keywords.   The input stream must be\nadvanced by each procedure that Block calls.\n\nIn general, we have to replace every test on Look with  a similar\ntest on Token.  For example:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Boolean Expression }\n\nprocedure BoolExpression;\nbegin\n   BoolTerm;\n   while IsOrOp(Token) do begin\n      Push;\n      case Token of\n       '|': BoolOr;\n       '~': BoolXor;\n      end;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nIn procedures like Add, we don't  have  to use Match anymore.  We\nneed only call Next to advance the input stream:\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an Add }\n\nprocedure Add;\nbegin\n   Next;\n   Term;\n   PopAdd;\nend;\n{-------------------------------------------------------------}\n\n\nControl  structures  are  actually simpler.  We just call Next to\nadvance over the control keywords:\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate an IF Construct }\n\nprocedure Block; Forward;\n\nprocedure DoIf;\nvar L1, L2: string;\nbegin\n   Next;\n   BoolExpression;\n   L1 := NewLabel;\n   L2 := L1;\n   BranchFalse(L1);\n   Block;\n   if Token = 'l' then begin\n      Next;\n      L2 := NewLabel;\n      Branch(L2);\n      PostLabel(L1);\n      Block;\n   end;\n   PostLabel(L2);\n   MatchString('ENDIF');\nend;\n{--------------------------------------------------------------}\n\n\nThat's about the extent of the REQUIRED changes.  In  the listing\nof TINY  Version  1.1  below,  I've  also  made a number of other\n\"improvements\" that  aren't really required.  Let me explain them\nbriefly:\n\n (1)  I've deleted the two procedures Prog and Main, and combined\n      their functions into the main program.  They didn't seem to\n      add  to program clarity ... in fact  they  seemed  to  just\n      muddy things up a little.\n\n (2)  I've  deleted  the  keywords  PROGRAM  and  BEGIN  from the\n      keyword list.  Each  one  only occurs in one place, so it's\n      not necessary to search for it.\n\n (3)  Having been  bitten  by  an  overdose  of  cleverness, I've\n      reminded myself that TINY  is  supposed  to be a minimalist\n      program.  Therefore I've  replaced  the  fancy  handling of\n      unary minus with the dumbest one I could think of.  A giant\n      step backwards in code quality, but a  great simplification\n      of the compiler.  KISS is the right place to use  the other\n      version.\n\n (4)  I've added some  error-checking routines such as CheckTable\n      and CheckDup, and  replaced  in-line code by calls to them.\n      This cleans up a number of routines.\n\n (5)  I've  taken  the  error  checking  out  of  code generation\n      routines  like Store, and put it in  the  parser  where  it\n      belongs.  See Assignment, for example.\n\n (6)  There was an error in InTable and Locate  that  caused them\n      to search all locations  instead  of  only those with valid\n      data  in them.  They now search only  valid  cells.    This\n      allows us to eliminate  the  initialization  of  the symbol\n      table, which was done in Init.\n\n (7)  Procedure AddEntry now has two  arguments,  which  helps to\n      make things a bit more modular.\n\n (8)  I've cleaned up the  code  for  the relational operators by\n      the addition of the  new  procedures  CompareExpression and\n      NextExpression.\n\n (9)  I fixed an error in the Read routine ... the  earlier value\n      did not check for a valid variable name.\n\n\n CONCLUSION\n\nThe resulting compiler for  TINY  is given below.  Other than the\nremoval  of  the  keyword PROGRAM, it parses the same language as\nbefore.    It's  just  a  bit cleaner, and more importantly  it's\nconsiderably more robust.  I feel good about it.\n\nThe next installment will be another  digression:  the discussion\nof  semicolons  and  such that got me into this mess in the first\nplace.  THEN we'll press on  into  procedures and types.  Hang in\nthere with me.  The addition of those features will go a long way\ntowards removing KISS from  the  \"toy  language\" category.  We're\ngetting very close to being able to write a serious compiler.\n\n\nTINY VERSION 1.1\n\n\n{--------------------------------------------------------------}\nprogram Tiny11;\n\n{--------------------------------------------------------------}\n{ Constant Declarations }\n\nconst TAB = ^I;\n      CR  = ^M;\n      LF  = ^J;\n\n      LCount: integer = 0;\n      NEntry: integer = 0;\n\n\n{--------------------------------------------------------------}\n{ Type Declarations }\n\ntype Symbol = string[8];\n\n     SymTab = array[1..1000] of Symbol;\n\n     TabPtr = ^SymTab;\n\n\n{--------------------------------------------------------------}\n{ Variable Declarations }\n\nvar Look : char;             { Lookahead Character }\n    Token: char;             { Encoded Token       }\n    Value: string[16];       { Unencoded Token     }\n\n\nconst MaxEntry = 100;\n\nvar ST   : array[1..MaxEntry] of Symbol;\n    SType: array[1..MaxEntry] of char;\n\n\n{--------------------------------------------------------------}\n{ Definition of Keywords and Token Types }\n\nconst NKW =   9;\n      NKW1 = 10;\n\nconst KWlist: array[1..NKW] of Symbol =\n              ('IF', 'ELSE', 'ENDIF', 'WHILE', 'ENDWHILE',\n               'READ', 'WRITE', 'VAR', 'END');\n\nconst KWcode: string[NKW1] = 'xileweRWve';\n\n\n{--------------------------------------------------------------}\n{ Read New Character From Input Stream }\n\nprocedure GetChar;\nbegin\n   Read(Look);\nend;\n\n{--------------------------------------------------------------}\n{ Report an Error }\n\nprocedure Error(s: string);\nbegin\n   WriteLn;\n   WriteLn(^G, 'Error: ', s, '.');\nend;\n\n\n{--------------------------------------------------------------}\n{ Report Error and Halt }\n\nprocedure Abort(s: string);\nbegin\n   Error(s);\n   Halt;\nend;\n\n\n{--------------------------------------------------------------}\n{ Report What Was Expected }\n\nprocedure Expected(s: string);\nbegin\n   Abort(s + ' Expected');\nend;\n\n{--------------------------------------------------------------}\n{ Report an Undefined Identifier }\n\nprocedure Undefined(n: string);\nbegin\n   Abort('Undefined Identifier ' + n);\nend;\n\n\n{--------------------------------------------------------------}\n{ Report a Duplicate Identifier }\n\nprocedure Duplicate(n: string);\nbegin\n   Abort('Duplicate Identifier ' + n);\nend;\n\n\n{--------------------------------------------------------------}\n{ Check to Make Sure the Current Token is an Identifier }\n\nprocedure CheckIdent;\nbegin\n   if Token <> 'x' then Expected('Identifier');\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an Alpha Character }\n\nfunction IsAlpha(c: char): boolean;\nbegin\n   IsAlpha := UpCase(c) in ['A'..'Z'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Decimal Digit }\n\nfunction IsDigit(c: char): boolean;\nbegin\n   IsDigit := c in ['0'..'9'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an AlphaNumeric Character }\n\nfunction IsAlNum(c: char): boolean;\nbegin\n   IsAlNum := IsAlpha(c) or IsDigit(c);\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an Addop }\n\nfunction IsAddop(c: char): boolean;\nbegin\n   IsAddop := c in ['+', '-'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Mulop }\n\nfunction IsMulop(c: char): boolean;\nbegin\n   IsMulop := c in ['*', '/'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Boolean Orop }\n\nfunction IsOrop(c: char): boolean;\nbegin\n   IsOrop := c in ['|', '~'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Relop }\n\nfunction IsRelop(c: char): boolean;\nbegin\n   IsRelop := c in ['=', '#', '<', '>'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize White Space }\n\nfunction IsWhite(c: char): boolean;\nbegin\n   IsWhite := c in [' ', TAB, CR, LF];\nend;\n\n\n{--------------------------------------------------------------}\n{ Skip Over Leading White Space }\n\nprocedure SkipWhite;\nbegin\n   while IsWhite(Look) do\n      GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Table Lookup }\n\nfunction Lookup(T: TabPtr; s: string; n: integer): integer;\nvar i: integer;\n    found: Boolean;\nbegin\n   found := false;\n   i := n;\n   while (i > 0) and not found do\n      if s = T^[i] then\n         found := true\n      else\n         dec(i);\n   Lookup := i;\nend;\n\n\n{--------------------------------------------------------------}\n{ Locate a Symbol in Table }\n{ Returns the index of the entry.  Zero if not present. }\n\nfunction Locate(N: Symbol): integer;\nbegin\n   Locate := Lookup(@ST, n, NEntry);\nend;\n\n\n{--------------------------------------------------------------}\n{ Look for Symbol in Table }\n\nfunction InTable(n: Symbol): Boolean;\nbegin\n   InTable := Lookup(@ST, n, NEntry) <> 0;\nend;\n\n\n{--------------------------------------------------------------}\n{ Check to See if an Identifier is in the Symbol Table         }\n{ Report an error if it's not. }\n\n\nprocedure CheckTable(N: Symbol);\nbegin\n   if not InTable(N) then Undefined(N);\nend;\n\n\n{--------------------------------------------------------------}\n{ Check the Symbol Table for a Duplicate Identifier }\n{ Report an error if identifier is already in table. }\n\n\nprocedure CheckDup(N: Symbol);\nbegin\n   if InTable(N) then Duplicate(N);\nend;\n\n\n{--------------------------------------------------------------}\n{ Add a New Entry to Symbol Table }\n\nprocedure AddEntry(N: Symbol; T: char);\nbegin\n   CheckDup(N);\n   if NEntry = MaxEntry then Abort('Symbol Table Full');\n   Inc(NEntry);\n   ST[NEntry] := N;\n   SType[NEntry] := T;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nprocedure GetName;\nbegin\n   SkipWhite;\n   if Not IsAlpha(Look) then Expected('Identifier');\n   Token := 'x';\n   Value := '';\n   repeat\n      Value := Value + UpCase(Look);\n      GetChar;\n   until not IsAlNum(Look);\nend;\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nprocedure GetNum;\nbegin\n   SkipWhite;\n   if not IsDigit(Look) then Expected('Number');\n   Token := '#';\n   Value := '';\n   repeat\n      Value := Value + Look;\n      GetChar;\n   until not IsDigit(Look);\nend;\n\n\n{--------------------------------------------------------------}\n{ Get an Operator }\n\nprocedure GetOp;\nbegin\n   SkipWhite;\n   Token := Look;\n   Value := Look;\n   GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get the Next Input Token }\n\nprocedure Next;\nbegin\n   SkipWhite;\n   if IsAlpha(Look) then GetName\n   else if IsDigit(Look) then GetNum\n   else GetOp;\nend;\n\n\n{--------------------------------------------------------------}\n{ Scan the Current Identifier for Keywords }\n\nprocedure Scan;\nbegin\n   if Token = 'x' then\n      Token := KWcode[Lookup(Addr(KWlist), Value, NKW) + 1];\nend;\n\n\n{--------------------------------------------------------------}\n{ Match a Specific Input String }\n\nprocedure MatchString(x: string);\nbegin\n   if Value <> x then Expected('''' + x + '''');\n   Next;\nend;\n\n\n{--------------------------------------------------------------}\n{ Output a String with Tab }\n\nprocedure Emit(s: string);\nbegin\n   Write(TAB, s);\nend;\n\n\n{--------------------------------------------------------------}\n{ Output a String with Tab and CRLF }\n\nprocedure EmitLn(s: string);\nbegin\n   Emit(s);\n   WriteLn;\nend;\n\n\n{--------------------------------------------------------------}\n{ Generate a Unique Label }\n\nfunction NewLabel: string;\nvar S: string;\nbegin\n   Str(LCount, S);\n   NewLabel := 'L' + S;\n   Inc(LCount);\nend;\n\n\n{--------------------------------------------------------------}\n{ Post a Label To Output }\n\nprocedure PostLabel(L: string);\nbegin\n   WriteLn(L, ':');\nend;\n\n\n{---------------------------------------------------------------}\n{ Clear the Primary Register }\n\nprocedure Clear;\nbegin\n   EmitLn('CLR D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Negate the Primary Register }\n\nprocedure Negate;\nbegin\n   EmitLn('NEG D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Complement the Primary Register }\n\nprocedure NotIt;\nbegin\n   EmitLn('NOT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Load a Constant Value to Primary Register }\n\nprocedure LoadConst(n: string);\nbegin\n   Emit('MOVE #');\n   WriteLn(n, ',D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Load a Variable to Primary Register }\n\nprocedure LoadVar(Name: string);\nbegin\n   if not InTable(Name) then Undefined(Name);\n   EmitLn('MOVE ' + Name + '(PC),D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Push Primary onto Stack }\n\nprocedure Push;\nbegin\n   EmitLn('MOVE D0,-(SP)');\nend;\n\n\n{---------------------------------------------------------------}\n{ Add Top of Stack to Primary }\n\nprocedure PopAdd;\nbegin\n   EmitLn('ADD (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Subtract Primary from Top of Stack }\n\nprocedure PopSub;\nbegin\n   EmitLn('SUB (SP)+,D0');\n   EmitLn('NEG D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Multiply Top of Stack by Primary }\n\nprocedure PopMul;\nbegin\n   EmitLn('MULS (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Divide Top of Stack by Primary }\n\nprocedure PopDiv;\nbegin\n   EmitLn('MOVE (SP)+,D7');\n   EmitLn('EXT.L D7');\n   EmitLn('DIVS D0,D7');\n   EmitLn('MOVE D7,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ AND Top of Stack with Primary }\n\nprocedure PopAnd;\nbegin\n   EmitLn('AND (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ OR Top of Stack with Primary }\n\nprocedure PopOr;\nbegin\n   EmitLn('OR (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ XOR Top of Stack with Primary }\n\nprocedure PopXor;\nbegin\n   EmitLn('EOR (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Compare Top of Stack with Primary }\n\nprocedure PopCompare;\nbegin\n   EmitLn('CMP (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was = }\n\nprocedure SetEqual;\nbegin\n   EmitLn('SEQ D0');\n   EmitLn('EXT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was != }\n\nprocedure SetNEqual;\nbegin\n   EmitLn('SNE D0');\n   EmitLn('EXT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was > }\n\nprocedure SetGreater;\nbegin\n   EmitLn('SLT D0');\n   EmitLn('EXT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was < }\n\nprocedure SetLess;\nbegin\n   EmitLn('SGT D0');\n   EmitLn('EXT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was <= }\n\nprocedure SetLessOrEqual;\nbegin\n   EmitLn('SGE D0');\n   EmitLn('EXT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Set D0 If Compare was >= }\n\nprocedure SetGreaterOrEqual;\nbegin\n   EmitLn('SLE D0');\n   EmitLn('EXT D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Store Primary to Variable }\n\nprocedure Store(Name: string);\nbegin\n   EmitLn('LEA ' + Name + '(PC),A0');\n   EmitLn('MOVE D0,(A0)')\nend;\n\n\n{---------------------------------------------------------------}\n{ Branch Unconditional  }\n\nprocedure Branch(L: string);\nbegin\n   EmitLn('BRA ' + L);\nend;\n\n\n{---------------------------------------------------------------}\n{ Branch False }\n\nprocedure BranchFalse(L: string);\nbegin\n   EmitLn('TST D0');\n   EmitLn('BEQ ' + L);\nend;\n\n\n{---------------------------------------------------------------}\n{ Read Variable to Primary Register }\n\nprocedure ReadIt(Name: string);\nbegin\n   EmitLn('BSR READ');\n   Store(Name);\nend;\n\n\n{ Write from Primary Register }\n\nprocedure WriteIt;\nbegin\n   EmitLn('BSR WRITE');\nend;\n\n\n{--------------------------------------------------------------}\n{ Write Header Info }\n\nprocedure Header;\nbegin\n   WriteLn('WARMST', TAB, 'EQU $A01E');\nend;\n\n\n{--------------------------------------------------------------}\n{ Write the Prolog }\n\nprocedure Prolog;\nbegin\n   PostLabel('MAIN');\nend;\n\n\n{--------------------------------------------------------------}\n{ Write the Epilog }\n\nprocedure Epilog;\nbegin\n   EmitLn('DC WARMST');\n   EmitLn('END MAIN');\nend;\n\n\n{---------------------------------------------------------------}\n{ Allocate Storage for a Static Variable }\n\nprocedure Allocate(Name, Val: string);\nbegin\n   WriteLn(Name, ':', TAB, 'DC ', Val);\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Factor }\n\nprocedure BoolExpression; Forward;\n\nprocedure Factor;\nbegin\n   if Token = '(' then begin\n      Next;\n      BoolExpression;\n      MatchString(')');\n      end\n   else begin\n      if Token = 'x' then\n         LoadVar(Value)\n      else if Token = '#' then\n         LoadConst(Value)\n      else Expected('Math Factor');\n      Next;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Multiply }\n\nprocedure Multiply;\nbegin\n   Next;\n   Factor;\n   PopMul;\nend;\n\n\n{-------------------------------------------------------------}\n{ Recognize and Translate a Divide }\n\nprocedure Divide;\nbegin\n   Next;\n   Factor;\n   PopDiv;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Term }\n\nprocedure Term;\nbegin\n   Factor;\n   while IsMulop(Token) do begin\n      Push;\n      case Token of\n       '*': Multiply;\n       '/': Divide;\n      end;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an Add }\n\nprocedure Add;\nbegin\n   Next;\n   Term;\n   PopAdd;\nend;\n\n\n{-------------------------------------------------------------}\n{ Recognize and Translate a Subtract }\n\nprocedure Subtract;\nbegin\n   Next;\n   Term;\n   PopSub;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nprocedure Expression;\nbegin\n   if IsAddop(Token) then\n      Clear\n   else\n      Term;\n   while IsAddop(Token) do begin\n      Push;\n      case Token of\n       '+': Add;\n       '-': Subtract;\n      end;\n   end;\nend;\n\n\n{---------------------------------------------------------------}\n{ Get Another Expression and Compare }\n\nprocedure CompareExpression;\nbegin\n   Expression;\n   PopCompare;\nend;\n\n\n{---------------------------------------------------------------}\n{ Get The Next Expression and Compare }\n\nprocedure NextExpression;\nbegin\n   Next;\n   CompareExpression;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Equals\" }\n\nprocedure Equal;\nbegin\n   NextExpression;\n   SetEqual;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Less Than or Equal\" }\n\nprocedure LessOrEqual;\nbegin\n   NextExpression;\n   SetLessOrEqual;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Not Equals\" }\n\nprocedure NotEqual;\nbegin\n   NextExpression;\n   SetNEqual;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Less Than\" }\n\nprocedure Less;\nbegin\n   Next;\n   case Token of\n     '=': LessOrEqual;\n     '>': NotEqual;\n   else begin\n           CompareExpression;\n           SetLess;\n        end;\n   end;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Greater Than\" }\n\nprocedure Greater;\nbegin\n   Next;\n   if Token = '=' then begin\n      NextExpression;\n      SetGreaterOrEqual;\n      end\n   else begin\n      CompareExpression;\n      SetGreater;\n   end;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Relation }\n\n\nprocedure Relation;\nbegin\n   Expression;\n   if IsRelop(Token) then begin\n      Push;\n      case Token of\n       '=': Equal;\n       '<': Less;\n       '>': Greater;\n      end;\n   end;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Boolean Factor with Leading NOT }\n\nprocedure NotFactor;\nbegin\n   if Token = '!' then begin\n      Next;\n      Relation;\n      NotIt;\n      end\n   else\n      Relation;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Boolean Term }\n\nprocedure BoolTerm;\nbegin\n   NotFactor;\n   while Token = '&' do begin\n      Push;\n      Next;\n      NotFactor;\n      PopAnd;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Boolean OR }\n\nprocedure BoolOr;\nbegin\n   Next;\n   BoolTerm;\n   PopOr;\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an Exclusive Or }\n\nprocedure BoolXor;\nbegin\n   Next;\n   BoolTerm;\n   PopXor;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Boolean Expression }\n\nprocedure BoolExpression;\nbegin\n   BoolTerm;\n   while IsOrOp(Token) do begin\n      Push;\n      case Token of\n       '|': BoolOr;\n       '~': BoolXor;\n      end;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Assignment Statement }\n\nprocedure Assignment;\nvar Name: string;\nbegin\n   CheckTable(Value);\n   Name := Value;\n   Next;\n   MatchString('=');\n   BoolExpression;\n   Store(Name);\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate an IF Construct }\n\nprocedure Block; Forward;\n\nprocedure DoIf;\nvar L1, L2: string;\nbegin\n   Next;\n   BoolExpression;\n   L1 := NewLabel;\n   L2 := L1;\n   BranchFalse(L1);\n   Block;\n   if Token = 'l' then begin\n      Next;\n      L2 := NewLabel;\n      Branch(L2);\n      PostLabel(L1);\n      Block;\n   end;\n   PostLabel(L2);\n   MatchString('ENDIF');\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a WHILE Statement }\n\nprocedure DoWhile;\nvar L1, L2: string;\nbegin\n   Next;\n   L1 := NewLabel;\n   L2 := NewLabel;\n   PostLabel(L1);\n   BoolExpression;\n   BranchFalse(L2);\n   Block;\n   MatchString('ENDWHILE');\n   Branch(L1);\n   PostLabel(L2);\nend;\n\n\n{--------------------------------------------------------------}\n{ Read a Single Variable }\n\nprocedure ReadVar;\nbegin\n   CheckIdent;\n   CheckTable(Value);\n   ReadIt(Value);\n   Next;\nend;\n\n\n{--------------------------------------------------------------}\n{ Process a Read Statement }\n\nprocedure DoRead;\nbegin\n   Next;\n   MatchString('(');\n   ReadVar;\n   while Token = ',' do begin\n      Next;\n      ReadVar;\n   end;\n   MatchString(')');\nend;\n\n\n{--------------------------------------------------------------}\n{ Process a Write Statement }\n\nprocedure DoWrite;\nbegin\n   Next;\n   MatchString('(');\n   Expression;\n   WriteIt;\n   while Token = ',' do begin\n      Next;\n      Expression;\n      WriteIt;\n   end;\n   MatchString(')');\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Block of Statements }\n\nprocedure Block;\nbegin\n   Scan;\n   while not(Token in ['e', 'l']) do begin\n      case Token of\n       'i': DoIf;\n       'w': DoWhile;\n       'R': DoRead;\n       'W': DoWrite;\n      else Assignment;\n      end;\n      Scan;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Allocate Storage for a Variable }\n\nprocedure Alloc;\nbegin\n   Next;\n   if Token <> 'x' then Expected('Variable Name');\n   CheckDup(Value);\n   AddEntry(Value, 'v');\n   Allocate(Value, '0');\n   Next;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate Global Declarations }\n\nprocedure TopDecls;\nbegin\n   Scan;\n   while Token = 'v' do\n      Alloc;\n      while Token = ',' do\n         Alloc;\nend;\n\n\n{--------------------------------------------------------------}\n{ Initialize }\n\nprocedure Init;\nbegin\n   GetChar;\n   Next;\nend;\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\n   MatchString('PROGRAM');\n   Header;\n   TopDecls;\n   MatchString('BEGIN');\n   Prolog;\n   Block;\n   MatchString('END');\n   Epilog;\nend.\n{--------------------------------------------------------------}\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n"
  },
  {
    "path": "12/Makefile",
    "content": "IN=main.c cradle.c\nOUT=main\nFLAGS=-Wall -Werror\n\nall:\n\tgcc -o $(OUT) $(IN) $(FLAGS)\n\nrun:\n\t./$(OUT)\n\n.PHONY: clean\nclean:\n\trm $(OUT)\n"
  },
  {
    "path": "12/cradle.c",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <stdbool.h>\n#include <string.h>\n\n#include \"cradle.h\"\n#include <malloc.h>\n\n#define MaxEntry 100\n#define MAX_SYMBOL_LENGTH 10\nstatic int LCount = 0;\nstatic char labelName[MAX_BUF];\nchar tmp[MAX_BUF];\nchar TempChar;\n\n/*char ST[TABLE_SIZE];*/\nstatic int NEntry = 0;\nconst char *ST[MaxEntry];\nchar SType[MaxEntry];\n\n\n/* Keywords symbol table */\nconst char const *KWList[] = {\n    \"IF\",\n    \"ELSE\",\n    \"ENDIF\",\n    \"WHILE\",\n    \"ENDWHILE\",\n    \"VAR\",\n    \"END\",\n};\nconst char KWCode[] = \"xileweve\";\nconst int KWNum = sizeof(KWList)/sizeof(*KWList);\n\nchar Token;             /* current token */\nchar Value[MAX_BUF];    /* string token of Look */\n\n/* Helper Functions */\nchar uppercase(char c)\n{\n    if (IsAlpha(c)) {\n        return (c & 0xDF);\n    } else {\n        return c;\n    }\n}\n\n/* Table Lookup\n * If the input string matches a table entry, return the entry index, else\n * return -1.\n * *n* is the size of the table */\nint Lookup(const char const *table[], const char *string, int n)\n{\n    int i;\n    bool found = false;\n\n    for (i = 0; i < n; ++i) {\n        if (strcmp(table[i], string) == 0) {\n            found = true;\n            break;\n        }\n    }\n    return found ? i : -1;\n}\n\nint Locate(char *symbol)\n{\n    return Lookup(ST, symbol, NEntry);\n}\n\n/* Add a new entry to symbol table */\nvoid AddEntry(char *symbol, char type)\n{\n    CheckDup(symbol);\n    if (NEntry == MaxEntry) {\n        Abort(\"Symbol Table Full\");\n    }\n\n    char *new_entry = (char *)malloc((strlen(symbol)+1)*sizeof(*new_entry));\n    if (new_entry == NULL) {\n        Abort(\"AddEntry: not enough memory allocating new_entry.\");\n    }\n    strcpy(new_entry, symbol);\n    ST[NEntry] = new_entry;\n    SType[NEntry] = type;\n\n    NEntry++;\n}\n\n/* Get an Identifier and Scan it for keywords */\nvoid Scan()\n{\n    if (Token == 'x') {\n        int index = Lookup(KWList, Value, KWNum);\n        Token = KWCode[index+1];\n    }\n}\n\nvoid MatchString(char *str)\n{\n    if (strcmp(Value, str) != 0) {\n        sprintf(tmp, \"\\\"%s\\\"\", str);\n        Expected(tmp);\n    }\n    Next();\n}\n\nvoid GetCharX()\n{\n    Look = getchar();\n    /* printf(\"Getchar: %c\\n\", Look); */\n}\n\nvoid GetChar()\n{\n    if (TempChar != ' ') {\n        Look = TempChar;\n        TempChar = ' ';\n    } else {\n        GetCharX();\n        if (Look == '/') {\n            TempChar = getchar();\n            if (TempChar == '*') {\n                Look = '{';\n                TempChar = ' ';\n            }\n        }\n    }\n}\n\nvoid Error(char *s)\n{\n    printf(\"\\nError: %s.\", s);\n}\n\nvoid Abort(char *s)\n{\n    Error(s);\n    exit(1);\n}\n\n\nvoid Expected(char *s)\n{\n    sprintf(tmp, \"%s Expected\", s);\n    Abort(tmp);\n}\n\n\nvoid Match(char x)\n{\n    NewLine();\n    if(Look == x) {\n        GetChar();\n    } else {\n        sprintf(tmp, \"' %c ' \",  x);\n        Expected(tmp);\n    }\n    SkipWhite();\n}\n\nint IsAlpha(char c)\n{\n    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');\n}\n\nint IsDigit(char c)\n{\n    return (c >= '0') && (c <= '9');\n}\n\nint IsAddop(char c)\n{\n    return (c == '+') || (c == '-');\n}\n\nint IsMulop(char c)\n{\n    return (c == '*') || (c == '/');\n}\n\nint IsOrOp(char c)\n{\n    return strchr(\"|~\", c) != NULL;\n}\n\nint IsRelop(char c)\n{\n    return strchr(\"=#<>\", c) != NULL;\n}\n\nint IsWhite(char c)\n{\n    return strchr(\" \\t\\r\\n{\", c) != NULL;\n}\n\nint IsAlNum(char c)\n{\n    return IsAlpha(c) || IsDigit(c);\n}\n\nvoid GetName()\n{\n    SkipWhite();\n    if( !IsAlpha(Look)) {\n        Expected(\"Name\");\n    }\n\n    Token = 'x';\n    char *p = Value;\n    do {\n        *p++ = uppercase(Look);\n        GetChar();\n    } while(IsAlNum(Look)) ;\n    *p = '\\0';\n}\n\nvoid GetNum()\n{\n    SkipWhite();\n    if( !IsDigit(Look)) {\n        Expected(\"Integer\");\n    }\n\n    Token = '#';\n    char *p = Value;\n    do {\n        *p++ = Look;\n        GetChar();\n    } while (IsDigit(Look));\n    *p = '\\0';\n}\n\n/* Get an operator */\nvoid GetOp()\n{\n    SkipWhite();\n    Token = Look;\n    Value[0] = Look;\n    Value[1] = '\\0';\n    GetChar();\n}\n\n/* Get the next input token */\nvoid Next()\n{\n    SkipWhite();\n    if (IsAlpha(Look)) {\n        GetName();\n    } else if (IsDigit(Look)) {\n        GetNum();\n    } else {\n        GetOp();\n    }\n}\n\nvoid Emit(char *s)\n{\n    printf(\"\\t%s\", s);\n}\n\nvoid EmitLn(char *s)\n{\n    Emit(s);\n    printf(\"\\n\");\n}\n\nvoid Init()\n{\n    LCount = 0;\n\n    InitTable();\n    GetChar();\n    Next();\n}\n\nvoid InitTable()\n{\n    int i;\n    for (i = 0; i < MaxEntry; i++) {\n        ST[i] = NULL;\n        SType[i] = ' ';\n    }\n\n}\n\n/* look for symbol in table */\nbool InTable(char *symbol)\n{\n    return Locate(symbol) != -1;\n}\n\n/* Check to see if an identifier is in the symbol table,\n * report an error if it's not */\nvoid CheckTable(char *symbol)\n{\n    if (! InTable(symbol)) {\n        Undefined(symbol);\n    }\n}\n\nvoid CheckDup(char *symbol)\n{\n    if (InTable(symbol)) {\n        Duplicate(symbol);\n    }\n}\n\nchar *NewLabel()\n{\n    sprintf(labelName, \"L%02d\", LCount);\n    LCount ++;\n    return labelName;\n}\n\nvoid PostLabel(char *label)\n{\n    printf(\"%s:\\n\", label);\n}\n\nvoid SkipWhite()\n{\n    while (IsWhite(Look)) {\n        if (Look == '{') {\n            SkipComment();\n        } else {\n            GetChar();\n        }\n    }\n}\n\nvoid SkipComment()\n{\n    do {\n        do {\n            GetChar();\n            if (Look == '{') {\n                SkipComment();\n            }\n        } while(Look != '*');\n        GetCharX();\n    } while(Look != '/');\n    GetChar();\n}\n\n/* Skip over an End-of-Line */\nvoid NewLine()\n{\n    while(Look == '\\n') {\n        GetChar();\n        if (Look == '\\r') {\n            GetChar();\n        }\n        SkipWhite();\n    }\n}\n\n/* re-targetable routines */\nvoid Clear()\n{\n    EmitLn(\"xor %eax, %eax\");\n}\n\nvoid Negate()\n{\n    EmitLn(\"neg %eax\");\n}\n\nvoid LoadConst(char *value)\n{\n    sprintf(tmp, \"movl $%s, %%eax\", value);\n    EmitLn(tmp);\n}\n\n/* Load a variable to primary register */\nvoid LoadVar(char *name)\n{\n    if (!InTable(name)) {\n        char name_string[MAX_BUF];\n        Undefined(name_string);\n    }\n    sprintf(tmp, \"movl %s, %%eax\", name);\n    EmitLn(tmp);\n}\n\n\n/* Push Primary onto stack */\nvoid Push()\n{\n    EmitLn(\"pushl %eax\");\n}\n\n/* Add Top of Stack to primary */\nvoid PopAdd()\n{\n    EmitLn(\"addl (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* Subtract Primary from Top of Stack */\nvoid PopSub()\n{\n    EmitLn(\"subl (%esp), %eax\");\n    EmitLn(\"neg %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* multiply top of stack by primary */\nvoid PopMul()\n{\n    EmitLn(\"imull (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* divide top of stack by primary */\nvoid PopDiv()\n{\n    /* for a expersion like a/b we have eax=b and %(esp)=a\n     * but we need eax=a, and b on the stack\n     */\n    EmitLn(\"movl (%esp), %edx\");\n    EmitLn(\"addl $4, %esp\");\n    EmitLn(\"pushl %eax\");\n    EmitLn(\"movl %edx, %eax\");\n\n    /* sign extesnion */\n    EmitLn(\"sarl $31, %edx\");\n    EmitLn(\"idivl (%esp)\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* store primary to variable */\nvoid Store(char *name)\n{\n    if (!InTable(name)) {\n        char name_string[MAX_BUF];\n        Undefined(name_string);\n    }\n    sprintf(tmp, \"movl %%eax, %s\", name);\n    EmitLn(tmp);\n}\n\nvoid Undefined(char *name)\n{\n    sprintf(tmp, \"Undefined Identifier: %s\", name);\n    Abort(tmp);\n}\n\nvoid Duplicate(char *name)\n{\n    sprintf(tmp, \"Duplicate Identifier: %s\", name);\n    Abort(tmp);\n}\n\n/* Complement the primary register */\nvoid NotIt()\n{\n    EmitLn(\"not %eax\");\n}\n\n/* AND top of Stack with primary */\nvoid PopAnd()\n{\n    EmitLn(\"and (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* OR top of Stack with primary */\nvoid PopOr()\n{\n    EmitLn(\"or (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* XOR top of Stack with primary */\nvoid PopXor()\n{\n    EmitLn(\"xor (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* Compare top of Stack with primary */\nvoid PopCompare()\n{\n    EmitLn(\"addl $4, %esp\");\n    EmitLn(\"cmp -4(%esp), %eax\");\n}\n\n/* set %eax if Compare was = */\nvoid SetEqual()\n{\n    EmitLn(\"sete %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* set %eax if Compare was != */\nvoid SetNEqual()\n{\n    EmitLn(\"setne %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* set %eax if Compare was > */\nvoid SetGreater()\n{\n    EmitLn(\"setl %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* set %eax if Compare was >= */\nvoid SetGreaterOrEqual()\n{\n    EmitLn(\"setle %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* set %eax if Compare was < */\nvoid SetLess()\n{\n    EmitLn(\"setg %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* set %eax if Compare was <= */\nvoid SetLessOrEqual()\n{\n    EmitLn(\"setge %al\");\n    EmitLn(\"movsx %al, %eax\");\n}\n\n/* Branch unconditional */\nvoid Branch(char *label)\n{\n    sprintf(tmp, \"jmp %s\", label);\n    EmitLn(tmp);\n}\n\n/* Branch False */\nvoid BranchFalse(char *label)\n{\n    EmitLn(\"test $1, %eax\");\n    sprintf(tmp, \"jz %s\", label);\n    EmitLn(tmp);\n}\n"
  },
  {
    "path": "12/cradle.h",
    "content": "#ifndef _CRADLE_H\n#define _CRADLE_H\n#include <stdbool.h>\n\n#define MAX_BUF 100\n#define MaxEntry 100\nextern char tmp[MAX_BUF];\nextern const char *ST[];\nextern char SType[];\nextern char Token;\nextern char Value[MAX_BUF];\nchar Look;\n\nvoid GetChar();\n\nvoid Error(char *s);\nvoid Abort(char *s);\nvoid Expected(char *s);\nvoid Match(char x);\nvoid MatchString(char *str);\n\nint IsAlpha(char c);\nint IsDigit(char c);\nint IsAddop(char c);\nint IsMulop(char c);\nint IsOrOp(char c);\nint IsRelop(char c);\nint IsWhite(char c);\nint IsAlNum(char c);\n\nvoid GetName();\nvoid GetNum();\nvoid GetOp();\nvoid Next();\n\nvoid Emit(char *s);\nvoid EmitLn(char *s);\n\nvoid Init();\nvoid InitTable();\nint Locate(char *symbol);\nbool InTable(char *symbol);\nvoid CheckTable(char *symbol);\nvoid CheckDup(char *symbol);\nvoid AddEntry(char *symbol, char type);\n\nchar *NewLabel();\nvoid PostLabel(char *label);\nvoid SkipWhite();\nvoid SkipComment();\nvoid NewLine();\nvoid Scan();\n\n/* re-targetable routines */\nvoid Clear();\nvoid Negate();\nvoid LoadConst(char *value);\nvoid LoadVar(char *name);\nvoid Push();\nvoid PopAdd();\nvoid PopSub();\nvoid PopMul();\nvoid PopDiv();\nvoid Store(char *name);\nvoid Undefined(char *name);\nvoid Duplicate(char *name);\nvoid NotIt();\nvoid PopAnd();\nvoid PopOr();\nvoid PopXor();\nvoid PopCompare();\nvoid SetEqual();\nvoid SetNEqual();\nvoid SetGreater();\nvoid SetGreaterOrEqual();\nvoid SetLess();\nvoid SetLessOrEqual();\nvoid Branch(char *label);\nvoid BranchFalse(char *label);\n\n#endif\n"
  },
  {
    "path": "12/main.c",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n#include <stdbool.h>\n\n#include \"cradle.h\"\n\n#ifdef DEBUG\n#define dprint(fmt, ...) printf(fmt, __VA_ARGS__);\n#else\n#define dprint(fmt, ...)\n#endif\n\n\nvoid TopDecls();\nvoid Allocate(char *name, char *value);\nvoid Alloc();\nvoid Block();\nvoid Assignment();\n\nvoid Factor();\nvoid Expression();\nvoid Subtract();\nvoid Term();\nvoid Divide();\nvoid Multiply();\nvoid FirstFactor();\nvoid Add();\nvoid Equals();\nvoid NotEqual();\nvoid Less();\nvoid LessOrEqual();\nvoid Greater();\nvoid Relation();\nvoid NotFactor();\nvoid BoolTerm();\nvoid BoolOr();\nvoid BoolXor();\nvoid BoolExpression();\nvoid DoIf();\nvoid DoWhile();\nvoid CompareExpression();\nvoid NextExpression();\n\nvoid Semi();\n\nvoid Header()\n{\n    EmitLn(\".global _start\");\n}\n\nvoid Prolog()\n{\n    EmitLn(\".section .text\");\n    EmitLn(\"_start:\");\n}\n\nvoid Epilog()\n{\n    EmitLn(\"movl %eax, %ebx\");\n    EmitLn(\"movl $1, %eax\");\n    EmitLn(\"int $0x80\");\n}\n\nvoid TopDecls()\n{\n    Scan();\n    while(Token == 'v') {\n        EmitLn(\".section .data\"); /* in case that the variable and function\n                                     declarations are mixed */\n        Alloc();\n        while(Token == ',') {\n            Alloc();\n        }\n        Semi();\n    }\n}\n\n/* Allocate Storage for a static variable */\nvoid Allocate(char *name, char *value)\n{\n    sprintf(tmp, \"%s: .int %s\", name, value);\n    EmitLn(tmp);\n}\n\nvoid Alloc()\n{\n    char name[MAX_BUF];\n    Next();\n    if (Token != 'x') {\n        Expected(\"Variable Name\");\n    }\n    CheckDup(Value);\n\n    sprintf(name, Value);\n    AddEntry(name, 'v');\n    Next();\n    if (Token == '=') {\n        Next();\n        if (Token != '#') {\n            Expected(\"Integer\");\n        }\n        Allocate(name, Value);\n        Next();\n    } else {\n        Allocate(name, \"0\");\n    }\n}\n\n/* Parse and Translate a Block of Statements \n * <block> ::= ( <statement> )*\n * <statement> ::= <if> | <while> | <assignment>\n * */\nvoid Block()\n{\n    Scan();\n    while(strchr(\"el\", Token) == NULL) {\n        switch (Token) {\n            case 'i':\n                DoIf();\n                break;\n            case 'w':\n                DoWhile();\n                break;\n            case 'x':\n                Assignment();\n                break;\n            default:\n                break;\n        }\n        Semi();\n        Scan();\n    }\n}\n\nvoid Assignment()\n{\n    char name[MAX_BUF];\n    sprintf(name, Value);\n    Next();\n    MatchString(\"=\");\n    BoolExpression();\n    Store(name);\n}\n\nvoid Factor()\n{\n    if (Token == '(') {\n        Next();\n        BoolExpression();\n        MatchString(\")\");\n    } else {\n        if (Token == 'x') {\n            LoadVar(Value);\n        } else if (Token == '#') {\n            LoadConst(Value);\n        } else {\n            Expected(\"Math Factor\");\n        }\n        Next();\n    }\n}\n\n\nvoid Multiply()\n{\n    Next();\n    Factor();\n    PopMul();\n}\n\nvoid Divide()\n{\n    Next();\n    Factor();\n    PopDiv();\n}\n\nvoid Term()\n{\n    Factor();\n    while(IsMulop(Token)) {\n        Push();\n        switch(Token) {\n            case '*':\n                Multiply();\n                break;\n            case '/':\n                Divide();\n                break;\n            default:\n                break;\n        }\n    }\n}\n\nvoid Add()\n{\n    Next();\n    Term();\n    PopAdd();\n}\n\nvoid Subtract()\n{\n    Next();\n    Term();\n    PopSub();\n}\n\nvoid Expression()\n{\n    if (IsAddop(Token)) {\n        Clear();\n    } else {\n        Term();\n    }\n\n    while(IsAddop(Token)) {\n        Push();\n        switch(Token) {\n            case '+':\n                Add();\n                break;\n            case '-':\n                Subtract();\n                break;\n            default:\n                break;\n        }\n    }\n}\n\n/* Get another expression and compare */\nvoid CompareExpression()\n{\n    Expression();\n    PopCompare();\n}\n\n/* Get the next expression and compare */\nvoid NextExpression()\n{\n    Next();\n    CompareExpression();\n}\n\n/* Recognize and Translate a Relational \"Equals\" */\nvoid Equals()\n{\n    NextExpression();\n    SetEqual();\n}\n\n/* Recognize and Translate a Relational \"Not Equals\" */\nvoid NotEqual()\n{\n    NextExpression();\n    SetNEqual();\n}\n\n/* Recognize and Translate a Relational \"Less Than\" */\nvoid Less()\n{\n    Next();\n    switch(Token) {\n        case '=':\n            LessOrEqual();\n            break;\n        case '>':\n            NotEqual();\n            break;\n        default:\n            CompareExpression();\n            SetLess();\n            break;\n    }\n}\n\n/* Recognize and Translate a Relational \"Less or Equal\" */\nvoid LessOrEqual()\n{\n    NextExpression();\n    SetLessOrEqual();\n}\n\n/* Recognize and Translate a Relational \"Greater Than\" */\nvoid Greater()\n{\n    Next();\n    if (Token == '=') {\n        NextExpression();\n        SetGreaterOrEqual();\n    } else {\n        CompareExpression();\n        SetGreater();\n    }\n}\n\n/* Parse and Translate a Relation */\nvoid Relation()\n{\n    Expression();\n    if (IsRelop(Token)) {\n        Push();\n        switch (Token) {\n            case '=':\n                Equals();\n                break;\n            case '<':\n                Less();\n                break;\n            case '>':\n                Greater();\n                break;\n            default:\n                break;\n        }\n    }\n}\n\n/* Parse and Translate a Boolean Factor with Leading NOT */\nvoid NotFactor()\n{\n    if (Token == '!') {\n        Next();\n        Relation();\n        NotIt();\n    } else {\n        Relation();\n    }\n}\n\n/* Parse and Translate a Boolean Term \n * <bool_term> ::= <not_factor> ( and_op <not_factor )*\n * */\nvoid BoolTerm()\n{\n    NotFactor();\n    while(Token == '&') {\n        Push();\n        Next();\n        NotFactor();\n        PopAnd();\n    }\n}\n\n/* Recognize and Translate a Boolean OR */\nvoid BoolOr()\n{\n    Next();\n    BoolTerm();\n    PopOr();\n}\n\n/* Recognize and Translate a Boolean XOR */\nvoid BoolXor()\n{\n    Next();\n    BoolTerm();\n    PopXor();\n}\n\n/* Parse and Translate a Boolean Expression \n * <bool_expression> ::= <bool_term> ( or_op <bool_term> )* */\nvoid BoolExpression()\n{\n    BoolTerm();\n    while(IsOrOp(Token)) {\n        Push();\n        switch(Look) {\n            case '|':\n                BoolOr();\n                break;\n            case '~':\n                BoolXor();\n                break;\n            default:\n                break;\n        }\n    }\n}\n\n/* Recognize and Translate an IF construct */\nvoid DoIf()\n{\n    Next();\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    sprintf(L1, NewLabel());\n    sprintf(L2, L1);\n    BoolExpression();\n    BranchFalse(L1);\n    Block();\n    if (Token == 'l') {\n        Next();\n        sprintf(L2, NewLabel());\n        Branch(L2);\n        PostLabel(L1);\n        Block();\n    }\n    PostLabel(L2);\n    MatchString(\"ENDIF\");\n}\n\nvoid DoWhile()\n{\n    Next();\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    sprintf(L1, NewLabel());\n    sprintf(L2, NewLabel());\n    PostLabel(L1);\n    BoolExpression();\n    BranchFalse(L2);\n    Block();\n    MatchString(\"ENDWHILE\");\n    Branch(L1);\n    PostLabel(L2);\n}\n\nvoid Semi()\n{\n    /* make a semicolon optional */\n    if (Token == ';') {\n        Next();\n    }\n}\n\nint main()\n{\n    Init();\n    MatchString(\"PROGRAM\");\n    Semi();\n    Header();\n    TopDecls();\n    MatchString(\"BEGIN\");\n    Prolog();\n    Block();\n    MatchString(\"END\");\n    Epilog();\n\n    return 0;\n}\n"
  },
  {
    "path": "12/prog.txt",
    "content": "PROGRAM;\n/* \n    calculate 1+2+...+10 \n    /* nested comment */\n*/\nVAR xx, yy=1, zz=10;\nBEGIN\n  WHILE yy <= zz\n    IF yy <> 5 \n      xx=xx+yy;\n    ELSE\n      xx=xx+5;\n    ENDIF;\n  yy=yy+1;\n  ENDWHILE;\nEND.\n\n"
  },
  {
    "path": "12/tutor12.txt",
    "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                     LET'S BUILD A COMPILER!\n\n                                By\n\n                     Jack W. Crenshaw, Ph.D.\n\n                           5 June 1989\n\n\n                       Part XII: MISCELLANY\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\nINTRODUCTION\n\nThis installment is another one  of  those  excursions  into side\nalleys  that  don't  seem to fit  into  the  mainstream  of  this\ntutorial  series.    As I mentioned last time, it was while I was\nwriting this installment that I realized some changes  had  to be\nmade  to  the  compiler structure.  So I had to digress from this\ndigression long enough to develop the new structure  and  show it\nto you.\n\nNow that that's behind us, I can tell you what I  set  out  to in\nthe first place.  This shouldn't  take  long, and then we can get\nback into the mainstream.\n\nSeveral people have asked  me  about  things that other languages\nprovide, but so far I haven't addressed in this series.   The two\nbiggies are semicolons and  comments.    Perhaps  you've wondered\nabout them, too, and  wondered  how things would change if we had\nto  deal with them.  Just so you can proceed with what's to come,\nwithout being  bothered by that nagging feeling that something is\nmissing, we'll address such issues here.\n\n\nSEMICOLONS\n\nEver since the introduction of Algol, semicolons have been a part\nof  almost every modern language.  We've all  used  them  to  the\npoint that they are taken for  granted.   Yet I suspect that more\ncompilation errors have  occurred  due  to  misplaced  or missing\nsemicolons  than  any  other single cause.  And if we had a penny\nfor  every  extra  keystroke programmers have used  to  type  the\nlittle rascals, we could pay off the national debt.\n\nHaving  been  brought  up with FORTRAN, it took me a long time to\nget used to using semicolons, and to tell the  truth  I've  never\nquite understood why they  were  necessary.    Since I program in\nPascal, and since the use of semicolons in Pascal is particularly\ntricky,  that one little character is still  by  far  my  biggest\nsource of errors.\n\nWhen  I  began  developing  KISS,  I resolved to  question  EVERY\nconstruct in other languages, and to try to avoid the most common\nproblems that occur with them.  That puts the semicolon very high\non my hit list.\n\nTo  understand  the  role of the semicolon, you have to look at a\nlittle history.\n\nEarly programming languages were line-oriented.  In  FORTRAN, for\nexample, various parts  of  the statement had specific columns or\nfields that they had to appear in.  Since  some  statements  were\ntoo  long for one line, the  \"continuation  card\"  mechanism  was\nprovided to let  the  compiler  know  that a given card was still\npart of the previous  line.   The mechanism survives to this day,\neven though punched cards are now things of the distant past.\n\nWhen  other  languages  came  along,  they  also  adopted various\nmechanisms for dealing with multiple-line statements.  BASIC is a\ngood  example.  It's important to  recognize,  though,  that  the\nFORTRAN  mechanism  was   not   so  much  required  by  the  line\norientation of that  language,  as by the column-orientation.  In\nthose versions of FORTRAN  where  free-form  input  is permitted,\nit's no longer needed.\n\nWhen the fathers  of  Algol introduced that language, they wanted\nto get away  from  line-oriented programs like FORTRAN and BASIC,\nand allow for free-form input.   This included the possibility of\nstringing multiple statements on a single line, as in\n\n\n     a=b; c=d; e=e+1;\n\n\nIn cases like this,  the  semicolon is almost REQUIRED.  The same\nline, without the semicolons, just looks \"funny\":\n\n\n     a=b c= d e=e+1\n\nI suspect that this is the major ... perhaps ONLY ...  reason for\nsemicolons: to keep programs from looking funny.\n\nBut  the  idea  of stringing multiple statements  together  on  a\nsingle  line  is  a  dubious  one  at  best.  It's not very  good\nprogramming  style,  and  harks back to  the  days  when  it  was\nconsidered improtant to conserve cards.  In these  days  of CRT's\nand indented code, the clarity of programs is  far  better served\nby  keeping statements separate.  It's still  nice  to  have  the\nOPTION  of  multiple  statements,  but  it seems a shame to  keep\nprogrammers  in  slavery  to the semicolon, just to keep that one\nrare case from \"looking funny.\"\n\nWhen I started in with KISS, I tried  to  keep  an  open mind.  I\ndecided that I would use  semicolons when it became necessary for\nthe parser, but not until then.  I figured this would happen just\nabout  the time I added the ability  to  spread  statements  over\nmultiple lines.  But, as you  can  see, that never happened.  The\nTINY compiler is perfectly  happy  to  parse the most complicated\nstatement, spread over any number of lines, without semicolons.\n\nStill, there are people  who  have  used  semicolons for so long,\nthey feel naked  without them.  I'm one of them.  Once I had KISS\ndefined sufficiently well, I began to write a few sample programs\nin the language.    I  discovered,  somewhat to my horror, that I\nkept  putting  semicolons  in anyway.   So  now  I'm  facing  the\nprospect of a NEW  rash  of  compiler  errors, caused by UNWANTED\nsemicolons.  Phooey!\n\nPerhaps more to the point, there are readers out  there  who  are\ndesigning their own languages, which may  include  semicolons, or\nwho  want to use the techniques of  these  tutorials  to  compile\nconventional languages like  C.    In  either case, we need to be\nable to deal with semicolons.\n\n\nSYNTACTIC SUGAR\n\nThis whole discussion brings  up  the  issue of \"syntactic sugar\"\n... constructs that are added to a language, not because they are\nneeded, but because they help make the programs look right to the\nprogrammer.    After  all, it's nice  to  have  a  small,  simple\ncompiler,    but  it  would  be  of  little  use if the resulting\nlanguage  were  cryptic  and hard to program.  The language FORTH\ncomes  to mind (a premature OUCH! for the  barrage  I  know  that\none's going to fetch me).  If we can add features to the language\nthat  make the programs easier to read  and  understand,  and  if\nthose features  help keep the programmer from making errors, then\nwe should do so.    Particularly if the constructs don't add much\nto the complexity of the language or its compiler.\n\nThe  semicolon  could  be considered an example,  but  there  are\nplenty of others, such as the 'THEN' in a IF-statement,  the 'DO'\nin a WHILE-statement,  and  even the 'PROGRAM' statement, which I\ncame within a gnat's eyelash of leaving out  of  TINY.    None of\nthese tokens  add  much  to  the  syntax  of the language ... the\ncompiler can figure out  what's  going on without them.  But some\nfolks feel that they  DO  add to the readability of programs, and\nthat can be very important.\n\nThere are two schools of thought on this subject, which  are well\nrepresented by two of our most popular languages, C and Pascal.\n\nTo  the minimalists, all such sugar should be  left  out.    They\nargue that it clutters up the language and adds to the  number of\nkeystrokes  programmers  must type.   Perhaps  more  importantly,\nevery extra token or keyword represents a trap laying in wait for\nthe inattentive programmer.  If you leave out  a  token, misplace\nit, or misspell it, the compiler  will  get you.  So these people\nargue that the best approach is to get rid of such things.  These\nfolks tend to like C, which has a minimum of unnecessary keywords\nand punctuation.\n\nThose from the other school tend to like Pascal.  They argue that\nhaving to type a few extra characters is a small price to pay for\nlegibility.    After  all, humans have to read the programs, too.\nTheir best argument is that each such construct is an opportunity\nto tell the compiler that you really mean for it  to  do what you\nsaid to.  The sugary tokens serve as useful landmarks to help you\nfind your way.\n\nThe differences are well represented by the two  languages.   The\nmost oft-heard complaint about  C  is  that  it is too forgiving.\nWhen you make a mistake in C, the  erroneous  code  is  too often\nanother  legal  C  construct.    So  the  compiler  just  happily\ncontinues to compile, and  leaves  you  to  find the error during\ndebug.    I guess that's why debuggers  are  so  popular  with  C\nprogrammers.\n\nOn the  other  hand,  if  a  Pascal  program compiles, you can be\npretty  sure that the program will do what you told it.  If there\nis an error at run time, it's probably a design error.\n\nThe  best  example  of  useful  sugar  is  the semicolon  itself.\nConsider the code fragment:\n\n\n     a=1+(2*b+c)   b...\n\n\nSince there is no operator connecting the token 'b' with the rest\nof the  statement, the compiler will conclude that the expression\nends  with  the  ')', and the 'b'  is  the  beginning  of  a  new\nstatement.    But  suppose  I  have simply left out the  intended\noperator, and I really want to say:\n\n\n     a=1+(2*b+c)*b...\n\n\nIn  this  case  the compiler will get an error, all right, but it\nwon't be very meaningful  since  it will be expecting an '=' sign\nafter the 'b' that really shouldn't be there.\n\nIf, on the other hand, I include a semicolon after the  'b', THEN\nthere  can  be no doubt where I  intend  the  statement  to  end.\nSyntactic  sugar,  then,  can  serve  a  very  useful purpose  by\nproviding some additional insurance that we remain on track.\n\nI find  myself  somewhere  in  the middle of all this.  I tend to\nfavor the Pascal-ers' view ... I'd much rather find  my  bugs  at\ncompile time rather than run time.  But I also hate to just throw\nverbosity  in  for  no apparent reason, as in COBOL.  So far I've\nconsistently left most of the Pascal sugar out of KISS/TINY.  But\nI certainly have no strong feelings either way, and  I  also  can\nsee the value of sprinkling a little sugar around  just  for  the\nextra  insurance  that  it  brings.    If  you like  this  latter\napproach, things like that are easy to add.  Just  remember that,\nlike  the semicolon, each item of sugar  is  something  that  can\npotentially cause a compile error by its omission.\n\n\nDEALING WITH SEMICOLONS\n\nThere  are  two  distinct  ways  in which semicolons are used  in\npopular  languages.    In Pascal, the semicolon is regarded as an\nstatement SEPARATOR.  No semicolon  is  required  after  the last\nstatement in a block.  The syntax is:\n\n\n     <block> ::= <statement> ( ';' <statement>)*\n\n     <statement> ::= <assignment> | <if> | <while> ... | null\n\n\n(The null statement is IMPORTANT!)\n\nPascal  also defines some semicolons in  other  places,  such  as\nafter the PROGRAM statement.\n\nIn  C  and  Ada, on the other hand, the semicolon is considered a\nstatement TERMINATOR,  and  follows  all  statements  (with  some\nembarrassing and confusing  exceptions).   The syntax for this is\nsimply:\n\n\n     <block> ::= ( <statement> ';')*\n\n\nOf  the two syntaxes, the Pascal one seems on the face of it more\nrational, but experience has shown  that it leads to some strange\ndifficulties.  People get  so  used  to  typing a semicolon after\nevery  statement  that  they tend to  type  one  after  the  last\nstatement in a block, also.  That usually doesn't cause  any harm\n...  it  just gets treated as a  null  statement.    Many  Pascal\nprogrammers, including yours truly,  do  just  that. But there is\none  place you absolutely CANNOT type  a  semicolon,  and  that's\nright before an ELSE.  This little gotcha  has  cost  me  many an\nextra  compilation,  particularly  when  the  ELSE  is  added  to\nexisting code.    So  the  C/Ada  choice  turns out to be better.\nApparently Nicklaus Wirth thinks so, too:  In his  Modula  2,  he\nabandoned the Pascal approach.\n\nGiven either of these two syntaxes, it's an easy matter (now that\nwe've  reorganized  the  parser!) to add these  features  to  our\nparser.  Let's take the last case first, since it's simpler.\n\nTo begin, I've made things easy by introducing a new recognizer:\n\n\n{--------------------------------------------------------------}\n{ Match a Semicolon }\n\nprocedure Semi;\nbegin\n   MatchString(';');\nend;\n{--------------------------------------------------------------}\n\n\nThis procedure works very much like our old Match.  It insists on\nfinding a semicolon as the next token.  Having found it, it skips\nto the next one.\n\nSince a  semicolon follows a statement, procedure Block is almost\nthe only one we need to change:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Block of Statements }\n\nprocedure Block;\nbegin\n   Scan;\n   while not(Token in ['e', 'l']) do begin\n      case Token of\n       'i': DoIf;\n       'w': DoWhile;\n       'R': DoRead;\n       'W': DoWrite;\n       'x': Assignment;\n      end;\n      Semi;\n      Scan;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nNote carefully the subtle change in the case statement.  The call\nto  Assignment  is now guarded by a test on Token.   This  is  to\navoid calling Assignment when the  token  is  a  semicolon (which\ncould happen if the statement is null).\n\nSince declarations are also  statements,  we  also  need to add a\ncall to Semi within procedure TopDecls:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate Global Declarations }\n\nprocedure TopDecls;\nbegin\n   Scan;\n   while Token = 'v' do begin\n      Alloc;\n      while Token = ',' do\n         Alloc;\n      Semi;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nFinally, we need one for the PROGRAM statement:\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\n   MatchString('PROGRAM');\n   Semi;\n   Header;\n   TopDecls;\n   MatchString('BEGIN');\n   Prolog;\n   Block;\n   MatchString('END');\n   Epilog;\nend.\n{--------------------------------------------------------------}\n\n\nIt's as easy as that.  Try it with a copy of TINY and see how you\nlike it.\n\nThe Pascal version  is  a  little  trickier,  but  it  still only\nrequires  minor  changes,  and those only to procedure Block.  To\nkeep things as simple as possible, let's split the procedure into\ntwo parts.  The following procedure handles just one statement:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Single Statement }\n\nprocedure Statement;\nbegin\n   Scan;\n   case Token of\n    'i': DoIf;\n    'w': DoWhile;\n    'R': DoRead;\n    'W': DoWrite;\n    'x': Assignment;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nUsing this procedure, we can now rewrite Block like this:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Block of Statements }\n\nprocedure Block;\nbegin\n   Statement;\n   while Token = ';' do begin\n      Next;\n      Statement;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nThat  sure  didn't  hurt, did it?  We can now parse semicolons in\nPascal-like fashion.\n\n\nA COMPROMISE\n\nNow that we know how to deal with semicolons, does that mean that\nI'm going to put them in KISS/TINY?  Well, yes and  no.    I like\nthe extra sugar and the security that comes with knowing for sure\nwhere the  ends  of  statements  are.    But I haven't changed my\ndislike for the compilation errors associated with semicolons.\n\nSo I have what I think is a nice compromise: Make them OPTIONAL!\n\nConsider the following version of Semi:\n\n\n{--------------------------------------------------------------}\n{ Match a Semicolon }\n\nprocedure Semi;\nbegin\n   if Token = ';' then Next;\nend;\n{--------------------------------------------------------------}\n\n\nThis procedure will ACCEPT a semicolon whenever it is called, but\nit won't INSIST on one.  That means that when  you  choose to use\nsemicolons, the compiler  will  use the extra information to help\nkeep itself on track.  But if you omit one (or omit them all) the\ncompiler won't complain.  The best of both worlds.\n\nPut this procedure in place in the first version of  your program\n(the  one for C/Ada syntax), and you have  the  makings  of  TINY\nVersion 1.2.\n\n\nCOMMENTS\n\nUp  until  now  I have carefully avoided the subject of comments.\nYou would think that this would be an easy subject ... after all,\nthe compiler doesn't have to deal with comments at all; it should\njust ignore them.  Well, sometimes that's true.\n\nComments can be just about as easy or as difficult as  you choose\nto make them.    At  one  extreme,  we can arrange things so that\ncomments  are  intercepted  almost  the  instant  they  enter the\ncompiler.  At the  other,  we can treat them as lexical elements.\nThings  tend to get interesting when  you  consider  things  like\ncomment delimiters contained in quoted strings.\n\n\nSINGLE-CHARACTER DELIMITERS\n\nHere's an example.  Suppose we assume the  Turbo  Pascal standard\nand use curly braces for comments.  In this case we  have single-\ncharacter delimiters, so our parsing is a little easier.\n\nOne  approach  is  to  strip  the  comments  out the  instant  we\nencounter them in the input stream; that is,  right  in procedure\nGetChar.    To  do  this,  first  change  the  name of GetChar to\nsomething else, say GetCharX.  (For the record, this is  going to\nbe a TEMPORARY change, so best not do this with your only copy of\nTINY.  I assume you understand that you should  always  do  these\nexperiments with a working copy.)\n\nNow, we're going to need a  procedure  to skip over comments.  So\nkey in the following one:\n\n\n{--------------------------------------------------------------}\n{ Skip A Comment Field }\n\nprocedure SkipComment;\nbegin\n   while Look <> '}' do\n      GetCharX;\n   GetCharX;\nend;\n{--------------------------------------------------------------}\n\n\nClearly, what this procedure is going to do is to simply read and\ndiscard characters from the input  stream, until it finds a right\ncurly brace.  Then it reads one more character and returns  it in\nLook.\n\nNow we can  write  a  new  version of GetChar that SkipComment to\nstrip out comments:\n\n\n{--------------------------------------------------------------}\n{ Get Character from Input Stream }\n{ Skip Any Comments }\n\nprocedure GetChar;\nbegin\n   GetCharX;\n   if Look = '{' then SkipComment;\nend;\n{--------------------------------------------------------------}\n\n\nCode this up  and  give  it  a  try.    You'll find that you can,\nindeed, bury comments anywhere you like.  The comments never even\nget into the parser proper ... every call to GetChar just returns\nany character that's NOT part of a comment.\n\nAs a matter of fact, while  this  approach gets the job done, and\nmay even be  perfectly  satisfactory  for  you, it does its job a\nlittle  TOO  well.    First  of all, most  programming  languages\nspecify that a comment should be treated like a  space,  so  that\ncomments aren't allowed  to  be embedded in, say, variable names.\nThis current version doesn't care WHERE you put comments.\n\nSecond, since the  rest  of  the  parser can't even receive a '{'\ncharacter, you will not be allowed to put one in a quoted string.\n\nBefore you turn up your nose at this simplistic solution, though,\nI should point out  that  as respected a compiler as Turbo Pascal\nalso won't allow  a  '{' in a quoted string.  Try it.  And as for\nembedding a comment in an  identifier, I can't imagine why anyone\nwould want to do such a  thing,  anyway, so the question is moot.\nFor 99% of all  applications,  what I've just shown you will work\njust fine.\n\nBut,  if  you  want  to  be  picky  about it  and  stick  to  the\nconventional treatment, then we  need  to  move  the interception\npoint downstream a little further.\n\nTo  do  this,  first change GetChar back to the way  it  was  and\nchange the name called in SkipComment.  Then, let's add  the left\nbrace as a possible whitespace character:\n\n\n{--------------------------------------------------------------}\n{ Recognize White Space }\n\nfunction IsWhite(c: char): boolean;\nbegin\n   IsWhite := c in [' ', TAB, CR, LF, '{'];\nend;\n{--------------------------------------------------------------}\n\n\nNow, we can deal with comments in procedure SkipWhite:\n\n\n{--------------------------------------------------------------}\n{ Skip Over Leading White Space }\n\nprocedure SkipWhite;\nbegin\n   while IsWhite(Look) do begin\n      if Look = '{' then\n         SkipComment\n      else\n         GetChar;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nNote  that SkipWhite is written so that we  will  skip  over  any\ncombination of whitespace characters and comments, in one call.\n\nOK, give this one a try, too.   You'll  find  that  it will let a\ncomment serve to delimit tokens.  It's worth mentioning that this\napproach also gives us the  ability to handle curly braces within\nquoted strings, since within such  strings we will not be testing\nfor or skipping over whitespace.\n\nThere's one last  item  to  deal  with:  Nested  comments.   Some\nprogrammers like the idea  of  nesting  comments, since it allows\nyou to comment out code during debugging.  The  code  I've  given\nhere won't allow that and, again, neither will Turbo Pascal.\n\nBut the fix is incredibly easy.  All  we  need  to  do is to make\nSkipComment recursive:\n\n\n{--------------------------------------------------------------}\n{ Skip A Comment Field }\n\nprocedure SkipComment;\nbegin\n   while Look <> '}' do begin\n      GetChar;\n      if Look = '{' then SkipComment;\n   end;\n   GetChar;\nend;\n{--------------------------------------------------------------}\n\n\nThat does it.  As  sophisticated a comment-handler as you'll ever\nneed.\n\n\nMULTI-CHARACTER DELIMITERS\n\nThat's all well and  good  for cases where a comment is delimited\nby single  characters,  but  what  about  the  cases such as C or\nstandard Pascal, where two  characters  are  required?  Well, the\nprinciples are still the same, but we have to change our approach\nquite a bit.  I'm sure it won't surprise you to learn that things\nget harder in this case.\n\nFor the multi-character situation, the  easiest thing to do is to\nintercept the left delimiter  back  at the GetChar stage.  We can\n\"tokenize\" it right there, replacing it by a single character.\n\nLet's assume we're using the C delimiters '/*' and '*/'.   First,\nwe  need  to  go back to the \"GetCharX' approach.  In yet another\ncopy of your compiler, rename  GetChar to GetCharX and then enter\nthe following new procedure GetChar:\n\n\n{--------------------------------------------------------------}\n{ Read New Character.  Intercept '/*' }\n\nprocedure GetChar;\nbegin\n   if TempChar <> ' ' then begin\n      Look := TempChar;\n      TempChar := ' ';\n      end\n   else begin\n      GetCharX;\n      if Look = '/' then begin\n         Read(TempChar);\n         if TempChar = '*' then begin\n            Look := '{';\n            TempChar := ' ';\n         end;\n      end;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nAs you can see, what this procedure does is  to  intercept  every\noccurrence of '/'.  It then examines the NEXT  character  in  the\nstream.  If the character  is  a  '*',  then  we  have  found the\nbeginning  of  a  comment,  and  GetChar  will  return  a  single\ncharacter replacement for it.   (For  simplicity,  I'm  using the\nsame '{' character  as I did for Pascal.  If you were writing a C\ncompiler, you'd no doubt want to pick some other character that's\nnot  used  elsewhere  in C.  Pick anything you like ... even $FF,\nanything that's unique.)\n\nIf the character  following  the  '/'  is NOT a '*', then GetChar\ntucks it away in the new global TempChar, and  returns  the  '/'.\n\nNote that you need to declare this new variable and initialize it\nto ' '.  I like to do  things  like  that  using the Turbo \"typed\nconstant\" construct:\n\n\n     const TempChar: char = ' ';\n\n\nNow we need a new version of SkipComment:\n\n\n{--------------------------------------------------------------}\n{ Skip A Comment Field }\n\nprocedure SkipComment;\nbegin\n   repeat\n      repeat\n         GetCharX;\n      until Look = '*';\n      GetCharX;\n   until Look = '/';\n   GetChar;\nend;\n{--------------------------------------------------------------}\n\n\nA  few  things  to  note:  first  of  all, function  IsWhite  and\nprocedure SkipWhite  don't  need  to  be  changed,  since GetChar\nreturns the '{' token.  If you change that token  character, then\nof  course you also need to change the  character  in  those  two\nroutines.\n\nSecond, note that  SkipComment  doesn't call GetChar in its loop,\nbut  GetCharX.    That  means   that  the  trailing  '/'  is  not\nintercepted and  is seen by SkipComment.  Third, although GetChar\nis the  procedure  doing  the  work,  we  can still deal with the\ncomment  characters  embedded  in  a  quoted  string,  by calling\nGetCharX  instead  of  GetChar  while  we're  within  the string.\nFinally,  note  that  we can again provide for nested comments by\nadding a single statement to SkipComment, just as we did before.\n\n\nONE-SIDED COMMENTS\n\nSo far I've shown you  how  to  deal  with  any  kind  of comment\ndelimited on the left and the  right.   That only leaves the one-\nsided comments like those in assembler language or  in  Ada, that\nare terminated by the end of the line.  In a  way,  that  case is\neasier.   The only procedure that would need  to  be  changed  is\nSkipComment, which must now terminate at the newline characters:\n\n\n{--------------------------------------------------------------}\n{ Skip A Comment Field }\n\nprocedure SkipComment;\nbegin\n   repeat\n      GetCharX;\n   until Look = CR;\n   GetChar;\nend;\n{--------------------------------------------------------------}\n\n\nIf the leading character is  a  single  one,  as  in  the  ';' of\nassembly language, then we're essentially done.  If  it's  a two-\ncharacter token, as in the '--'  of  Ada, we need only modify the\ntests  within  GetChar.   Either way, it's an easier problem than\nthe balanced case.\n\n\nCONCLUSION\n\nAt this point we now have the ability to deal with  both comments\nand semicolons, as well as other kinds of syntactic sugar.   I've\nshown  you several ways to deal with  each,  depending  upon  the\nconvention  desired.    The  only  issue left is: which of  these\nconventions should we use in KISS/TINY?\n\nFor the reasons that I've given as we went  along,  I'm  choosing\nthe following:\n\n\n (1) Semicolons are TERMINATORS, not separators\n\n (2) Semicolons are OPTIONAL\n\n (3) Comments are delimited by curly braces\n\n (4) Comments MAY be nested\n\n\nPut the code corresponding to these cases into your copy of TINY.\nYou now have TINY Version 1.2.\n\nNow that we  have  disposed  of  these  sideline  issues,  we can\nfinally get back into the mainstream.  In  the  next installment,\nwe'll talk  about procedures and parameter passing, and we'll add\nthese important features to TINY.  See you then.\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n"
  },
  {
    "path": "13/Makefile",
    "content": "IN=main.c cradle.c\nOUT=main\nFLAGS=-Wall -Werror\n\nall:\n\tgcc -o $(OUT) $(IN) $(FLAGS)\n\nrun:\n\t./$(OUT)\n\n.PHONY: clean\nclean:\n\trm $(OUT)\n"
  },
  {
    "path": "13/cradle.c",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n\n#include \"cradle.h\"\n\n\n#define MaxEntry 26\nconst char TAB = '\\t';\nconst char CR = '\\n';\nconst char LF = '\\r';\n\nchar tmp[MAX_BUF];  /* temporary buffer */\n\nchar Look;\nchar ST[MaxEntry];   /* symbol table */\nint Params[MaxEntry];    /* parameter table */\nint NumParams = 0;\nint Base;\n\n/* read new character from input stream */\nvoid GetChar()\n{\n    Look = getchar();\n}\n\n/* Report an Error */\nvoid Error(char *str)\n{\n    printf(\"\\n\");\n    printf(\"\\aError: %s.\\n\", str);\n}\n\n/* report Error and Halt */\nvoid Abort(char *str)\n{\n    Error(str);\n    exit(1);\n}\n\n/* report what was expected */\nvoid Expected(char *str)\n{\n    sprintf(tmp, \"Expected: %s\", str);\n    Abort(tmp);\n}\n\n/* report an undefined identifier */\nvoid Undefined(char symbol)\n{\n    sprintf(tmp, \"Undefined Identifier: %c\", symbol);\n    Abort(tmp);\n}\n\n/* report an duplicate identifier */\nvoid Duplicate(char symbol)\n{\n    sprintf(tmp, \"Duplicate Identifier: %c\", symbol);\n    Abort(tmp);\n}\n\n/* Get type of symbole */\nchar TypeOf(char symbol)\n{\n    if (IsParam(symbol)) {\n        return 'f';\n    } else {\n        return ST[symbol - 'A'];\n    }\n}\n\n/* check if a symbol is in table */\nbool InTable(char symbol)\n{\n    return ST[symbol - 'A'] != ' ';\n}\n\n/* add a new symbol to table */\nvoid AddEntry(char symbol, char type)\n{\n    if (InTable(symbol)) {\n        Duplicate(symbol);\n    }\n    ST[symbol-'A'] = type;\n}\n\n/* check an entry to make sure it's a variable */\nvoid CheckVar(char name)\n{\n    char tmp_buf[MAX_BUF];\n    if (!InTable(name)) {\n        Undefined(name);\n    }\n    if (TypeOf(name) != 'v') {\n        sprintf(tmp_buf, \"%c is not a variable\", name);\n        Abort(tmp_buf);\n    }\n}\n\n/* turn an character into uppercase */\nchar upcase(char c)\n{\n    return (c & 0xDF);\n}\n\nbool IsAlpha(char c)\n{\n    char upper = upcase(c);\n    return (upper >= 'A') && (upper <= 'Z');\n}\n\nbool IsDigit(char c)\n{\n    return (c >= '0') && (c <= '9');\n}\n\nbool IsAlNum(char c)\n{\n    return IsAlpha(c) || IsDigit(c);\n}\n\nbool IsAddop(char c)\n{\n    return strchr(\"+-\", c) != NULL;\n}\n\nbool IsMulop(char c)\n{\n    return strchr(\"*/\", c) != NULL;\n}\n\nbool IsRelop(char c)\n{\n    return strchr(\"=#<>\", c) != NULL;\n}\n\nbool IsWhite(char c)\n{\n    return strchr(\" \\t\", c) != NULL;\n}\n\n/* skip over leading white space */\nvoid SkipWhite(void)\n{\n    while(IsWhite(Look)) {\n        GetChar();\n    }\n}\n\n/* skip over an End-Of-Line */\nvoid Fin(void)\n{\n    if (Look == CR) {\n        GetChar();\n        if (Look == LF) {\n            GetChar();\n        }\n    } else if (Look == LF){\n        GetChar();\n    }\n}\n\n/* match a specific input character */\nvoid Match(char c)\n{\n    if (Look == c) {\n        GetChar();\n    } else {\n        char tmp_buf[MAX_BUF];\n        sprintf(tmp_buf, \"'%c'\", c);\n        Expected(tmp_buf);\n    }\n    SkipWhite();\n}\n\n/* Get an identifier */\nchar GetName(void)\n{\n    if (! IsAlpha(Look)) {\n        Expected(\"Name\");\n    }\n    char name = upcase(Look);\n    GetChar();\n    SkipWhite();\n    return name;\n}\n\n/* Get a number */\nchar GetNum(void)\n{\n    if (!IsDigit(Look)) {\n        Expected(\"Integer\");\n    }\n    char num = Look;\n    GetChar();\n    SkipWhite();\n    return num;\n}\n\n/* output a string with TAB */\nvoid Emit(char *str)\n{\n    printf(\"\\t%s\", str);\n}\n\n/* Output a string with TAB and CRLF */\nvoid EmitLn(char *str)\n{\n    Emit(str);\n    printf(\"\\n\");\n}\n\n/* Post a label to output */\nvoid PostLabel(char label)\n{\n    printf(\"%c:\\n\", label);\n}\n\n/* Load a variable to the primary register */\nvoid LoadVar(char name)\n{\n    CheckVar(name);\n    sprintf(tmp, \"movl %c, %%eax\", name);\n    EmitLn(tmp);\n}\n\n/* store the primary register */\nvoid StoreVar(char name)\n{\n    CheckVar(name);\n    sprintf(tmp, \"movl %%eax, %c\", name);\n    EmitLn(tmp);\n}\n\n/* load a parameter to the primary register */\nvoid LoadParam(int n)\n{\n    int offset = 8 + 4*(Base - n);\n    sprintf(tmp, \"movl %d(%%ebp), %%eax\", offset);\n    EmitLn(tmp);\n}\n\n/* store a parameter from the primary register */\nvoid StoreParam(int n)\n{\n    int offset = 8 + 4*(Base - n);\n    sprintf(tmp, \"movl %%eax, %d(%%ebp)\", offset);\n    EmitLn(tmp);\n}\n\n/* push the primary register to the stack */\nvoid Push()\n{\n    EmitLn(\"push %eax\");\n}\n\n/* Adjust the stack pointer upwards by n bytes */\nvoid CleanStack(int bytes)\n{\n    if (bytes > 0) {\n        sprintf(tmp, \"addl $%d, %%esp\", bytes);\n        EmitLn(tmp);\n    }\n}\n\n/* initialize the symbol table */\nvoid InitTable(void)\n{\n    int i;\n    for (i = 0; i < MaxEntry; ++i) {\n        ST[i] = ' ';\n    }\n}\n\n/* initialize parameter table to NULL */\nvoid ClearParams()\n{\n    int i;\n    for (i = 0; i < MaxEntry; ++i) {\n        Params[i] = 0;\n    }\n    NumParams = 0;\n}\n\n/* find the parameter number */\nint ParamNumber(char name)\n{\n    return Params[name - 'A'];\n}\n\n/* see if an identifier is a parameter */\nbool IsParam(char name)\n{\n    return Params[name-'A'] != 0;\n}\n\n/* Add a new parameter to table */\nvoid AddParam(char name)\n{\n    if (IsParam(name)) {\n        Duplicate(name);\n    }\n    NumParams++;\n    Params[name - 'A'] = NumParams;\n}\n\n/* initialize */\nvoid Init()\n{\n    GetChar();\n    SkipWhite();\n    InitTable();\n    ClearParams();\n}\n"
  },
  {
    "path": "13/cradle.h",
    "content": "#ifndef _CRADLE_H\n#define _CRADLE_H\n\n#include <stdbool.h>\n\n#define MAX_BUF 100\nextern const char TAB;\nextern const char CR;\nextern const char LF;\n\nextern char Look;   /* lookahead character */\nextern char ST[];   /* symbol table */\nextern int Params[];    /* parameter table */\nextern int NumParams;\nextern int Base;\n\n/* read new character from input stream */\nvoid GetChar();\n\n/* Report an Error */\nvoid Error(char *str);\n\n/* report Error and Halt */\nvoid Abort(char *str);\n\n/* report what was expected */\nvoid Expected(char *str);\n\n/* report an undefined identifier */\nvoid Undefined(char symbol);\n\n/* report an duplicate identifier */\nvoid Duplicate(char symbol);\n\n/* Get type of symbole */\nchar TypeOf(char symbol);\n\n/* check if a symbol is in table */\nbool InTable(char symbol);\n\n/* add a new symbol to table */\nvoid AddEntry(char symbol, char type);\n\n/* check an entry to make sure it's a variable */\nvoid CheckVar(char name);\n\n\nbool IsAlpha(char c);\nbool IsDigit(char c);\nbool IsAlNum(char c);\nbool IsAddop(char c);\nbool IsMulop(char c);\nbool IsRelop(char c);\nbool IsWhite(char c);\n\n/* skip over leading white space */\nvoid SkipWhite(void);\n/* skip over an End-Of-Line */\nvoid Fin(void);\n\n/* match a specific input character */\nvoid Match(char c);\n\n/* Get an identifier */\nchar GetName(void);\n\n/* Get a number */\nchar GetNum(void);\n\n/* output a string with TAB */\nvoid Emit(char *str);\n/* Output a string with TAB and CRLF */\nvoid EmitLn(char *str);\n\n\n/* Post a label to output */\nvoid PostLabel(char label);\n\n/* Load a variable to the primary register */\nvoid LoadVar(char name);\n\n/* store the primary register */\nvoid StoreVar(char name);\n\n/* load a parameter to the primary register */\nvoid LoadParam(int n);\n\n/* store a parameter from the primary register */\nvoid StoreParam(int n);\n\n/* push the primary register to the stack */\nvoid Push();\n\n/* Adjust the stack pointer upwards by n bytes */\nvoid CleanStack(int bytes);\n\n/* initialize the symbol table */\nvoid InitTable(void);\n\n/* initialize parameter table to NULL */\nvoid ClearParams();\n\n/* find the parameter number */\nint ParamNumber(char name);\n\n/* see if an identifier is a parameter */\nbool IsParam(char name);\n\n/* Add a new parameter to table */\nvoid AddParam(char name);\n\n/* initialize */\nvoid Init(void);\n\n#endif\n"
  },
  {
    "path": "13/main.c",
    "content": "#include <stdio.h>\n#include <string.h>\n\n#include \"cradle.h\"\n\n\nvoid Expression();\nvoid AssignOrProc();\nvoid Assignment(char name);\nvoid DoBlock();\nvoid BeginBlock();\nvoid Alloc(char name);\nvoid Decl(void);\nvoid TopDecls(void);\nvoid DoProc(void);\nvoid DoMain(void);\nvoid Return();\nvoid CallProc(char name);\nvoid Call(char name);\nvoid FormalList();\nvoid FormalParam();\nvoid Param();\nint ParamList();\nvoid LocDecl();\nint LocDecls();\n\nvoid Header();\nvoid Prolog();\nvoid Epilog();\n\nvoid ProcProlog(char name, int num_local_params);\nvoid ProcEpilog();\n\n\n/* parse and tranlate an expression\n * vestigial version */\nvoid Expression()\n{\n    char name = GetName();\n    if (IsParam(name)) {\n        LoadParam(ParamNumber(name));\n    } else {\n        LoadVar(name);\n    }\n}\n\n/* decide if a statement is an assignment or procedure call */\nvoid AssignOrProc()\n{\n    char name = GetName();\n    char tmp_buf[MAX_BUF];\n    switch (TypeOf(name)) {\n        case ' ':\n            Undefined(name);\n            break;\n        case 'v':\n        case 'f':\n            Assignment(name);\n            break;\n        case 'p':\n            CallProc(name);\n            break;\n        default:\n            sprintf(tmp_buf, \"Identifier %c cannot be used here\", name);\n            Abort(tmp_buf);\n            break;\n    }\n}\n\n/* parse and tranlate an assignment statement */\nvoid Assignment(char name)\n{\n    Match('=');\n    Expression();\n    if (IsParam(name)) {\n        StoreParam(ParamNumber(name));\n    } else {\n        StoreVar(name);\n    }\n}\n\n/* parse and translate a block of statement */\nvoid DoBlock()\n{\n    while(strchr(\"e\", Look) == NULL) {\n        AssignOrProc();\n        Fin();\n    }\n}\n\nvoid CallProc(char name)\n{\n    int bytes_pushed = ParamList();\n    Call(name);\n    CleanStack(bytes_pushed);\n}\n\n/* call a procedure */\nvoid Call(char name)\n{\n    char tmp_buf[MAX_BUF];\n    sprintf(tmp_buf, \"call %c\", name);\n    EmitLn(tmp_buf);\n}\n\n/* parse and translate a Begin-Block */\nvoid BeginBlock()\n{\n    Match('b');\n    Fin();\n    DoBlock();\n    Match('e');\n    Fin();\n}\n\n/* allocate storage for a variable */\nvoid Alloc(char name)\n{\n    if (InTable(name)) {\n        Duplicate(name);\n    }\n    ST[name-'A'] = 'v';\n    printf(\"\\t%c : .int 0\\n\", name);\n}\n\n/* parse and translate a data declaration */\nvoid Decl(void)\n{\n    printf(\".section .data\\n\");\n    Match('v');\n    Alloc(GetName());\n}\n\n/* parse and translate global declarations */\nvoid TopDecls(void)\n{\n    char tmp_buf[MAX_BUF];\n    while(Look != '.') {\n        switch(Look) {\n            case 'v':\n                Decl();\n                break;\n            case 'p':\n                DoProc();\n                break;\n            case 'P':\n                DoMain();\n                break;\n            default:\n                sprintf(tmp_buf, \"Unrecognized keyword %c\", Look);\n                Abort(tmp_buf);\n                break;\n        }\n        Fin();\n    }\n}\n\nvoid Header()\n{\n    printf(\".global _start\\n\");\n}\n\nvoid Prolog()\n{\n    EmitLn(\".section .text\");\n    EmitLn(\"_start:\");\n}\n\nvoid Epilog()\n{\n    EmitLn(\"movl %eax, %ebx\");\n    EmitLn(\"movl $1, %eax\");\n    EmitLn(\"int $0x80\");\n}\n\nvoid DoProc(void)\n{\n    Match('p');\n    char name = GetName();\n    Fin();\n    if (InTable(name)) {\n        Duplicate(name);\n    }\n    ST[name-'A'] = 'p';\n    FormalList();\n    int num_local_params = LocDecls();\n    ProcProlog(name, num_local_params);\n    BeginBlock();\n    ProcEpilog();\n    ClearParams();\n}\n\nvoid Return()\n{\n    EmitLn(\"ret\");\n}\n\n/* parse and translate a main program \n * <main program> ::= PROGRAM <ident> <begin-block>\n * */\nvoid DoMain(void)\n{\n    Match('P');\n    char name = GetName();\n    Fin();\n    if (InTable(name)) {\n        Duplicate(name);\n    }\n    Prolog();\n    BeginBlock();\n}\n\n/* process the formal parameter list of a procedure */\nvoid FormalList()\n{\n    Match('(');\n    if (Look != ')') {\n        FormalParam();\n        while(Look == ',') {\n            Match(',');\n            FormalParam();\n        }\n    }\n    Match(')');\n    Fin();\n    Base = NumParams;\n    NumParams = NumParams + 1;\n}\n\n/* process a formal parameter */\nvoid FormalParam()\n{\n    AddParam(GetName());\n}\n\n/* process an actual parameter */\nvoid Param()\n{\n    Expression();\n    Push();\n}\n\n/* process the parameter list for a procedure call */\nint ParamList()\n{\n    int num_params = 0;\n    Match('(');\n    if (Look != ')') {\n        Param();\n        num_params++;\n        while(Look == ',') {\n            Match(',');\n            Param();\n            num_params++;\n        }\n    }\n    Match(')');\n    return 4*num_params;\n}\n\n/* write the prolog for a procedure */\nvoid ProcProlog(char name, int num_local_params)\n{\n    char tmp_buf[MAX_BUF];\n    PostLabel(name);\n    EmitLn(\"pushl %ebp\");\n    EmitLn(\"movl %esp, %ebp\");\n    sprintf(tmp_buf, \"subl $%d, %%esp\", 4*num_local_params);\n    EmitLn(tmp_buf);\n}\n\n/* write the epilog for a procedure */\nvoid ProcEpilog()\n{\n    EmitLn(\"movl %ebp, %esp\");\n    EmitLn(\"popl %ebp\");\n    EmitLn(\"ret\");\n}\n\n/* parse and translate a local data declaration */\nvoid LocDecl()\n{\n    Match('v');\n    AddParam(GetName());\n    Fin();\n}\n\n/* parse and translate local declarations */\nint LocDecls()\n{\n    int num_params = 0;\n    while(Look == 'v') {\n        LocDecl();\n        num_params ++;\n    }\n    return num_params;\n}\n\nint main(int argc, char *argv[])\n{\n    Init();\n    Header();\n    TopDecls();\n    Epilog();\n    return 0;\n}\n"
  },
  {
    "path": "13/prog.txt",
    "content": "vx\nvy\nvz\npm(a,b,c)\nvt\nb\nt=a\nx=t\ne\nPtb\nm(z,z,z)\ne.\n"
  },
  {
    "path": "13/tutor13.txt",
    "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                     LET'S BUILD A COMPILER!\n\n                                By\n\n                     Jack W. Crenshaw, Ph.D.\n\n                          27 August 1989\n\n\n                      Part XIII: PROCEDURES\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\nINTRODUCTION\n\nAt last we get to the good part!\n\nAt  this point we've studied almost all  the  basic  features  of\ncompilers  and  parsing.    We  have  learned  how  to  translate\narithmetic expressions, Boolean expressions, control  constructs,\ndata  declarations,  and  I/O  statements.    We  have defined  a\nlanguage, TINY 1.3, that embodies all these features, and we have\nwritten  a  rudimentary  compiler that can translate  them.    By\nadding some file I/O we could indeed have a working compiler that\ncould produce executable object files  from  programs  written in\nTINY.  With such a compiler, we could write simple  programs that\ncould read integer data, perform calculations with it, and output\nthe results.\n\nThat's nice, but what we have is still only a  toy  language.  We\ncan't read or write even a single character of text, and we still\ndon't have procedures.\n\nIt's  the  features  to  be  discussed  in  the  next  couple  of\ninstallments  that  separate  the men from the toys, so to speak.\n\"Real\" languages have more than one data type,  and  they support\nprocedure calls.  More than any others, it's  these  two features\nthat give a language much of its character and personality.  Once\nwe  have  provided   for   them,  our  languages,  TINY  and  its\nsuccessors, will cease  to  become  toys  and  will  take  on the\ncharacter  of  real  languages,  suitable for serious programming\njobs.\n\nFor several installments now, I've been promising you sessions on\nthese  two  important  subjects.  Each time, other issues came up\nthat required me to  digress  and deal with them.  Finally, we've\nbeen able to put all those issues to rest and can get on with the\nmainstream  of  things.    In   this   installment,   I'll  cover\nprocedures.  Next time, we'll talk about the basic data types.\n\n\nONE LAST DIGRESSION\n\nThis has  been an extraordinarily difficult installment for me to\nwrite.  The reason has nothing to do with the subject  itself ...\nI've  known  what I wanted to say for some time, and  in  fact  I\npresented  most  of  this at Software Development  '89,  back  in\nFebruary.  It has more to do with the approach.  Let me explain.\n\nWhen I first  began  this  series,  I  told you that we would use\nseveral \"tricks\" to  make  things  easy,  and to let us learn the\nconcepts without getting too bogged down in the  details.   Among\nthese tricks was the idea of looking at individual  pieces  of  a\ncompiler at  a time, i.e. performing experiments using the Cradle\nas a base.  When we studied expressions, for  example,  we  dealt\nwith only that part of compiler theory.  When we  studied control\nstructures,  we wrote a different program,  still  based  on  the\nCradle, to do that part. We only incorporated these concepts into\na complete language fairly recently. These techniques have served\nus very well indeed, and led us to the development of  a compiler\nfor TINY version 1.3.\n\nWhen  I  first  began this session, I tried to build upon what we\nhad already done, and  just  add the new features to the existing\ncompiler.  That turned out to be a little awkward and  tricky ...\nmuch too much to suit me.\n\nI finally figured out why.  In this series of experiments,  I had\nabandoned the very useful techniques that had allowed  us  to get\nhere, and  without  meaning  to  I  had  switched over into a new\nmethod of  working, that involved incremental changes to the full\nTINY compiler.\n\nYou  need  to  understand that what we are doing here is a little\nunique.  There have been a number of articles, such as  the Small\nC articles by Cain and Hendrix, that presented finished compilers\nfor one language or another.  This is different.  In  this series\nof tutorials, you are  watching  me  design  and implement both a\nlanguage and a compiler, in real time.\n\nIn the experiments that I've been doing in  preparation  for this\narticle,  I  was  trying to inject  the  changes  into  the  TINY\ncompiler  in such a way that, at every step, we still had a real,\nworking  compiler.     In   other  words,  I  was  attempting  an\nincremental enhancement of the language and  its  compiler, while\nat the same time explaining to you what I was doing.\n\nThat's a tough act to pull off!  I finally  realized  that it was\ndumb to try.    Having  gotten  this  far using the idea of small\nexperiments   based   on   single-character  tokens  and  simple,\nspecial-purpose  programs,  I  had  abandoned  them  in  favor of\nworking with the full compiler.  It wasn't working.\n\nSo we're going to go back to our  roots,  so  to  speak.  In this\ninstallment and the next, I'll be  using  single-character tokens\nagain as we study the concepts of procedures,  unfettered  by the\nother baggage  that we have accumulated in the previous sessions.\nAs a  matter  of  fact,  I won't even attempt, at the end of this\nsession, to merge the constructs into the TINY  compiler.   We'll\nsave that for later.\n\nAfter all this time, you don't need more buildup  than  that,  so\nlet's waste no more time and dive right in.\n\n\nTHE BASICS\n\nAll modern  CPU's provide direct support for procedure calls, and\nthe  68000  is no exception.  For the 68000, the call  is  a  BSR\n(PC-relative version) or JSR, and the return is RTS.  All we have\nto do is to arrange for  the  compiler to issue these commands at\nthe proper place.\n\nActually, there are really THREE things we have to address.   One\nof  them  is  the  call/return  mechanism.    The second  is  the\nmechanism  for  DEFINING  the procedure in the first place.  And,\nfinally, there is the issue of passing parameters  to  the called\nprocedure.  None of these things are really  very  difficult, and\nwe can of course borrow heavily on what people have done in other\nlanguages ... there's no need to reinvent the wheel here.  Of the\nthree issues, that of parameter passing will occupy  most  of our\nattention, simply because there are so many options available.\n\n\nA BASIS FOR EXPERIMENTS\n\nAs always, we will need some software to  serve  as  a  basis for\nwhat  we are doing.  We don't need the full TINY compiler, but we\ndo need enough of a program so that some of the  other constructs\nare present.  Specifically, we need at least to be able to handle\nstatements of some sort, and data declarations.\n\nThe program shown below is that basis.  It's a vestigial  form of\nTINY, with single-character tokens.   It  has  data declarations,\nbut only in their simplest form ... no lists or initializers.  It\nhas assignment statements, but only of the kind\n\n     <ident> = <ident>\n\nIn  other  words,  the only legal expression is a single variable\nname.    There  are no control  constructs  ...  the  only  legal\nstatement is the assignment.\n\nMost of the program  is  just the standard Cradle routines.  I've\nshown the whole thing here, just to make sure we're  all starting\nfrom the same point:\n\n\n{--------------------------------------------------------------}\nprogram Calls;\n\n{--------------------------------------------------------------}\n{ Constant Declarations }\n\nconst TAB = ^I;\n      CR  = ^M;\n      LF  = ^J;\n\n{--------------------------------------------------------------}\n{ Variable Declarations }\n\nvar Look: char;              { Lookahead Character }\n\nvar ST: Array['A'..'Z'] of char;\n\n\n{--------------------------------------------------------------}\n{ Read New Character From Input Stream }\n\nprocedure GetChar;\nbegin\n   Read(Look);\nend;\n\n{--------------------------------------------------------------}\n{ Report an Error }\n\nprocedure Error(s: string);\nbegin\n   WriteLn;\n   WriteLn(^G, 'Error: ', s, '.');\nend;\n\n\n{--------------------------------------------------------------}\n{ Report Error and Halt }\n\nprocedure Abort(s: string);\nbegin\n   Error(s);\n   Halt;\nend;\n\n\n{--------------------------------------------------------------}\n{ Report What Was Expected }\n\nprocedure Expected(s: string);\nbegin\n   Abort(s + ' Expected');\nend;\n\n\n{--------------------------------------------------------------}\n{ Report an Undefined Identifier }\n\nprocedure Undefined(n: string);\nbegin\n   Abort('Undefined Identifier ' + n);\nend;\n\n\n{--------------------------------------------------------------}\n{ Report an Duplicate Identifier }\n\nprocedure Duplicate(n: string);\nbegin\n     Abort('Duplicate Identifier ' + n);\nend;\n\n\n{--------------------------------------------------------------}\n{ Get Type of Symbol }\n\nfunction TypeOf(n: char): char;\nbegin\n     TypeOf := ST[n];\nend;\n\n\n{--------------------------------------------------------------}\n{ Look for Symbol in Table }\n\nfunction InTable(n: char): Boolean;\nbegin\n   InTable := ST[n] <> ' ';\nend;\n\n\n{--------------------------------------------------------------}\n{ Add a New Symbol to Table }\n\nprocedure AddEntry(Name, T: char);\nbegin\n     if Intable(Name) then Duplicate(Name);\n     ST[Name] := T;\nend;\n\n\n{--------------------------------------------------------------}\n{ Check an Entry to Make Sure It's a Variable }\n\nprocedure CheckVar(Name: char);\nbegin\n     if not InTable(Name) then Undefined(Name);\n     if  TypeOf(Name)  <>  'v'  then    Abort(Name  +  ' is not a\nvariable');\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an Alpha Character }\n\nfunction IsAlpha(c: char): boolean;\nbegin\n   IsAlpha := upcase(c) in ['A'..'Z'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Decimal Digit }\n\nfunction IsDigit(c: char): boolean;\nbegin\n   IsDigit := c in ['0'..'9'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an AlphaNumeric Character }\n\nfunction IsAlNum(c: char): boolean;\nbegin\n   IsAlNum := IsAlpha(c) or IsDigit(c);\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an Addop }\n\nfunction IsAddop(c: char): boolean;\nbegin\n   IsAddop := c in ['+', '-'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Mulop }\n\nfunction IsMulop(c: char): boolean;\nbegin\n   IsMulop := c in ['*', '/'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Boolean Orop }\n\nfunction IsOrop(c: char): boolean;\nbegin\n   IsOrop := c in ['|', '~'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Relop }\n\nfunction IsRelop(c: char): boolean;\nbegin\n   IsRelop := c in ['=', '#', '<', '>'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize White Space }\n\nfunction IsWhite(c: char): boolean;\nbegin\n   IsWhite := c in [' ', TAB];\nend;\n\n\n{--------------------------------------------------------------}\n{ Skip Over Leading White Space }\n\nprocedure SkipWhite;\nbegin\n   while IsWhite(Look) do\n      GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Skip Over an End-of-Line }\n\nprocedure Fin;\nbegin\n   if Look = CR then begin\n      GetChar;\n      if Look = LF then\n         GetChar;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Match a Specific Input Character }\n\nprocedure Match(x: char);\nbegin\n   if Look = x then GetChar\n     else Expected('''' + x + '''');\n     SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nfunction GetName: char;\nbegin\n   if not IsAlpha(Look) then Expected('Name');\n   GetName := UpCase(Look);\n     GetChar;\n     SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nfunction GetNum: char;\nbegin\n   if not IsDigit(Look) then Expected('Integer');\n   GetNum := Look;\n     GetChar;\n     SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Output a String with Tab }\n\nprocedure Emit(s: string);\nbegin\n   Write(TAB, s);\nend;\n\n\n{--------------------------------------------------------------}\n{ Output a String with Tab and CRLF }\n\nprocedure EmitLn(s: string);\nbegin\n   Emit(s);\n   WriteLn;\nend;\n\n\n{--------------------------------------------------------------}\n{ Post a Label To Output }\n\nprocedure PostLabel(L: string);\nbegin\n   WriteLn(L, ':');\nend;\n\n\n{--------------------------------------------------------------}\n{ Load a Variable to the Primary Register }\n\nprocedure LoadVar(Name: char);\nbegin\n     CheckVar(Name);\n     EmitLn('MOVE ' + Name + '(PC),D0');\nend;\n\n\n{--------------------------------------------------------------}\n{ Store the Primary Register }\n\nprocedure StoreVar(Name: char);\nbegin\n     CheckVar(Name);\n     EmitLn('LEA ' + Name + '(PC),A0');\n   EmitLn('MOVE D0,(A0)')\nend;\n\n\n{--------------------------------------------------------------}\n{ Initialize }\n\nprocedure Init;\nvar i: char;\nbegin\n     GetChar;\n     SkipWhite;\n     for i := 'A' to 'Z' do\n          ST[i] := ' ';\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Expression }\n{ Vestigial Version }\n\nprocedure Expression;\nbegin\n     LoadVar(GetName);\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Assignment Statement }\n\nprocedure Assignment;\nvar Name: char;\nbegin\n     Name := GetName;\n     Match('=');\n     Expression;\n     StoreVar(Name);\nend;\n\n\n{--------------------------------------------------------------}\n\n                             \n\n\n\n\n\n\n{ Parse and Translate a Block of Statements }\n\nprocedure DoBlock;\nbegin\n     while not(Look in ['e']) do begin\n          Assignment;\n          Fin;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Begin-Block }\n\nprocedure BeginBlock;\nbegin\n     Match('b');\n     Fin;\n     DoBlock;\n     Match('e');\n     Fin;\nend;\n\n\n{--------------------------------------------------------------}\n{ Allocate Storage for a Variable }\n\nprocedure Alloc(N: char);\nbegin\n     if InTable(N) then Duplicate(N);\n   ST[N] := 'v';\n     WriteLn(N, ':', TAB, 'DC 0');\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Data Declaration }\n\nprocedure Decl;\nvar Name: char;\nbegin\n   Match('v');\n     Alloc(GetName);\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate Global Declarations }\n\nprocedure TopDecls;\nbegin\n     while Look <> 'b' do begin\n      case Look of\n        'v': Decl;\n      else Abort('Unrecognized Keyword ' + Look);\n          end;\n          Fin;\n     end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n     Init;\n     TopDecls;\n     BeginBlock;\nend.\n{--------------------------------------------------------------}\n\n\nNote  that we DO have a symbol table, and there is logic to check\na variable name to make sure it's a legal one.    It's also worth\nnoting that I  have  included  the  code  you've  seen  before to\nprovide for white space  and  newlines.    Finally, note that the\nmain program is delimited, as usual, by BEGIN-END brackets.\n\nOnce you've copied  the  program  to  Turbo, the first step is to\ncompile it and make sure it  works.   Give it a few declarations,\nand then a begin-block.  Try something like:\n\n\n     va             (for VAR A)\n     vb             (for VAR B)\n     vc             (for VAR C)\n     b              (for BEGIN)\n     a=b\n     b=c\n     e.             (for END.)\n\n\nAs usual, you should also make some deliberate errors, and verify\nthat the program catches them correctly.\n\n\nDECLARING A PROCEDURE\n\nIf you're satisfied that our little program works, then it's time\nto  deal  with  the  procedures.  Since we haven't  talked  about\n                             \n\n\n\n\n\n\nparameters yet, we'll begin by considering  only  procedures that\nhave no parameter lists.\n\nAs a start, let's consider a simple program with a procedure, and\nthink about the code we'd like to see generated for it:\n\n\n     PROGRAM FOO;\n     .\n     .\n     PROCEDURE BAR;                     BAR:\n     BEGIN                                   .\n     .                                       .\n     .                                       .\n     END;                                    RTS\n\n     BEGIN { MAIN PROGRAM }             MAIN:\n     .                                       .\n     .                                       .\n     FOO;                                    BSR BAR\n     .                                       .\n     .                                       .\n     END.                                    END MAIN\n\n\nHere I've shown  the  high-order language constructs on the left,\nand the desired assembler code on the right.  The first  thing to\nnotice  is that we certainly don't have  much  code  to  generate\nhere!  For  the  great  bulk  of  both the procedure and the main\nprogram,  our existing constructs take care of  the  code  to  be\ngenerated.\n\nThe key to dealing with the body of the procedure is to recognize\nthat  although a procedure may be quite  long,  declaring  it  is\nreally no different than  declaring  a  variable.   It's just one\nmore kind of declaration.  We can write the BNF:\n\n\n     <declaration> ::= <data decl> | <procedure>\n\n\nThis means that it should be easy to modify TopDecl to  deal with\nprocedures.  What about the syntax of a procedure?   Well, here's\na suggested syntax, which is essentially that of Pascal:\n\n\n     <procedure> ::= PROCEDURE <ident> <begin-block>\n\n\nThere is practically no code generation required, other than that\ngenerated within the begin-block.    We need only emit a label at\nthe beginning of the procedure, and an RTS at the end.\n\nHere's the required code:\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Procedure Declaration }\n\nprocedure DoProc;\nvar N: char;\nbegin\n     Match('p');\n     N := GetName;\n     Fin;\n     if InTable(N) then Duplicate(N);\n     ST[N] := 'p';\n     PostLabel(N);\n     BeginBlock;\n     Return;\nend;\n{--------------------------------------------------------------}\n\n\nNote that I've added a new code generation routine, Return, which\nmerely emits an RTS instruction.  The creation of that routine is\n\"left as an exercise for the student.\"\n\nTo  finish  this  version, add the following line within the Case\nstatement in DoBlock:\n\n\n            'p': DoProc;\n\n\nI should mention that  this  structure  for declarations, and the\nBNF that drives it, differs from standard Pascal.  In  the Jensen\n& Wirth  definition of Pascal, variable declarations, in fact ALL\nkinds of declarations,  must  appear in a specific sequence, i.e.\nlabels,   constants,  types,  variables,  procedures,  and   main\nprogram.  To  follow  such  a  scheme, we should separate the two\ndeclarations, and have code in the main program something like\n\n\n     DoVars;\n     DoProcs;\n     DoMain;\n\n\nHowever,  most implementations of Pascal, including Turbo,  don't\nrequire  that  order  and  let  you  freely  mix up  the  various\ndeclarations,  as  long  as  you  still  don't  try to  refer  to\nsomething  before  it's  declared.    Although  it  may  be  more\naesthetically pleasing to declare all the global variables at the\ntop of the  program,  it  certainly  doesn't do any HARM to allow\nthem to be sprinkled around.   In  fact,  it may do some GOOD, in\nthe  sense  that it gives you the  opportunity  to  do  a  little\nrudimentary  information  hiding.     Variables  that  should  be\naccessed only by the main program, for example,  can  be declared\njust before it and will thus be inaccessible by the procedures.\n\nOK, try this new version out.  Note that we  can  declare as many\nprocedures as we choose (as long  as  we don't run out of single-\ncharacter names!), and the  labels  and RTS's all come out in the\nright places.\n\nIt's  worth  noting  here  that  I  do  _NOT_  allow  for  nested\nprocedures.   In TINY, all procedures must  be  declared  at  the\nglobal level,  the  same  as  in  C.    There  has  been  quite a\ndiscussion about this point in  the  Computer  Language  Forum of\nCompuServe.  It turns out that there is a significant  penalty in\ncomplexity that must be paid for the luxury of nested procedures.\nWhat's  more,  this  penalty gets paid at RUN TIME, because extra\ncode must be added and executed every time a procedure is called.\nI also happen to believe that nesting is not a good  idea, simply\non the grounds that I have seen too many abuses of the feature.\nBefore going on to the next step, it's also worth noting that the\n\"main program\" as it stands  is incomplete, since it doesn't have\nthe label and END statement.  Let's fix that little oversight:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Main Program }\n\nprocedure DoMain;\nbegin\n     Match('b');\n     Fin;\n     Prolog;\n     DoBlock;\n     Epilog;\nend;\n{--------------------------------------------------------------}\n.\n.\n.\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n     Init;\n     TopDecls;\n     DoMain;\nend.\n{--------------------------------------------------------------}\n\n\nNote  that  DoProc  and DoMain are not quite symmetrical.  DoProc\nuses a call to BeginBlock, whereas DoMain cannot.  That's because\na procedure  is signaled by the keyword PROCEDURE (abbreviated by\na 'p' here), while the main program gets no  keyword  other  than\nthe BEGIN itself.\n\nAnd _THAT_ brings up an interesting question: WHY?\n\nIf  we  look  at the structure of C programs, we  find  that  all\nfunctions are treated just  alike,  except  that the main program\nhappens to be identified by its name, \"main.\"  Since  C functions\ncan appear in any order, the main program can also be anywhere in\nthe compilation unit.\n\nIn Pascal, on the other hand, all variables  and  procedures must\nbe declared before they're  used,  which  means  that there is no\npoint putting anything after the  main program ... it could never\nbe accessed.  The \"main program\" is not identified at  all, other\nthan  being that part of the code that  comes  after  the  global\nBEGIN.  In other words, if it ain't anything else, it must be the\nmain program.\n\nThis  causes  no  small  amount   of   confusion   for  beginning\nprogrammers, and for big Pascal programs sometimes it's difficult\nto  find the beginning of the main program at all.  This leads to\nconventions such as identifying it in comments:\n\n\n     BEGIN { of MAIN }\n\n\nThis  has  always  seemed  to  me to be a bit of a kludge.    The\nquestion comes up:    Why  should  the main program be treated so\nmuch  differently  than  a  procedure?   In fact, now that  we've\nrecognized that  procedure declarations are just that ... part of\nthe global declarations ... isn't  the main program just one more\ndeclaration, also?\n\nThe answer is yes, and by  treating  it that way, we can simplify\nthe code and make  it  considerably  more  orthogonal.  I propose\nthat  we  use  an explicit keyword, PROGRAM, to identify the main\nprogram (Note that this  means  that we can't start the file with\nit, as in Pascal).  In this case, our BNF becomes:\n\n\n     <declaration> ::= <data decl> | <procedure> | <main program>\n\n\n     <procedure> ::= PROCEDURE <ident> <begin-block>\n\n\n     <main program> ::= PROGRAM <ident> <begin-block>\n\n\nThe code  also  looks  much  better,  at  least in the sense that\nDoMain and DoProc look more alike:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Main Program }\n\nprocedure DoMain;\nvar N: char;\nbegin\n     Match('P');\n     N := GetName;\n     Fin;\n     if InTable(N) then Duplicate(N);\n     Prolog;\n     BeginBlock;\nend;\n{--------------------------------------------------------------}\n.\n.\n.\n{--------------------------------------------------------------}\n{ Parse and Translate Global Declarations }\n\nprocedure TopDecls;\nbegin\n     while Look <> '.' do begin\n      case Look of\n            'v': Decl;\n            'p': DoProc;\n            'P': DoMain;\n          else Abort('Unrecognized Keyword ' + Look);\n          end;\n          Fin;\n     end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n     Init;\n     TopDecls;\n     Epilog;\nend.\n{--------------------------------------------------------------}\n\n\nSince the declaration of the main program is now within  the loop\nof  TopDecl,  that  does  present  some difficulties.  How do  we\nensure that it's  the last thing in the file?  And how do we ever\nexit  from  the  loop?  My answer for the second question, as you\ncan see, was to bring back our old friend the  period.   Once the\nparser sees that, we're done.\n\nTo  answer  the first question:  it  depends  on  how  far  we're\nwilling to go to  protect  the programmer from dumb mistakes.  In\nthe code that I've shown,  there's nothing to keep the programmer\nfrom adding code after  the  main  program  ... even another main\nprogram.   The code will just not be  accessible.    However,  we\nCOULD access it via a FORWARD statement, which we'll be providing\nlater. As a  matter  of fact, many assembler language programmers\nlike to use  the  area  just  after the program to declare large,\nuninitialized data blocks, so there may indeed be  some  value in\nnot  requiring the main program to be last.  We'll leave it as it\nis.\n\nIf we decide  that  we  should  give the programmer a little more\nhelp than that, it's pretty easy to add some logic to kick us out\nof the loop  once  the  main  program  has been processed.  Or we\ncould  at least flag an error if someone  tries  to  include  two\nmains.\n\n\nCALLING THE PROCEDURE\n\nIf you're satisfied that  things  are  working, let's address the\nsecond half of the equation ... the call.\n\nConsider the BNF for a procedure call:\n\n\n     <proc_call> ::= <identifier>\n\n\nfor an assignment statement, on the other hand, the BNF is:\n\n\n     <assignment> ::= <identifier> '=' <expression>\n\n\nAt this point we seem to  have  a problem. The two BNF statements\nboth begin on the  right-hand  side  with the token <identifier>.\nHow are we supposed to know, when we see the  identifier, whether\nwe have a procedure call or an assignment statement?   This looks\nlike a case where our  parser ceases being predictive, and indeed\nthat's exactly the case.  However, it turns  out  to  be  an easy\nproblem to fix, since all we have to do is to look at the type of\nthe identifier, as  recorded  in  the  symbol  table.    As we've\ndiscovered before, a  minor  local  violation  of  the predictive\nparsing rule can be easily handled as a special case.\n\nHere's how to do it:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Assignment Statement }\n\nprocedure Assignment(Name: char);\nbegin\n     Match('=');\n     Expression;\n     StoreVar(Name);\nend;\n\n\n{--------------------------------------------------------------}\n{ Decide if a Statement is an Assignment or Procedure Call }\n\nprocedure AssignOrProc;\nvar Name: char;\nbegin\n     Name := GetName;\n     case TypeOf(Name) of\n          ' ': Undefined(Name);\n          'v': Assignment(Name);\n          'p': CallProc(Name);\n          else Abort('Identifier ' + Name +\n                                   ' Cannot Be Used Here');\n     end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Block of Statements }\n\nprocedure DoBlock;\nbegin\n     while not(Look in ['e']) do begin\n          AssignOrProc;\n          Fin;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nAs you can see, procedure Block now calls AssignOrProc instead of\nAssignment.  The function of this new procedure is to simply read\nthe identifier,  determine  its  type,  and  then  call whichever\nprocedure  is  appropriate  for  that  type.  Since the name  has\nalready been read,  we  must  pass  it to the two procedures, and\nmodify Assignment to match.   Procedure CallProc is a simple code\ngeneration routine:\n\n\n{--------------------------------------------------------------}\n{ Call a Procedure }\n\nprocedure CallProc(N: char);\nbegin\n     EmitLn('BSR ' + N);\nend;\n{--------------------------------------------------------------}\n\n\nWell,  at  this  point  we  have  a  compiler  that can deal with\nprocedures.    It's  worth  noting  that   procedures   can  call\nprocedures to any depth.  So even though we  don't  allow  nested\nDECLARATIONS, there  is certainly nothing to keep us from nesting\nCALLS, just as  we  would  expect  to  do in any language.  We're\ngetting there, and it wasn't too hard, was it?\n\nOf course, so far we can  only  deal with procedures that have no\nparameters.    The  procedures  can  only operate on  the  global\nvariables  by  their  global names.  So at this point we have the\nequivalent of BASIC's GOSUB construct.  Not too bad ... after all\nlots of serious programs were written using GOSUBs, but we can do\nbetter, and we will.  That's the next step.\n\n\nPASSING PARAMETERS\n\nAgain, we all know the basic idea of passed parameters, but let's\nreview them just to be safe.\n\nIn general the procedure is given a parameter list, for example\n\n     PROCEDURE FOO(X, Y, Z)\n\nIn  the declaration of a procedure,  the  parameters  are  called\nformal  parameters, and may be referred to in  the  body  of  the\nprocedure  by  those  names.    The  names  used for  the  formal\nparameters  are  really  arbitrary.    Only  the  position really\ncounts.  In  the  example  above,  the name 'X' simply means \"the\nfirst parameter\" wherever it is used.\n\nWhen a procedure is called,  the \"actual parameters\" passed to it\nare associated  with  the  formal  parameters,  on  a one-for-one\nbasis.\n\nThe BNF for the syntax looks something like this:\n\n\n     <procedure> ::= PROCEDURE <ident>\n                    '(' <param-list> ')' <begin-block>\n\n\n     <param_list> ::= <parameter> ( ',' <parameter> )* | null\n\nSimilarly, the procedure call looks like:\n\n\n     <proc call> ::= <ident> '(' <param-list> ')'\n\n\nNote that there is already an implicit decision  built  into this\nsyntax.  Some languages, such as Pascal and Ada, permit parameter\nlists to be  optional.    If  there are no parameters, you simply\nleave off the parens  completely.    Other  languages, like C and\nModula 2, require the parens even if the list is empty.  Clearly,\nthe example we just finished corresponds to the  former  point of\nview.  But to tell the truth I prefer the latter.  For procedures\nalone, the  decision would seem to favor the \"listless\" approach.\nThe statement\n\n\n     Initialize; ,\n\n\nstanding alone, can only  mean  a procedure call.  In the parsers\nwe've  been  writing,  we've  made  heavy  use  of  parameterless\nprocedures, and it would seem a  shame  to have to write an empty\npair of parens for each case.\n\nBut later on we're going to  be  using functions, too.  And since\nfunctions  can  appear  in  the  same  places  as  simple  scalar\nidentifiers, you can't tell the  difference between the two.  You\nhave to go  back  to  the  declarations  to find out.  Some folks\nconsider  this to be an advantage.  Their  argument  is  that  an\nidentifier gets replaced by a value, and what do you care whether\nit's done by  substitution  or  by  a function?  But we sometimes\n_DO_ care, because the function may be quite time-consuming.  If,\nby  writing  a  simple identifier into a given expression, we can\nincur a heavy run-time penalty, it seems to  me  we  ought  to be\nmade aware of it.\n\nAnyway,  Niklaus  Wirth  designed both Pascal and Modula 2.  I'll\ngive him the benefit of the doubt and assume that  he  had a good\nreason for changing the rules the second time around!\n\nNeedless to say, it's an easy thing to accomodate either point of\nview as we design a language, so this one is strictly a matter of\npersonal preference.  Do it whichever way you like best.\n\nBefore we go any further, let's alter the translator to  handle a\n(possibly empty) parameter list.  For now we  won't  generate any\nextra code ... just parse the syntax.  The  code  for  processing\nthe declaration has very  much  the  same  form we've seen before\nwhen dealing with VAR-lists:\n\n\n{--------------------------------------------------------------}\n{ Process the Formal Parameter List of a Procedure }\n\nprocedure FormalList;\nbegin\n     Match('(');\n     if Look <> ')' then begin\n          FormalParam;\n          while Look = ',' do begin\n               Match(',');\n               FormalParam;\n          end;\n     end;\n     Match(')');\nend;\n{--------------------------------------------------------------}\n\n\nProcedure DoProc needs to have a line added to call FormalList:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Procedure Declaration }\n\nprocedure DoProc;\nvar N: char;\nbegin\n     Match('p');\n     N := GetName;\n     FormalList;\n     Fin;\n     if InTable(N) then Duplicate(N);\n     ST[N] := 'p';\n     PostLabel(N);\n     BeginBlock;\n     Return;\nend;\n{--------------------------------------------------------------}\n\n\nFor now, the code for FormalParam is just a dummy one that simply\nskips the parameter name:\n\n\n{--------------------------------------------------------------}\n{ Process a Formal Parameter }\n\nprocedure FormalParam;\nvar Name:  char;\nbegin\n     Name := GetName;\nend;\n{--------------------------------------------------------------}\n\n\nFor  the actual procedure call, there must  be  similar  code  to\nprocess the actual parameter list:\n\n\n{--------------------------------------------------------------}\n{ Process an Actual Parameter }\n\nprocedure Param;\nvar Name:  char;\nbegin\n     Name := GetName;\nend;\n\n\n{--------------------------------------------------------------}\n{ Process the Parameter List for a Procedure  Call }\n\nprocedure ParamList;\nbegin\n     Match('(');\n     if Look <> ')' then begin\n          Param;\n          while Look = ',' do begin\n               Match(',');\n               Param;\n          end;\n     end;\n     Match(')');\nend;\n\n\n{--------------------------------------------------------------}\n{ Process a Procedure Call }\n\nprocedure CallProc(Name: char);\nbegin\n     ParamList;\n     Call(Name);\nend;\n{--------------------------------------------------------------}\n\n\nNote  here  that  CallProc  is  no  longer  just  a  simple  code\ngeneration  routine.  It has some structure to  it.    To  handle\nthis, I've renamed the code  generation routine to just Call, and\ncalled it from within CallProc.\n\nOK, if you'll add all this code to  your  translator  and  try it\nout, you'll find that you can indeed parse the syntax properly.\nI'll note in  passing  that  there  is _NO_ checking to make sure\nthat  the  number  (and,  later,  types)  of  formal  and  actual\nparameters match up.  In a production compiler, we must of course\ndo  this.  We'll ignore the issue now if for no other reason than\nthat the structure of our  symbol table doesn't currently give us\na place to store the necessary information.  Later on, we'll have\na place for that data and we can deal with the issue then.\n\n\nTHE SEMANTICS OF PARAMETERS\n\nSo  far we've dealt with the SYNTAX  of  parameter  passing,  and\nwe've got the parsing mechanisms in place to handle it.  Next, we\nhave to look at the SEMANTICS, i.e., the actions to be taken when\nwe encounter parameters. This brings  us  square  up  against the\nissue of the different ways parameters can be passed.\n\nThere is more than one way to pass a parameter, and the way we do\nit can have a  profound  effect on the character of the language.\nSo  this is another of those areas where I can't just give you my\nsolution.  Rather, it's important that we spend some time looking\nat the  alternatives  so  that  you  can  go another route if you\nchoose to.\n\nThere are two main ways parameters are passed:\n\n     o By value\n     o By reference (address)\n\nThe differences are best seen in the light of a little history.\n\nThe old FORTRAN compilers passed all parameters by reference.  In\nother  words, what was actually passed was  the  address  of  the\nparameter.  This meant  that  the  called  subroutine was free to\neither read or  write  that  parameter,  as often as it chose to,\njust  as though it were a global variable.    This  was  actually\nquite an efficient  way  to  do  things, and it was pretty simple\nsince  the  same  mechanism  was  used  in  all cases,  with  one\nexception that I'll get to shortly.\n\nThere were problems, though.  Many people felt  that  this method\ncreated entirely too much coupling between the  called subroutine\nand  its  caller.    In  effect, it gave the subroutine  complete\naccess to all variables that appeared in the parameter list.\n\nMany  times,  we  didn't want to actually change a parameter, but\nonly use it as an input.  For example, we  might  pass an element\ncount  to a subroutine, and wish we could  then  use  that  count\nwithin a DO-loop.    To  avoid  changing the value in the calling\nprogram, we had to make a local copy of the input  parameter, and\noperate only on the  copy.    Some  FORTRAN programmers, in fact,\nmade it a practice to copy ALL parameters except those  that were\nto be used as return values.    Needless to say, all this copying\ndefeated  a  good  bit  of  the  efficiency  associated with  the\napproach.\n\nThere was, however, an even more insidious problem, which was not\nreally just the fault of  the \"pass by reference\" convention, but\na bad convergence of several implementation decisions.\n\nSuppose we have a subroutine:\n\n\n     SUBROUTINE FOO(X, Y, N)\n\n\nwhere N is some kind of  input  count  or flag.  Many times, we'd\nlike  to be able to pass a literal or even an expression in place\nof a variable, such as:\n\n\n     CALL FOO(A, B, J + 1)\n\n\nHere the third  parameter  is  not  a  variable, and so it has no\naddress.    The  earliest FORTRAN compilers did  not  allow  such\nthings, so we had to resort to subterfuges like:\n\n\n     K = J + 1\n     CALL FOO(A, B, K)\n\n\nHere again, there was copying required, and the burden was on the\nprogrammer to do it.  Not good.\n\nLater  FORTRAN  implementations  got  rid  of  this  by  allowing\nexpressions  as  parameters.   What they  did  was  to  assign  a\ncompiler-generated variable, store the value of the expression in\nthe variable, and then pass the address of the expression.\n\nSo far, so good.    Even if the subroutine mistakenly altered the\nanonymous variable, who was to know  or  care?  On the next call,\nit would be recalculated anyway.\n\nThe  problem  arose  when  someone  decided to make  things  more\nefficient.  They  reasoned,  rightly enough, that the most common\nkind of \"expression\" was a single integer value, as in:\n\n\n     CALL FOO(A, B, 4)\n\n\nIt seemed inefficient to go to the trouble of \"computing\" such an\ninteger and storing it  in  a temporary variable, just to pass it\nthrough  the  calling  list.  Since we had to pass the address of\nthe  thing  anyway,  it seemed to make lots of sense to just pass\nthe address of the literal integer, 4 in the example above.\n\nTo make matters  more  interesting, most compilers, then and now,\nidentify all literals and store  them  separately  in  a \"literal\npool,\"  so that we only have to store one  value  for each unique\nliteral.    That  combination  of  design  decisions:     passing\nexpressions, optimization for literals as a special case, and use\nof a literal pool, is what led to disaster.\n\nTo  see  how  it works, imagine that we call subroutine FOO as in\nthe example above, passing  it  a literal 4.  Actually, what gets\npassed  is  the  address of the literal 4, which is stored in the\nliteral pool.   This address corresponds to the formal parameter,\nK, in the subroutine itself.\n\nNow suppose that, unbeknownst to the  programmer,  subroutine FOO\nactually modifies K to be, say, -7.  Suddenly, that literal  4 in\nthe literal pool  gets  CHANGED,  to  a  -7.  From then on, every\nexpression that uses  a  4  and  every subroutine that passes a 4\nwill be using the value of -7 instead!  Needless to say, this can\nlead to some  bizarre  and difficult-to-find behavior.  The whole\nthing gave  the concept of pass-by-reference a bad name, although\nas we have seen, it was really a combination of  design decisions\nthat led to the problem.\n\nIn spite of  the  problem,  the  FORTRAN  approach  had  its good\npoints.    Chief  among them is the fact that we  don't  have  to\nsupport  multiple  mechanisms.    The  same  scheme,  passing the\naddress of  the argument, works for EVERY case, including arrays.\nSo the size of the compiler can be reduced.\n\nPartly because of the FORTRAN  gotcha, and partly just because of\nthe reduced coupling involved, modern languages  like  C, Pascal,\nAda, and Modula 2 generally pass scalars by value.\n\nThis means that the value of the scalar is COPIED into a separate\nvalue  used only for the call.  Since the value passed is a copy,\nthe called procedure can use it as a local variable and modify it\nany way it likes.  The value in the caller will not be changed.\n\nIt may seem at first that  this  is a bit inefficient, because of\nthe need to copy the parameter.  But remember that we're going to\nhave  to  fetch SOME value to pass  anyway,  whether  it  be  the\nparameter  itself  or  an address for it.  Inside the subroutine,\nusing  pass-by-value  is  definitely  more  efficient,  since  we\neliminate one level of indirection.  Finally, we saw earlier that\nwith  FORTRAN,  it  was often necessary to make copies within the\nsubroutine anyway, so pass-by-value reduces the  number  of local\nvariables.  All in all, pass-by-value is better.\n\nExcept for one small little detail:  if all parameters are passed\nby value, there is no way for a called to  procedure  to return a\nresult to its caller!  The parameter passed is NOT altered in the\ncaller,  only  in  the called procedure.  Clearly, that won't get\nthe job done.\n\nThere  have  been   two   answers  to  this  problem,  which  are\nequivalent.   In Pascal, Wirth provides for VAR parameters, which\nare  passed-by-reference.    What a VAR parameter is, in fact, is\nnone other than our old friend the FORTRAN parameter, with  a new\nname and paint job for disguise.  Wirth neatly  gets  around  the\n\"changing a literal\"  problem  as  well  as  the  \"address  of an\nexpression\" problem, by  the  simple expedient of allowing only a\nvariable to be the actual parameter.  In other  words,  it's  the\nsame restriction that the earliest FORTRANs imposed.\n\nC does the same thing, but explicitly.  In  C,  _ALL_  parameters\nare passed  by  value.    One  kind  of variable that C supports,\nhowever, is the pointer.  So  by  passing a pointer by value, you\nin effect pass what it points to by reference.  In some ways this\nworks even better yet,  because  even  though  you can change the\nvariable  pointed to all you like, you  still  CAN'T  change  the\npointer itself.  In a function such as strcpy, for example, where\nthe  pointers are incremented as the string  is  copied,  we  are\nreally only incrementing copies of the pointers, so the values of\nthose  pointers in the calling procedure  still  remain  as  they\nwere.  To modify a  pointer,  you  must  pass  a  pointer  to the\npointer.\n\nSince we are simply  performing  experiments  here, we'll look at\nBOTH pass-by-value and pass-by-reference.    That  way,  we'll be\nable to use either one as we need to.  It's worth mentioning that\nit's  going  to  be tough to use the C approach to pointers here,\nsince a pointer is a different type and we haven't  studied types\nyet!\n\n\nPASS-BY-VALUE\n\nLet's just try some simple-minded  things and see where they lead\nus.    Let's begin with the pass-by-value  case.    Consider  the\nprocedure call:\n\n\n     FOO(X, Y)\n\n\nAlmost the only reasonable way to pass the data  is  through  the\nCPU stack.  So the code we'd like  to  see  generated  might look\nsomething like this:\n\n\n     MOVE X(PC),-(SP)    ; Push X\n     MOVE Y(PC),-(SP)    ; Push Y\n     BSR FOO             ; Call FOO\n\n\nThat certainly doesn't seem too complex!\n\nWhen the BSR is executed, the CPU pushes the return  address onto\nthe stack and jumps to FOO.    At  this point the stack will look\nlike this:\n\n          .\n          .\n          Value of X (2 bytes)\n          Value of Y (2 bytes)\n  SP -->  Return Address (4 bytes)\n\n\nSo the values of  the  parameters  have  addresses that are fixed\noffsets from the stack pointer.  In this  example,  the addresses\nare:\n\n\n     X:  6(SP)\n     Y:  4(SP)\n\n\nNow consider what the called procedure might look like:\n\n\n     PROCEDURE FOO(A, B)\n     BEGIN\n          A = B\n     END\n\n(Remember, the names  of  the formal parameters are arbitrary ...\nonly the positions count.)\n\nThe desired output code might look like:\n\n\n     FOO: MOVE 4(SP),D0\n          MOVE D0,6(SP)\n          RTS\n\n\nNote that, in order to address the formal parameters, we're going\nto have to know  which  position they have in the parameter list.\nThis means some changes to the symbol table stuff.  In  fact, for\nour single-character case it's best to just create  a  new symbol\ntable for the formal parameters.\n\nLet's begin by declaring a new table:\n\n\n     var Params: Array['A'..'Z'] of integer;\n\n\nWe  also  will  need to keep track of how many parameters a given\nprocedure has:\n\n\n     var NumParams: integer;\n\n\nAnd we need to initialize the new table.  Now, remember  that the\nformal parameter list  will  be different for each procedure that\nwe process, so we'll need to initialize that table anew  for each\nprocedure.  Here's the initializer:\n\n\n{--------------------------------------------------------------}\n{ Initialize Parameter Table to Null }\n\nprocedure ClearParams;\nvar i: char;\nbegin\n     for i := 'A' to 'Z' do\n          Params[i] := 0;\n     NumParams := 0;\nend;\n{--------------------------------------------------------------}\n\n\nWe'll put a call to this procedure in Init, and  also  at the end\nof DoProc:\n\n\n{--------------------------------------------------------------}\n{ Initialize }\n\nprocedure Init;\nvar i: char;\nbegin\n     GetChar;\n     SkipWhite;\n     for i := 'A' to 'Z' do\n          ST[i] := ' ';\n     ClearParams;\nend;\n{--------------------------------------------------------------}\n.\n.\n.\n{--------------------------------------------------------------}\n{ Parse and Translate a Procedure Declaration }\n\nprocedure DoProc;\nvar N: char;\nbegin\n     Match('p');\n     N := GetName;\n     FormalList;\n     Fin;\n     if InTable(N) then Duplicate(N);\n     ST[N] := 'p';\n     PostLabel(N);\n     BeginBlock;\n     Return;\n     ClearParams;\nend;\n{--------------------------------------------------------------}\n\n\nNote that the call  within  DoProc ensures that the table will be\nclear when we're in the main program.\n\n\nOK, now  we  need  a  few procedures to work with the table.  The\nnext few functions are  essentially  copies  of  InTable, TypeOf,\netc.:\n\n\n{--------------------------------------------------------------}\n{ Find the Parameter Number }\n\nfunction ParamNumber(N: char): integer;\nbegin\n     ParamNumber := Params[N];\nend;\n\n\n{--------------------------------------------------------------}\n{ See if an Identifier is a Parameter }\n\nfunction IsParam(N: char): boolean;\nbegin\n     IsParam := Params[N] <> 0;\nend;\n\n\n{--------------------------------------------------------------}\n{ Add a New Parameter to Table }\n\nprocedure AddParam(Name: char);\nbegin\n     if IsParam(Name) then Duplicate(Name);\n     Inc(NumParams);\n     Params[Name] := NumParams;\nend;\n{--------------------------------------------------------------}\n\n\nFinally, we need some code generation routines:\n\n\n{--------------------------------------------------------------}\n{ Load a Parameter to the Primary Register }\n\nprocedure LoadParam(N: integer);\nvar Offset: integer;\nbegin\n     Offset := 4 + 2 * (NumParams - N);\n     Emit('MOVE ');\n     WriteLn(Offset, '(SP),D0');\nend;\n\n\n{--------------------------------------------------------------}\n{ Store a Parameter from the Primary Register }\n\nprocedure StoreParam(N: integer);\nvar Offset: integer;\nbegin\n     Offset := 4 + 2 * (NumParams - N);\n     Emit('MOVE D0,');\n     WriteLn(Offset, '(SP)');\nend;\n\n\n{--------------------------------------------------------------}\n{ Push The Primary Register to the Stack }\n\nprocedure Push;\nbegin\n     EmitLn('MOVE D0,-(SP)');\nend;\n{--------------------------------------------------------------}\n\n\n( The last routine is one we've seen  before,  but  it  wasn't in\nthis vestigial version of the program.)\n\nWith those preliminaries in place, we're ready to  deal  with the\nsemantics of procedures with calling lists (remember, the code to\ndeal with the syntax is already in place).\n\nLet's begin by processing a formal parameter.  All we have  to do\nis to add each parameter to the parameter symbol table:\n\n\n{--------------------------------------------------------------}\n{ Process a Formal Parameter }\n\nprocedure FormalParam;\nbegin\n     AddParam(GetName);\nend;\n{--------------------------------------------------------------}\n\n\nNow, what about dealing with a formal parameter  when  it appears\nin the body of the procedure?  That takes a little more work.  We\nmust first determine that it IS a formal parameter.  To  do this,\nI've written a modified version of TypeOf:\n\n\n{--------------------------------------------------------------}\n{ Get Type of Symbol }\n\nfunction TypeOf(n: char): char;\nbegin\n     if IsParam(n) then\n          TypeOf := 'f'\n     else\n          TypeOf := ST[n];\nend;\n{--------------------------------------------------------------}\n\n\n(Note that, since  TypeOf  now  calls  IsParam, it may need to be\nrelocated in your source.)\n\nWe also must modify AssignOrProc to deal with this new type:\n\n\n{--------------------------------------------------------------}\n{ Decide if a Statement is an Assignment or Procedure Call }\n\nprocedure AssignOrProc;\nvar Name: char;\nbegin\n     Name := GetName;\n     case TypeOf(Name) of\n          ' ': Undefined(Name);\n          'v', 'f': Assignment(Name);\n          'p': CallProc(Name);\n          else  Abort('Identifier ' + Name +  '  Cannot  Be  Used\nHere');\n     end;\nend;\n{--------------------------------------------------------------}\n\n\nFinally,  the  code  to process an assignment  statement  and  an\nexpression must be extended:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Expression }\n{ Vestigial Version }\n\nprocedure Expression;\nvar Name: char;\nbegin\n     Name := GetName;\n     if IsParam(Name) then\n          LoadParam(ParamNumber(Name))\n     else\n          LoadVar(Name);\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Assignment Statement }\n\nprocedure Assignment(Name: char);\nbegin\n     Match('=');\n     Expression;\n     if IsParam(Name) then\n          StoreParam(ParamNumber(Name))\n     else\n          StoreVar(Name);\nend;\n{--------------------------------------------------------------}\n\n\nAs you can see, these procedures will treat  every  variable name\nencountered as either a  formal  parameter  or a global variable,\ndepending  on  whether  or not it appears in the parameter symbol\ntable.   Remember  that  we  are  using  only a vestigial form of\nExpression.  In the  final  program,  the  change shown here will\nhave to be added to Factor, not Expression.\n\nThe rest is easy.  We need only add the  semantics  to the actual\nprocedure call, which we can do with one new line of code:\n\n\n{--------------------------------------------------------------}\n{ Process an Actual Parameter }\n\nprocedure Param;\nbegin\n     Expression;\n     Push;\nend;\n{--------------------------------------------------------------}\n\n\nThat's  it.  Add these changes to your program and give it a try.\nTry declaring one or two procedures, each with a formal parameter\nlist.  Then do some assignments, using combinations of global and\nformal  parameters.    You  can  call one procedure  from  within\nanother, but you cannot DECLARE a nested procedure.  You can even\npass formal parameters from one procedure to another.  If  we had\nthe  full  syntax  of the language here, you'd also be able to do\nthings like read  or  write  formal  parameters  or  use  them in\ncomplicated expressions.\n\n\nWHAT'S WRONG?\n\nAt this point, you might be thinking: Surely there's more to this\nthan a few pushes and  pops.    There  must  be  more  to passing\nparameters than this.\n\nYou'd  be  right.    As  a  matter  of fact, the code that  we're\ngenerating here leaves a lot to be desired in several respects.\n\nThe most glaring oversight is that it's wrong!   If  you'll  look\nback at the code for a procedure call, you'll see that the caller\npushes each actual parameter onto the stack before  it  calls the\nprocedure.  The  procedure  USES that information, but it doesn't\nchange the stack  pointer.    That  means that the stuff is still\nthere when we return. SOMEBODY needs to clean up  the  stack,  or\nwe'll soon be in very hot water!\n\nFortunately,  that's  easily fixed.  All we  have  to  do  is  to\nincrement the stack pointer when we're finished.\n\nShould  we  do  that  in  the  calling  program,  or  the  called\nprocedure?   Some folks let the called  procedure  clean  up  the\nstack,  since  that  requires less code to be generated per call,\nand since the procedure, after  all,  knows  how  many parameters\nit's got.   But  that  means  that  it must do something with the\nreturn address so as not to lose it.\n\nI prefer letting  the  caller  clean  up, so that the callee need\nonly execute a return.  Also, it seems a bit more balanced, since\nthe caller is  the  one  who  \"messed  up\" the stack in the first\nplace.  But  THAT  means  that  the caller must remember how many\nitems  it  pushed.    To  make  things  easy, I've  modified  the\nprocedure  ParamList to be a function  instead  of  a  procedure,\nreturning the number of bytes pushed:\n\n\n{--------------------------------------------------------------}\n{ Process the Parameter List for a Procedure  Call }\n\nfunction ParamList: integer;\nvar N: integer;\nbegin\n     N := 0;\n     Match('(');\n     if Look <> ')' then begin\n          Param;\n          inc(N);\n          while Look = ',' do begin\n               Match(',');\n               Param;\n               inc(N);\n          end;\n     end;\n     Match(')');\n     ParamList := 2 * N;\nend;\n{--------------------------------------------------------------}\n\n\nProcedure CallProc then uses this to clean up the stack:\n\n\n{--------------------------------------------------------------}\n{ Process a Procedure Call }\n\nprocedure CallProc(Name: char);\nvar N: integer;\nbegin\n     N := ParamList;\n     Call(Name);\n     CleanStack(N);\nend;\n{--------------------------------------------------------------}\n\n\nHere I've created yet another code generation procedure:\n\n\n{--------------------------------------------------------------}\n{ Adjust the Stack Pointer Upwards by N Bytes }\n\nprocedure CleanStack(N: integer);\nbegin\n     if N > 0 then begin\n          Emit('ADD #');\n          WriteLn(N, ',SP');\n     end;\nend;\n{--------------------------------------------------------------}\n\n\nOK, if you'll add this code to your compiler, I think you'll find\nthat the stack is now under control.\n\nThe next problem has to do with our way of addressing relative to\nthe stack pointer.  That works fine in our simple examples, since\nwith our rudimentary  form  of expressions nobody else is messing\nwith the stack.  But consider a different example as simple as:\n\n\n     PROCEDURE FOO(A, B)\n     BEGIN\n          A = A + B\n     END\n\n\nThe code generated by a simple-minded parser might be:\n\n\n     FOO: MOVE 6(SP),D0       ; Fetch A\n          MOVE D0,-(SP)       ; Push it\n          MOVE 4(SP),D0       ; Fetch B\n          ADD (SP)+,D0        ; Add A\n          MOVE D0,6(SP)       : Store A\n          RTS\n\n\nThis  would  be  wrong.  When we push the first argument onto the\nstack, the offsets for the two formal parameters are no  longer 4\nand 6, but are 6 and 8.  So the second fetch would fetch A again,\nnot B.\n\nThis is not  the  end of the world.  I think you can see that all\nwe really have to do is to alter the offset every  time  we  do a\npush, and that in fact is what's done if the  CPU  has no support\nfor other methods.\n\nFortunately,   though,   the   68000   does  have  such  support.\nRecognizing that this CPU  would  be  used  a lot with high-order\nlanguage compilers, Motorola decided to  add  direct  support for\nthis kind of thing.\n\nThe problem, as you  can  see, is that as the procedure executes,\nthe stack  pointer  bounces  up  and  down,  and so it becomes an\nawkward  thing  to  use  as  a  reference  to access  the  formal\nparameters.  The solution is to define some _OTHER_ register, and\nuse  it instead.  This register is typically  set  equal  to  the\noriginal stack pointer, and is called the frame pointer.\n\nThe  68000 instruction set LINK lets you  declare  such  a  frame\npointer, and  sets  it  equal  to  the  stack pointer, all in one\ninstruction.  As a matter of  fact,  it does even more than that.\nSince this register may have been in use for  something  else  in\nthe calling procedure, LINK also pushes the current value of that\nregister onto the stack.  It  can  also  add a value to the stack\npointer, to make room for local variables.\n\nThe complement of LINK is UNLK, which simply  restores  the stack\npointer and pops the old value back into the register.\n\nUsing these two  instructions,  the code for the previous example\nbecomes:\n\n\n     FOO: LINK A6,#0\n          MOVE 10(A6),D0      ; Fetch A\n          MOVE D0,-(SP)       ; Push it\n          MOVE 8(A6),D0       ; Fetch B\n          ADD (SP)+,D0        ; Add A\n          MOVE D0,10(A6)      : Store A\n          UNLK A6\n          RTS\n\n\nFixing the compiler to generate this code is a lot easier than it\nis  to  explain  it.    All we need to do is to modify  the  code\ngeneration created by DoProc.  Since that makes the code a little\nmore than one line, I've created new procedures to deal  with it,\nparalleling the Prolog and Epilog procedures called by DoMain:\n\n\n{--------------------------------------------------------------}\n{ Write the Prolog for a Procedure }\n\nprocedure ProcProlog(N: char);\nbegin\n     PostLabel(N);\n     EmitLn('LINK A6,#0');\nend;\n\n\n{--------------------------------------------------------------}\n{ Write the Epilog for a Procedure }\n\nprocedure ProcEpilog;\nbegin\n     EmitLn('UNLK A6');\n     EmitLn('RTS');\nend;\n{--------------------------------------------------------------}\n\n\nProcedure DoProc now just calls these:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Procedure Declaration }\n\nprocedure DoProc;\nvar N: char;\nbegin\n     Match('p');\n     N := GetName;\n     FormalList;\n     Fin;\n     if InTable(N) then Duplicate(N);\n     ST[N] := 'p';\n     ProcProlog(N);\n     BeginBlock;\n     ProcEpilog;\n     ClearParams;\nend;\n{--------------------------------------------------------------}\n\n\nFinally, we need to  change  the  references  to SP in procedures\nLoadParam and StoreParam:\n\n\n{--------------------------------------------------------------}\n{ Load a Parameter to the Primary Register }\n\nprocedure LoadParam(N: integer);\nvar Offset: integer;\nbegin\n     Offset := 8 + 2 * (NumParams - N);\n     Emit('MOVE ');\n     WriteLn(Offset, '(A6),D0');\nend;\n\n\n{--------------------------------------------------------------}\n{ Store a Parameter from the Primary Register }\n\nprocedure StoreParam(N: integer);\nvar Offset: integer;\nbegin\n     Offset := 8 + 2 * (NumParams - N);\n     Emit('MOVE D0,');\n     WriteLn(Offset, '(A6)');\nend;\n{--------------------------------------------------------------}\n\n\n(Note that the Offset computation  changes to allow for the extra\npush of A6.)\n\nThat's all it takes.  Try this out and see how you like it.\n\nAt this point we  are  generating  some  relatively nice code for\nprocedures and procedure calls.  Within the limitation that there\nare no local variables  (yet)  and  that  no procedure nesting is\nallowed, this code is just what we need.\n\nThere is still just one little small problem remaining:\n\n\n     WE HAVE NO WAY TO RETURN RESULTS TO THE CALLER!\n\n\nBut  that,  of course, is not a  limitation  of  the  code  we're\ngenerating, but  one  inherent  in  the  call-by-value  protocol.\nNotice that we CAN use formal parameters in any  way  inside  the\nprocedure.  We  can  calculate  new  values for them, use them as\nloop counters (if we had loops, that is!), etc.   So  the code is\ndoing what it's supposed to.   To  get over this last problem, we\nneed to look at the alternative protocol.\n\n\nCALL-BY-REFERENCE\n\nThis  one is easy, now that we have  the  mechanisms  already  in\nplace.    We  only  have  to  make  a few  changes  to  the  code\ngeneration.  Instead of  pushing  a value onto the stack, we must\npush an address.  As it turns out, the 68000 has  an instruction,\nPEA, that does just that.\n\nWe'll be  making  a  new  version  of  the test program for this.\nBefore we do anything else,\n\n>>>> MAKE A COPY <<<<\n\nof  the program as it now stands, because  we'll  be  needing  it\nagain later.\n\nLet's begin by looking at the code we'd like to see generated for\nthe new case. Using the same example as before, we need the call\n\n\n     FOO(X, Y)\n\n\nto be translated to:\n\n\n     PEA X(PC)           ; Push the address of X\n     PEA Y(PC)           ; Push Y the address of Y\n     BSR FOO             ; Call FOO\n\n\nThat's a simple matter of a slight change to Param:\n\n\n{--------------------------------------------------------------}\n{ Process an Actual Parameter }\n\nprocedure Param;\nbegin\n     EmitLn('PEA ' + GetName + '(PC)');\nend;\n{--------------------------------------------------------------}\n\n\n(Note that with pass-by-reference, we can't  have  expressions in\nthe calling list, so Param can just read the name directly.)\n\nAt the other end, the references to the formal parameters must be\ngiven one level of indirection:\n\n\n     FOO: LINK A6,#0\n          MOVE.L 12(A6),A0    ; Fetch the address of A\n          MOVE (A0),D0        ; Fetch A\n          MOVE D0,-(SP)       ; Push it\n          MOVE.L 8(A6),A0     ; Fetch the address of B\n          MOVE (A0),D0        ; Fetch B\n          ADD (SP)+,D0        ; Add A\n          MOVE.L 12(A6),A0    ; Fetch the address of A\n          MOVE D0,(A0)        : Store A\n          UNLK A6\n          RTS\n\n\nAll  of  this  can  be   handled  by  changes  to  LoadParam  and\nStoreParam:\n\n\n{--------------------------------------------------------------}\n{ Load a Parameter to the Primary Register }\n\nprocedure LoadParam(N: integer);\nvar Offset: integer;\nbegin\n     Offset := 8 + 4 * (NumParams - N);\n     Emit('MOVE.L ');\n     WriteLn(Offset, '(A6),A0');\n     EmitLn('MOVE (A0),D0');\nend;\n\n\n{--------------------------------------------------------------}\n{ Store a Parameter from the Primary Register }\n\nprocedure StoreParam(N: integer);\nvar Offset: integer;\nbegin\n     Offset := 8 + 4 * (NumParams - N);\n     Emit('MOVE.L ');\n     WriteLn(Offset, '(A6),A0');\n     EmitLn('MOVE D0,(A0)');\nend;\n{--------------------------------------------------------------}\n\nTo  get  the  count  right,  we  must  also  change  one line  in\nParamList:\n\n\n     ParamList := 4 * N;\n\n\nThat  should  do it.  Give it a try and see  if  it's  generating\nreasonable-looking code.  As  you  will  see,  the code is hardly\noptimal,  since  we  reload  the  address register every  time  a\nparameter  is  needed.    But  that's  consistent  with our  KISS\napproach  here,  of  just being sure to generate code that works.\nWe'll  just  make  a  little  note here, that here's yet  another\ncandidate for optimization, and press on.\n\nNow we've learned to process parameters  using  pass-by-value and\npass-by-reference.  In the real world, of course, we'd like to be\nable  to  deal  with BOTH methods.  We can't do that yet, though,\nbecause we have not yet had a session on types,  and  that has to\ncome first.\n\nIf  we can only have ONE method, then of course it has to be  the\ngood ol' FORTRAN method of  pass-by-reference,  since  that's the\nonly way procedures can ever return values to their caller.\n\nThis, in fact, will be one of the differences  between  TINY  and\nKISS.  In the next version of TINY,  we'll  use pass-by-reference\nfor all parameters.  KISS will support both methods.\n\n\nLOCAL VARIABLES\n\nSo  far,  we've  said  nothing  about  local  variables, and  our\ndefinition of procedures doesn't allow  for  them.    Needless to\nsay, that's a big gap in our language, and one  that  needs to be\ncorrected.\n\nHere again we are faced with a choice: Static or dynamic storage?\n\nIn those  old FORTRAN programs, local variables were given static\nstorage just like global ones.  That is, each local  variable got\na  name  and  allocated address, like any other variable, and was\nreferenced by that name.\n\nThat's easy for us to do, using the allocation mechanisms already\nin place.  Remember,  though,  that local variables can have  the\nsame  names as global ones.  We need to somehow deal with that by\nassigning unique names for these variables.\n\nThe characteristic of static storage, of course, is that the data\nsurvives  a procedure call and return.   When  the  procedure  is\ncalled  again,  the  data will still be there.  That  can  be  an\nadvantage in some applications.    In the FORTRAN days we used to\ndo tricks like initialize a flag, so that you could tell when you\nwere entering a  procedure  for  the  first time and could do any\none-time initialization that needed to be done.\n\nOf  course,  the  same  \"feature\"  is also what  makes  recursion\nimpossible with static storage.  Any new call to a procedure will\noverwrite the data already in the local variables.\n\nThe alternative is dynamic storage, in which storage is allocated\non the stack just as for passed parameters.    We  also  have the\nmechanisms  already  for  doing this.  In fact, the same routines\nthat  deal with passed (by value) parameters  on  the  stack  can\neasily deal  with  local  variables  as  well  ... the code to be\ngenerated  is  the  same.  The purpose of the offset in the 68000\nLINK instruction is there just for that reason:  we can use it to\nadjust the stack  pointer  to  make  room  for  locals.   Dynamic\nstorage, of course, inherently supports recursion.\n\nWhen  I  first  began  planning  TINY,  I  must  admit  to  being\nprejudiced in favor of static  storage.    That's  simply because\nthose old FORTRAN  programs  were pretty darned efficient ... the\nearly FORTRAN compilers  produced  a quality of code that's still\nrarely matched by modern compilers.   Even today, a given program\nwritten  in  FORTRAN  is likely to outperform  the  same  program\nwritten in C or Pascal, sometimes  by  wide margins. (Whew!  Am I\ngoing to hear about THAT statement!)\n\nI've always supposed that the reason had to do with the  two main\ndifferences  between  FORTRAN  implementations  and  the  others:\nstatic  storage  and  pass-by-reference.    I  know  that dynamic\nstorage  supports  recursion,  but it's always seemed to me a bit\npeculiar to be willing to accept slower code in the 95%  of cases\nthat don't need recursion, just to get that feature when you need\nit.  The idea is that, with static storage, you can  use absolute\naddressing  rather than indirect addressing, which should  result\nin faster code.\n\nMore recently, though, several folks  have pointed out to me that\nthere really is no performance  penalty  associated  with dynamic\nstorage.  With the 68000, for example, you shouldn't use absolute\naddressing  anyway  ...  most  operating systems require position\nindependent code.  And the 68000 instruction\n\n     MOVE 8(A6),D0\n\nhas exactly the same timing as\n\n     MOVE X(PC),D0.\n\nSo  I'm  convinced,  now, that there is no good reason NOT to use\ndynamic storage.\n\nSince this use of local variables fits so well into the scheme of\npass-by-value  parameters,  we'll  use   that   version   of  the\ntranslator to illustrate it. (I _SURE_ hope you kept a copy!)\n\nThe general idea is to keep track of how  many  local  parameters\nthere  are.    Then we use the integer in the LINK instruction to\nadjust the stack pointer downward to make room for them.   Formal\nparameters are  addressed  as  positive  offsets  from  the frame\npointer, and locals as negative offsets.  With a  little  bit  of\nwork, the same procedures we've  already created can take care of\nthe whole thing.\n\nLet's start by creating a new variable, Base:\n\n\n     var Base: integer;\n\nWe'll use this  variable,  instead of NumParams, to compute stack\noffsets.  That means changing  the two references to NumParams in\nLoadParam and StoreParam:\n\n\n{--------------------------------------------------------------}\n{ Load a Parameter to the Primary Register }\n\nprocedure LoadParam(N: integer);\nvar Offset: integer;\nbegin\n     Offset := 8 + 2 * (Base - N);\n     Emit('MOVE ');\n     WriteLn(Offset, '(A6),D0');\nend;\n\n\n{--------------------------------------------------------------}\n{ Store a Parameter from the Primary Register }\n\nprocedure StoreParam(N: integer);\nvar Offset: integer;\nbegin\n     Offset := 8 + 2 * (Base - N);\n     Emit('MOVE D0,');\n     WriteLn(Offset, '(A6)');\nend;\n{--------------------------------------------------------------}\n\n\nThe idea is that the value of Base will be  frozen  after we have\nprocessed the formal parameters, and  won't  increase  further as\nthe new, local variables, are inserted in the symbol table.  This\nis taken care of at the end of FormalList:\n\n\n{--------------------------------------------------------------}\n{ Process the Formal Parameter List of a Procedure }\n\nprocedure FormalList;\nbegin\n     Match('(');\n     if Look <> ')' then begin\n          FormalParam;\n          while Look = ',' do begin\n               Match(',');\n               FormalParam;\n          end;\n     end;\n     Match(')');\n     Fin;\n     Base := NumParams;\n     NumParams := NumParams + 4;\nend;\n{--------------------------------------------------------------}\n\n\n(We add four words to make allowances for the return  address and\nold frame pointer, which end up between the formal parameters and\nthe locals.)\n\nAbout all we  need  to  do  next  is to install the semantics for\ndeclaring local variables into the parser.  The routines are very\nsimilar to Decl and TopDecls:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Local Data Declaration }\n\nprocedure LocDecl;\nvar Name: char;\nbegin\n   Match('v');\n     AddParam(GetName);\n     Fin;\nend;\n\n\n{--------------------------------------------------------------}\n\n\n{ Parse and Translate Local Declarations }\n\nfunction LocDecls: integer;\nvar n: integer;\nbegin\n     n := 0;\n     while Look = 'v' do begin\n          LocDecl;\n          inc(n);\n     end;\n     LocDecls := n;\nend;\n{--------------------------------------------------------------}\n\n\nNote that LocDecls is a  FUNCTION, returning the number of locals\nto DoProc.\n\nNext, we modify DoProc to use this information:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Procedure Declaration }\n\nprocedure DoProc;\nvar N: char;\n      k: integer;\nbegin\n     Match('p');\n     N := GetName;\n     if InTable(N) then Duplicate(N);\n     ST[N] := 'p';\n     FormalList;\n     k := LocDecls;\n     ProcProlog(N, k);\n     BeginBlock;\n     ProcEpilog;\n     ClearParams;\nend;\n{--------------------------------------------------------------}\n\n\n(I've  made   a  couple  of  changes  here  that  weren't  really\nnecessary.  Aside from rearranging things a bit, I moved the call\nto  Fin  to  within FormalList, and placed one inside LocDecls as\nwell.   Don't forget to put one at the end of FormalList, so that\nwe're together here.)\n\nNote the change in the call  to  ProcProlog.  The new argument is\nthe number of WORDS (not bytes) to allocate space  for.    Here's\nthe new version of ProcProlog:\n\n\n{--------------------------------------------------------------}\n{ Write the Prolog for a Procedure }\n\nprocedure ProcProlog(N: char; k: integer);\nbegin\n     PostLabel(N);\n     Emit('LINK A6,#');\n     WriteLn(-2 * k)\nend;\n{--------------------------------------------------------------}\n\n\nThat should do it.  Add these changes and see how they work.\n\n\nCONCLUSION\n\nAt this point you know  how to compile procedure declarations and\nprocedure calls,  with  parameters  passed  by  reference  and by\nvalue.  You can also handle local variables.  As you can see, the\nhard part is not  in  providing  the  mechanisms, but in deciding\njust which mechanisms to use.  Once we make these  decisions, the\ncode to translate the constructs is really not that difficult.\nI didn't  show  you  how  to  deal  with the combination of local\nparameters   and  pass-by-reference  parameters,  but  that's   a\nstraightforward extension to  what  you've already seen.  It just\ngets a little more messy, that's all, since we  need  to  support\nboth mechanisms instead of just one at a  time.    I'd  prefer to\nsave  that  one  until after we've  dealt  with  ways  to  handle\ndifferent variable types.\n\nThat will be the next installment, which will be coming soon to a\nForum near you.  See you then.\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n"
  },
  {
    "path": "14/Makefile",
    "content": "IN=main.c cradle.c\nOUT=main\nFLAGS=-Wall -Werror\n\nall:\n\tgcc -o $(OUT) $(IN) $(FLAGS)\n\nrun:\n\t./$(OUT)\n\n.PHONY: clean\nclean:\n\trm $(OUT)\n"
  },
  {
    "path": "14/README.md",
    "content": "# Notes\n\nx86 instructions is different to 68000 in some ways, the generated code is\na trying to simulate the author's 68k codes. For example, the \"MOVE\"\nexpression in 68k only requires the type information like \"MOVE.L A(PC),D0\" or\n\"MOVE.B A(PC),D0\" while x86 specify length by register types like \"MOV EAX, A\"\nor \"MOV AL, A\". More over, the AT&T syntax require the type information with\noperator like \"movl A, %eax\" or \"movb A, %al\".\n\nSo the generating procedure might look like a giant compared to the one\ngenerating 68k codes.\n"
  },
  {
    "path": "14/cradle.c",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n\n#include \"cradle.h\"\n\n\n#define MaxEntry 26\nconst char TAB = '\\t';\nconst char CR = '\\n';\nconst char LF = '\\r';\n\nchar tmp[MAX_BUF];  /* temporary buffer */\n\nchar Look;\nchar ST[MaxEntry];   /* symbol table */\n\n/* read new character from input stream */\nvoid GetChar()\n{\n    Look = getchar();\n}\n\n/* Report an Error */\nvoid Error(char *str)\n{\n    printf(\"\\n\");\n    printf(\"\\aError: %s.\\n\", str);\n}\n\n/* report Error and Halt */\nvoid Abort(char *str)\n{\n    Error(str);\n    exit(1);\n}\n\n/* report what was expected */\nvoid Expected(char *str)\n{\n    sprintf(tmp, \"Expected: %s\", str);\n    Abort(tmp);\n}\n\n/* report an undefined identifier */\nvoid Undefined(char name)\n{\n    sprintf(tmp, \"Undefined Identifier: %c\", name);\n    Abort(tmp);\n}\n\n/* report an duplicate identifier */\nvoid Duplicate(char name)\n{\n    sprintf(tmp, \"Duplicate Identifier: %c\", name);\n    Abort(tmp);\n}\n\n/* Get type of symbole */\nchar TypeOf(char symbol)\n{\n    return ST[symbol - 'A'];\n}\n\n/* check if a symbol is in table */\nbool InTable(char symbol)\n{\n    return ST[symbol - 'A'] != '?';\n}\n\n/* add a new symbol to table */\nvoid AddEntry(char symbol, char type)\n{\n    CheckDup(symbol);\n    ST[symbol-'A'] = type;\n}\n\n/* check an entry to make sure it's a variable */\nvoid CheckVar(char name)\n{\n    char tmp_buf[MAX_BUF];\n    if (!InTable(name)) {\n        Undefined(name);\n    }\n    if (TypeOf(name) != 'v') {\n        sprintf(tmp_buf, \"%c is not a variable\", name);\n        Abort(tmp_buf);\n    }\n}\n\n/* check for a duplicate variable name */\nvoid CheckDup(char name)\n{\n    if (InTable(name)) {\n        Duplicate(name);\n    }\n}\n\n/* turn an character into uppercase */\nchar upcase(char c)\n{\n    return (c & 0xDF);\n}\n\nbool IsAlpha(char c)\n{\n    char upper = upcase(c);\n    return (upper >= 'A') && (upper <= 'Z');\n}\n\nbool IsDigit(char c)\n{\n    return (c >= '0') && (c <= '9');\n}\n\nbool IsAlNum(char c)\n{\n    return IsAlpha(c) || IsDigit(c);\n}\n\nbool IsAddop(char c)\n{\n    return strchr(\"+-\", c) != NULL;\n}\n\nbool IsMulop(char c)\n{\n    return strchr(\"*/\", c) != NULL;\n}\n\nbool IsRelop(char c)\n{\n    return strchr(\"=#<>\", c) != NULL;\n}\n\nbool IsWhite(char c)\n{\n    return strchr(\" \\t\", c) != NULL;\n}\n\nbool IsVarType(char c)\n{\n    return strchr(\"BWLbwl\", c) != NULL;\n}\n\n/* get a variable type from the symbol table */\nchar VarType(char name)\n{\n    char type = TypeOf(name);\n    if (!IsVarType(type)) {\n        sprintf(tmp, \"Identifier %c is not a variable\", name);\n    }\n    return type;\n}\n\n/* skip over leading white space */\nvoid SkipWhite(void)\n{\n    while(IsWhite(Look)) {\n        GetChar();\n    }\n}\n\n/* skip over an End-Of-Line */\nvoid Fin(void)\n{\n    if (Look == CR) {\n        GetChar();\n        if (Look == LF) {\n            GetChar();\n        }\n    } else if (Look == LF){\n        GetChar();\n    }\n}\n\n/* match a specific input character */\nvoid Match(char c)\n{\n    if (Look == c) {\n        GetChar();\n    } else {\n        char tmp_buf[MAX_BUF];\n        sprintf(tmp_buf, \"'%c'\", c);\n        Expected(tmp_buf);\n    }\n    SkipWhite();\n}\n\n/* Get an identifier */\nchar GetName(void)\n{\n    if (! IsAlpha(Look)) {\n        Expected(\"Name\");\n    }\n    char name = upcase(Look);\n    GetChar();\n    SkipWhite();\n    return name;\n}\n\n/* Get a number */\nint GetNum(void)\n{\n    if (!IsDigit(Look)) {\n        Expected(\"Integer\");\n    }\n    int val = 0;\n    while(IsDigit(Look)) {\n        val = 10*val + Look - '0';\n        GetChar();\n    }\n    SkipWhite();\n    return val;\n}\n\n/* load a constant to the primary register */\nchar LoadNum(int val)\n{\n    char type;\n    if (abs(val) <= 127) {\n        type = 'B';\n    } else if (abs(val) <= 32767) {\n        type = 'W';\n    } else {\n        type = 'L';\n    }\n    LoadConst(val, type);\n    return type;\n}\n\n/* output a string with TAB */\nvoid Emit(char *str)\n{\n    printf(\"\\t%s\", str);\n}\n\n/* Output a string with TAB and CRLF */\nvoid EmitLn(char *str)\n{\n    Emit(str);\n    printf(\"\\n\");\n}\n\n/* Post a label to output */\nvoid PostLabel(char *label)\n{\n    printf(\"%s:\\n\", label);\n}\n\n/* Load a variable to the primary register */\nvoid LoadVar(char name, char type)\n{\n    char src[MAX_BUF];\n    src[0] = name;\n    src[1] = '\\0';\n    char *dst;\n    switch(type) {\n        case 'B':\n            dst = \"%al\";\n            break;\n        case 'W':\n            dst = \"%ax\";\n            break;\n        case 'L':\n            dst = \"%eax\";\n            break;\n        default:\n            dst = \"%eax\";\n            break;\n    }\n    Move(type, src, dst);\n}\n\nvoid Move(char size, char *src, char *dest)\n{\n    sprintf(tmp, \"MOV%c %s, %s\", size, src, dest);\n    EmitLn(tmp);\n}\n\n/* store the primary register */\nvoid StoreVar(char name, char type)\n{\n    char dest[MAX_BUF];\n    dest[0] = name;\n    dest[1] = '\\0';\n    char *src;\n    switch(type) {\n        case 'B':\n            src = \"%al\";\n            break;\n        case 'W':\n            src = \"%ax\";\n            break;\n        case 'L':\n            src = \"%eax\";\n            break;\n        default:\n            src = \"%eax\";\n            break;\n    }\n    Move(type, src, dest);\n}\n\n/* load a variable to the primary register */\nchar Load(char name)\n{\n    char type = VarType(name);\n    LoadVar(name, type);\n    return type;\n}\n\n/* Load a constant to the primary register */\nvoid LoadConst(int val, char type)\n{\n    char src[MAX_BUF];\n    sprintf(src, \"$%d\", val);\n    char *dst;\n    switch(type) {\n        case 'B':\n            dst = \"%al\";\n            break;\n        case 'W':\n            dst = \"%ax\";\n            break;\n        case 'L':\n            dst = \"%eax\";\n            break;\n        default:\n            dst = \"%eax\";\n            break;\n    }\n    Move(type, src, dst);\n}\n\n\n/* store a variable from the primary register */\nvoid Store(char name, char src_type)\n{\n    char dst_type = VarType(name);\n    Convert(src_type, dst_type, 'a');\n    StoreVar(name, dst_type);\n}\n\n/* convert a data item from one type to another */\nvoid Convert(char src, char dst, char reg)\n{\n    /* this function only works when storing a variable and\n     * (B,W) -> (W,L)\n     * and the action are the same: zero extend %eax */\n    char tmp_buf[MAX_BUF];\n    if (src != dst) {\n        switch(src) {\n            case 'B':\n                sprintf(tmp_buf, \"movzx %%%cl, %%e%cx\", reg, reg);\n                EmitLn(tmp_buf);\n                break;\n            case 'W':\n                sprintf(tmp_buf, \"movzx %%%cx, %%e%cx\", reg, reg);\n                EmitLn(tmp_buf);\n                break;\n            default:\n                break;\n        }\n    }\n}\n\n/* promote the size of a register value */\nchar Promote(char src_type, char dst_type, char reg)\n{\n    char type = src_type;\n    if (src_type != dst_type) {\n        if ((src_type == 'B') || ((src_type == 'W' && dst_type == 'L'))) {\n            Convert(src_type, dst_type, reg);\n            type = dst_type;\n        }\n    }\n    return type;\n}\n\n/* force both arguments to same type */\nchar SameType(char src_type, char dst_type)\n{\n    src_type = Promote(src_type, dst_type, 'd');\n    return Promote(dst_type, src_type, 'a');\n}\n\n/* initialize the symbol table */\nvoid InitTable(void)\n{\n    int i;\n    for (i = 0; i < MaxEntry; ++i) {\n        ST[i] = '?';\n    }\n}\n\n/* Dump the symbol table */\nvoid DumpTable()\n{\n    int i;\n    for (i = 0; i < MaxEntry; ++i) {\n        if (ST[i] != '?') {\n            printf(\"%c: %c\\n\", i+'A', ST[i]);\n        }\n    }\n}\n\n/* initialize */\nvoid Init()\n{\n    GetChar();\n    SkipWhite();\n    InitTable();\n}\n\nvoid Clear()\n{\n    EmitLn(\"xor %eax, %eax\");\n}\n\n/* Push Primary onto stack */\nvoid Push(char type)\n{\n    switch(type) {\n        case 'B':\n        case 'W':\n            EmitLn(\"pushw %ax\");\n            break;\n        case 'L':\n            EmitLn(\"pushl %eax\");\n            break;\n        default:\n            break;\n    }\n}\n\nvoid Pop(char type)\n{\n    switch(type) {\n        case 'B':\n        case 'W':\n            EmitLn(\"popw %dx\");\n            break;\n        case 'L':\n            EmitLn(\"popl %edx\");\n            break;\n        default:\n            break;\n    }\n}\n\n/* Add Top of Stack to primary */\nchar PopAdd(char src_type, char dst_type)\n{\n    Pop(src_type);\n    dst_type = SameType(src_type, dst_type);\n    GenAdd(dst_type);\n    return dst_type;\n\n    EmitLn(\"addl (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* Subtract Primary from Top of Stack */\nchar PopSub(char src_type, char dst_type)\n{\n    Pop(src_type);\n    dst_type = SameType(src_type, dst_type);\n    GenSub(dst_type);\n    return dst_type;\n\n    EmitLn(\"subl (%esp), %eax\");\n    EmitLn(\"neg %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n/* add top of stack to primary */\nvoid GenAdd(char type)\n{\n    switch(type) {\n        case 'B':\n            EmitLn(\"addb %dl, %al\");\n            break;\n        case 'W':\n            EmitLn(\"addw %dx, %ax\");\n            break;\n        case 'L':\n            EmitLn(\"addl %edx, %eax\");\n            break;\n        default:\n            EmitLn(\"addl %edx, %eax\");\n            break;\n    }\n}\n\n/* subtract primary from top of stack to */\nvoid GenSub(char type)\n{\n    switch(type) {\n        case 'B':\n            EmitLn(\"subb %dl, %al\");\n            EmitLn(\"neg %al\");\n            break;\n        case 'W':\n            EmitLn(\"subw %dx, %ax\");\n            EmitLn(\"neg %ax\");\n            break;\n        case 'L':\n            EmitLn(\"subl %edx, %eax\");\n            EmitLn(\"neg %eax\");\n            break;\n        default:\n            EmitLn(\"subl %edx, %eax\");\n            EmitLn(\"neg %eax\");\n            break;\n    }\n}\n\n/* multiply top of stack by primary (Word) */\nvoid GenMul()\n{\n    EmitLn(\"imulw %dx, %ax\");\n}\n\n/* multiply top of stack by primary (Long) */\nvoid GenLongMul()\n{\n    EmitLn(\"imull %edx, %eax\");\n}\n\nvoid GenDiv()\n{\n    EmitLn(\"Dividision not implemented yet!\");\n}\n\nvoid GenLongDiv()\n{\n    EmitLn(\"Dividision not implemented yet!\");\n}\n\n/* multiply top of stack by primary */\nchar PopMul(char src_type, char dst_type)\n{\n    Pop(src_type);\n    char type = SameType(src_type, dst_type);\n    Convert(type, 'W', 'd');\n    Convert(type, 'W', 'a');\n    if (type == 'L') {\n        GenLongMul(type);\n    } else {\n        GenMul();\n    }\n\n    if (type == 'B') {\n        type = 'W';\n    } else {\n        type = 'L';\n    }\n    return type;\n}\n\n/* divide top of stack by primary */\nchar PopDiv(char src_type, char dst_type)\n{\n    char type;\n    Pop(src_type);\n    Convert(src_type, 'L', 'd');\n    if (src_type == 'L' || dst_type == 'L') {\n        Convert(dst_type, 'L', 'a');\n        GenLongDiv();\n        type = 'L';\n    } else {\n        Convert(dst_type, 'w', 'a');\n        GenDiv();\n        type = src_type;\n    }\n    return type;\n}\n"
  },
  {
    "path": "14/cradle.h",
    "content": "#ifndef _CRADLE_H\n#define _CRADLE_H\n\n#include <stdbool.h>\n\n#define MAX_BUF 100\nextern const char TAB;\nextern const char CR;\nextern const char LF;\n\nextern char Look;   /* lookahead character */\nextern char ST[];   /* symbol table */\n\n/* read new character from input stream */\nvoid GetChar();\n\n/* Report an Error */\nvoid Error(char *str);\n\n/* report Error and Halt */\nvoid Abort(char *str);\n\n/* report what was expected */\nvoid Expected(char *str);\n\n/* report an undefined identifier */\nvoid Undefined(char name);\n\n/* report an duplicate identifier */\nvoid Duplicate(char name);\n\n/* Get type of symbole */\nchar TypeOf(char symbol);\n\n/* check if a symbol is in table */\nbool InTable(char symbol);\n\n/* Dump the symbol table */\nvoid DumpTable();\n\n/* add a new symbol to table */\nvoid AddEntry(char symbol, char type);\n\n/* check an entry to make sure it's a variable */\nvoid CheckVar(char name);\n\n/* check for a duplicate variable name */\nvoid CheckDup(char name);\n\n\nbool IsAlpha(char c);\nbool IsDigit(char c);\nbool IsAlNum(char c);\nbool IsAddop(char c);\nbool IsMulop(char c);\nbool IsRelop(char c);\nbool IsWhite(char c);\nbool IsVarType(char c);\n\n/* skip over leading white space */\nvoid SkipWhite(void);\n/* skip over an End-Of-Line */\nvoid Fin(void);\n\n/* match a specific input character */\nvoid Match(char c);\n\n/* Get an identifier */\nchar GetName(void);\n\n/* Get a number */\nint GetNum(void);\n\n/* load a constant to the primary register */\nchar LoadNum(int val);\n\n/* output a string with TAB */\nvoid Emit(char *str);\n/* Output a string with TAB and CRLF */\nvoid EmitLn(char *str);\n\n/* Post a label to output */\nvoid PostLabel(char *label);\n\n/* Load a variable to the primary register */\nvoid LoadVar(char name, char type);\n\n/* Load a constant to the primary register */\nvoid LoadConst(int val, char type);\n\n/* store the primary register */\nvoid StoreVar(char name, char type);\n\n/* initialize the symbol table */\nvoid InitTable(void);\n\n/* initialize */\nvoid Init(void);\n\n/* get a variable type from the symbol table */\nchar VarType(char name);\n\nvoid Move(char size, char *src, char *dest);\n\n/* load a variable to the primary register */\nchar Load(char name);\n/* store a variable from the primary register */\nvoid Store(char name, char src_type);\n\n/* convert a data item from one type to another */\nvoid Convert(char src, char dst, char reg);\n\n/* promote the size of a register value */\nchar Promote(char src_type, char dst_type, char reg);\n\nvoid Clear();\nvoid Push(char type);\nchar PopAdd(char src_type, char dst_type);\nchar PopSub(char src_type, char dst_type);\nchar PopMul(char src_type, char dst_type);\nchar PopDiv(char src_type, char dst_type);\nvoid GenAdd(char type);\nvoid GenSub(char type);\nvoid GenMul();\nvoid GenLongDiv();\nvoid GenLongMul();\n\n#endif\n"
  },
  {
    "path": "14/main.c",
    "content": "#include <stdio.h>\n#include <string.h>\n\n#include \"cradle.h\"\n\nchar Term();\nchar Expression();\nvoid Assignment();\nchar Factor();\nvoid DoBlock();\nvoid BeginBlock();\nvoid Alloc(char name, char type);\nvoid Decl(void);\nvoid TopDecls(void);\n\nvoid Header();\nvoid Prolog();\nvoid Epilog();\n\nvoid Block();\n\nchar Unop();\nchar Add(char type);\nchar Subtract(char type);\nchar Multiply(char type);\nchar Divide(char type);\n\n/* parse and tranlate an expression\n * vestigial version */\nchar Expression()\n{\n    char type;\n    if (IsAddop(Look)) {\n        type = Unop();\n    } else {\n        type = Term();\n    }\n\n    while(IsAddop(Look)) {\n        Push(type);\n        switch (Look) {\n            case '+':\n                type = Add(type);\n                break;\n            case '-':\n                type = Subtract(type);\n                break;\n            default:\n                break;\n        }\n    }\n    return type;\n}\n\nchar Term()\n{\n    char type = Factor();\n    while(IsMulop(Look)) {\n        Push(type);\n        switch (Look) {\n            case '*':\n                type = Multiply(type);\n                break;\n            case '/':\n                type = Divide(type);\n                break;\n            default:\n                break;\n        }\n    }\n    return type;\n}\n\n/* parse and translate a Factor */\nchar Factor()\n{\n    char type;\n    if (Look == '(') {\n        Match('(');\n        type = Expression();\n        Match(')');\n    } else if (IsAlpha(Look)) {\n        type = Load(GetName());\n    } else {\n        type = LoadNum(GetNum());\n    }\n    return type;\n}\n\n/* process a term with leading unary operator */\nchar Unop()\n{\n    Clear();\n    return 'W';\n}\n\nchar Add(char type)\n{\n    Match('+');\n    return PopAdd(type, Term());\n}\n\nchar Subtract(char type)\n{\n    Match('-');\n    return PopSub(type, Term());\n}\n\nchar Multiply(char type)\n{\n    Match('*');\n    return PopMul(type, Factor());\n}\n\nchar Divide(char type)\n{\n    Match('/');\n    return PopDiv(type, Factor());\n}\n\n\n/* parse and tranlate an assignment statement */\nvoid Assignment()\n{\n    char name = GetName();\n    Match('=');\n    Store(name, Expression());\n}\n\nvoid Block()\n{\n    while(Look != '.') {\n        Assignment();\n        Fin();\n    }\n}\n\n/* parse and translate a block of statement */\nvoid DoBlock()\n{\n    while(strchr(\"e\", Look) == NULL) {\n        Assignment();\n        Fin();\n    }\n}\n\n/* parse and translate a Begin-Block */\nvoid BeginBlock()\n{\n    Match('b');\n    Fin();\n    DoBlock();\n    Match('e');\n    Fin();\n}\n\n\n/* Generate code for allocation of a variable */\nvoid AllocVar(char name, char type)\n{\n    char *p = \"\";\n    switch(type) {\n        case 'B':\n            p = \"byte\";\n            break;\n        case 'W':\n            p = \"word\";\n            break;\n        case 'L':\n            p = \"long\";\n            break;\n        default:\n            break;\n    }\n    printf(\"%c:\\t.%s 0\\n\", name, p);\n}\n\n/* allocate storage for a variable */\nvoid Alloc(char name, char type)\n{\n    AddEntry(name, type);\n    AllocVar(name, type);\n}\n\n/* parse and translate a data declaration */\nvoid Decl(void)\n{\n    char type = GetName();\n    Alloc(GetName(), type);\n}\n\n/* parse and translate global declarations */\nvoid TopDecls(void)\n{\n    printf(\".section .data\\n\");\n    char tmp_buf[MAX_BUF];\n    while(Look != 'B') {\n        switch(Look) {\n            case 'b':\n            case 'w':\n            case 'l':\n                Decl();\n                break;\n            default:\n                sprintf(tmp_buf, \"Unrecognized keyword %c\", Look);\n                Abort(tmp_buf);\n                break;\n        }\n        Fin();\n    }\n}\n\nvoid Header()\n{\n    printf(\".global _start\\n\");\n}\n\nvoid Prolog()\n{\n    EmitLn(\".section .text\");\n    EmitLn(\"_start:\");\n}\n\nvoid Epilog()\n{\n    EmitLn(\"movl %eax, %ebx\");\n    EmitLn(\"movl $1, %eax\");\n    EmitLn(\"int $0x80\");\n}\n\nint main(int argc, char *argv[])\n{\n    Init();\n    TopDecls();\n    Match('B');\n    Fin();\n    Block();\n    DumpTable();\n    return 0;\n}\n"
  },
  {
    "path": "14/prog.txt",
    "content": "ba\nwb\nlc\nB \na=10\nb=70000\nc=a+b\na=c\nb=c\n.\n\n"
  },
  {
    "path": "14/tutor14.txt",
    "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                     LET'S BUILD A COMPILER!\n\n                                By\n\n                     Jack W. Crenshaw, Ph.D.\n\n                           26 May 1990\n\n\n                         Part XIV: TYPES\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\nINTRODUCTION\n\nIn the  last installment (Part XIII: PROCEDURES) I mentioned that\nin that part and this one,  we  would cover the two features that\ntend  to  separate  the toy language from a real, usable one.  We\ncovered  procedure  calls  in that installment.  Many of you have\nbeen  waiting patiently, since August '89, for  me  to  drop  the\nother shoe.  Well, here it is.\n\nIn this installment, we'll talk  about how to deal with different\ndata types.  As I did in the last segment, I will NOT incorporate\nthese  features directly into the TINY  compiler  at  this  time.\nInstead, I'll be using the same approach that has worked  so well\nfor  us  in the past: using only  fragments  of  the  parser  and\nsingle-character  tokens.    As  usual,  this  allows  us to  get\ndirectly to the  heart  of  the  matter  without  having  to wade\nthrough a lot of  unnecessary  code.  Since the major problems in\ndealing with multiple types occur in  the  arithmetic operations,\nthat's where we'll concentrate our focus.\n\nA  few words of warning:  First, there are some types that I will\nNOT  be  covering in this installment.   Here  we  will  ONLY  be\ntalking about the simple, predefined types.  We  won't  even deal\nwith arrays, pointers or strings  in  this  installment;  I'll be\ncovering them in the next few.\n\nSecond, we also will not discuss user-defined types.    That will\nnot come until  much  later,  for  the simple reason that I still\nhaven't convinced myself  that  user-defined  types  belong  in a\nlanguage named KISS.  In later installments, I do intend to cover\nat least the general  concepts  of  user-defined  types, records,\netc., just so that the series  will  be complete.  But whether or\nnot they will be included as part of KISS is still an open issue.\nI am open to comments or suggestions on this question.\n\nFinally,  I  should  warn you: what we are about to  do  CAN  add\nconsiderable  extra  complication  to  both  the  parser  and the\ngenerated  code.    Handling  variables  of  different  types  is\nstraightforward enough.  The complexity  comes  in  when  you add\nrules about conversion between types.  In general,  you  can make\nthe  compiler  as  simple or as complex as you choose to make it,\ndepending upon the  way  you  define  the  type-conversion rules.\nEven if you decide not to allow ANY type conversions (as  in Ada,\nfor example) the problem is still there, and is  built  into  the\nmathematics.  When  you  multiply two short numbers, for example,\nyou can get a long result.\n\nI've approached this problem very  carefully,  in  an  attempt to\nKeep It Simple.  But we can't avoid the complexity entirely.   As\nhas so often has happened, we end up having to trade code quality\nagainst complexity,  and  as  usual  I  will  tend to opt for the\nsimplest approach.\n\n\nWHAT'S COMING NEXT?\n\nBefore diving into the tutorial, I think you'd like to know where\nwe are going  from  here  ...  especially since it's been so long\nsince the last installment.\n\nI have not been idle in  the  meantime.   What I've been doing is\nreorganizing  the  compiler  itself into Turbo Units.  One of the\nproblems I've encountered is that  as we've covered new areas and\nthereby added features to  the  TINY  compiler, it's been getting\nlonger and longer.  I realized a couple of installments back that\nthis was causing trouble, and that's why I've gone back  to using\nonly compiler fragments for  the  last  installment and this one.\nThe problem is that it just  seems  dumb to have to reproduce the\ncode  for,  say,  processing  boolean  exclusive  OR's,  when the\nsubject of the discussion is parameter passing.\n\nThe obvious way  to have our cake and eat it, too, is to break up\nthe compiler into separately compilable  modules,  and  of course\nthe Turbo Unit is an ideal  vehicle  for doing this.  This allows\nus to hide some fairly complex code (such as the  full arithmetic\nand boolean expression parsing) into a single unit, and just pull\nit in whenever it's needed.  In that way, the only code I'll have\nto reproduce in these installments will be the code that actually\nrelates to the issue under discussion.\n\nI've  also  been  toying with Turbo 5.5, which of course includes\nthe Borland object-oriented  extensions  to  Pascal.    I haven't\ndecided whether to make use of these features,  for  two reasons.\nFirst of all, many of you who have been following this series may\nstill not have 5.5, and I certainly don't want to force anyone to\nhave to go out and  buy  a  new  compiler  just  to  complete the\nseries.  Secondly, I'm not convinced that the O-O extensions have\nall that much value for this application.  We've been having some\ndiscussions  about that in CompuServe's CLM  forum,  and  so  far\nwe've  not found any compelling reason  to  use  O-O  constructs.\nThis is another of those areas where I could  use  some  feedback\nfrom you readers.  Anyone want to vote for Turbo 5.5 and O-O?\n\nIn any case, after  the  next few installments in the series, the\nplan  is  to  upload to you a complete set of Units, and complete\nfunctioning compilers as  well.    The  plan, in fact, is to have\nTHREE compilers:  One for  a single-character version of TINY (to\nuse  for  our  experiments), one for TINY and one for KISS.  I've\npretty much isolated the differences between TINY and KISS, which\nare these:\n\n   o TINY will support only two data types: The character and the\n     16-bit  integer.    I may also  try  to  do  something  with\n     strings, since  without  them  a  compiler  would  be pretty\n     useless.   KISS will support all  the  usual  simple  types,\n     including arrays and even floating point.\n\n   o TINY will only have two control constructs, the  IF  and the\n     WHILE.  KISS will  support  a  very  rich set of constructs,\n     including one we haven't discussed here before ... the CASE.\n\n   o KISS will support separately compilable modules.\n\nOne caveat: Since I still don't know much  about  80x86 assembler\nlanguage, all these compiler modules  will  still  be  written to\nsupport 68000 code.  However, for the programs I plan  to upload,\nall the code generation  has  been  carefully encapsulated into a\nsingle unit, so that any enterprising student should  be  able to\neasily retarget to any other processor.  This task is \"left as an\nexercise for the  student.\"    I'll  make an offer right here and\nnow:  For the person who provides us the first robust retarget to\n80x86, I will be happy to discuss shared copyrights and royalties\nfrom the book that's upcoming.\n\nBut enough talk.  Let's get on with  the  study  of  types.  As I\nsaid  earlier,  we'll  do  this  one  as  we  did  in   the  last\ninstallment:  by  performing experiments  using  single-character\ntokens.\n\n\nTHE SYMBOL TABLE\n\nIt should be apparent that, if we're going to deal with variables\nof different types, we're going  to need someplace to record what\nthose  types are.  The obvious vehicle for  that  is  the  symbol\ntable, and we've already  used  it  that  way to distinguish, for\nexample,   between  local  and  global  variables,  and   between\nvariables and procedures.\n\nThe  symbol  table   structure  for  single-character  tokens  is\nparticularly simple, and we've used  it several times before.  To\ndeal with it, we'll steal some procedures that we've used before.\n\nFirst, we need to declare the symbol table itself:\n\n\n{--------------------------------------------------------------}\n{ Variable Declarations }\n\nvar Look: char;              { Lookahead Character }\n\n    ST: Array['A'..'Z'] of char;   {  *** ADD THIS LINE ***}\n{--------------------------------------------------------------}\n\n\nNext, we need to make sure it's initialized as part  of procedure\nInit:\n\n\n{--------------------------------------------------------------}\n{ Initialize }\n\nprocedure Init;\nvar i: char;\nbegin\n   for i := 'A' to 'Z' do\n      ST[i] := '?';\n   GetChar;\nend;\n{--------------------------------------------------------------}\n\n\nWe don't really need  the  next procedure, but it will be helpful\nfor debugging.  All it does is to dump the contents of the symbol\ntable:\n\n\n{--------------------------------------------------------------}\n{ Dump the Symbol Table }\n\nprocedure DumpTable;\nvar i: char;\nbegin\n   for i := 'A' to 'Z' do\n      WriteLn(i, ' ', ST[i]);\nend;\n{--------------------------------------------------------------}\n\n\nIt really doesn't matter much where you put this procedure  ... I\nplan to cluster all the symbol table routines together, so  I put\nmine just after the error reporting procedures.\n\nIf  you're  the  cautious type (as I am), you might want to begin\nwith a test program that does nothing but initializes, then dumps\nthe table.  Just to be sure that we're all on the same wavelength\nhere, I'm reproducing the entire program below, complete with the\nnew  procedures.  Note that this  version  includes  support  for\nwhite space:\n\n\n{--------------------------------------------------------------}\nprogram Types;\n\n{--------------------------------------------------------------}\n{ Constant Declarations }\n\nconst TAB = ^I;\n      CR  = ^M;\n      LF  = ^J;\n\n{--------------------------------------------------------------}\n{ Variable Declarations }\n\nvar Look: char;              { Lookahead Character }\n\n    ST: Array['A'..'Z'] of char;\n\n\n{--------------------------------------------------------------}\n{ Read New Character From Input Stream }\n\nprocedure GetChar;\nbegin\n   Read(Look);\nend;\n\n\n{--------------------------------------------------------------}\n{ Report an Error }\n\nprocedure Error(s: string);\nbegin\n   WriteLn;\n   WriteLn(^G, 'Error: ', s, '.');\nend;\n\n\n{--------------------------------------------------------------}\n{ Report Error and Halt }\n\nprocedure Abort(s: string);\nbegin\n   Error(s);\n   Halt;\nend;\n\n\n{--------------------------------------------------------------}\n{ Report What Was Expected }\n\nprocedure Expected(s: string);\nbegin\n   Abort(s + ' Expected');\nend;\n\n\n{--------------------------------------------------------------}\n{ Dump the Symbol Table }\n\nprocedure DumpTable;\nvar i: char;\nbegin\n   for i := 'A' to 'Z' do\n        WriteLn(i, ' ', ST[i]);\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an Alpha Character }\n\nfunction IsAlpha(c: char): boolean;\nbegin\n   IsAlpha := UpCase(c) in ['A'..'Z'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Decimal Digit }\n\nfunction IsDigit(c: char): boolean;\nbegin\n   IsDigit := c in ['0'..'9'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an AlphaNumeric Character }\n\nfunction IsAlNum(c: char): boolean;\nbegin\n   IsAlNum := IsAlpha(c) or IsDigit(c);\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an Addop }\n\nfunction IsAddop(c: char): boolean;\nbegin\n   IsAddop := c in ['+', '-'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Mulop }\n\nfunction IsMulop(c: char): boolean;\nbegin\n   IsMulop := c in ['*', '/'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Boolean Orop }\n\nfunction IsOrop(c: char): boolean;\nbegin\n   IsOrop := c in ['|', '~'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Relop }\n\nfunction IsRelop(c: char): boolean;\nbegin\n   IsRelop := c in ['=', '#', '<', '>'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize White Space }\n\nfunction IsWhite(c: char): boolean;\nbegin\n   IsWhite := c in [' ', TAB];\nend;\n\n\n{--------------------------------------------------------------}\n{ Skip Over Leading White Space }\n\nprocedure SkipWhite;\nbegin\n   while IsWhite(Look) do\n      GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Skip Over an End-of-Line }\n\nprocedure Fin;\nbegin\n   if Look = CR then begin\n      GetChar;\n      if Look = LF then\n         GetChar;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Match a Specific Input Character }\n\nprocedure Match(x: char);\nbegin\n   if Look = x then GetChar\n   else Expected('''' + x + '''');\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nfunction GetName: char;\nbegin\n   if not IsAlpha(Look) then Expected('Name');\n   GetName := UpCase(Look);\n   GetChar;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nfunction GetNum: char;\nbegin\n   if not IsDigit(Look) then Expected('Integer');\n   GetNum := Look;\n   GetChar;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Output a String with Tab }\n\nprocedure Emit(s: string);\nbegin\n   Write(TAB, s);\nend;\n\n\n{--------------------------------------------------------------}\n{ Output a String with Tab and CRLF }\n\nprocedure EmitLn(s: string);\nbegin\n   Emit(s);\n   WriteLn;\nend;\n\n\n{--------------------------------------------------------------}\n{ Initialize }\n\nprocedure Init;\nvar i: char;\nbegin\n   for i := 'A' to 'Z' do\n      ST[i] := '?';\n   GetChar;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\n   DumpTable;\nend.\n{--------------------------------------------------------------}\n\n\nOK, run this program.  You  should  get a (very fast) printout of\nall the letters of  the  alphabet  (potential  identifiers), each\nfollowed by  a  question  mark.    Not  very exciting, but it's a\nstart.\n\nOf course, in general we  only  want  to  see  the  types  of the\nvariables that have been defined.  We can eliminate the others by\nmodifying DumpTable with an IF test.  Change the loop to read:\n\n\n  for i := 'A' to 'Z' do\n     if ST[i] <> '?' then\n         WriteLn(i, ' ', ST[i]);\n\n\nNow, run the program again.  What did you get?\n\nWell, that's even more  boring  than before!  There was no output\nat all, since at this point NONE of the names have been declared.\nWe  can  spice  things up a  bit  by  inserting  some  statements\ndeclaring some entries in the main program.  Try these:\n\n\n     ST['A'] := 'a';\n     ST['P'] := 'b';\n     ST['X'] := 'c';\n\n\nThis time, when  you  run  the  program, you should get an output\nshowing that the symbol table is working right.\n\n\nADDING ENTRIES\n\nOf course, writing to the table directly is pretty poor practice,\nand not one that will  help  us  much  later.   What we need is a\nprocedure to add entries to the table.  At the same time, we know\nthat  we're going to need to test the table, to make sure that we\naren't redeclaring a variable that's already in use  (easy  to do\nwith only 26 choices!).  To handle all this, enter  the following\nnew procedures:\n\n\n{--------------------------------------------------------------}\n{ Report Type of a Variable }\n\n\nfunction TypeOf(N: char): char;\nbegin\n   TypeOf := ST[N];\nend;\n\n\n{--------------------------------------------------------------}\n{ Report if a Variable is in the Table }\n\n\nfunction InTable(N: char): boolean;\nbegin\n   InTable := TypeOf(N) <> '?';\nend;\n\n\n{--------------------------------------------------------------}\n{ Check for a Duplicate Variable Name }\n\nprocedure CheckDup(N: char);\nbegin\n   if InTable(N) then Abort('Duplicate Name ' + N);\nend;\n\n\n{--------------------------------------------------------------}\n{ Add Entry to Table }\n\nprocedure AddEntry(N, T: char);\nbegin\n   CheckDup(N);\n   ST[N] := T;\nend;\n{--------------------------------------------------------------}\n\n\nNow change the three lines in the main program to read:\n\n\n     AddEntry('A', 'a');\n     AddEntry('P', 'b');\n     AddEntry('X', 'c');\n                             \n\nand run the program again.  Did it work?  Then we have the symbol\ntable routines needed to support our work on types.  In  the next\nsection, we'll actually begin to use them.\n\n\nALLOCATING STORAGE\n\nIn  other programs like this one,  including  the  TINY  compiler\nitself, we have  already  addressed the issue of declaring global\nvariables, and the  code  generated  for  them.    Let's  build a\nvestigial version of a \"compiler\" here, whose only function is to\nallow  us   declare  variables.    Remember,  the  syntax  for  a\ndeclaration is:\n\n\n     <data decl> ::= VAR <identifier>\n\n\nAgain, we can lift a lot of the code from previous programs.  The\nfollowing are stripped-down versions of those  procedures.   They\nare greatly simplified  since  I  have  eliminated  niceties like\nvariable lists and  initializers.   In procedure Alloc, note that\nthe  new call to AddEntry will also  take  care  of  checking for\nduplicate declarations:\n\n\n{--------------------------------------------------------------}\n{ Allocate Storage for a Variable }\n\nprocedure Alloc(N: char);\nbegin\n   AddEntry(N, 'v');\n   WriteLn(N, ':', TAB, 'DC 0');\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Data Declaration }\n\nprocedure Decl;\nvar Name: char;\nbegin\n   Match('v');\n   Alloc(GetName);\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate Global Declarations }\n\nprocedure TopDecls;\nbegin\n   while Look <> '.' do begin\n      case Look of\n        'v': Decl;\n      else Abort('Unrecognized Keyword ' + Look);\n      end;\n      Fin;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nNow, in the  main  program,  add  a  call to TopDecls and run the\nprogram.  Try allocating a  few variables, and note the resulting\ncode generated.  This is old stuff for you, so the results should\nlook familiar.  Note from the code for TopDecls that  the program\nis ended by a terminating period.\n\nWhile you're at it,  try  declaring  two  variables with the same\nname, and verify that the parser catches the error.\n\n\nDECLARING TYPES\n\n\nAllocating storage of different sizes  is  as  easy  as modifying\nprocedure TopDecls to recognize more than one keyword.  There are\na  number  of  decisions to be made here, in terms  of  what  the\nsyntax should be, etc., but for now I'm  going  to  duck  all the\nissues and simply declare by  executive fiat that our syntax will\nbe:\n\n\n     <data decl> ::= <typename>  <identifier>\n\nwhere:\n\n\n     <typename> ::= BYTE | WORD | LONG\n\n\n(By  an amazing coincidence, the first  letters  of  these  names\nhappen  to  be  the  same  as  the  68000  assembly  code  length\nspecifications, so this choice saves us a little work.)\n\nWe can create the code to take care of  these  declarations  with\nonly slight modifications.  In the routines below, note that I've\nseparated  the  code  generation parts of Alloc  from  the  logic\nparts.  This  is  in  keeping  with our desire to encapsulate the\nmachine-dependent part of the compiler.\n\n\n{--------------------------------------------------------------}\n{ Generate Code for Allocation of a Variable }\n\nprocedure AllocVar(N, T: char);\nbegin\n   WriteLn(N, ':', TAB, 'DC.', T, ' 0');\nend;\n\n\n{--------------------------------------------------------------}\n{ Allocate Storage for a Variable }\n\nprocedure Alloc(N, T: char);\nbegin\n   AddEntry(N, T);\n   AllocVar(N, T);\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Data Declaration }\n\nprocedure Decl;\nvar Typ: char;\nbegin\n   Typ := GetName;\n   Alloc(GetName, Typ);\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate Global Declarations }\n\nprocedure TopDecls;\nbegin\n   while Look <> '.' do begin\n      case Look of\n        'b', 'w', 'l': Decl;\n      else Abort('Unrecognized Keyword ' + Look);\n      end;\n      Fin;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nMake the changes shown to these procedures, and give the  thing a\ntry.    Use  the  single  characters  'b',  'w',  and 'l' for the\nkeywords (they must be lower case,  for  now).  You will see that\nin each case, we are allocating the proper storage  size.    Note\nfrom the dumped symbol table that the sizes are also recorded for\nlater use.  What later use?  Well, that's the subject of the rest\nof this installment.\n\n\nASSIGNMENTS\n\nNow that we can declare variables of different  sizes,  it stands\nto reason that we ought to be able  to  do  something  with them.\nFor our first trick, let's just try loading them into our working\nregister, D0.  It makes sense to use the same  idea  we used for\nAlloc; that is, make a load procedure that can load more than one\nsize.    We  also  want  to continue to encapsulate the  machine-\ndependent stuff.  The load procedure looks like this:\n\n\n{---------------------------------------------------------------}\n{ Load a Variable to Primary Register }\n\nprocedure LoadVar(Name, Typ: char);\nbegin\n   Move(Typ, Name + '(PC)', 'D0');\nend;\n{---------------------------------------------------------------}\n\n\nOn  the  68000,  at least, it happens that many instructions turn\nout to be MOVE's.  It turns out to be useful to create a separate\ncode generator just for these instructions, and then  call  it as\nneeded:\n\n\n{---------------------------------------------------------------}\n{ Generate a Move Instruction }\n\nprocedure Move(Size: char; Source, Dest: String);\nbegin\n   EmitLn('MOVE.' + Size + ' ' + Source + ',' + Dest);\nend;\n{---------------------------------------------------------------}\n\n\nNote that these  two  routines are strictly code generators; they\nhave no error-checking or other  logic.  To complete the picture,\nwe need one more layer of software that provides these functions.\n\nFirst of all, we need to make sure that the  type  we are dealing\nwith is a  loadable  type.    This  sounds like a job for another\nrecognizer:\n\n\n{--------------------------------------------------------------}\n{ Recognize a Legal Variable Type }\n\nfunction IsVarType(c: char): boolean;\nbegin\n   IsVarType := c in ['B', 'W', 'L'];\nend;\n{--------------------------------------------------------------}\n\n\nNext, it would be nice to have a routine that will fetch the type\nof a variable from the symbol table, while checking  it  to  make\nsure it's valid:\n\n\n{--------------------------------------------------------------}\n{ Get a Variable Type from the Symbol Table }\n\nfunction VarType(Name: char): char;\nvar Typ: char;\nbegin\n   Typ := TypeOf(Name);\n   if not IsVarType(Typ) then Abort('Identifier ' + Name +\n                                        ' is not a variable');\n   VarType := Typ;\nend;\n{--------------------------------------------------------------}\n\n\nArmed with these  tools,  a  procedure  to cause a variable to be\nloaded becomes trivial:\n\n\n{--------------------------------------------------------------}\n{ Load a Variable to the Primary Register }\n\nprocedure Load(Name: char);\nbegin\n     LoadVar(Name, VarType(Name));\nend;\n{--------------------------------------------------------------}\n\n\n(NOTE to the  concerned:  I  know,  I  know, all this is all very\ninefficient.  In a production  program,  we  probably  would take\nsteps to avoid such deep nesting of procedure calls.  Don't worry\nabout it.  This is an EXERCISE, remember?  It's more important to\nget it  right  and  understand  it, than it is to make it get the\nwrong  answer,  quickly.   If you get your compiler completed and\nfind that you're unhappy  with  the speed, feel free to come back\nand hack the code to speed it up!)\n\nIt would be a good idea to test the program at this point.  Since\nwe don't have a  procedure  for  dealing  with assignments yet, I\njust added the lines:\n\n\n     Load('A');\n     Load('B');\n     Load('C');\n     Load('X');\n\n\nto  the main program.  Thus, after  the  declaration  section  is\ncomplete, they will be executed to generate code  for  the loads.\nYou can play around with  this, and try different combinations of\ndeclarations to see how the errors are handled.\n\nI'm sure you won't be surprised to learn  that  storing variables\nis a lot like  loading  them.  The necessary procedures are shown\nnext:\n\n\n{---------------------------------------------------------------}\n{ Store Primary to Variable }\n\nprocedure StoreVar(Name, Typ: char);\nbegin\n   EmitLn('LEA ' + Name + '(PC),A0');\n   Move(Typ, 'D0', '(A0)');\nend;\n\n\n{--------------------------------------------------------------}\n{ Store a Variable from the Primary Register }\n\nprocedure Store(Name: char);\nbegin\n   StoreVar(Name, VarType(Name));\nend;\n{--------------------------------------------------------------}\n\n\nYou can test this one the same way as the loads.\n\nNow, of course, it's a RATHER  small  step to use these to handle\nassignment  statements.  What we'll do is  to  create  a  special\nversion   of  procedure  Block  that  supports  only   assignment\nstatements, and also a  special  version  of Expression that only\nsupports single variables as legal expressions.  Here they are:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nprocedure Expression;\nvar Name: char;\nbegin\n   Load(GetName);\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Assignment Statement }\n\nprocedure Assignment;\nvar Name: char;\nbegin\n   Name := GetName;\n   Match('=');\n   Expression;\n   Store(Name);\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Block of Statements }\n\nprocedure Block;\nbegin\n   while Look <> '.' do begin\n      Assignment;\n      Fin;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\n(It's worth noting that, if  anything,  the  new  procedures that\npermit us to manipulate types  are, if anything, even simpler and\ncleaner than what we've seen before.  This is  mostly  thanks  to\nour efforts to encapsulate the code generator procedures.)\n\nThere is one small, nagging problem.  Before, we used  the Pascal\nterminating period to get us out of procedure TopDecls.   This is\nnow the wrong  character  ...  it's  used to terminate Block.  In\nprevious programs, we've used the BEGIN symbol  (abbreviated 'b')\nto get us out.  But that is now used as a type symbol.\n\nThe solution, while somewhat of a kludge, is easy enough.   We'll\nuse  an  UPPER CASE 'B' to stand for the BEGIN.   So  change  the\ncharacter in the WHILE loop within TopDecls, from '.' to 'B', and\neverything will be fine.\n\nNow, we can  complete  the  task  by changing the main program to\nread:\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\n   TopDecls;\n   Match('B');\n   Fin;\n   Block;\n   DumpTable;\nend.\n{--------------------------------------------------------------}\n\n\n(Note  that I've had to sprinkle a few calls to Fin around to get\nus out of Newline troubles.)\n\nOK, run this program.  Try the input:\n\n\n     ba        { byte a }   *** DON'T TYPE THE COMMENTS!!! ***\n     wb        { word b }\n     lc        { long c }\n     B         { begin  }\n     a=a\n     a=b\n     a=c\n     b=a\n     b=b\n     b=c\n     c=a\n     c=b\n     c=c\n     .\n\n\nFor  each  declaration,  you  should  get  code   generated  that\nallocates storage.  For each assignment, you should get code that\nloads a variable of the correct size, and stores one, also of the\ncorrect size.\n\nThere's only one small  little  problem:    The generated code is\nWRONG!\n\nLook at the code for a=c above.  The code is:\n\n\n     MOVE.L    C(PC),D0\n     LEA       A(PC),A0\n     MOVE.B    D0,(A0)\n\n\nThis code is correct.  It will cause the lower eight bits of C to\nbe stored into A, which is a reasonable behavior.  It's about all\nwe can expect to happen.\n\nBut now, look at the opposite case.  For c=a, the  code generated\nis:\n\n\n     MOVE.B A(PC),D0\n     LEA  C(PC),A0\n     MOVE.L D0,(A0)\n\n\nThis is  NOT  correct.    It will cause the byte variable A to be\nstored into the lower eight bits  of  D0.  According to the rules\nfor the 68000 processor,  the  upper 24 bits are unchanged.  This\nmeans  that when we store the entire 32  bits  into  C,  whatever\ngarbage  that  was  in those high bits will also get stored.  Not\ngood.\n\nSo what  we  have  run  into here, early on, is the issue of TYPE\nCONVERSION, or COERCION.\n\nBefore we do anything with  variables of different types, even if\nit's just to  copy  them, we have to face up to the issue.  It is\nnot the most easy part of a compiler.  Most of  the  bugs  I have\nseen in production compilers  have  had to do with errors in type\nconversion for  some obscure combination of arguments.  As usual,\nthere is a tradeoff between compiler complexity and the potential\nquality of the  generated  code,  and  as usual, we will take the\npath that keeps the  compiler  simple.  I think you'll find that,\nwith this approach, we can keep the potential complexity in check\nrather nicely.\n\n\nTHE COWARD'S WAY OUT\n\nBefore we get into the details (and potential complexity) of type\nconversion,  I'd  like  you to see that there is one super-simple\nway to solve the problem: simply promote every variable to a long\ninteger when we load it!\n\nThis takes the addition of only one line to LoadVar,  although if\nwe  are  not  going to COMPLETELY ignore efficiency, it should be\nguarded by an IF test.  Here is the modified version:\n\n\n{---------------------------------------------------------------}\n{ Load a Variable to Primary Register }\n\nprocedure LoadVar(Name, Typ: char);\nbegin\n   if Typ <> 'L' then\n      EmitLn('CLR.L D0');\n   Move(Typ, Name + '(PC)', 'D0');\nend;\n{---------------------------------------------------------------}\n\n\n(Note that StoreVar needs no similar change.)\n\nIf you run some tests with  this  new version, you will find that\neverything  works correctly now, albeit sometimes  inefficiently.\nFor example, consider the case  a=b  (for  the  same declarations\nshown above).  Now the generated code turns out to be:\n\n\n     CLR.L D0\n     MOVE.W B(PC),D0\n     LEA  A(PC),A0\n     MOVE.B D0,(A0)\n\n\nIn  this  case,  the CLR turns out not to be necessary, since the\nresult is going into a byte-sized variable.  With a little bit of\nwork, we can do better.  Still, this is not  bad,  and it typical\nof the kinds of inefficiencies  that we've seen before in simple-\nminded compilers.\n\nI should point out that, by setting the high bits to zero, we are\nin effect treating the numbers as UNSIGNED integers.  If  we want\nto treat them as signed ones instead (the more  likely  case)  we\nshould do a  sign  extension  after  the load, instead of a clear\nbefore it. Just  to  tie  this  part  of the discussion up with a\nnice, red ribbon, let's change LoadVar as shown below:\n\n\n{---------------------------------------------------------------}\n{ Load a Variable to Primary Register }\n\nprocedure LoadVar(Name, Typ: char);\nbegin\n   if Typ = 'B' then\n      EmitLn('CLR.L D0');\n   Move(Typ, Name + '(PC)', 'D0');\n   if Typ = 'W' then\n      EmitLn('EXT.L D0');\nend;\n{---------------------------------------------------------------}\n\n\nWith this version, a byte is treated as unsigned  (as  in  Pascal\nand C), while a word is treated as signed.\n\n\nA MORE REASONABLE SOLUTION\n\nAs we've seen, promoting  every  variable  to  long while it's in\nmemory solves the problem, but it can hardly be called efficient,\nand  probably wouldn't be acceptable even for  those  of  us  who\nclaim be unconcerned about  efficiency.    It  will mean that all\narithmetic operations will be done to 32-bit accuracy, which will\nDOUBLE the run time  for  most operations, and make it even worse\nfor multiplication  and division.  For those operations, we would\nneed to call subroutines to do  them,  even if the data were byte\nor  word types.  The whole thing is sort of a cop-out, too, since\nit ducks all the real issues.\n\nOK, so that solution's no good.  Is there still a relatively easy\nway to get data conversion?  Can we still Keep It Simple?\n\nYes, indeed.   All we have to do is to make the conversion at the\nother end ... that is, we convert on the way _OUT_, when the data\nis stored, rather than on the way in.\n\nBut, remember, the storage part  of the assignment is pretty much\nindependent of the data load, which is taken care of by procedure\nExpression.    In  general  the  expression  may  be  arbitrarily\ncomplex, so how can procedure Assignment know what  type  of data\nis left in register D0?\n\nAgain,  the  answer  is  simple:    We'll  just  _ASK_  procedure\nExpression!  The answer can be returned as a function value.\n\nAll of this requires several procedures to be  modified,  but the\nmods, like the method, are quite simple.  First of all,  since we\naren't requiring LoadVar to do  all the work of conversion, let's\ngo back to the simple version:\n\n\n{---------------------------------------------------------------}\n{ Load a Variable to Primary Register }\n\nprocedure LoadVar(Name, Typ: char);\nbegin\n   Move(Typ, Name + '(PC)', 'D0');\nend;\n{--------------------------------------------------------------}\n\n\nNext, let's add a  new  procedure that will convert from one type\nto another:\n\n\n{---------------------------------------------------------------}\n{ Convert a Data Item from One Type to Another }\n\n\nprocedure Convert(Source, Dest: char);\nbegin\n   if Source <> Dest then begin\n      if Source  = 'B' then\n         EmitLn('AND.W #$FF,D0');\n      if Dest = 'L' then\n         EmitLn('EXT.L D0');\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nNext, we need to do  the  logic  required  to  load  and  store a\nvariable of any type.  Here are the routines for that:\n\n\n{---------------------------------------------------------------}\n{ Load a Variable to the Primary Register }\n\nfunction Load(Name: char): char;\nvar Typ : char;\nbegin\n   Typ := VarType(Name);\n   LoadVar(Name, Typ);\n   Load := Typ;\nend;\n\n\n{--------------------------------------------------------------}\n{ Store a Variable from the Primary Register }\n\nprocedure Store(Name, T1: char);\nvar T2: char;\nbegin\n   T2 := VarType(Name);\n   Convert(T1, T2);\n   StoreVar(Name, T2);\nend;\n{--------------------------------------------------------------}\n\n\nNote that Load is a function, which not only emits the code for a\nload, but also returns the variable type.  In this way, we always\nknow what type of data we  are  dealing  with.  When we execute a\nStore,  we pass it the current type of the variable in D0.  Since\nStore also knows the  type  of  the  destination variable, it can\nconvert as necessary.\n\nArmed  with all these new routines,  the  implementation  of  our\nrudimentary   assignment   statement  is   essentially   trivial.\nProcedure Expression now becomes a  function,  which  returns its\ntype to procedure Assignment:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nfunction Expression: char;\nbegin\n   Expression := Load(GetName);\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Assignment Statement }\n\nprocedure Assignment;\nvar Name: char;\nbegin\n   Name := GetName;\n   Match('=');\n   Store(Name, Expression);\nend;\n{--------------------------------------------------------------}\n\nAgain, note how  incredibly  simple these two routines are. We've\nencapsulated  all the type logic into Load  and  Store,  and  the\ntrick of  passing  the  type  around  makes  the rest of the work\nextremely easy.    Of  course,  all  of  this is for our special,\ntrivial case of Expression.  Naturally, for the  general  case it\nwill have to get more complex.  But  you're  looking  now  at the\nFINAL version of procedure Assignment!\n\nAll this seems like a very  simple  and clean solution, and it is\nindeed.   Compile this program and run the  same  test  cases  as\nbefore.    You will see that all  types  of  data  are  converted\nproperly, and there are few if any wasted instructions.  Only the\nbyte-to-long conversion uses two instructions where one would do,\nand we could easily modify Convert to handle this case, too.\n\nAlthough we haven't considered unsigned variables in this case, I\nthink you can see  that  we could easily fix up procedure Convert\nto deal with these types as well.  This is  \"left  as an exercise\nfor the student.\"\n\n\nLITERAL ARGUMENTS\n\nSharp-eyed readers might have noticed, though, that we don't even\nhave a proper form of a simple factor yet, because we don't allow\nfor loading literal constants,  only  variables.   Let's fix that\nnow.\n\nTo begin with, we'll need a GetNum function.  We've  seen several\nversions of this, some returning  only a single character, some a\nstring, and some an integer.   The  one needed here will return a\nLongInt, so that it can handle anything we  throw  at  it.   Note\nthat no type information is returned here: GetNum doesn't concern\nitself with how the number will be used:\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nfunction GetNum: LongInt;\nvar Val: LongInt;\nbegin\n   if not IsDigit(Look) then Expected('Integer');\n   Val := 0;\n   while IsDigit(Look) do begin\n      Val := 10 * Val + Ord(Look) - Ord('0');\n      GetChar;\n   end;\n   GetNum := Val;\n   SkipWhite;\nend;\n{---------------------------------------------------------------}\n\n\nNow, when dealing with  literal  data,  we  have one little small\nproblem.   With variables, we know what  type  things  should  be\nbecause they've been declared to be  that  type.  We have no such\ntype information for  literals.   When the programmer says, \"-1,\"\ndoes that mean a byte, word, or longword  version?    We  have no\nclue.  The obvious thing to do would be to  use  the largest type\npossible, i.e. a longword.    But that's a bad idea, because when\nwe get to more complex expressions, we'll find that it will cause\nevery expression involving literals  to  be  promoted to long, as\nwell.\n\nA better approach is to select a type based upon the value of the\nliteral, as shown next:\n\n\n{--------------------------------------------------------------}\n{ Load a Constant to the Primary Register }\n\nfunction LoadNum(N: LongInt): char;\nvar Typ : char;\nbegin\n   if abs(N) <= 127 then\n      Typ := 'B'\n   else if abs(N) <= 32767 then\n      Typ := 'W'\n   else Typ := 'L';\n   LoadConst(N, Typ);\n   LoadNum := Typ;\nend;\n{---------------------------------------------------------------}\n\n\n(I know, I know, the number base isn't really symmetric.  You can\nstore -128 in a single byte,  and  -32768  in a word.  But that's\neasily fixed, and not  worth  the time or the added complexity to\nfool with it here.  It's the thought that counts.)\n\nNote  that  LoadNum  calls  a  new version of the code  generator\nroutine  LoadConst, which has an added  argument  to  define  the\ntype:\n\n\n{---------------------------------------------------------------}\n{ Load a Constant to the Primary Register }\n\nprocedure LoadConst(N: LongInt; Typ: char);\nvar temp:string;\nbegin\n   Str(N, temp);\n   Move(Typ, '#' + temp, 'D0');\nend;\n{--------------------------------------------------------------}\n\n\nNow  we can modify procedure Expression  to  accomodate  the  two\npossible kinds of factors:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nfunction Expression: char;\nbegin\n   if IsAlpha(Look) then\n      Expression := Load(GetName)\n   else\n      Expression := LoadNum(GetNum);\nend;\n{--------------------------------------------------------------}\n\n\n(Wow, that sure didn't hurt too bad!  Just a  few  extra lines do\nthe job.)\n\nOK,  compile  this code into your program  and  give  it  a  try.\nYou'll see that it now works for either variables or constants as\nvalid expressions.\n\n\nADDITIVE EXPRESSIONS\n\nIf you've been following this series from the beginning, I'm sure\nyou  know  what's coming next:  We'll  expand  the  form  for  an\nexpression   to   handle   first   additive   expressions,   then\nmultiplicative, then general expressions with parentheses.\n\nThe nice part is that we already have a pattern for  dealing with\nthese more complex expressions.  All we have  to  do  is  to make\nsure that  all the procedures called by Expression (Term, Factor,\netc.)  always  return a type identifier.   If  we  do  that,  the\nprogram structure gets changed hardly at all.\n\nThe  first  step  is  easy:  We can rename our existing  function\nExpression  to  Term,  as  we've  done so many times before,  and\ncreate the new version of Expression:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nfunction Expression: char;\nvar Typ: char;\nbegin\n   if IsAddop(Look) then\n      Typ := Unop\n   else\n      Typ := Term;\n   while IsAddop(Look) do begin\n      Push(Typ);\n      case Look of\n       '+': Typ := Add(Typ);\n       '-': Typ := Subtract(Typ);\n      end;\n   end;\n   Expression := Typ;\nend;\n{--------------------------------------------------------------}\n\n\nNote  in  this  routine how each  procedure  call  has  become  a\nfunction call, and how  the  local  variable  Typ gets updated at\neach pass.\n\nNote also the new call to a function  Unop,  which  lets  us deal\nwith a leading unary minus.  This change is not necessary  ... we\ncould  still  use  a form more like what we've done before.  I've\nchosen  to  introduce  UnOp as a separate routine because it will\nmake it easier, later, to produce somewhat better code than we've\nbeen  doing.    In other words, I'm looking ahead to optimization\nissues.\n\nFor  this  version,  though, we'll retain the same dumb old code,\nwhich makes the new routine trivial:\n\n\n{---------------------------------------------------------------}\n{ Process a Term with Leading Unary Operator }\n\nfunction Unop: char;\nbegin\n   Clear;\n   Unop := 'W';\nend;\n{---------------------------------------------------------------}\n\n\nProcedure  Push  is  a code-generator routine, and now has a type\nargument:\n\n\n{---------------------------------------------------------------}\n{ Push Primary onto Stack }\n\nprocedure Push(Size: char);\nbegin\n   Move(Size, 'D0', '-(SP)');\nend;\n{---------------------------------------------------------------}\n\n\nNow, let's take a look at functions Add  and  Subtract.    In the\nolder versions of these routines, we let them call code generator\nroutines PopAdd and PopSub.    We'll  continue  to do that, which\nmakes the functions themselves extremely simple:\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate an Add }\n\nfunction Add(T1: char): char;\nbegin\n   Match('+');\n   Add := PopAdd(T1, Term);\nend;\n\n\n{-------------------------------------------------------------}\n{ Recognize and Translate a Subtract }\n\nfunction Subtract(T1: char): char;\nbegin\n   Match('-');\n   Subtract := PopSub(T1, Term);\nend;\n{---------------------------------------------------------------}\n\n\nThe simplicity is  deceptive,  though, because what we've done is\nto defer all the logic to PopAdd and PopSub, which are  no longer\njust code generation routines.    They must also now take care of\nthe type conversions required.\n\nAnd just what conversion is that?  Simple: Both arguments must be\nof the same size, and the result  is  also  of  that  size.   The\nsmaller of the two arguments must be \"promoted\" to  the  size  of\nthe larger one.\n\nBut  this  presents a bit of a problem.  If the  argument  to  be\npromoted is the second argument  (i.e.  in  the  primary register\nD0), we  are  in  great  shape.  If it's not, however, we're in a\nfix: we can't change the size of the  information  that's already\nbeen pushed onto the stack.\n\nThe solution is simple but a little painful: We must abandon that\nlovely  \"pop  the  data and do something  with  it\"  instructions\nthoughtfully provided by Motorola.\n\nThe alternative is to assign  a  secondary  register,  which I've\nchosen to be R7.  (Why not R1?  Because I  have  later  plans for\nthe other registers.)\n\nThe  first  step in this new structure  is  to  introduce  a  Pop\nprocedure analogous to the Push.   This procedure will always Pop\nthe top element of the stack into D7:\n\n\n{---------------------------------------------------------------}\n{ Pop Stack into Secondary Register }\n\nprocedure Pop(Size: char);\nbegin\n   Move(Size, '(SP)+', 'D7');\nend;\n{---------------------------------------------------------------}\n\n\nThe general idea is that all the \"Pop-Op\" routines can  call this\none.    When  this is done, we will then have  both  operands  in\nregisters, so we can promote whichever  one  we need to.  To deal\nwith this, procedure Convert needs another argument, the register\nname:\n\n\n{---------------------------------------------------------------}\n{ Convert a Data Item from One Type to Another }\n\nprocedure Convert(Source, Dest: char; Reg: String);\nbegin\n   if Source <> Dest then begin\n      if Source  = 'B' then\n         EmitLn('AND.W #$FF,' + Reg);\n      if Dest = 'L' then\n         EmitLn('EXT.L ' + Reg);\n   end;\nend;\n{---------------------------------------------------------------}\n\n\nThe next function does a conversion, but only if the current type\nT1  is  smaller  in size than the desired  type  T2.    It  is  a\nfunction, returning the final type to let us know what it decided\nto do:\n\n\n{---------------------------------------------------------------}\n{ Promote the Size of a Register Value }\n\nfunction Promote(T1, T2: char; Reg: string): char;\nvar Typ: char;\nbegin\n   Typ := T1;\n   if T1 <> T2 then\n      if (T1 = 'B') or ((T1 = 'W') and (T2 = 'L')) then begin\n         Convert(T1, T2, Reg);\n         Typ := T2;\n      end;\n   Promote := Typ;\nend;\n{---------------------------------------------------------------}\n\n\nFinally, the following function forces the two registers to be of\nthe same type:\n\n\n{---------------------------------------------------------------}\n{ Force both Arguments to Same Type }\n\nfunction SameType(T1, T2: char): char;\nbegin\n   T1 := Promote(T1, T2, 'D7');\n   SameType := Promote(T2, T1, 'D0');\nend;\n{---------------------------------------------------------------}\n\n\nThese new routines give us the ammunition we need  to  flesh  out\nPopAdd and PopSub:\n\n\n{---------------------------------------------------------------}\n{ Generate Code to Add Primary to the Stack }\n\nfunction PopAdd(T1, T2: char): char;\nbegin\n   Pop(T1);\n   T2 := SameType(T1, T2);\n   GenAdd(T2);\n   PopAdd := T2;\nend;\n\n\n{---------------------------------------------------------------}\n{ Generate Code to Subtract Primary from the Stack }\n\nfunction PopSub(T1, T2: char): char;\nbegin\n   Pop(T1);\n   T2 := SameType(T1, T2);\n   GenSub(T2);\n   PopSub := T2;\nend;\n{---------------------------------------------------------------}\n\n\nAfter  all   the   buildup,   the   final   results   are  almost\nanticlimactic.  Once  again,  you can see that the logic is quite\nsimple.  All the two routines do is to pop the  top-of-stack into\nD7, force the two operands to be the same size, and then generate\nthe code.\n\nNote  the  new  code generator routines GenAdd and GenSub.  These\nare vestigial forms of the ORIGINAL PopAdd and PopSub.   That is,\nthey  are pure code generators, producing a  register-to-register\nadd or subtract:\n\n\n{---------------------------------------------------------------}\n{ Add Top of Stack to Primary }\n\nprocedure GenAdd(Size: char);\nbegin\n   EmitLn('ADD.' + Size + ' D7,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Subtract Primary from Top of Stack }\n\nprocedure GenSub(Size: char);\nbegin\n   EmitLn('SUB.' + Size + ' D7,D0');\n   EmitLn('NEG.' + Size + ' D0');\nend;\n{---------------------------------------------------------------}\n\n\nOK,  I grant you:  I've thrown a lot of routines at you since  we\nlast tested the code.   But  you  have  to  admit  that  each new\nroutine is pretty simple and transparent.  If you (like me) don't\nlike to test so many new  routines  at  once, that's OK.  You can\nstub out routines like Convert, Promote, and SameType, since they\ndon't  read  any inputs.  You won't  get  the  correct  code,  of\ncourse, but things should work.  Then flesh  them  out  one  at a\ntime.\n\nWhen testing the program,  don't  forget  that  you first have to\ndeclare some variables, and then  start the \"body\" of the program\nwith an upper-case  'B'  (for  BEGIN).   You should find that the\nparser  will  handle  any  additive  expressions.  Once  all  the\nconversion routines are in, you should see that the  correct code\nis  generated,  with  type  conversions inserted where necessary.\nTry mixing up variables  of  different  sizes, and also literals.\nMake sure that everything's working properly.  As  usual,  it's a\ngood  idea  to  try  some  erroneous expressions and see how  the\ncompiler handles them.\n\n\nWHY SO MANY PROCEDURES?\n\nAt this point, you may think  I've  pretty much gone off the deep\nend in terms of deeply nested procedures.  There is  admittedly a\nlot of overhead here.  But there's a method in my madness.  As in\nthe case of UnOp, I'm looking ahead to the time when  we're going\nto want better code  generation.   The way the code is organized,\nwe can achieve  this  without major modifications to the program.\nFor example, in cases where the value pushed onto the  stack does\n_NOT_ have to be converted, it's still better to use the \"pop and\nadd\"  instruction.    If we choose to test for such cases, we can\nembed the extra tests into  PopAdd  and  PopSub  without changing\nanything else much.\n\n\nMULTIPLICATIVE EXPRESSIONS\n\nThe procedure for dealing with multiplicative  operators  is much\nthe  same.    In  fact,  at  the  first  level,  they are  almost\nidentical, so I'll just show them here without much fanfare.  The\nfirst  one  is  our  general  form  for  Factor,  which  includes\nparenthetical subexpressions:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Factor }\n\nfunction Expression: char; Forward;\n\nfunction Factor: char;\nbegin\n   if Look = '(' then begin\n      Match('(');\n      Factor := Expression;\n      Match(')');\n      end\n   else if IsAlpha(Look) then\n      Factor := Load(GetName)\n   else\n      Factor := LoadNum(GetNum);\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Multiply }\n\nFunction Multiply(T1: char): char;\nbegin\n   Match('*');\n   Multiply := PopMul(T1, Factor);\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Divide }\n\nfunction Divide(T1: char): char;\nbegin\n   Match('/');\n   DIvide := PopDiv(T1, Factor);\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Term }\n\nfunction Term: char;\nvar Typ: char;\nbegin\n   Typ := Factor;\n   while IsMulop(Look) do begin\n      Push(Typ);\n      case Look of\n       '*': Typ := Multiply(Typ);\n       '/': Typ := Divide(Typ);\n      end;\n   end;\n   Term := Typ;\nend;\n{---------------------------------------------------------------}\n\n\nThese routines parallel the additive  ones  almost  exactly.   As\nbefore, the complexity is encapsulated within PopMul  and PopDiv.\nIf  you'd  like  to test the program before we get into that, you\ncan build dummy versions of them, similar to  PopAdd  and PopSub.\nAgain, the code won't be correct at this point,  but  the  parser\nshould handle expressions of arbitrary complexity.\n\n\nMULTIPLICATION\n\nOnce you've  convinced yourself that the parser itself is working\nproperly, we need to figure out what it will take to generate the\nright code.  This is where  things  begin to get a little sticky,\nbecause the rules are more complex.\n\nLet's take the case of multiplication first.   This  operation is\nsimilar to the \"addops\" in that both operands should  be  of  the\nsame size.  It differs in two important respects:\n\n\n  o  The type of the product is typically not the same as that of\n     the  two  operands.   For the product of two words, we get a\n     longword result.\n\n  o  The 68000 does  not support a 32 x 32 multiply, so a call to\n     a software routine is needed.  This routine will become part\n     of the run-time library.\n\n  o  It also does  not  support  an  8  x 8 multiply, so all byte\n     operands must be promoted to words.\n\n\nThe actions that we have to take are best shown in  the following\ntable:\n\n  T1 -->  |                 |                 |                 |\n          |                 |                 |                 |\n      |   |        B        |        W        |       L         |\n  T2  V   |                 |                 |                 |\n-----------------------------------------------------------------\n          |                 |                 |                 |\n                             \n\n\n\n\n\n\n     B    | Convert D0 to W | Convert D0 to W | Convert D0 to L |\n          | Convert D7 to W |                 |                 |\n          | MULS            | MULS            | JSR MUL32       |\n          | Result = W      | Result = L      | Result = L      |\n          |                 |                 |                 |\n-----------------------------------------------------------------\n          |                 |                 |                 |\n     W    | Convert D7 to W |                 | Convert D0 to L |\n          | MULS            | MULS            | JSR MUL32       |\n          | Result = L      | Result = L      | Result = L      |\n          |                 |                 |                 |\n-----------------------------------------------------------------\n          |                 |                 |                 |\n     L    | Convert D7 to L | Convert D7 to L |                 |\n          | JSR MUL32       | JSR MUL32       | JSR MUL32       |\n          | Result = L      | Result = L      | Result = L      |\n          |                 |                 |                 |\n-----------------------------------------------------------------\n\nThis table shows the actions to be taken for each  combination of\noperand types.  There are three things to note: First,  we assume\na library routine  MUL32  which  performs  a  32  x  32 multiply,\nleaving a >> 32-bit << (not 64-bit) product.    If  there  is any\noverflow in the process,  we  choose to ignore it and return only\nthe lower 32 bits.\n\nSecond, note that the  table  is  symmetric  ... the two operands\nenter in the same way.  Finally, note that the product  is ALWAYS\na longword, except when  both  operands  are  bytes.  (It's worth\nnoting, in passing, that  this  means  that many expressions will\nend up being longwords, whether we  like  it or not.  Perhaps the\nidea  of  just  promoting  them  all  up  front wasn't  all  that\noutrageous, after all!)\n\nNow, clearly, we are going to have to generate different code for\nthe 16-bit and 32-bit multiplies.  This is best  done  by  having\nseparate code generator routines for the two cases:\n\n\n{---------------------------------------------------------------}\n{ Multiply Top of Stack by Primary (Word) }\n\nprocedure GenMult;\nbegin\n   EmitLn('MULS D7,D0')\nend;\n\n\n{---------------------------------------------------------------}\n{ Multiply Top of Stack by Primary (Long) }\n\nprocedure GenLongMult;\nbegin\n   EmitLn('JSR MUL32');\nend;\n{---------------------------------------------------------------}\n\n\nAn examination of the code below for PopMul  should  convince you\nthat the conditions in the table are met:\n\n\n{---------------------------------------------------------------}\n{ Generate Code to Multiply Primary by Stack }\n\nfunction PopMul(T1, T2: char): char;\nvar T: char;\nbegin\n   Pop(T1);\n   T := SameType(T1, T2);\n   Convert(T, 'W', 'D7');\n   Convert(T, 'W', 'D0');\n   if T = 'L' then\n      GenLongMult\n   else\n      GenMult;\n   if T = 'B' then\n      PopMul := 'W'\n   else\n      PopMul:= 'L';\nend;\n{---------------------------------------------------------------}\n\n\nAs you can see, the routine starts off just like PopAdd.  The two\narguments are forced to the same type.  The two calls  to Convert\ntake  care  of  the case where both operands are bytes.  The data\nthemselves are promoted  to  words, but the routine remembers the\ntype so as to assign the correct type to the result.  Finally, we\ncall one of the two code generator routines, and then  assign the\nresult type.  Not too complicated, really.\n\nAt this point, I suggest that you go ahead and test  the program.\nTry all combinations of operand sizes.\n\n\nDIVISION\n\nThe case of division is not nearly so  symmetric.    I  also have\nsome bad news for you:\n\nAll  modern  16-bit   CPU's   support   integer   divide.     The\nmanufacturer's data  sheet  will  describe  this  operation  as a\n32 x 16-bit divide, meaning that you can divide a 32-bit dividend\nby a 16-bit divisor.  Here's the bad news:\n\n\n                     THEY'RE LYING TO YOU!!!\n\n\nIf you don't believe  it,  try  dividing  any large 32-bit number\n(meaning that it has non-zero bits  in  the upper 16 bits) by the\ninteger 1.  You are guaranteed to get an overflow exception.\n\nThe  problem is that the instruction  really  requires  that  the\nresulting quotient fit into a 16-bit result.   This  won't happen\nUNLESS the divisor is  sufficiently  large.    When any number is\ndivided by unity, the quotient will of course be the same  as the\ndividend, which had better fit into a 16-bit word.\n\nSince  the  beginning  of  time  (well,  computers,  anyway), CPU\narchitects have  provided  this  little  gotcha  in  the division\ncircuitry.  It provides a certain amount of  symmetry  in things,\nsince it is sort of the inverse of the way a multiply works.  But\nsince  unity  is  a perfectly valid (and rather common) number to\nuse as a divisor, the division as implemented  in  hardware needs\nsome help from us programmers.\n\nThe implications are as follows:\n\n  o  The type of the quotient must always be the same as  that of\n     the dividend.  It is independent of the divisor.\n\n  o  In spite of  the  fact  that  the  CPU  supports  a longword\n     dividend,  the hardware-provided  instruction  can  only  be\n     trusted  for  byte  and  word  dividends.      For  longword\n     dividends, we need another library routine that can return a\n     long result.\n\n\n\nThis  looks  like  a job for  another  table,  to  summarize  the\nrequired actions:\n\n  T1 -->  |                 |                 |                 |\n          |                 |                 |                 |\n      |   |        B        |        W        |       L         |\n  T2  V   |                 |                 |                 |\n-----------------------------------------------------------------\n          |                 |                 |                 |\n     B    | Convert D0 to W | Convert D0 to W | Convert D0 to L |\n          | Convert D7 to L | Convert D7 to L |                 |\n          | DIVS            | DIVS            | JSR DIV32       |\n          | Result = B      | Result = W      | Result = L      |\n          |                 |                 |                 |\n-----------------------------------------------------------------\n          |                 |                 |                 |\n     W    | Convert D7 to L | Convert D7 to L | Convert D0 to L |\n          | DIVS            | DIVS            | JSR DIV32       |\n          | Result = B      | Result = W      | Result = L      |\n          |                 |                 |                 |\n-----------------------------------------------------------------\n          |                 |                 |                 |\n     L    | Convert D7 to L | Convert D7 to L |                 |\n          | JSR DIV32       | JSR DIV32       | JSR DIV32       |\n          | Result = B      | Result = W      | Result = L      |\n          |                 |                 |                 |\n-----------------------------------------------------------------\n\n\n(You may wonder why it's necessary to do a 32-bit  division, when\nthe  dividend is, say, only a byte in the first place.  Since the\nnumber  of bits in the result can only be as many as that in  the\ndividend,  why  bother?   The reason is that, if the divisor is a\nlongword,  and  there  are any high bits set in it, the result of\nthe division must  be zero.  We might not get that if we only use\nthe lower word of the divisor.)\n\nThe following code provides the correct function for PopDiv:\n\n\n{---------------------------------------------------------------}\n{ Generate Code to Divide Stack by the Primary }\n\nfunction PopDiv(T1, T2: char): char;\nbegin\n   Pop(T1);\n   Convert(T1, 'L', 'D7');\n   if (T1 = 'L') or (T2 = 'L') then begin\n      Convert(T2, 'L', 'D0');\n      GenLongDiv;\n      PopDiv := 'L';\n      end\n   else begin\n      Convert(T2, 'W', 'D0');\n      GenDiv;\n      PopDiv := T1;\n   end;\nend;\n{---------------------------------------------------------------}\n\n\nThe two code generation procedures are:\n\n\n{---------------------------------------------------------------}\n{ Divide Top of Stack by Primary  (Word) }\n\nprocedure GenDiv;\nbegin\n   EmitLn('DIVS D0,D7');\n   Move('W', 'D7', 'D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Divide Top of Stack by Primary (Long) }\n\nprocedure GenLongDiv;\nbegin\n   EmitLn('JSR DIV32');\nend;\n{---------------------------------------------------------------}\n\n\nNote  that  we  assume that DIV32 leaves the (longword) result in\nD0.\n\nOK, install the new  procedures  for division.  At this point you\nshould be able  to  generate  code  for  any  kind  of arithmetic\nexpression.  Give it a whirl!\n\n\nBEGINNING TO WIND DOWN\n\nAt  last, in this installment, we've learned  how  to  deal  with\nvariables (and literals) of different types.  As you can  see, it\nhasn't been too tough.  In  fact,  in  some ways most of the code\nlooks even more simple than it does in earlier  programs.    Only\nthe  multiplication  and  division  operators  require  a  little\nthinking and planning.\n\nThe main concept that  made  things  easy  was that of converting\nprocedures such as Expression into functions that return the type\nof the result.  Once this  was  done,  we were able to retain the\nsame general structure of the compiler.\n\nI won't pretend that  we've  covered  every  single aspect of the\nissue.  I conveniently  ignored  unsigned  arithmetic.  From what\nwe've  done, I think you can see that to include them adds no new\nchallenges, just extra possibilities to test for.\n\nI've also ignored the  logical  operators And, Or, etc.  It turns\nout  that  these are pretty easy to  handle.    All  the  logical\noperators are  bitwise  operations,  so  they  are  symmetric and\ntherefore work  in  the  same  fashion  as  PopAdd.  There is one\ndifference,  however:    if  it  is necessary to extend the  word\nlength for a logical variable, the extension should be done as an\nUNSIGNED  number.      Floating   point   numbers,   again,   are\nstraightforward  to  handle  ... just a few more procedures to be\nadded to the run-time library, or perhaps instructions for a math\nchip.\n\nPerhaps more importantly, I have also skirted the  issue  of type\nCHECKING,  as  opposed  to  conversion.   In other  words,  we've\nallowed for operations between variables of  all  combinations of\ntypes.  In general this will not be true ... certainly  you don't\nwant to add an integer, for example, to a string.  Most languages\nalso don't allow you to mix up character and integer variables.\n\nAgain, there are  really  no  new  issues to be addressed in this\ncase.  We are already checking the types of the two  operands ...\nmuch  of this checking gets done  in  procedures  like  SameType.\nIt's  pretty  straightforward  to  include  a  call  to an  error\nhandler, if the types of the two operands are incompatible.\n\nIn the general  case,  we  can  think of every single operator as\nbeing handled by  a  different procedure, depending upon the type\nof the two operands.  This is straightforward, though tedious, to\nimplement simply by implementing  a  jump  table with the operand\ntypes  as indices.  In Pascal,  the  equivalent  operation  would\ninvolve nested Case statements.    Some  of the called procedures\ncould then be simple  error  routines,  while others could effect\nwhatever kind of conversion we need.  As more  types  are  added,\nthe number of procedures goes up by a square-law rule, but that's\nstill not an unreasonably large number of procedures.\n\nWhat  we've  done  here is to collapse such a jump table into far\nfewer  procedures, simply by making use  of  symmetry  and  other\nsimplifying rules.\n\n\nTO COERCE OR NOT TO COERCE\n\nIn case you haven't gotten this message yet, it sure appears that\nTINY and KISS will  probably  _NOT_  be strongly typed languages,\nsince I've allowed for  automatic  mixing  and conversion of just\nabout any type.  Which brings up the next issue:\n\n                Is this really what we want to do?\n\nThe answer depends on what kind of language you want, and the way\nyou'd like it to behave.  What we have not addressed is the issue\nof when to allow and when to deny the use of operations involving\ndifferent  data  types.   In other  words,  what  should  be  the\nSEMANTICS of our compiler?   Do we want automatic type conversion\nfor all cases, for some cases, or not at all?\n\nLet's pause here to think about this a bit more.   To  do  so, it\nwill help to look at a bit of history.\n\nFORTRAN  II supported only two simple  data  types:  Integer  and\nReal.    It  allowed implicit type conversion  between  real  and\ninteger types during assignment, but not within expressions.  All\ndata items (including literal constants) on  the  right-hand side\nof an assignment statement had to be of the same type.  That made\nthings pretty easy  ...  much  simpler  than what we've had to do\nhere.\n\nThis  was  changed  in  FORTRAN   IV   to   support  \"mixed-mode\"\narithmetic.  If an expression had any real data items in it, they\nwere all converted to reals and the expression  itself  was real.\nTo round out  the  picture, functions were provided to explicitly\nconvert  from  one  type to the other, so that you could force an\nexpression to end up as either type.\n\nThis  led to two things:  code that was easier to write, and code\nthat was less efficient.  That's because sloppy programmers would\nwrite expressions with simple  constants  like  0  and 1 in them,\nwhich  the  compiler  would  dutifully  compile  to   convert  at\nexecution  time.  Still, the system  worked  pretty  well,  which\nwould  tend  to  indicate that implicit type conversion is a Good\nThing.\n\nC is also a weakly typed language, though it  supports  a  larger\nnumber  of types.  C won't complain if you try to add a character\nto an integer,  for  example.    Partly,  this is helped by the C\nconvention of promoting every char  to integer when it is loaded,\nor  passed  through  a  parameter  list.    This  simplifies  the\nconversions quite a  bit.    In  fact, in subset C compilers that\ndon't support long or float types,  we  end up back where we were\nin our earlier,  simple-minded  first try: every variable has the\nsame representation, once loaded into  a  register.    Makes life\npretty easy!\n\nThe  ultimate  language  in  the  direction  of   automatic  type\nconversion is PL/I.   This  language  supports  a large number of\ndata types, and you can mix them all  freely.    If  the implicit\nconversions of FORTRAN seemed good,  then  those  of  PL/I should\nhave been Heaven, but it turned  out  to  be more like Hell!  The\nproblem was that with so many data types, there had to be a large\nnumber  of  different conversions, AND  a  correspondingly  large\nnumber of rules about how  mixed  operands  should  be converted.\nThese rules became so  complex  that  no  one could remember what\nthey  were!  A lot of the errors in PL/I programs had to do  with\nunexpected and unwanted type  conversions.    Too  much of a Good\nThing can be bad for you!\n\nPascal,  on  the  other hand, is a  language  which  is \"strongly\ntyped,\" which means that in general you can't mix types,  even if\nthey differ only in _NAME_, and yet have the same base type!\nNiklaus Wirth made Pascal strongly typed to help keep programmers\nout of trouble, and  the  restrictions  have  indeed saved many a\nprogrammer from himself, because the compiler kept him from doing\nsomething dumb.  Better  to  find  the  bug in compilation rather\nthan  the  debug  phase.    The same restrictions can also  cause\nfrustration when you really  WANT  to mix types, and they tend to\ndrive an ex-C-programmer up the wall.\n\nEven so, Pascal does permit some implicit conversions.    You can\nassign  an integer to a real value.  You can also mix integer and\nreal types in  expressions  of  type  Real.  The integers will be\nautomatically coerced to real, just as in FORTRAN  (and  with the\nsame hidden cost in run-time overhead).\n\nYou can't, however, convert the  other way, from real to integer,\nwithout applying an explicit  conversion  function,  Trunc.   The\ntheory here is that,  since  the numerical value of a real number\nis  necessarily  going  to  be  changed  by  the conversion  (the\nfractional  part will be lost), you really  shouldn't  do  it  in\n\"secret.\"\n\nIn the spirit of strong typing, Pascal will not allow you  to mix\nChar  and  Integer   variables,  without  applying  the  explicit\ncoercion functions Chr and Ord.\n\nTurbo Pascal also includes the  types  Byte,  Word,  and LongInt.\nThe first two are basically the same as unsigned  integers.    In\nTurbo,  these can be freely intermixed  with  variables  of  type\nInteger,  and  Turbo will automatically  handle  the  conversion.\nThere are run-time  checks,  though, to keep you from overflowing\nor otherwise getting the wrong  answer. Note that you still can't\nmix Byte and Char types, even though they  are  stored internally\nin the same representation.\n\nThe ultimate in a  strongly-typed  language  is Ada, which allows\n_NO_  implicit  type  conversions at all, and also will not allow\nmixed-mode  arithmetic.    Jean   Ichbiah's   position   is  that\nconversions cost  execution time, and you shouldn't be allowed to\nbuild in such cost in a hidden manner.  By forcing the programmer\nto  explicitly  request  a  type  conversion,  you  make it  more\napparent that there could be a cost involved.\n\nI have been using another strongly-typed  language,  a delightful\nlittle  language  called  Whimsical,  by  John  Spray.   Although\nWhimsical is  intended as a systems programming language, it also\nrequires explicit conversion EVERY time.    There  are  NEVER any\nautomatic conversions, even the ones supported by Pascal.\n\nThis approach does  have  certain advantages:  The compiler never\nhas to guess what to do: the programmer always tells it precisely\nwhat  he  wants.  As a result, there tends to be  a  more  nearly\none-to-one correspondence between  source code and compiled code,\nand John's compiler produces VERY tight code.\n\nOn the other hand, I sometimes find the  explicit  conversions to\nbe a pain.  If I want, for example, to add one to a character, or\nAND it with a mask, there are a lot of conversions to make.  If I\nget  it  wrong,  the  only   error  message  is  \"Types  are  not\ncompatible.\"  As it happens, John's particular  implementation of\nthe language in his compiler doesn't tell you exactly WHICH types\nare not compatible ... it only tells you which LINE the  error is\nin.\n\nI must admit that most of my errors with this compiler tend to be\nerrors of this type, and  I've  spent  a  lot  of  time  with the\nWhimsical compiler, trying to figure out just WHERE  in  the line\nI've offended it.   The only real way to fix the error is to keep\ntrying things until something works.\n\nSo what should we do in TINY and KISS?  For the first one, I have\nthe answer:  TINY  will  support only the types Char and Integer,\nand  we'll  use  the  C  trick  of  promoting Chars  to  Integers\ninternally.  That means  that  the  TINY  compiler will be _MUCH_\nsimpler  than  what  we've  already  done.    Type conversion  in\nexpressions is sort of moot, since none will be required!   Since\nlongwords will not be supported, we also won't need the MUL32 and\nDIV32 run-time routines, nor the logic to figure out when to call\nthem.  I _LIKE_ it!\n\nKISS, on the other hand, will support the type Long.\n\nShould it support both signed and unsigned arithmetic?    For the\nsake of simplicity I'd rather not.    It  does add quite a bit to\nthe  complexity  of  type conversions.  Even  Niklaus  Wirth  has\neliminated  unsigned  (Cardinal) numbers from  his  new  language\nOberon, with the argument that  32-bit  integers  should  be long\nenough for anybody, in either case.\n\nBut KISS is supposed to  be a systems programming language, which\nmeans that we should  be  able to do whatever operations that can\nbe done in assembler.    Since the 68000 supports both flavors of\nintegers, I guess KISS  should,  also.    We've seen that logical\noperations  need to be able to extend  integers  in  an  unsigned\nfashion, so the unsigned conversion  procedures  are  required in\nany case.\n\n\nCONCLUSION\n\nThat wraps up our session on type conversions.  Sorry you  had to\nwait  so  long for it, but hope you feel that it  was  worth  the\nwait.\n\nIn  the  next  few installments, we'll extend the simple types to\ninclude arrays and pointers, and we'll have a look at what  to do\nabout  strings.    That should pretty well wrap up the mainstream\npart of the series.  After  that,  I'll give you the new versions\nof the TINY and KISS compilers,  and  then we'll start to look at\noptimization issues.\n\nSee you then.\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\n"
  },
  {
    "path": "15/tutor15.txt",
    "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                     LET'S BUILD A COMPILER!\n\n                                By\n\n                     Jack W. Crenshaw, Ph.D.\n\n                           5 March 1994\n\n\nPart 15: BACK TO THE FUTURE\n\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1994 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\nINTRODUCTION\n\nCan it really have been four years since I wrote installment \nfourteen of this series?  Is it really possible that six long \nyears have passed since I began it?  Funny how time flies when \nyou're having fun, isn't it?  \n\nI won't spend a lot of time making excuses; only point out that \nthings happen, and priorities change.  In the four years since \ninstallment fourteen, I've managed to get laid off, get divorced, \nhave a nervous breakdown, begin a new career as a writer, begin \nanother one as a consultant, move, work on two real-time systems, \nand raise fourteen baby birds, three pigeons, six possums, and a \nduck.  For awhile there, the parsing of source code was not high \non my list of priorities.  Neither was writing stuff for free, \ninstead of writing stuff for pay.  But I do try to be faithful, \nand I do recognize and feel my responsibility to you, the reader, \nto finish what I've started.  As the tortoise said in one of my \nson's old stories, I may be slow, but I'm sure.  I'm  sure that \nthere are people out there anxious to see the last reel of this \nfilm, and I intend to give it to them.  So, if you're one of those \nwho's been waiting, more or less patiently, to see how this thing \ncomes out, thanks for your patience.  I apologize for the delay.  \nLet's move on.\n\n\nNEW STARTS, OLD DIRECTIONS\n\nLike many other things, programming languages and programming \nstyles change with time.  In 1994, it seems a little anachronistic \nto be programming in Turbo Pascal, when the rest of the world \nseems  to have gone bananas over C++.  It also seems a little \nstrange to be programming in a classical style when the rest of \nthe world has switched to object-oriented methods.  Still, in \nspite of the four-year hiatus, it would be entirely too wrenching \na change, at this point, to switch to, say, C++ with object-\norientation .  Anyway, Pascal is still not only a powerful \nprogramming language (more than ever, in fact), but it's a \nwonderful medium for teaching.  C is a notoriously difficult \nlanguage to read ... it's often been accused, along with Forth, of \nbeing a \"write-only language.\"  When I program in C++, I find \nmyself spending at least 50% of my time struggling with language \nsyntax rather than with concepts.  A stray \"&\" or \"*\" can not only \nchange the functioning of the program, but its correctness as \nwell.  By contrast, Pascal code is usually quite transparent and \neasy to read, even if you don't know the language. What you see is \nalmost always what you get, and we can concentrate on concepts \nrather than implementation details.  I've said from the beginning \nthat the purpose of this tutorial series was not to generate the \nworld's fastest compiler, but to teach the fundamentals of \ncompiler technology, while spending the least amount of time \nwrestling with language syntax or other aspects of software \nimplementation. Finally, since a lot of what we do in this course \namounts to software experimentation, it's important to have a \ncompiler and associated environment that compiles quickly and with \nno fuss.  In my opinion, by far the most significant time measure \nin software development is the speed of the edit/compile/test \ncycle.  In this department, Turbo Pascal is king.  The compilation \nspeed is blazing fast, and continues to get faster in every \nrelease (how do they keep doing that?).  Despite vast improvements \nin C compilation speed over the years, even Borland's fastest \nC/C++ compiler is still no match for Turbo Pascal.  Further, the \neditor built into their IDE, the make facility, and even their \nsuperb smart linker, all complement each other to produce a \nwonderful environment for quick turnaround.  For all of these \nreasons, I intend to stick with Pascal for the duration of this \nseries. We'll be using Turbo Pascal for Windows, one of the \ncompilers provided Borland Pascal with Objects, version 7.0.  If \nyou don't have this compiler, don't worry ... nothing we do here \nis going to count on your having the latest version. Using the \nWindows version helps me a lot, by allowing me to use the \nClipboard to copy code from the compiler's editor into these \ndocuments.  It should also help you at least as much, copying the \ncode in the other direction.  \n\nI've thought long and hard about whether or not to introduce \nobjects to our discussion.  I'm a big advocate of object-oriented \nmethods for all uses, and such methods definitely have their place \nin compiler technology.  In fact, I've written papers on just this \nsubject (Refs. 1-3).  But the architecture of a compiler which is \nbased on object-oriented approaches is vastly different than that \nof the more classical compiler we've been building.  Again, it \nwould seem to be entirely too much to change these horses in mid-\nstream.  As I said, programming styles change.  Who knows, it may \nbe another six years before we finish this thing, and if we keep \nchanging the code every time programming style changes, we may \nNEVER finish.\n\nSo for now, at least, I've determined to continue the classical \nstyle in Pascal, though we might indeed discuss objects and object \norientation as we go.  Likewise, the target machine will remain \nthe Motorola 68000 family.  Of all the decisions to be made here, \nthis one has been the easiest.  Though I know that many of you \nwould like to see code for the 80x86, the 68000 has become, if \nanything, even more popular as a platform for embedded systems, \nand it's to that application that this whole effort began in the \nfirst place.  Compiling for the PC, MSDOS platform, we'd have to \ndeal with all the issues of DOS system calls, DOS linker formats, \nthe PC file system and hardware, and all those other complications \nof a DOS environment.  An embedded system, on the other hand, must \nrun standalone, and it's for this kind of application, as an \nalternative to assembly language, that I've always imagined that a \nlanguage like KISS would thrive. Anyway, who wants to deal with \nthe 80x86 architecture if they don't have to?\n\nThe one feature of Turbo Pascal that I'm going to be making heavy \nuse of is units.  In the past, we've had to make compromises \nbetween code size and complexity, and program functionality.  A \nlot of our work has been in the nature of computer \nexperimentation, looking at only one aspect of compiler technology \nat a time. We did this to avoid to avoid having to carry around \nlarge programs, just to investigate simple concepts.  In the \nprocess, we've re-invented the wheel and re-programmed the same \nfunctions more times than I'd like to count.  Turbo units provide \na wonderful way to get functionality and simplicity at the same \ntime:  You write reusable code, and invoke it with a single line.  \nYour test program stays small, but it can do powerful things.\n\nOne feature of Turbo Pascal units is their initialization block.  \nAs with an Ada package, any code in the main begin-end block of a \nunit gets executed as the program is initialized.  As you'll see \nlater, this sometimes gives us neat simplifications in the code.  \nOur procedure Init, which has been with us since Installment 1, \ngoes away entirely when we use units.  The various routines in the \nCradle, another key features of our approach, will get distributed \namong the units.\n\nThe concept of units, of course, is no different than that of C \nmodules.  However, in C (and C++), the interface between modules \ncomes via preprocessor include statements and header files.  As \nsomeone who's had to read a lot of other people's C programs, I've \nalways found this rather bewildering.  It always seems that \nwhatever data structure you'd like to know about is in some other \nfile.  Turbo units are simpler for the very reason that they're \ncriticized by some:  The function interfaces and their \nimplementation are included in the same file.  While this \norganization may create problems with code security, it also \nreduces the number of files by half, which isn't half bad.  \nLinking of the object files is also easy, because the Turbo \ncompiler takes care of it without the need for make files or other \nmechanisms.\n\n\nSTARTING OVER?\n\nFour years ago, in Installment 14, I promised you that our days of \nre-inventing the wheel, and recoding the same software over and \nover for each lesson, were over, and that from now on we'd stick \nto more complete programs that we would simply add new features \nto.  I still intend to keep that promise; that's one of the main \npurposes for using units.  However, because of the long time since \nInstallment 14, it's natural to want to at least do some review, \nand anyhow, we're going to have to make rather sweeping changes in \nthe code to make the transition to units.  Besides, frankly, after \nall this time I can't remember all the neat ideas I had in my head \nfour years ago.  The best way for me to recall them is to retrace \nsome of the steps we took to arrive at Installment 14.  So I hope \nyou'll be understanding and bear with me as we go back to our \nroots, in a sense, and rebuild the core of the software, \ndistributing the routines among the various units, and \nbootstrapping ourselves back up to the point we were at lo, those \nmany moons ago. As has always been the case, you're going to get  \nto see me make all the mistakes and execute changes of direction, \nin real time.  Please bear with me ... we'll start getting to the \nnew stuff before you know it.\n\nSince we're going to be using multiple modules in our new \napproach, we have to address the issue of file management.  If \nyou've followed all the other sections of this tutorial, you know \nthat, as our programs evolve, we're going to be replacing older, \nmore simple-minded units with more capable ones. This brings us to \nan issue of version control. There will almost certainly be times \nwhen we will overlay a simple file (unit), but later wish we had \nthe simple one again.  A case in point is embodied in our \npredilection for using single-character variable names, keywords, \netc., to test concepts without getting bogged down in the details \nof a lexical scanner.  Thanks to the use of units, we will be \ndoing much less of this in the future.  Still, I not only suspect, \nbut am certain that we will need to save some older versions of \nfiles, for special purposes, even though they've been replaced by \nnewer, more capable ones.\n\nTo deal with this problem, I suggest that you create different \ndirectories, with different versions of the units as needed.  If \nwe do this properly, the code in each directory will remain self-\nconsistent.  I've tentatively created four directories:  SINGLE \n(for single-character experimentation), MULTI (for, of course, \nmulti-character versions), TINY, and KISS.\n\nEnough said about philosophy and details.  Let's get on with the \nresurrection of the software.\n\n\nTHE INPUT UNIT\n\nA key concept that we've used since Day 1 has been the idea of an \ninput stream with one lookahead character.  All the parsing \nroutines examine this character, without changing it, to decide \nwhat they should do next.  (Compare this approach with the C/Unix \napproach using getchar and unget, and I think you'll agree that \nour approach is simpler). We'll begin our hike into the future by \ntranslating this concept into our new, unit-based organization.  \nThe first unit, appropriately called Input, is shown below:\n\n\n{--------------------------------------------------------------}\nunit Input;\n{--------------------------------------------------------------}\ninterface\nvar Look: char;              \t{ Lookahead character }\nprocedure GetChar;            { Read new character  }\n\n{--------------------------------------------------------------}\nimplementation\n\n{--------------------------------------------------------------}\n{ Read New Character From Input Stream }\n\nprocedure GetChar;\nbegin\n\tRead(Look);\nend;\n\n{--------------------------------------------------------------}\n{ Unit Initialization }\nbegin\n\tGetChar;\nend.\n{--------------------------------------------------------------}\n\n\nAs you can see, there's nothing very profound, and certainly \nnothing complicated, about this unit, since it consists of only a \nsingle procedure.  But already, we can see how the use of units \ngives us advantages.  Note the executable code in the \ninitialization block.  This code \"primes the pump\" of the input \nstream for us, something we've always had to do before, by \ninserting the call to GetChar in line, or in procedure Init.  This \ntime, the call happens without any special reference to it on our \npart, except within the unit itself. As I predicted earlier, this \nmechanism is going to make our lives much simpler as we proceed.\nI consider it to be one of the most useful features of Turbo \nPascal, and I lean on it heavily. \n\nCopy this unit into your compiler's IDE, and compile it. To test \nthe software, of course, we always need a main program.  I used \nthe following, really complex test program, which we'll later \nevolve into the Main for our compiler:\n\n\n{--------------------------------------------------------------}\nprogram Main;\nuses WinCRT, Input;\nbegin\n\tWriteLn(Look);\nend.\n{--------------------------------------------------------------}\n\n\nNote the use of the Borland-supplied unit, WinCRT.  This unit is \nnecessary if you intend to use the standard Pascal I/O routines, \nRead, ReadLn, Write, and WriteLn, which of course we intend to do.\nIf you forget to include this unit in the \"uses\" clause, you will \nget a really bizarre and indecipherable error message at run time.\n\nNote also that we can access the lookahead character, even though \nit's not declared in the main program.  All variables declared \nwithin the interface section of a unit are global, but they're \nhidden from prying eyes; to that extent, we get a modicum of \ninformation hiding.  Of course, if we were writing in an object-\noriented fashion, we should not allow outside modules to access \nthe units internal variables.  But, although Turbo units have a \nlot in common with objects, we're not doing object-oriented design \nor code here, so our use of Look is appropriate.\n\nGo ahead and save the test program as Main.pas.  To make life \neasier as we get more and more files, you might want to take this \nopportunity to declare this file as the compiler's Primary file.  \nThat way, you can execute the program from any file.  Otherwise, \nif you press Cntl-F9 to compile and run from one of the units, \nyou'll get an error message.  You set the primary file using the \nmain submenu, \"Compile,\" in the Turbo IDE.\n\nI hasten to point out, as I've done before, that the function of \nunit Input is, and always has been, considered to be a dummy \nversion of the real thing.  In a production version of a compiler, \nthe input stream will, of course, come from a file rather than \nfrom the keyboard.  And it will almost certainly include line \nbuffering, at the very least, and more likely, a rather large text \nbuffer to support efficient disk I/O.  The nice part about the \nunit approach is that, as with objects, we can modify the code in \nthe unit to be as simple or as sophisticated as we like. As long \nas the interface, as embodied in the public procedures and the \nlookahead character, don't change, the rest of the program is \ntotally unaffected.  And since units are compiled, rather than \nmerely included, the time required to link with them is virtually \nnil.  Again, the result is that we can get all the benefits of \nsophisticated implementations, without having to carry the code \naround as so much baggage.\n\nIn later installments, I intend to provide a full-blown IDE for \nthe KISS compiler, using a true Windows application generated by \nBorland's OWL applications framework.  For now, though, we'll obey \nmy #1 rule to live by:  Keep It Simple.\n\n\n\nTHE OUTPUT UNIT\n\nOf course, every decent program should have output, and ours is no \nexception.  Our output routines included the Emit functions.  The \ncode for the corresponding output unit is shown next:\n\n\n{--------------------------------------------------------------}\nunit Output;\n{--------------------------------------------------------------}\ninterface\nprocedure Emit(s: string);\t\t\t{ Emit an instruction \t}\nprocedure EmitLn(s: string);\t\t{ Emit an instruction line }\n\n{--------------------------------------------------------------}\nimplementation\nconst TAB = ^I;\n\n{--------------------------------------------------------------}\n{ Emit an Instruction }\n\nprocedure Emit(s: string);\nbegin\n\tWrite(TAB, s);\nend;\n\n{--------------------------------------------------------------}\n{ Emit an Instruction, Followed By a Newline }\n\nprocedure EmitLn(s: string);\nbegin\n\tEmit(s);\n\tWriteLn;\nend;\n\nend.\n{--------------------------------------------------------------}\n\n\n(Notice that this unit has no initialization clause, so it needs \nno begin-block.)\n \nTest this unit with the following main program:\n\n{--------------------------------------------------------------}\nprogram Test;\nuses WinCRT, Input, Output, Scanner, Parser;\nbegin\n\tWriteLn('MAIN:\");\n\tEmitLn('Hello, world!');\nend.\n{--------------------------------------------------------------}\n\nDid you see anything that surprised you?  You may have been \nsurprised to see that you needed to type something, even though \nthe main program requires no input.  That's because of the \ninitialization in unit Input, which still requires something to \nput into the lookahead character.  Sorry, there's no way out of \nthat box, or rather, we don't _WANT_ to get out. Except for simple \ntest cases such as this, we will always want a valid lookahead \ncharacter, so the right thing to do about this \"problem\" is ... \nnothing.\n\nPerhaps more surprisingly, notice that the TAB character had no \neffect; our line of \"instructions\" begins at column 1, same as the \nfake label.  That's right:  WinCRT doesn't support tabs. We have a \nproblem.\n\nThere are a few ways we can deal with this problem. The one thing \nwe can't do is to simply ignore it.  Every assembler I've ever \nused reserves column 1 for labels, and will rebel to see \ninstructions starting there.  So, at the very least, we must space \nthe instructions over one column to keep the assembler happy.  .  \nThat's easy enough to do:  Simply change, in procedure Emit, the \nline:\n\n\tWrite(TAB, s);\n\t\nby:\n\n\tWrite(' ', s);\n\nI must admit that I've wrestled with this problem before, and find \nmyself changing my mind as often as a chameleon changes color.  \nFor the purposes we're going to be using, 99% of which will be \nexamining the output code as it's displayed on a CRT, it would be \nnice to see neatly blocked out \"object\" code.  The line:\n\nSUB1:\t\tMOVE\t#4,D0\n\njust plain looks neater than the different, but functionally \nidentical code,\n\nSUB1:\n MOVE #4,D0\n\nIn test versions of my code, I included a more sophisticated \nversion of the procedure PostLabel, that avoids having labels on \nseparate lines, but rather defers the printing of a label so it \ncan end up on the same line as the associated instruction.  As \nrecently as an hour ago, my version of unit Output provided full \nsupport for tabs, using an internal column count variable and \nsoftware to manage it.  I had, if I do say so myself, some rather \nelegant code to support the tab mechanism, with a minimum of code \nbloat. It was awfully tempting to show you the \"prettyprint\" \nversion, if for no other reason than to show off the elegance.\n\nNevertheless, the code of the \"elegant\" version was considerably \nmore complex and larger.  Since then, I've had second thoughts. In \nspite of our desire to see pretty output, the inescapable fact is \nthat the two versions of the MAIN: code fragment shown above are \nfunctionally identical; the assembler, which is the ultimate \ndestination of the code, couldn't care less which version it gets, \nexcept that the prettier version will contain more characters, \ntherefore will use more disk space and take longer to assemble.   \nbut the prettier one not only takes more code to generate, but \nwill create a larger output file, with many more space characters \nthan the minimum needed.  When you look at it that way, it's not \nvery hard to decide which approach to use, is it?\n\nWhat finally clinched the issue for me was a reminder to consider \nmy own first commandment: KISS.  Although I was pretty proud of \nall my elegant little tricks to implement tabbing, I had to remind \nmyself that, to paraphrase Senator Barry Goldwater, elegance in \nthe pursuit of complexity is no virtue.  Another wise man once \nwrote, \"Any idiot can design a Rolls-Royce. It takes a genius to \ndesign a VW.\"  So the elegant, tab-friendly version of Output is \nhistory, and what you see is the simple, compact, VW version.\n\n\nTHE ERROR UNIT\n\nOur next set of routines are those that handle errors.  To refresh \nyour memory, we take the approach, pioneered by Borland in Turbo \nPascal, of halting on the first error.  Not only does this greatly \nsimplify our code, by completely avoiding the sticky issue of \nerror recovery, but it also makes much more sense, in my opinion, \nin an interactive environment.  I know this may be an extreme \nposition, but I consider the practice of reporting all errors in a \nprogram to be an anachronism, a holdover from the days of batch \nprocessing.  It's time to scuttle the practice.  So there.\n\nIn our original Cradle, we had two error-handling procedures: \nError, which didn't halt, and Abort, which did.  But I don't think \nwe ever found a use for the procedure that didn't halt, so in the \nnew, lean and mean unit Errors, shown next, procedure Error takes \nthe place of Abort.\n\n\n{--------------------------------------------------------------}\nunit Errors;\n{--------------------------------------------------------------}\ninterface\nprocedure Error(s: string);\nprocedure Expected(s: string);\n\n{--------------------------------------------------------------}\nimplementation\n\n{--------------------------------------------------------------}\n{ Write error Message and Halt }\n\nprocedure Error(s: string);\nbegin\n\tWriteLn;\n\tWriteLn(^G, 'Error: ', s, '.');\n\tHalt;\nend;\n\n{--------------------------------------------------------------}\n{ Write \"<something> Expected\" }\n\nprocedure Expected(s: string);\nbegin\n\tError(s + ' Expected');\nend;\n\nend.\n{--------------------------------------------------------------}\n\n\nAs usual, here's a test program:\n\n\n\n\n{--------------------------------------------------------------}\nprogram Test;\nuses WinCRT, Input, Output, Errors;\n\nbegin\n\tExpected('Integer');\nend.\n{--------------------------------------------------------------}\n\nHave you noticed that the \"uses\" line in our main program keeps \ngetting longer?  That's OK. In the final version, the main program \nwill only call procedures in our parser, so its use clause will \nonly have a couple of entries. But for now, it's probably best to \ninclude all the units so we can test procedures in them.\n\n\nSCANNING AND PARSING\n\nThe classical compiler architecture consists of separate modules \nfor the lexical scanner, which supplies tokens in the language, \nand the parser, which tries to make sense of the tokens as syntax \nelements.  If you can still remember what we did in earlier \ninstallments, you'll recall that we didn't do things that way.  \nBecause we're using a predictive parser, we can almost always tell \nwhat language element is coming next, just by examining the \nlookahead character.  Therefore, we found no need to prefetch \ntokens, as a scanner would do.\n\nBut, even though there is no functional procedure called \n\"Scanner,\" it still makes sense to separate the scanning functions \nfrom the parsing functions.  So I've created two more units \ncalled, amazingly enough, Scanner and Parser.  The Scanner unit \ncontains all of the routines known as recognizers.  Some of these, \nsuch as IsAlpha, are pure boolean routines which operate on the \nlookahead character only.  The other routines are those which \ncollect tokens, such as identifiers and numeric constants. The \nParser unit will contain all of the routines making up the \nrecursive-descent parser.  The general rule should be that unit \nParser contains all of the information that is language-specific; \nin other words, the syntax of the language should be wholly \ncontained in Parser.  In an ideal world, this rule should be true \nto the extent that we can change the compiler to compile a \ndifferent language, merely by replacing the single unit, Parser. \n\nIn practice, things are almost never this pure.  There's always a \nsmall amount of \"leakage\" of syntax rules into the scanner as \nwell.  For example, the rules concerning what makes up a legal \nidentifier or constant may vary from language to language.  In \nsome languages, the rules concerning comments permit them to be \nfiltered by the scanner, while in others they do not. So in \npractice, both units are likely to end up having language-\ndependent components, but the changes required to the scanner \nshould be relatively trivial. \n\nNow, recall that we've used two versions of the scanner routines: \nOne that handled only single-character tokens, which we used for a \nnumber of our tests, and another that provided full support for \nmulti-character tokens.  Now that we have our software separated \ninto units, I don't anticipate getting much use out of the single-\ncharacter version, but it doesn't cost us much to provide for \nboth.  I've created two versions of the Scanner unit.  The first \none, called Scanner1, contains the single-digit version of the \nrecognizers:\n\n\n{--------------------------------------------------------------}\nunit Scanner1;\n{--------------------------------------------------------------}\ninterface\nuses Input, Errors;\n\nfunction IsAlpha(c: char): boolean;\nfunction IsDigit(c: char): boolean;\nfunction IsAlNum(c: char): boolean;\nfunction IsAddop(c: char): boolean;\nfunction IsMulop(c: char): boolean;\n\nprocedure Match(x: char);\nfunction GetName: char;\nfunction GetNumber: char;\n\n{--------------------------------------------------------------}\nimplementation\n\n{--------------------------------------------------------------}\n{ Recognize an Alpha Character }\n\nfunction IsAlpha(c: char): boolean;\nbegin\n\tIsAlpha := UpCase(c) in ['A'..'Z'];\nend;\n\n{--------------------------------------------------------------}\n{ Recognize a Numeric Character }\n\nfunction IsDigit(c: char): boolean;\nbegin\n\tIsDigit := c in ['0'..'9'];\nend;\n\n{--------------------------------------------------------------}\n{ Recognize an Alphanumeric Character }\n\nfunction IsAlnum(c: char): boolean;\nbegin\n\tIsAlnum := IsAlpha(c) or IsDigit(c);\nend;\n\n{--------------------------------------------------------------}\n{ Recognize an Addition Operator }\n\nfunction IsAddop(c: char): boolean;\nbegin\n\tIsAddop := c in ['+','-'];\nend;\n\n{--------------------------------------------------------------}\n{ Recognize a Multiplication Operator }\n\nfunction IsMulop(c: char): boolean;\nbegin\n\tIsMulop := c in ['*','/'];\nend;\n\n{--------------------------------------------------------------}\n{ Match One Character }\n\nprocedure Match(x: char);\nbegin\n\tif Look = x then GetChar\n\telse Expected('''' + x + '''');\nend;\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nfunction GetName: char;\nbegin\n\tif not IsAlpha(Look) then Expected('Name');\n\tGetName := UpCase(Look);\n\tGetChar;\nend;\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nfunction GetNumber: char;\nbegin\n\tif not IsDigit(Look) then Expected('Integer');\n\tGetNumber := Look;\n\tGetChar;\nend;\n\nend.\n{--------------------------------------------------------------}\n\n\nThe following code fragment of the main program provides a good \ntest of the scanner.  For brevity, I'll only include the \nexecutable code here; the rest remains the same.  Don't forget, \nthough, to add the name Scanner1 to the \"uses\" clause.\n\n\tWrite(GetName);\n\tMatch('=');\n\tWrite(GetNumber);\n\tMatch('+');\n\tWriteLn(GetName);\n\nThis code will recognize all sentences of the form:\n\n\tx=0+y\n\nwhere x and y can be any single-character variable names, and 0 \nany digit.  The code should reject all other sentences, and give a \nmeaningful error message. If it did, you're in good shape and we \ncan proceed.\n\n\nTHE SCANNER UNIT\n\nThe next, and by far the most important, version of the scanner is \nthe one that handles the multi-character tokens that all real \nlanguages must have.  Only the two functions, GetName and \nGetNumber, change between the two units, but just to be sure there \nare no mistakes, I've reproduced the entire unit here.  This is \nunit Scanner:\n\n\n{--------------------------------------------------------------}\nunit Scanner;\n{--------------------------------------------------------------}\ninterface\nuses Input, Errors;\n\nfunction IsAlpha(c: char): boolean;\nfunction IsDigit(c: char): boolean;\nfunction IsAlNum(c: char): boolean;\nfunction IsAddop(c: char): boolean;\nfunction IsMulop(c: char): boolean;\n\nprocedure Match(x: char);\nfunction GetName: string;\nfunction GetNumber: longint;\n\n{--------------------------------------------------------------}\nimplementation\n\n{--------------------------------------------------------------}\n{ Recognize an Alpha Character }\n\nfunction IsAlpha(c: char): boolean;\nbegin\n\tIsAlpha := UpCase(c) in ['A'..'Z'];\nend;\n\n{--------------------------------------------------------------}\n{ Recognize a Numeric Character }\n\nfunction IsDigit(c: char): boolean;\nbegin\n\tIsDigit := c in ['0'..'9'];\nend;\n\n{--------------------------------------------------------------}\n{ Recognize an Alphanumeric Character }\n\nfunction IsAlnum(c: char): boolean;\nbegin\n\tIsAlnum := IsAlpha(c) or IsDigit(c);\nend;\n\n{--------------------------------------------------------------}\n{ Recognize an Addition Operator }\n\nfunction IsAddop(c: char): boolean;\nbegin\n\tIsAddop := c in ['+','-'];\nend;\n\n{--------------------------------------------------------------}\n{ Recognize a Multiplication Operator }\n\nfunction IsMulop(c: char): boolean;\nbegin\n\tIsMulop := c in ['*','/'];\nend;\n\n{--------------------------------------------------------------}\n{ Match One Character }\n\nprocedure Match(x: char);\nbegin\n\tif Look = x then GetChar\n\telse Expected('''' + x + '''');\nend;\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nfunction GetName: string;\nvar n: string;\nbegin\n\tn := '';\n\tif not IsAlpha(Look) then Expected('Name');\n\twhile IsAlnum(Look) do begin\n\t\tn := n + Look;\n\t\tGetChar;\n\tend;\n\tGetName := n;\nend;\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nfunction GetNumber: string;\nvar n: string;\nbegin\n\tn := '';\n\tif not IsDigit(Look) then Expected('Integer');\n\twhile IsDigit(Look) do begin\n\t\tn := n + Look;\n\t\tGetChar;\n\tend;\n\tGetNumber := n;\nend;\n\nend.\n{--------------------------------------------------------------}\n\n\nThe same test program will test this scanner, also. Simply change \nthe \"uses\" clause to use Scanner instead of Scanner1.  Now you \nshould be able to type multi-character names and numbers.\n\n\nDECISIONS, DECISIONS\n\nIn spite of the relative simplicity of both scanners, a lot of \nthought has gone into them, and a lot of decisions had to be made.  \nI'd like to share those thoughts with you now so you can make your \nown educated decision, appropriate for your application.  First, \nnote that both versions of GetName translate the input characters \nto upper case.  Obviously, there was a design decision made here, \nand this is one of those cases where the language syntax splatters \nover into the scanner.  In the C language, the case of characters \nin identifiers is significant.  For such a language, we obviously \ncan't map the characters to upper case.  The design I'm using \nassumes a language like Pascal, where the case of characters \ndoesn't matter.  For such languages, it's easier to go ahead and \nmap all identifiers to upper case in the scanner, so we don't have \nto worry later on when we're comparing strings for equality.\n\nWe could have even gone a step further, and map the characters to \nupper case right as they come in, in GetChar.  This approach works \ntoo, and I've used it in the past, but it's too confining. \nSpecifically, it will also map characters that may be part of \nquoted strings, which is not a good idea.  So if you're going to \nmap to upper case at all, GetName is the proper place to do it.\n\nNote that the function GetNumber in this scanner returns a string, \njust as GetName does.  This is another one of those things I've \noscillated about almost daily, and the last swing was all of ten \nminutes ago.  The alternative approach, and one I've used many \ntimes in past installments, returns an integer result.\n\nBoth approaches have their good points. Since we're fetching a \nnumber, the approach that immediately comes to mind is to return \nit as an integer.  But bear in mind that the eventual use of the \nnumber will be in a write statement that goes back to the outside \nworld.  Someone -- either us or the code hidden inside the write \nstatement -- is going to have to convert the number back to a \nstring again.  Turbo Pascal includes such string conversion \nroutines, but why use them if we don't have to?  Why convert a \nnumber from string to integer form, only to convert it right back \nagain in the code generator, only a few statements later?\n\nFurthermore, as you'll soon see, we're going to need a temporary \nstorage spot for the value of the token we've fetched. If we treat \nthe number in its string form, we can store the value of either a \nvariable or a number in the same string.  Otherwise, we'll have to \ncreate a second, integer variable.\n\nOn the other hand, we'll find that carrying the number as a string \nvirtually eliminates any chance of optimization later on.  As we \nget to the point where we are beginning to concern ourselves with \ncode generation, we'll encounter cases in which we're doing \narithmetic on constants.  For such cases, it's really foolish to \ngenerate code that performs the constant arithmetic at run time.  \nFar better to let the parser do the arithmetic at compile time, \nand merely code the result.  To do that, we'll wish we had the \nconstants stored as integers rather than strings.\n\nWhat finally swung me back over to the string approach was an \naggressive application of the KISS test, plus reminding myself \nthat we've studiously avoided issues of code efficiency.  One of \nthe things that makes our simple-minded parsing work, without the \ncomplexities of a \"real\" compiler, is that we've said up front \nthat we aren't concerned about code efficiency.  That gives us a \nlot of freedom to do things the easy way rather than the efficient \none, and it's a freedom we must be careful not to abandon \nvoluntarily, in spite of the urges for efficiency shouting in our \near.  In addition to being a big believer in the KISS philosophy, \nI'm also an advocate of \"lazy programming,\" which in this context \nmeans, don't program anything until you need it.  As P.J. Plauger \nsays, \"Never put off until tomorrow what you can put off \nindefinitely.\"  Over the years, much code has been written to \nprovide for eventualities that never happened.  I've learned that \nlesson myself, from bitter experience.  So the bottom line is:  We \nwon't convert to an integer here because we don't need to.  It's \nas simple as that.\n\nFor those of you who still think we may need the integer version \n(and indeed we may), here it is:\n\n\n{--------------------------------------------------------------}\n{ Get a Number (integer version) }\n\nfunction GetNumber: longint;\nvar n: longint;\nbegin\n\tn := 0;\n\tif not IsDigit(Look) then Expected('Integer');\n\twhile IsDigit(Look) do begin\n\t\tn := 10 * n + (Ord(Look) - Ord('0'));\n\t\tGetChar;\n\tend;\n\tGetNumber := n;\nend;\n{--------------------------------------------------------------}\n\nYou might file this one away, as I intend to, for a rainy day.\n\n\nPARSING\n\nAt this point, we have distributed all the routines that made up \nour Cradle into units that we can draw upon as we need them.  \nObviously, they will evolve further as we continue the process of \nbootstrapping ourselves up again, but for the most part their \ncontent, and certainly the architecture that they imply, is \ndefined.  What remains is to embody the language syntax into the \nparser unit.  We won't do much of that in this installment, but I \ndo want to do a little, just to leave us with the good feeling \nthat we still know what we're doing.  So before we go, let's \ngenerate just enough of a parser to process single factors in an \nexpression.  In the process, we'll also, by necessity, find we \nhave created a code generator unit, as well.\n\nRemember the very first installment of this series?  We read an \ninteger value, say n, and generated the code to load it into the \nD0 register via an immediate move:\n\n\tMOVE #n,D0\n\nShortly afterwards, we repeated the process for a variable, \n\n\tMOVE X(PC),D0\n\nand then for a factor that could be either constant or variable.\nFor old times sake, let's revisit that process.  Define the \nfollowing new unit:\n\n\n{--------------------------------------------------------------}\nunit Parser;\n{--------------------------------------------------------------}\ninterface\nuses Input, Scanner, Errors, CodeGen;\nprocedure Factor;\n\n{--------------------------------------------------------------}\nimplementation\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Factor }\n\nprocedure Factor;\nbegin\n\tLoadConstant(GetNumber);\nend;\n\nend.\n{--------------------------------------------------------------}\n\n\nAs you can see, this unit calls a procedure, LoadConstant, which \nactually effects the output of the assembly-language code.  The \nunit also uses a new unit, CodeGen.  This step represents the last \nmajor change in our architecture, from earlier installments: The \nremoval of the machine-dependent code to a separate unit. If I \nhave my way, there will not be a single line of code, outside of \nCodeGen, that betrays the fact that we're targeting the 68000 CPU.  \nAnd this is one place I think that having my way is quite \nfeasible.  \n\nFor those of you who wish I were using the 80x86 architecture (or \nany other one) instead of the 68000, here's your answer:  Merely \nreplace CodeGen with one suitable for your CPU of choice.\n\nSo far, our code generator has only one procedure in it.  Here's \nthe unit:\n\n\n{--------------------------------------------------------------}\nunit CodeGen;\n\n{--------------------------------------------------------------}\ninterface\nuses Output;\nprocedure LoadConstant(n: string);\n\n{--------------------------------------------------------------}\nimplementation\n\n{--------------------------------------------------------------}\n{ Load the Primary Register with a Constant }\n\nprocedure LoadConstant(n: string);\nbegin\n\tEmitLn('MOVE #' + n + ',D0' );\nend;\n\nend.\n{--------------------------------------------------------------}\n\n\nCopy and compile this unit, and execute the following main \nprogram:\n\n{--------------------------------------------------------------}\nprogram Main;\nuses WinCRT, Input, Output, Errors, Scanner, Parser;\nbegin\n\tFactor;\nend.\n{--------------------------------------------------------------}\n\n\nThere it is, the generated code, just as we hoped it would be.\n\nNow, I hope you can begin to see the advantage of the unit-based \narchitecture of our new design.  Here we have a main program \nthat's all of five lines long. That's all of the program we need \nto see, unless we choose to see more.  And yet, all those units \nare sitting there, patiently waiting to serve us.  We can have our \ncake and eat it too, in that we have simple and short code, but \npowerful allies.  What remains to be done is to flesh out the \nunits to match the capabilities of earlier installments.  We'll do \nthat in the next installment, but before I close, let's finish out \nthe parsing of a factor, just to satisfy ourselves that we still \nknow how.  The final version of CodeGen includes the new \nprocedure, LoadVariable:\n\n{--------------------------------------------------------------}\nunit CodeGen;\n\n{--------------------------------------------------------------}\ninterface\nuses Output;\nprocedure LoadConstant(n: string);\nprocedure LoadVariable(Name: string);\n\n{--------------------------------------------------------------}\nimplementation\n\n{--------------------------------------------------------------}\n{ Load the Primary Register with a Constant }\n\nprocedure LoadConstant(n: string);\nbegin\n\tEmitLn('MOVE #' + n + ',D0' );\nend;\n\n{--------------------------------------------------------------}\n{ Load a Variable to the Primary Register }\n\nprocedure LoadVariable(Name: string);\nbegin\n\tEmitLn('MOVE ' + Name + '(PC),D0');\nend;\n\n\nend.\n{--------------------------------------------------------------}\n\n\nThe parser unit itself doesn't change, but we have a more complex \nversion of procedure Factor:\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Factor }\n\nprocedure Factor;\nbegin\n\tif IsDigit(Look) then\n\t\tLoadConstant(GetNumber)\n\telse if IsAlpha(Look)then\n\t\tLoadVariable(GetName)\n\telse\n\t\tError('Unrecognized character ' + Look);\nend;\n{--------------------------------------------------------------}\n\n \nNow, without altering the main program, you should find that our \nprogram will process either a variable or a constant factor.  At \nthis point, our architecture is almost complete; we have units to \ndo all the dirty work, and enough code in the parser and code \ngenerator to demonstrate that everything works.  What remains is \nto flesh out the units we've defined, particularly the parser and \ncode generator, to support the more complex syntax elements that \nmake up a real language.  Since we've done this many times before \nin earlier installments, it shouldn't take long to get us back to \nwhere we were before the long hiatus.  We'll continue this process \nin Installment 16, coming soon.  See you then.\n\n\n\nREFERENCES\n\n1. Crenshaw, J.W., \"Object-Oriented Design of Assemblers and \nCompilers,\" Proc. Software Development '91 Conference, Miller \nFreeman, San Francisco, CA, February 1991, pp. 143-155.\n\n2. Crenshaw, J.W., \"A Perfect Marriage,\" Computer Language, Volume \n8, #6, June 1991, pp. 44-55.\n\n3. Crenshaw, J.W., \"Syntax-Driven Object-Oriented Design,\" Proc. \n1991 Embedded Systems Conference, Miller Freeman, San \nFrancisco, CA, September 1991, pp. 45-60.\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1994 Jack W. Crenshaw. All rights reserved.   *  \n*                                                               *\n*                                                               *\n*****************************************************************\n\n\n"
  },
  {
    "path": "16/tutor16.txt",
    "content": " \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n                       LET'S BUILD A COMPILER! \n \n                                 By \n \n                       Jack W. Crenshaw, Ph.D. \n \n                            29 May, 1995 \n \n                     Part 16: UNIT CONSTRUCTION \n \n\n\n***************************************************************** \n*                                                               * \n*                        COPYRIGHT NOTICE                       * \n*                                                               * \n*   Copyright (C) 1995 Jack W. Crenshaw. All rights reserved.   * \n*                                                               * \n***************************************************************** \n \nINTRODUCTION \n \nThis series of tutorials promises to be perhaps one of the longest-\nrunning mini-series in history, rivalled only by the delay in Volume IV \nof Knuth.  Begun in 1988, the series ran into a four-year hiatus in 1990 \nwhen the \"cares of this world,\" changes in priorities and interests, and \nthe need to make a living seemed to stall it out after Installment 14.  \nThose of you with loads of patience were finally rewarded, in the spring \nof last year, with the long-awaited Installment 15.  In it, I began to \ntry to steer the series back on track, and in the process, to make it \neasier to continue on to the goal, which is to provide you with not only \nenough understanding of the difficult subject of compiler theory, but \nalso enough tools, in the form of canned subroutines and concepts, so \nthat you would be able to continue on your own and become proficient \nenough to build your own parsers and translators.  Because of that long \nhiatus, I thought it appropriate to go back and review the concepts we \nhave covered so far, and to redo some of the software, as well.  In the \npast, we've never concerned ourselves much with the development of \nproduction-quality software tools ... after all, I was trying to teach \n(and learn) concepts, not production practice.  To do that, I tended to \ngive you, not complete compilers or parsers, but only those snippets of \ncode that illustrated the particular point we were considering at the \nmoment. \n \nI still believe that's a good way to learn any subject; no one wants to \nhave to make changes to 100,000 line programs just to try out a new \nidea.  But the idea of just dealing with code snippets, rather than \ncomplete programs, also has its drawbacks in that we often seemed to be \nwriting the same code fragments over and over.  Although repetition has \nbeen thoroughly proven to be a good way to learn new ideas, it's also \ntrue that one can have too much of a good thing.  By the time I had \ncompleted Installment 14 I seemed to have reached the limits of my \nabilities to juggle multiple files and multiple versions of the same \nsoftware functions.  Who knows, perhaps that's one reason I seemed to \nhave run out of gas at that point. \n \nFortunately, the later versions of Borland's Turbo Pascal allow us to \nhave our cake and eat it too.  By using their concept of separately \ncompilable units, we can still write small subroutines and functions, \nand keep our main programs and test programs small and simple.  But, \nonce written, the code in the Pascal units will always be there for us \nto use, and linking them in is totally painless and transparent. \n \nSince, by now, most of you are programming in either C or C++, I know \nwhat you're thinking:  Borland, with their Turbo Pascal (TP), certainly \ndidn't invent the concept of separately compilable modules.  And of \ncourse you're right.  But if you've not used TP lately, or ever, you may \nnot realize just how painless the whole process is.  Even in C or C++, \nyou still have to build a make file, either manually or by telling the \ncompiler how to do so.  You must also list, using \"extern\" statements or \nheader files, the functions you want to import.  In TP, you don't even \nhave to do that.  You need only name the units you wish to use, and all \nof their procedures automatically become available.   \n \n \nIt's not my intention to get into a language-war debate here, so I won't \npursue the subject any further.  Even I no longer use Pascal on my job \n... I use C at work and C++ for my articles in Embedded Systems \nProgramming and other magazines.  Believe me, when I set out to \nresurrect this series, I thought long and hard about switching both \nlanguages and target systems to the ones that we're all using these \ndays, C/C++ and PC architecture, and possibly object-oriented methods as \nwell.  In the end, I felt it would cause more confusion than the hiatus \nitself has. And after all, Pascal still remains one of the best possible \nlanguages for teaching, not to mention production programming.  Finally, \nTP still compiles at the speed of light, much faster than competing \nC/C++ compilers. And Borland's smart linker, used in TP but not in their \nC++ products, is second to none.  Aside from being much faster than \nMicrosoft-compatible linkers, the Borland smart linker will cull unused \nprocedures and data items, even to the extent of trimming them out of \ndefined objects if they're not needed.  For one of the few times in our \nlives, we don't have to compromise between completeness and efficiency.  \nWhen we're writing a TP unit, we can make it as complete as we like, \nincluding any member functions and data items we may think we will ever \nneed, confident that doing so will not create unwanted bloat in the \ncompiled and linked executable. \n \nThe point, really, is simply this:  By using TP's unit mechanism, we can \nhave all the advantages and convenience of writing small, seemingly \nstand-alone test programs, without having to constantly rewrite the \nsupport functions that we need.  Once written, the TP units sit there, \nquietly waiting to do their duty and give us the support we need, when \nwe need it. \n \nUsing this principle, in Installment 15 I set out to minimize our \ntendency to re-invent the wheel by organizing  our code into separate \nTurbo Pascal units, each containing different parts of the compiler.  We \nended up with the following units: \n \n*\tInput \n*\tOutput \n*\tErrors \n*\tScanner \n*\tParser \n*\tCodeGen \n \nEach of these units serves a different function, and encapsulates \nspecific areas of functionality.  The Input and Output units, as their \nname implies, provide character stream I/O and the all-important \nlookahead character upon which our predictive parser is based.  The \nErrors unit, of course, provides standard error handling.  The Scanner \nunit contains all of our boolean functions such as IsAlpha, and the \nroutines GetName and GetNumber, which process multi-character tokens. \n \nThe two units we'll be working with the most, and the ones that most \nrepresent the personality of our compiler, are Parser and CodeGen.  \nTheoretically, the Parser unit should encapsulate all aspects of the \ncompiler that depend on the syntax of the compiled language (though, as \nwe saw last time, a small amount of this syntax spills over into \nScanner).  Similarly, the code generator unit, CodeGen, contains all of \nthe code dependent upon the target machine.  In this installment, we'll \nbe continuing with the development of the functions in these two all-\nimportant units. \n \n \n\n\nJUST LIKE CLASSICAL? \n \nBefore we proceed, however, I think I should clarify the relationship  \nbetween, and the functionality of these units.  Those of you who are \nfamiliar with compiler theory as taught in universities will, of course, \nrecognize the names, Scanner, Parser, and CodeGen, all of which are \ncomponents of a classical compiler implementation.  You may be thinking \nthat I've abandoned my commitment to the KISS philosophy, and drifted \ntowards a more conventional architecture than we once had.  A closer \nlook, however, should convince you that, while the names are similar, \nthe functionalities are quite different. \n \nTogether, the scanner and parser of a classical implementation comprise \nthe so-called \"front end,\" and the code generator, the back end.  The \nfront end routines process the language-dependent, syntax-related \naspects of the source language, while the code generator, or back end, \ndeals with the target machine-dependent parts of the problem.  In \nclassical compilers, the two ends communicate via a file of instructions \nwritten in an intermediate language (IL). \n \nTypically, a classical scanner is a single procedure, operating as a co-\nprocedure with the parser.  It \"tokenizes\" the source file, reading it \ncharacter by character, recognizing language elements, translating them \ninto tokens, and passing them along to the parser.  You can think of the \nparser as an abstract machine, executing \"op codes,\" which are the \ntokens.  Similarly, the parser generates op codes of a second abstract \nmachine, which mechanizes the IL.  Typically, the IL file is written to \ndisk by the parser, and read back again by the code generator. \n \nOur organization is quite different.  We have no lexical scanner, in the \nclassical sense;  our unit Scanner, though it has a similar name, is not \na single procedure or co-procedure, but merely a set of separate \nsubroutines which are called by the parser as needed.  \n \nSimilarly, the classical code generator, the back end,  is a translator \nin its own right, reading an IL \"source\" file, and emitting an object \nfile.  Our code generator doesn't work that way.  In our compiler, there \nIS no intermediate language; every construct in the source language \nsyntax is converted into assembly language as it is recognized by the \nparser.  Like Scanner, the unit CodeGen consists of individual \nprocedures which are called by the parser as needed. \n \nThis \"code 'em as you find 'em\" philosophy may not produce the world's \nmost efficient code -- for example, we haven't provided (yet!) a \nconvenient place for an optimizer to work its magic -- but it sure does \nsimplify the compiler, doesn't it? \n \nAnd that observation prompts me to reflect, once again, on how we have \nmanaged to reduce a compiler's functions to such comparatively simple \nterms.  I've waxed eloquent on this subject in past installments, so I \nwon't belabor the point too much here.  However, because of the time \nthat's elapsed since those last soliloquies, I hope you'll grant me just \na little time to remind myself, as well as you, how we got here.  We got \nhere by applying several principles that writers of commercial compilers \nseldom have the luxury of using.  These are: \n \no\tThe KISS philosophy -- Never do things the hard way without a \nreason \n \no\tLazy coding -- Never put off until tomorrow what you can put \nof forever (with credits to P.J. Plauger) \n \no\tSkepticism -- Stubborn refusal to do something just because \nthat's the way it's always been done. \n \no\tAcceptance of inefficient code \n \no\tRejection of arbitrary constraints \n \nAs I've reviewed the history of compiler construction, I've learned that \nvirtually every production compiler in history has suffered from pre-\nimposed conditions that strongly influenced its design. The original \nFORTRAN compiler of John Backus, et al, had to compete with assembly \nlanguage, and therefore was constrained to produce extremely efficient \ncode.  The IBM compilers for the minicomputers of the 70's had to run in \nthe very small RAM memories then available -- as small as 4k.  The early \nAda compiler had to compile itself.  Per Brinch Hansen decreed that his \nPascal compiler developed for the IBM PC must execute in a 64k machine.  \nCompilers developed in Computer Science courses had to compile the \nwidest variety of languages, and therefore required LALR parsers. \n \nIn each of these cases, these preconceived constraints literally \ndominated the design of the compiler.  \n \nA good example is Brinch Hansen's compiler, described in his excellent \nbook, \"Brinch Hansen on Pascal Compilers\" (highly recommended).  Though \nhis compiler is one of the most clear and un-obscure compiler \nimplementations I've seen, that one decision, to compile large files in \na small RAM, totally drives the design, and he ends up with not just \none, but many intermediate files, together with the drivers to write and \nread them. \n \nIn time, the architectures resulting from such decisions have found \ntheir way into computer science lore as articles of faith. In this one \nman's opinion, it's time that they were re-examined critically.  The \nconditions, environments, and requirements that led to classical \narchitectures are not the same as the ones we have today.  There's no \nreason to believe the solutions should be the same, either. \n \nIn this tutorial, we've followed the leads of such pioneers in the world \nof small compilers for Pcs as Leor Zolman, Ron Cain, and James Hendrix, \nwho didn't know enough compiler theory to know that they \"couldn't do it \nthat way.\"  We have resolutely refused to accept arbitrary constraints, \nbut rather have done whatever was easy.  As a result, we have evolved an \narchitecture that, while quite different from the classical one, gets \nthe job done in very simple and straightforward fashion. \n \nI'll end this philosophizing with an observation re the notion of an \nintermediate language.  While I've noted before that we don't have one \nin our compiler, that's not exactly true; we _DO_ have one, or at least \nare evolving one, in the sense that we are defining code generation \nfunctions for the parser to call.  In essence, every call to a code \ngeneration procedure can be thought of as an instruction in an \nintermediate language.  Should we ever find it necessary to formalize an \nintermediate language, this is the way we would do it:  emit codes from \nthe parser, each representing a call to one of the code generator \nprocedures, and then process each code by calling those procedures in a \nseparate pass, implemented in a back end. Frankly, I don't see that \nwe'll ever find a need for this approach, but there is the connection, \nif you choose to follow it, between the classical and the current \napproaches. \n \n\n\nFLESHING OUT THE PARSER \n \nThough I promised you, somewhere along about Installment 14, that we'd \nnever again write every single function from scratch, I ended up \nstarting to do just that in Installment 15.  One reason: that long \nhiatus between the two installments made a review seem eminently \njustified ... even imperative, both for you and for me. More \nimportantly, the decision to collect the procedures into modules \n(units), forced us to look at each one yet again, whether we wanted to \nor not.  And, finally and frankly, I've had some new ideas in the last \nfour years that warranted a fresh look at some old friends.  When I \nfirst began this series, I was frankly amazed, and pleased, to learn \njust how simple parsing routines can be made.  But this last time \naround, I've surprised myself yet again, and been able to make them just \nthat last little bit simpler, yet. \n \nStill, because of this total rewrite of the parsing modules, I was only \nable to include so much in the last installment.  Because of this, our \nhero, the parser, when last seen, was a shadow of his former self,  \nconsisting of only enough code to parse and process a factor consisting \nof either a variable or a constant.  The main effort of this current \ninstallment will be to help flesh out the parser to its former glory.  \nIn the process, I hope you'll bear with me if we sometimes cover ground \nwe've long since been over and dealt with. \n \nFirst, let's take care of a problem that we've addressed before: Our \ncurrent version of procedure Factor, as we left it in Installment 15,  \ncan't handle negative arguments.  To fix that, we'll introduce the \nprocedure SignedFactor: \n \n \n{--------------------------------------------------------------} \n{ Parse and Translate a Factor with Optional Sign } \n \nprocedure SignedFactor; \nvar Sign: char; \nbegin \n\tSign := Look; \n\tif IsAddop(Look) then \n\t\tGetChar; \n\tFactor; \n\tif Sign = '-' then Negate; \nend; \n{--------------------------------------------------------------}  \n \n \nNote that this procedure calls a new code generation routine, Negate: \n \n \n{--------------------------------------------------------------} \n{ Negate Primary } \n \nprocedure Negate; \nbegin \n\tEmitLn('NEG D0'); \nend; \n{--------------------------------------------------------------} \n \n \n(Here, and elsewhere in this series, I'm only going to show you the new \nroutines. I'm counting on you to put them into the proper unit, which \nyou should normally have no trouble identifying.  Don't forget to add \nthe procedure's prototype to the interface section of the unit.) \n \nIn the main program, simply change the procedure called from Factor to \nSignedFactor, and give the code a test.  Isn't it neat how the Turbo \nlinker and make facility handle all the details? \n \nYes, I know, the code isn't very efficient.  If we input a number, -3, \nthe generated code is: \n \n\tMOVE #3,D0 \n\tNEG D0 \n \nwhich is really, really dumb.  We can do better, of course, by simply \npre-appending a minus sign to the string passed to LoadConstant, but it \nadds a few lines of code to SignedFactor, and I'm applying the KISS \nphilosophy very aggressively here. What's more,  to tell the truth, I \nthink I'm subconsciously enjoying generating \"really, really dumb\" code, \nso I can have the pleasure of watching it get dramatically better when \nwe get into optimization methods. \n \nMost of you have never heard of John Spray, so allow me to introduce him \nto you here.  John's from New Zealand, and used to teach computer \nscience at one of its universities.  John wrote a compiler for the \nMotorola 6809, based on a delightful, Pascal-like language of his own \ndesign called \"Whimsical.\"  He later ported the compiler to the 68000, \nand for awhile it was the only compiler I had for my homebrewed 68000 \nsystem.   \n \nFor the record, one of my standard tests for any new compiler is to see \nhow the compiler deals with a null program like: \n \n\tprogram main; \n\tbegin \n\tend. \n \nMy test is to measure the time required to compile and link, and the \nsize of the object file generated.  The undisputed _LOSER_ in the test \nis the DEC C compiler for the VAX, which took 60 seconds to compile, on \na VAX 11/780, and generated a 50k object file.  John's compiler is the \nundisputed, once, future, and forever king in the code size department.  \nGiven the null program, Whimsical generates precisely two bytes of code, \nimplementing the one instruction, \n \n\tRET \n \nBy setting a compiler option to generate an include file rather than a \nstandalone program, John can even cut this size, from two bytes to zero!  \nSort of hard to beat a null object file, wouldn't you say? \n \nNeedless to say, I consider John to be something of an expert on code \noptimization, and I like what he has to say: \"The best way to optimize \nis not to have to optimize at all, but to produce good code in the first \nplace.\" Words to live by.  When we get started on optimization, we'll \nfollow John's advice, and our first step will not be to add a peephole \noptimizer or other after-the-fact device, but to improve the quality of \nthe code emitted before optimization.  So make a note of SignedFactor as \na good first candidate for attention, and for now we'll leave it be. \n \nTERMS AND EXPRESSIONS \n \nI'm sure you know what's coming next: We must, yet again, create the \nrest of the procedures that implement the recursive-descent parsing of \nan expression.  We all know that the hierarchy of procedures for \narithmetic expressions is: \n \nexpression \n\tterm \n\t\tfactor \n \nHowever, for now let's continue to do things one step at a time, \nand consider only expressions with additive terms in them.  The \ncode to implement expressions, including a possibly signed first \nterm, is shown next: \n \n \n{--------------------------------------------------------------} \n{ Parse and Translate an Expression } \n \nprocedure Expression; \nbegin \n\tSignedFactor; \n\twhile IsAddop(Look) do \n\t\tcase Look of \n\t\t\t'+': Add; \n\t\t\t'-': Subtract; \n\t\tend; \nend; \n{--------------------------------------------------------------} \n \n \nThis procedure calls two other procedures to process the \noperations: \n \n \n{--------------------------------------------------------------} \n{ Parse and Translate an Addition Operation } \n \nprocedure Add; \nbegin \n\tMatch('+'); \n\tPush; \n\tFactor; \n\tPopAdd; \nend; \n \n \n{--------------------------------------------------------------} \n{ Parse and Translate a Subtraction Operation } \n \nprocedure Subtract; \nbegin \n\tMatch('-'); \n\tPush; \n\tFactor; \n\tPopSub; \nend; \n{--------------------------------------------------------------} \n \n \nThe three procedures Push, PopAdd, and PopSub are new code generation \nroutines.  As the name implies, procedure Push generates code to push \nthe primary register (D0, in our 68000 implementation) to the stack.  \nPopAdd and PopSub pop the top of the stack again, and add it to, or \nsubtract it from, the primary register.  The code is shown next: \n \n \n\n\n{--------------------------------------------------------------} \n{ Push Primary to Stack } \n \nprocedure Push; \nbegin \n\tEmitLn('MOVE D0,-(SP)'); \nend; \n \n{--------------------------------------------------------------} \n{ Add TOS to Primary } \n \nprocedure PopAdd; \nbegin \n\tEmitLn('ADD (SP)+,D0'); \nend; \n \n{--------------------------------------------------------------} \n{ Subtract TOS from Primary } \n \nprocedure PopSub; \nbegin \n\tEmitLn('SUB (SP)+,D0'); \n\tNegate; \nend; \n{--------------------------------------------------------------} \n \n \nAdd these routines to Parser and CodeGen, and change the main program to \ncall Expression. Voila! \n \nThe next step, of course, is to add the capability for dealing with \nmultiplicative terms.  To that end, we'll add a procedure Term, and code \ngeneration procedures PopMul and PopDiv.  These code generation \nprocedures are shown next: \n \n \n{--------------------------------------------------------------} \n{ Multiply TOS by Primary } \n \nprocedure PopMul; \nbegin \n\tEmitLn('MULS (SP)+,D0'); \nend; \n \n{--------------------------------------------------------------} \n{ Divide Primary by TOS } \n \nprocedure PopDiv; \nbegin \n\tEmitLn('MOVE (SP)+,D7'); \n\tEmitLn('EXT.L D7'); \n\tEmitLn('DIVS D0,D7'); \n\tEmitLn('MOVE D7,D0'); \nend; \n{--------------------------------------------------------------} \n \n \nI admit, the division routine is a little busy, but there's no help for \nit.  Unfortunately, while the 68000 CPU allows a division using the top \nof stack (TOS), it wants the arguments in the wrong order, just as it \ndoes for subtraction.  So our only recourse is to pop the stack to a \nscratch register (D7), perform the division there, and then move the \nresult back to our primary register, D0. Note the use of signed multiply \nand divide operations.  This follows an implied, but unstated, \nassumption, that all our variables will be signed 16-bit integers. This \ndecision will come back to haunt us later, when we start looking at \nmultiple data types, type conversions, etc. \n \nOur procedure Term is virtually a clone of Expression, and looks like \nthis: \n \n \n{--------------------------------------------------------------} \n{ Parse and Translate a Term } \n \nprocedure Term; \nbegin \n\tFactor; \n\twhile IsMulop(Look) do \n\t\tcase Look of \n\t\t\t'*': Multiply; \n\t\t\t'/': Divide; \n\t\tend; \nend; \n{--------------------------------------------------------------} \n \n \nOur next step is to change some names.  SignedFactor now becomes \nSignedTerm, and the calls to Factor in Expression, Add, Subtract and \nSignedTerm get changed to call Term: \n \n \n{--------------------------------------------------------------} \n{ Parse and Translate a Term with Optional Leading Sign } \n \nprocedure SignedTerm; \nvar Sign: char; \nbegin \n\tSign := Look; \n\tif IsAddop(Look) then \n\t\tGetChar; \n\tTerm; \n\tif Sign = '-' then Negate; \nend; \n{--------------------------------------------------------------} \n... \n{--------------------------------------------------------------} \n{ Parse and Translate an Expression } \n \nprocedure Expression; \nbegin \n\tSignedTerm; \n\twhile IsAddop(Look) do \n\t\tcase Look of \n\t\t\t'+': Add; \n\t\t\t'-': Subtract; \n\t\tend; \nend; \n{--------------------------------------------------------------} \n \n \nIf memory serves me correctly, we once had BOTH a procedure SignedFactor \nand a procedure SignedTerm. I had reasons for doing that at the time ... \nthey had to do with the handling of Boolean algebra and, in particular, \nthe Boolean \"not\" function.  But certainly, for arithmetic operations, \nthat duplication isn't necessary.  In an expression like: \n \n\t-x*y \n \nit's very apparent that the sign goes with the whole TERM, x*y, and not \njust the factor x, and that's the way Expression is coded.   \n \nTest this new code by executing Main.  It still calls Expression, so you \nshould now be able to deal with expressions containing any of the four \narithmetic operators. \n \nOur last bit of business, as far as expressions goes, is to modify \nprocedure Factor to allow for parenthetical expressions.  By using a \nrecursive call to Expression, we can reduce the needed code to virtually \nnothing.  Five lines added to Factor do the job: \n \n \n{--------------------------------------------------------------} \n{ Parse and Translate a Factor } \n \nprocedure Factor; \nbegin \n\tif Look ='(' then begin \n\t\tMatch('('); \n\t\tExpression; \n\t\tMatch(')'); \n\t\tend \n\telse if IsDigit(Look) then \n\t\tLoadConstant(GetNumber) \n\telse if IsAlpha(Look)then \n\t\tLoadVariable(GetName) \n\telse \n\t\tError('Unrecognized character ' + Look); \nend; \n{--------------------------------------------------------------} \n \n \nAt this point, your \"compiler\" should be able to handle any legal \nexpression you can throw at it.  Better yet, it should reject all \nillegal ones! \n \nASSIGNMENTS \n \nAs long as we're this close, we might as well create the code to deal \nwith an assignment statement.  This code needs only to remember the name \nof the target variable where we are to store the result of an \nexpression, call Expression, then store the number.  The procedure is \nshown next: \n \n \n{--------------------------------------------------------------} \n{ Parse and Translate an Assignment Statement } \n \nprocedure Assignment; \nvar Name: string; \nbegin \n\tName := GetName; \n\tMatch('='); \n\tExpression; \n\tStoreVariable(Name); \nend; \n{--------------------------------------------------------------} \n \nThe assignment calls for yet another code generation routine: \n \n\n\n{--------------------------------------------------------------} \n{ Store the Primary Register to a Variable } \n \nprocedure StoreVariable(Name: string); \nbegin \n\tEmitLn('LEA ' + Name + '(PC),A0'); \n\tEmitLn('MOVE D0,(A0)'); \nend; \n{--------------------------------------------------------------} \n \n \nNow, change the call in Main to call Assignment, and you should see a \nfull assignment statement being processed correctly.  Pretty neat, eh?  \nAnd painless, too. \n \nIn the past, we've always tried to show BNF relations to define the \nsyntax we're developing. I haven't done that here, and it's high time I \ndid.  Here's the BNF: \n \n \n<factor>      ::= <variable> | <constant> | '(' <expression> ')'\t \n<signed_term> ::= [<addop>] <term> \n<term>        ::= <factor> (<mulop> <factor>)*\t \n<expression>  ::= <signed_term> (<addop> <term>)* \n<assignment>  ::= <variable> '=' <expression> \n \nBOOLEANS \n \nThe next step, as we've learned several times before, is to add Boolean \nalgebra.  In the past, this step has at least doubled the amount of code \nwe've had to write.  As I've gone over this step in my mind, I've found \nmyself diverging more and more from what we did in previous \ninstallments.  To refresh your memory, I noted that Pascal treats the \nBoolean operators pretty much identically to the way it treats \narithmetic ones.  A Boolean \"and\" has the same precedence level as \nmultiplication, and the \"or\" as addition.  C, on the other hand, sets \nthem at different precedence levels, and all told has a whopping 17 \nlevels.  In our earlier work, I chose something in between, with seven \nlevels.  As a result, we ended up with things called Boolean \nexpressions, paralleling in most details the arithmetic expressions, but \nat a different precedence level.  All of this, as it turned out, came \nabout because I didn't like having to put parentheses around the Boolean \nexpressions in statements like: \n \n\t     IF (c >= 'A') and (c <= 'Z') then ... \n \nIn retrospect, that seems a pretty petty reason to add many layers of \ncomplexity to the parser.  Perhaps more to the point, I'm not sure I was \neven able to avoid the parens.   \n \nFor kicks, let's start anew, taking a more Pascal-ish approach, and just \ntreat the Boolean operators at the same precedence level as the \narithmetic ones. We'll see where it leads us.  If it seems to be down \nthe garden path, we can always backtrack to the earlier approach. \n \nFor starters, we'll add the \"addition-level\" operators to Expression. \nThat's easily done; first, modify the function IsAddop in unit Scanner \nto include two extra operators: '|' for \"or,\" and '~' for \"exclusive \nor\": \n \n \n\n\n{--------------------------------------------------------------} \nfunction IsAddop(c: char): boolean; \nbegin \n\tIsAddop := c in ['+','-', '|', '~']; \nend; \n{--------------------------------------------------------------} \n \n \nNext, we must include the parsing of the operators in procedure \nExpression: \n \n \n{--------------------------------------------------------------} \nprocedure Expression; \nbegin \n\tSignedTerm; \n\twhile IsAddop(Look) do \n\t\tcase Look of \n\t\t\t'+': Add; \n\t\t\t'-': Subtract; \n\t\t\t'|': _Or; \n\t\t\t'~': _Xor; \n\t\tend; \n{--------------------------------------------------------------} \nend; \n \n \n(The underscores are needed, of course, because \"or\" and \"xor\" are \nreserved words in Turbo Pascal.) \n \nNext, the procedures _Or and _Xor: \n \n \n{--------------------------------------------------------------} \n{ Parse and Translate a Subtraction Operation } \n \nprocedure _Or; \nbegin \n\tMatch('|'); \n\tPush; \n\tTerm; \n\tPopOr; \nend; \n \n{--------------------------------------------------------------} \n{ Parse and Translate a Subtraction Operation } \n \nprocedure _Xor; \nbegin \n\tMatch('~'); \n\tPush; \n\tTerm; \n\tPopXor; \nend; \n{--------------------------------------------------------------} \n \nAnd, finally, the new code generator procedures: \n \n \n\n\n{--------------------------------------------------------------} \n{ Or TOS with Primary } \n \nprocedure PopOr; \nbegin \n\tEmitLn('OR (SP)+,D0'); \nend; \n \n{--------------------------------------------------------------} \n{ Exclusive-Or TOS with Primary } \n \nprocedure PopXor; \nbegin \n\tEmitLn('EOR (SP)+,D0'); \nend; \n{--------------------------------------------------------------} \n \nNow, let's test the translator (you might want to change the call \nin Main back to a call to Expression, just to avoid having to type \n\"x=\" for an assignment every time). \n \nSo far, so good.  The parser nicely handles expressions of the \nform: \n \n\tx|y~z \n \nUnfortunately, it also does nothing to protect us from mixing \nBoolean and arithmetic algebra.  It will merrily generate code \nfor: \n \n\t(a+b)*(c~d) \n \nWe've talked about this a bit, in the past.  In general the rules \nfor what operations are legal or not cannot be enforced by the \nparser itself, because they are not part of the syntax of the \nlanguage, but rather its semantics.  A compiler that doesn't allow \nmixed-mode expressions of this sort must recognize that c and d \nare Boolean variables, rather than numeric ones, and balk at \nmultiplying them in the next step. But this \"policing\" can't be \ndone by the parser; it must be handled somewhere between the \nparser and the code generator. We aren't in a position to enforce \nsuch rules yet, because we haven't got either a way of declaring \ntypes, or a symbol table to store the types in.  So, for what \nwe've got to work with at the moment, the parser is doing \nprecisely what it's supposed to do. \n \nAnyway, are we sure that we DON'T want to allow mixed-type \noperations?  We made the decision some time ago (or, at least, I \ndid) to adopt the value 0000 as a Boolean \"false,\" and -1, or \nFFFFh, as a Boolean \"true.\"  The nice part about this choice is \nthat bitwise operations work exactly the same way as logical ones.  \nIn other words, when we do an operation on one bit of a logical \nvariable, we do it on all of them.  This means that we don't need \nto distinguish between logical and bitwise operations, as is done \nin C with the operators & and &&, and | and ||.  Reducing the \nnumber of operators by half certainly doesn't seem all bad. \n \nFrom the point of view of the data in storage, of course, the \ncomputer and compiler couldn't care less whether the number FFFFh \nrepresents the logical TRUE, or the numeric -1.  Should we?  I \nsort of think not.  I can think of many examples (though they \nmight be frowned upon as \"tricky\" code) where the ability to mix \nthe types might come in handy.  Example, the Dirac delta function, \nwhich could be coded in one simple line: \n \n\t-(x=0) \n \nor the absolute value function (DEFINITELY tricky code!): \n \n\tx*(1+2*(x<0)) \n \nPlease note, I'm not advocating coding like this as a way of life.  \nI'd almost certainly write these functions in more readable form,  \nusing IFs, just to keep from confusing later maintainers.  Still, \na moral question arises:  Do we have the right to ENFORCE our \nideas of good coding practice on the programmer, but writing the \nlanguage so he can't do anything else?  That's what Nicklaus Wirth \ndid, in many places in Pascal, and Pascal has been criticized for \nit -- for not being as \"forgiving\" as C.   \n \nAn interesting parallel presents itself in the example of the \nMotorola 68000 design.  Though Motorola brags loudly about the \northogonality of their instruction set, the fact is that it's far \nfrom orthogonal.  For example, you can read a variable from its \naddress:\n \n\tMOVE X,D0 (where X is the name of a variable) \n \nbut you can't write in the same way.  To write, you must load an \naddress register with the address of X.  The same is true for PC-\nrelative addressing:\n \n\tMOVE X(PC),DO\t(legal) \n\tMOVE D0,X(PC)\t(illegal) \n \nWhen you begin asking how such non-orthogonal behavior came about, \nyou find that someone in Motorola had some theories about how \nsoftware should be written.  Specifically, in this case, they \ndecided that self-modifying code, which you can implement using \nPC-relative writes, is a Bad Thing.  Therefore, they designed the \nprocessor to prohibit it.  Unfortunately, in the process they also \nprohibited _ALL_ writes of the forms shown above, however benign.  \nNote that this was not something done by default.  Extra design \nwork had to be done, and extra gates added, to destroy the natural \northogonality of the instruction set. \n \nOne of the lessons I've learned from life: If you have two \nchoices, and can't decide which one to take, sometimes the best \nthing to do is nothing.  Why add extra gates to a processor to \nenforce some stranger's idea of good programming practice?  Leave \nthe instructions in, and let the programmers debate what good \nprogramming practice is.  Similarly, why should we add extra code \nto our parser, to test for and prevent conditions that the user \nmight prefer to do, anyway?  I'd rather leave the compiler simple, \nand let the software experts debate whether the practices should \nbe used or not. \n \nAll of which serves as rationalization for my decision as to how \nto prevent mixed-type arithmetic:  I won't.  For a language \nintended for systems programming, the fewer rules, the better. If \nyou don't agree, and want to test for such conditions, we can do \nit once we have a symbol table. \n \nBOOLEAN \"AND\" \n \nWith that bit of philosophy out of the way, we can press on to the \n\"and\" operator, which goes into procedure Term. By now, you can \nprobably do this without me, but here's the code, anyway: \n \nIn Scanner, \n \n{--------------------------------------------------------------} \nfunction IsMulop(c: char): boolean; \nbegin \n\tIsMulop := c in ['*','/', '&']; \nend; \n{--------------------------------------------------------------} \n \nIn Parser, \n \n \n{--------------------------------------------------------------} \nprocedure Term; \nbegin \n\tFactor; \n\twhile IsMulop(Look) do \n\t\tcase Look of \n\t\t\t'*': Multiply; \n\t\t\t'/': Divide; \n\t\t\t'&': _And; \n\t\tend; \nend; \n \n{--------------------------------------------------------------} \n{ Parse and Translate a Boolean And Operation } \n \nprocedure _And; \nbegin \n\tMatch('&'); \n\tPush; \n\tFactor; \n\tPopAnd; \nend; \n{--------------------------------------------------------------} \n \nand in CodeGen, \n \n \n{--------------------------------------------------------------} \n{ And Primary with TOS } \n \nprocedure PopAnd; \nbegin \n\tEmitLn('AND (SP)+,D0'); \nend; \n{--------------------------------------------------------------} \n \nYour parser should now be able to process almost any sort of logical \nexpression, and (should you be so inclined), mixed-mode expressions as \nwell. \n \nWhy not \"all sorts of logical expressions\"?  Because, so far, we haven't \ndealt with the logical \"not\" operator, and this is where it gets tricky.  \nThe logical \"not\" operator seems, at first glance, to be identical in \nits behavior to the unary minus, so my first thought was to let the \nexclusive or operator, '~', double as the unary \"not.\"  That didn't \nwork. In my first attempt, procedure SignedTerm simply ate my '~', \nbecause the character passed the test for an addop, but SignedTerm \nignores all addops except '-'.  It would have been easy enough to add \nanother line to SignedTerm, but that would still not solve the problem, \nbecause note that Expression only accepts a signed term for the _FIRST_ \nargument.   \n \nMathematically, an expression like: \n \n\t-a * -b \n \nmakes little or no sense, and the parser should flag it as an error.  \nBut the same expression, using a logical \"not,\" makes perfect sense: \n \n\tnot a and not b \n \nIn the case of these unary operators, choosing to make them act the same \nway seems an artificial force fit, sacrificing reasonable behavior on \nthe altar of implementational ease.  While I'm all for keeping the \nimplementation as simple as possible, I don't think we should do so at \nthe expense of reasonableness.  Patching like this would be missing the \nmain point, which is that the logical \"not\" is simply NOT the same kind \nof animal as the unary minus.  Consider the exclusive or, which is most \nnaturally written as:   \n \n\ta~b ::= (a and not b) or (not a and b) \n \nIf we allow the \"not\" to modify the whole term, the last term in \nparentheses would be interpreted as: \n \n\tnot(a and b) \n \nwhich is not the same thing at all.  So it's clear that the logical \n\"not\" must be thought of as connected to the FACTOR, not the term. \n \nThe idea of overloading the '~' operator also makes no sense from a \nmathematical point of view.  The implication of the unary minus is that \nit's equivalent to a subtraction from zero: \n \n\t-x <=> 0-x \n \nIn fact, in one of my more simple-minded versions of Expression, I \nreacted to a leading addop by simply preloading a zero, then processing \nthe operator as though it were a binary operator.  But a \"not\" is not \nequivalent to an exclusive or with zero ... that would just give back \nthe original number.  Instead, it's an exclusive or with FFFFh, or -1. \n \nIn short, the seeming parallel between the unary \"not\" and the unary \nminus falls apart under closer scrutiny. \"not\" modifies the factor, not \nthe term, and it is not related to either the unary minus nor the \nexclusive or.  Therefore, it deserves a symbol to call its own. What \nbetter symbol than the obvious one, also used by C, the '!' character?  \nUsing the rules about the way we think the \"not\" should behave, we \nshould be able to code the exclusive or (assuming we'd ever need to), in \nthe very natural form: \n \n\ta & !b | !a & b \n \nNote that no parentheses are required -- the precedence levels we've \nchosen automatically take care of things. \n \nIf you're keeping score on the precedence levels, this definition puts \nthe '!' at the top of the heap.  The levels become: \n \n1.\t! \n2.\t- (unary) \n3.\t*, /, & \n4.\t+, -, |, ~ \n \nLooking at this list, it's certainly not hard to see why we had trouble \nusing '~' as the \"not\" symbol! \n \nSo how do we mechanize the rules?  In the same way as we did with \nSignedTerm, but at the factor level.  We'll define a procedure \nNotFactor: \n \n \n{--------------------------------------------------------------} \n{ Parse and Translate a Factor with Optional \"Not\" } \n \nprocedure NotFactor; \nbegin \n\tif Look ='!' then begin \n\t\tMatch('!'); \n\t\tFactor; \n\t\tNotit; \n\t\tend \n\telse \n\t\tFactor; \nend; \n{--------------------------------------------------------------} \n \n \nand call it from all the places where we formerly called Factor, i.e., \nfrom Term, Multiply, Divide, and _And.  Note the new code generation \nprocedure: \n \n \n{--------------------------------------------------------------} \n{ Bitwise Not Primary } \n \nprocedure NotIt; \nbegin \n\tEmitLn('EOR #-1,D0'); \nend; \n \n{--------------------------------------------------------------} \n \n \nTry this now, with a few simple cases. In fact, try that exclusive or \nexample, \n \n\ta&!b|!a&b \n \n \nYou should get the code (without the comments, of course): \n \n MOVE A(PC),DO    ; load a \n MOVE D0,-(SP)\t\t; push it \n MOVE B(PC),DO\t\t; load b \n EOR #-1,D0\t\t; not it \n AND (SP)+,D0\t\t; and with a \n MOVE D0,-(SP)\t\t; push result \n MOVE A(PC),DO\t\t; load a \n EOR #-1,D0\t\t; not it \n MOVE D0,-(SP)\t\t; push it \n MOVE B(PC),DO\t\t; load b \n AND (SP)+,D0\t\t; and with !a \n OR (SP)+,D0\t\t; or with first term \n \nThat's precisely what we'd like to get.  So, at least for both \narithmetic and logical operators, our new precedence and new, slimmer \nsyntax hang together.  Even the peculiar, but legal, expression with \nleading addop: \n \n\t~x \n \nmakes sense.  SignedTerm ignores the leading '~', as it should, since \nthe expression is equivalent to: \n \n\t0~x, \n \nwhich is equal to x. \n \nWhen we look at the BNF we've created, we find that our boolean algebra \nnow adds only one extra line: \n \n \n<not_factor> \t::= [!] <factor> \n<factor>      \t::= <variable> | <constant> | '(' <expression> ')'\t \n<signed_term> \t::= [<addop>] <term> \n<term>        \t::= <not_factor> (<mulop> <not_factor>)*\t \n<expression>  \t::= <signed_term> (<addop> <term>)* \n<assignment>  \t::= <variable> '=' <expression> \n \n \nThat's a big improvement over earlier efforts.  Will our luck continue \nto hold when we get to relational operators?  We'll find out soon, but \nit will have to wait for the next installment. We're at a good stopping \nplace, and I'm anxious to get this installment into your hands.  It's \nalready been a year since the release of Installment 15.  I blush to \nadmit that all of this current installment has been ready for almost as \nlong, with the exception of relational operators.  But the information \ndoes you no good at all, sitting on my hard disk, and by holding it back \nuntil the relational operations were done, I've kept it out of your \nhands for that long.  It's time for me to let go of it and get it out \nwhere you can get value from it. Besides, there are quite a number of \nserious philosophical questions associated with the relational \noperators, as well, and I'd rather save them for a separate installment \nwhere I can do them justice. \n \nHave fun with the new, leaner arithmetic and logical parsing, and I'll \nsee you soon with relationals. \n \n\n\n***************************************************************** \n*                                                               * \n*                        COPYRIGHT NOTICE                       * \n*                                                               * \n*   Copyright (C) 1995 Jack W. Crenshaw. All rights reserved.   *   \n*                                                               * \n*                                                               * \n***************************************************************** \n \n\n\u001a"
  },
  {
    "path": "2/Makefile",
    "content": "IN=main.c cradle.c\nOUT=main\nFLAGS=-Wall -Werror\n\nall:\n\tgcc -o $(OUT) $(IN) $(FLAGS)\n\nrun:\n\t./$(OUT)\n\n.PHONY: clean\nclean:\n\trm $(OUT)\n"
  },
  {
    "path": "2/cradle.c",
    "content": "#include \"cradle.h\"\n#include <stdio.h>\n#include <stdlib.h>\n\n\nvoid GetChar() \n{\n    Look = getchar();\n}\n\n\nvoid Error(char *s)\n{\n    printf(\"\\nError: %s.\", s);\n}\n\nvoid Abort(char *s)\n{\n    Error(s);\n    exit(1);\n}\n\n\nvoid Expected(char *s)\n{\n    sprintf(tmp, \"%s Expected\", s);\n    Abort(tmp);\n}\n\n\nvoid Match(char x)\n{\n    if(Look == x) {\n        GetChar();\n    } else {\n        sprintf(tmp, \"' %c ' \",  x);\n        Expected(tmp);\n    }\n}\n\n\nint IsAlpha(char c)\n{\n    return (UPCASE(c) >= 'A') && (UPCASE(c) <= 'Z');\n} \n\nint IsDigit(char c)\n{\n    return (c >= '0') && (c <= '9');\n}\n\nint IsAddop(char c)\n{\n    return (c == '+') || (c == '-');\n}\n\nchar GetName()\n{\n    char c = Look;\n\n    if( !IsAlpha(Look)) {\n        sprintf(tmp, \"Name\");\n        Expected(tmp);\n    }\n\n    GetChar();\n\n    return UPCASE(c);\n}\n\n\nchar GetNum()\n{\n    char c = Look;\n\n    if( !IsDigit(Look)) {\n        sprintf(tmp, \"Integer\");\n        Expected(tmp);\n    }\n\n    GetChar();\n\n    return c;\n}\n\nvoid Emit(char *s)\n{\n    printf(\"\\t%s\", s);\n}\n\nvoid EmitLn(char *s)\n{\n    Emit(s);\n    printf(\"\\n\");\n}\n\nvoid Init()\n{\n    GetChar();\n}\n\n"
  },
  {
    "path": "2/cradle.h",
    "content": "#ifndef _CRADLE_H\n#define _CRADLE_H\n#define UPCASE(C) ((1<<6)| (C))\n\n#define MAX_BUF 100\nchar tmp[MAX_BUF];\n\nchar Look;\n\nvoid GetChar();\n\nvoid Error(char *s);\nvoid Abort(char *s);\nvoid Expected(char *s);\nvoid Match(char x);\n\nint IsAlpha(char c);\nint IsDigit(char c);\nint IsAddop(char c);\n\nchar GetName();\nchar GetNum();\n\nvoid Emit(char *s);\nvoid EmitLn(char *s);\n\nvoid Init();\n\n#endif\n"
  },
  {
    "path": "2/main.c",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n\n#include \"cradle.h\"\n\nvoid Term();\nvoid Expression();\nvoid Add();\nvoid Substract();\nvoid Factor();\n\n\nvoid Multiply()\n{\n    Match('*');\n    Factor();\n    EmitLn(\"imull (%esp), %eax\");\n    /* push of the stack */\n    EmitLn(\"addl $4, %esp\");\n} \n\nvoid Divide()\n{\n    Match('/');\n    Factor();\n\n    /* for a expersion like a/b we have eax=b and %(esp)=a\n     * but we need eax=a, and b on the stack \n     */\n    EmitLn(\"movl (%esp), %edx\");\n    EmitLn(\"addl $4, %esp\");\n\n    EmitLn(\"pushl %eax\");\n\n    EmitLn(\"movl %edx, %eax\");\n\n    /* sign extesnion */\n    EmitLn(\"sarl $31, %edx\");\n    EmitLn(\"idivl (%esp)\");\n    EmitLn(\"addl $4, %esp\");\n\n}\n\nvoid Factor()\n{\n\n    if(Look == '(') {\n\n        Match('(');\n        Expression();\n        Match(')');\n     } else if(IsAddop(Look)) {\n\n        Match('-');\n        sprintf(tmp,\"movl $%c, %%eax\", GetNum());\n        EmitLn(tmp);\n        EmitLn(\"negl %eax\");\n\n    } else {\n\n        sprintf(tmp,\"movl $%c, %%eax\", GetNum());\n        EmitLn(tmp);\n    }\n}\n\nvoid Term()\n{\n    Factor();\n    while (strchr(\"*/\", Look)) {\n\n        EmitLn(\"pushl %eax\");\n\n        switch(Look)\n        {\n            case '*':\n                Multiply();\n                break;\n            case '/':\n                Divide();\n                break;\n            default:\n                Expected(\"Mulop\");\n        }\n    }\n}\n\nvoid Expression()\n{\n    if(IsAddop(Look))\n        EmitLn(\"xor %eax, %eax\");\n    else\n        Term();\n\n    while (strchr(\"+-\", Look)) {\n\n        EmitLn(\"pushl %eax\");\n\n        switch(Look)\n        {\n            case '+':\n                Add();\n                break;\n            case '-':\n                Substract();\n                break;\n            default:\n                Expected(\"Addop\");\n        }\n    }\n}\n\n\nvoid Add()\n{\n    Match('+');\n    Term();\n    EmitLn(\"addl (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n    \n}\n\n\nvoid Substract()\n{\n    Match('-');\n    Term();\n    EmitLn(\"subl (%esp), %eax\");\n    EmitLn(\"negl %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n\nint main()\n{\n\n    Init();\n    EmitLn(\".text\");\n    EmitLn(\".global _start\");\n    EmitLn(\"_start:\");\n    Expression();\n\n    /* return the result */\n    EmitLn(\"movl %eax, %ebx\");\n    EmitLn(\"movl $1, %eax\");\n    EmitLn(\"int $0x80\");\n    return 0;\n}\n"
  },
  {
    "path": "2/tutor2.txt",
    "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                     LET'S BUILD A COMPILER!\n\n                                By\n\n                     Jack W. Crenshaw, Ph.D.\n\n                           24 July 1988\n\n\n                   Part II: EXPRESSION PARSING\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1988 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\nGETTING STARTED\n\nIf you've read the introduction document to this series, you will\nalready know what  we're  about.    You will also have copied the\ncradle software  into your Turbo Pascal system, and have compiled\nit.  So you should be ready to go.\n\n\nThe purpose of this article is for us to learn  how  to parse and\ntranslate mathematical expressions.  What we would like to see as\noutput is a series of assembler-language statements  that perform\nthe desired actions.    For purposes of definition, an expression\nis the right-hand side of an equation, as in\n\n               x = 2*y + 3/(4*z)\n\nIn the early going, I'll be taking things in _VERY_  small steps.\nThat's  so  that  the beginners among you won't get totally lost.\nThere are also  some  very  good  lessons to be learned early on,\nthat will serve us well later.  For the more experienced readers:\nbear with me.  We'll get rolling soon enough.\n\nSINGLE DIGITS\n\nIn keeping with the whole theme of this series (KISS, remember?),\nlet's start with the absolutely most simple case we can think of.\nThat, to me, is an expression consisting of a single digit.\n\nBefore starting to code, make sure you have a  baseline  copy  of\nthe  \"cradle\" that I gave last time.  We'll be using it again for\nother experiments.  Then add this code:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Expression }\n\nprocedure Expression;\nbegin\n   EmitLn('MOVE #' + GetNum + ',D0')\nend;\n{---------------------------------------------------------------}\n\n\nAnd add the  line  \"Expression;\"  to  the main program so that it\nreads:\n                              \n\n{---------------------------------------------------------------}\nbegin\n   Init;\n   Expression;\nend.\n{---------------------------------------------------------------}\n\n\nNow  run  the  program. Try any single-digit number as input. You\nshould get a single line of assembler-language output.    Now try\nany  other character as input, and you'll  see  that  the  parser\nproperly reports an error.\n\n\nCONGRATULATIONS! You have just written a working translator!\n\nOK, I grant you that it's pretty limited. But don't brush  it off\ntoo  lightly.  This little \"compiler\" does,  on  a  very  limited\nscale,  exactly  what  any larger compiler does:    it  correctly\nrecognizes legal  statements in the input \"language\" that we have\ndefined for it, and  it  produces  correct,  executable assembler\ncode,  suitable  for  assembling  into  object  format.  Just  as\nimportantly,  it correctly  recognizes  statements  that  are NOT\nlegal, and gives a  meaningful  error message.  Who could ask for\nmore?  As we expand our  parser,  we'd better make sure those two\ncharacteristics always hold true.\n\nThere  are  some  other  features  of  this  tiny  program  worth\nmentioning.    First,  you  can  see that we don't separate  code\ngeneration from parsing ...  as  soon as the parser knows what we\nwant  done, it generates the object code directly.    In  a  real\ncompiler, of course, the reads in GetChar would be  from  a  disk\nfile, and the writes to another  disk  file, but this way is much\neasier to deal with while we're experimenting.\n\nAlso note that an expression must leave a result somewhere.  I've\nchosen the  68000  register  DO.    I  could have made some other\nchoices, but this one makes sense.\n\n\nBINARY EXPRESSIONS\n\nNow that we have that under our belt,  let's  branch  out  a bit.\nAdmittedly, an \"expression\" consisting of only  one  character is\nnot going to meet our needs for long, so let's see what we can do\nto extend it. Suppose we want to handle expressions of the form:\n\n                         1+2\n     or                  4-3\n     or, in general, <term> +/- <term>\n\n(That's a bit of Backus-Naur Form, or BNF.)\n                              \nTo do this we need a procedure that recognizes a term  and leaves\nits   result   somewhere,  and  another   that   recognizes   and\ndistinguishes  between   a  '+'  and  a  '-'  and  generates  the\nappropriate code.  But if Expression is going to leave its result\nin DO, where should Term leave its result?    Answer:    the same\nplace.  We're  going  to  have  to  save the first result of Term\nsomewhere before we get the next one.\n\nOK, basically what we want to  do  is have procedure Term do what\nExpression was doing before.  So just RENAME procedure Expression\nas Term, and enter the following new version of Expression:\n\n\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nprocedure Expression;\nbegin\n   Term;\n   EmitLn('MOVE D0,D1');\n   case Look of\n    '+': Add;\n    '-': Subtract;\n   else Expected('Addop');\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nNext, just above Expression enter these two procedures:\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an Add }\n\nprocedure Add;\nbegin\n   Match('+');\n   Term;\n   EmitLn('ADD D1,D0');\nend;\n\n\n{-------------------------------------------------------------}\n{ Recognize and Translate a Subtract }\n\nprocedure Subtract;\nbegin\n   Match('-');\n   Term;\n   EmitLn('SUB D1,D0');\nend;\n{-------------------------------------------------------------}\n                              \n\nWhen you're finished with that,  the order of the routines should\nbe:\n\n o Term (The OLD Expression)\n o Add\n o Subtract\n o Expression\n\nNow run the program.  Try any combination you can think of of two\nsingle digits,  separated  by  a  '+' or a '-'.  You should get a\nseries of four assembler-language instructions out  of  each run.\nNow  try  some  expressions with deliberate errors in them.  Does\nthe parser catch the errors?\n\nTake  a  look  at the object  code  generated.    There  are  two\nobservations we can make.  First, the code generated is  NOT what\nwe would write ourselves.  The sequence\n\n        MOVE #n,D0\n        MOVE D0,D1\n\nis inefficient.  If we were  writing  this code by hand, we would\nprobably just load the data directly to D1.\n\nThere is a  message  here:  code  generated by our parser is less\nefficient  than the code we would write by hand.  Get used to it.\nThat's going to be true throughout this series.  It's true of all\ncompilers to some extent.  Computer scientists have devoted whole\nlifetimes to the issue of code optimization, and there are indeed\nthings that can be done to improve the quality  of  code  output.\nSome compilers do quite well, but  there  is a heavy price to pay\nin complexity, and it's  a  losing  battle  anyway ... there will\nprobably never come a time when  a  good  assembler-language pro-\ngrammer can't out-program a compiler.    Before  this  session is\nover, I'll briefly mention some ways that we can do a  little op-\ntimization,  just  to  show you that we can indeed improve things\nwithout too much trouble.  But remember, we're here to learn, not\nto see how tight we can make  the  object  code.    For  now, and\nreally throughout  this  series  of  articles,  we'll  studiously\nignore optimization and  concentrate  on  getting  out  code that\nworks.\n\nSpeaking of which: ours DOESN'T!  The code is _WRONG_!  As things\nare working  now, the subtraction process subtracts D1 (which has\nthe FIRST argument in it) from D0 (which has the second).  That's\nthe wrong way, so we end up with the wrong  sign  for the result.\nSo let's fix up procedure Subtract with a  sign-changer,  so that\nit reads\n\n\n{-------------------------------------------------------------}\n{ Recognize and Translate a Subtract }\n\nprocedure Subtract;\nbegin\n   Match('-');\n   Term;\n   EmitLn('SUB D1,D0');\n   EmitLn('NEG D0');\nend;\n{-------------------------------------------------------------}\n\n\nNow  our  code  is even less efficient, but at least it gives the\nright answer!  Unfortunately, the  rules that give the meaning of\nmath expressions require that the terms in an expression come out\nin an inconvenient  order  for  us.    Again, this is just one of\nthose facts of life you learn to live with.   This  one will come\nback to haunt us when we get to division.\n\nOK,  at this point we have a parser that can recognize the sum or\ndifference of two digits.    Earlier,  we  could only recognize a\nsingle digit.  But  real  expressions can have either form (or an\ninfinity of others).  For kicks, go back and run the program with\nthe single input line '1'.\n\nDidn't work, did it?   And  why  should  it?    We  just finished\ntelling  our  parser  that the only kinds of expressions that are\nlegal are those  with  two  terms.    We  must  rewrite procedure\nExpression to be a lot more broadminded, and this is where things\nstart to take the shape of a real parser.\n\n\n\n\nGENERAL EXPRESSIONS\n\nIn the  REAL  world,  an  expression  can  consist of one or more\nterms, separated  by  \"addops\"  ('+'  or  '-').   In BNF, this is\nwritten\n\n          <expression> ::= <term> [<addop> <term>]*\n\n\nWe  can  accomodate  this definition of an  expression  with  the\naddition of a simple loop to procedure Expression:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nprocedure Expression;\nbegin\n   Term;\n   while Look in ['+', '-'] do begin\n      EmitLn('MOVE D0,D1');\n      case Look of\n       '+': Add;\n       '-': Subtract;\n      else Expected('Addop');\n      end;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nNOW we're getting somewhere!   This version handles any number of\nterms, and it only cost us two extra lines of code.  As we go on,\nyou'll discover that this is characteristic  of  top-down parsers\n... it only takes a few lines of code to accomodate extensions to\nthe  language.    That's  what  makes  our  incremental  approach\npossible.  Notice, too, how well the code of procedure Expression\nmatches the BNF definition.   That, too, is characteristic of the\nmethod.  As you get proficient in the approach, you'll  find that\nyou can turn BNF into parser code just about as  fast  as you can\ntype!\n\nOK, compile the new version of our parser, and give it a try.  As\nusual,  verify  that  the  \"compiler\"   can   handle   any  legal\nexpression,  and  will  give a meaningful error  message  for  an\nillegal one.  Neat, eh?  You might note that in our test version,\nany error message comes  out  sort of buried in whatever code had\nalready been  generated. But remember, that's just because we are\nusing  the  CRT  as  our  \"output  file\"  for   this   series  of\nexperiments.  In a production version, the two  outputs  would be\nseparated ... one to the output file, and one to the screen.\n\n\nUSING THE STACK\n\nAt  this  point  I'm going to  violate  my  rule  that  we  don't\nintroduce any complexity until  it's  absolutely  necessary, long\nenough to point out a problem with the code we're generating.  As\nthings stand now, the parser  uses D0 for the \"primary\" register,\nand D1 as  a place to store the partial sum.  That works fine for\nnow,  because  as  long as we deal with only the \"addops\" '+' and\n'-', any new term can be added in as soon as it is found.  But in\ngeneral that isn't true.  Consider, for example, the expression\n\n               1+(2-(3+(4-5)))\n                              \nIf we put the '1' in D1, where  do  we  put  the  '2'?    Since a\ngeneral expression can have any degree of complexity, we're going\nto run out of registers fast!\n\nFortunately,  there's  a  simple  solution.    Like  every modern\nmicroprocessor, the 68000 has a stack, which is the perfect place\nto save a variable number of items. So instead of moving the term\nin D0 to  D1, let's just push it onto the stack.  For the benefit\nof  those unfamiliar with 68000 assembler  language,  a  push  is\nwritten\n\n               -(SP)\n\nand a pop,     (SP)+ .\n\n\nSo let's change the EmitLn in Expression to read:\n\n               EmitLn('MOVE D0,-(SP)');\n\nand the two lines in Add and Subtract to\n\n               EmitLn('ADD (SP)+,D0')\n\nand            EmitLn('SUB (SP)+,D0'),\n\nrespectively.  Now try the parser again and make sure  we haven't\nbroken it.\n\nOnce again, the generated code is less efficient than before, but\nit's a necessary step, as you'll see.\n\n\nMULTIPLICATION AND DIVISION\n\nNow let's get down to some REALLY serious business.  As  you  all\nknow,  there  are  other  math   operators   than   \"addops\"  ...\nexpressions can also have  multiply  and  divide operations.  You\nalso  know  that  there  is  an implied operator  PRECEDENCE,  or\nhierarchy, associated with expressions, so that in  an expression\nlike\n\n                    2 + 3 * 4,\n\nwe know that we're supposed to multiply FIRST, then  add.    (See\nwhy we needed the stack?)\n\nIn the early days of compiler technology, people used some rather\ncomplex techniques to insure that the  operator  precedence rules\nwere  obeyed.    It turns out,  though,  that  none  of  this  is\nnecessary ... the rules can be accommodated quite  nicely  by our\ntop-down  parsing technique.  Up till now,  the  only  form  that\nwe've considered for a term is that of a  single  decimal  digit.\n\nMore generally, we  can  define  a  term as a PRODUCT of FACTORS;\ni.e.,\n\n          <term> ::= <factor>  [ <mulop> <factor ]*\n\nWhat  is  a factor?  For now, it's what a term used to be  ...  a\nsingle digit.\n\nNotice the symmetry: a  term  has the same form as an expression.\nAs a matter of fact, we can  add  to  our  parser  with  a little\njudicious  copying and renaming.  But  to  avoid  confusion,  the\nlisting below is the complete set of parsing routines.  (Note the\nway we handle the reversal of operands in Divide.)\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Factor }\n\nprocedure Factor;\nbegin\n   EmitLn('MOVE #' + GetNum + ',D0')\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Multiply }\n\nprocedure Multiply;\nbegin\n   Match('*');\n   Factor;\n   EmitLn('MULS (SP)+,D0');\nend;\n\n\n{-------------------------------------------------------------}\n{ Recognize and Translate a Divide }\n\nprocedure Divide;\nbegin\n   Match('/');\n   Factor;\n   EmitLn('MOVE (SP)+,D1');\n   EmitLn('DIVS D1,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Term }\n\nprocedure Term;\nbegin\n   Factor;\n   while Look in ['*', '/'] do begin\n      EmitLn('MOVE D0,-(SP)');\n      case Look of\n       '*': Multiply;\n       '/': Divide;\n      else Expected('Mulop');\n      end;\n   end;\nend;\n\n\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an Add }\n\nprocedure Add;\nbegin\n   Match('+');\n   Term;\n   EmitLn('ADD (SP)+,D0');\nend;\n\n\n{-------------------------------------------------------------}\n{ Recognize and Translate a Subtract }\n\nprocedure Subtract;\nbegin\n   Match('-');\n   Term;\n   EmitLn('SUB (SP)+,D0');\n   EmitLn('NEG D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nprocedure Expression;\nbegin\n   Term;\n   while Look in ['+', '-'] do begin\n      EmitLn('MOVE D0,-(SP)');\n      case Look of\n       '+': Add;\n       '-': Subtract;\n      else Expected('Addop');\n      end;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nHot dog!  A NEARLY functional parser/translator, in only 55 lines\nof Pascal!  The output is starting to look really useful,  if you\ncontinue to overlook the inefficiency,  which  I  hope  you will.\nRemember, we're not trying to produce tight code here.\n\n\nPARENTHESES\n\nWe  can  wrap  up this part of the parser with  the  addition  of\nparentheses with  math expressions.  As you know, parentheses are\na  mechanism to force a desired operator  precedence.    So,  for\nexample, in the expression\n\n               2*(3+4) ,\n\nthe parentheses force the addition  before  the  multiply.   Much\nmore importantly, though, parentheses  give  us  a  mechanism for\ndefining expressions of any degree of complexity, as in\n\n               (1+2)/((3+4)+(5-6))\n\nThe  key  to  incorporating  parentheses  into our parser  is  to\nrealize that  no matter how complicated an expression enclosed by\nparentheses may be,  to  the  rest  of  the world it looks like a\nsimple factor.  That is, one of the forms for a factor is:\n\n          <factor> ::= (<expression>)\n\nThis is where the recursion comes in. An expression can contain a\nfactor which contains another expression which contains a factor,\netc., ad infinitum.\n\nComplicated or not, we can take care of this by adding just a few\nlines of Pascal to procedure Factor:\n                             \n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Factor }\n\nprocedure Expression; Forward;\n\nprocedure Factor;\nbegin\n   if Look = '(' then begin\n      Match('(');\n      Expression;\n      Match(')');\n      end\n   else\n      EmitLn('MOVE #' + GetNum + ',D0');\nend;\n{--------------------------------------------------------------}\n\n\nNote again how easily we can extend the parser, and how  well the\nPascal code matches the BNF syntax.\n\nAs usual, compile the new version and make sure that it correctly\nparses  legal sentences, and flags illegal  ones  with  an  error\nmessage.\n\n\nUNARY MINUS\n\nAt  this  point,  we have a parser that can handle just about any\nexpression, right?  OK, try this input sentence:\n\n                         -1\n\nWOOPS!  It doesn't work, does it?   Procedure  Expression expects\neverything to start with an integer, so it coughs up  the leading\nminus  sign.  You'll find that +3 won't  work  either,  nor  will\nsomething like\n\n                    -(3-2) .\n\nThere  are  a  couple of ways to fix the problem.    The  easiest\n(although not necessarily the best)  way is to stick an imaginary\nleading zero in  front  of  expressions  of this type, so that -3\nbecomes 0-3.  We can easily patch this into our  existing version\nof Expression:\n\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nprocedure Expression;\nbegin\n   if IsAddop(Look) then\n      EmitLn('CLR D0')\n   else\n      Term;\n   while IsAddop(Look) do begin\n      EmitLn('MOVE D0,-(SP)');\n      case Look of\n       '+': Add;\n       '-': Subtract;\n      else Expected('Addop');\n      end;\n   end;\nend;\n{--------------------------------------------------------------}\n \n\nI TOLD you that making changes  was  easy!   This time it cost us\nonly  three  new lines of Pascal.   Note  the  new  reference  to\nfunction IsAddop.  Since the test for an addop appeared  twice, I\nchose  to  embed  it in the new function.  The  form  of  IsAddop\nshould be apparent from that for IsAlpha.  Here it is:\n\n\n{--------------------------------------------------------------}\n{ Recognize an Addop }\n\nfunction IsAddop(c: char): boolean;\nbegin\n   IsAddop := c in ['+', '-'];\nend;\n{--------------------------------------------------------------}\n\n\nOK, make these changes to the program and recompile.   You should\nalso include IsAddop in your baseline copy of the cradle.   We'll\nbe needing  it  again  later.   Now try the input -1 again.  Wow!\nThe efficiency of the code is  pretty  poor ... six lines of code\njust for loading a simple constant ... but at least it's correct.\nRemember, we're not trying to replace Turbo Pascal here.\n\nAt this point we're just about finished with the structure of our\nexpression parser.   This version of the program should correctly\nparse and compile just about any expression you care to  throw at\nit.    It's still limited in that  we  can  only  handle  factors\ninvolving single decimal digits.    But I hope that by now you're\nstarting  to  get  the  message  that we can  accomodate  further\nextensions  with  just  some  minor  changes to the parser.   You\nprobably won't be  surprised  to  hear  that a variable or even a\nfunction call is just another kind of a factor.\n                             \nIn  the next session, I'll show you just how easy it is to extend\nour parser to take care of  these  things too, and I'll also show\nyou just  how easily we can accomodate multicharacter numbers and\nvariable names.  So you see,  we're  not  far at all from a truly\nuseful parser.\n\n\n\n\nA WORD ABOUT OPTIMIZATION\n\nEarlier in this session, I promised to give you some hints  as to\nhow we can improve the quality of the generated code.  As I said,\nthe  production of tight code is not the  main  purpose  of  this\nseries of articles.  But you need to at least know that we aren't\njust  wasting our time here ... that we  can  indeed  modify  the\nparser further to  make  it produce better code, without throwing\naway everything we've done to date.  As usual, it turns  out that\nSOME optimization is not that difficult to do ... it simply takes\nsome extra code in the parser.\n\nThere are two basic approaches we can take:\n\n  o Try to fix up the code after it's generated\n\n    This is  the concept of \"peephole\" optimization.  The general\n    idea it that we  know  what  combinations of instructions the\n    compiler  is  going  to generate, and we also know which ones\n    are pretty bad (such as the code for -1, above).    So all we\n    do  is  to   scan   the  produced  code,  looking  for  those\n    combinations, and replacing  them  by better ones.  It's sort\n    of   a   macro   expansion,   in   reverse,   and   a  fairly\n    straightforward  exercise  in   pattern-matching.   The  only\n    complication,  really, is that there may be  a  LOT  of  such\n    combinations to look for.  It's called  peephole optimization\n    simply because it only looks at a small group of instructions\n    at a time.  Peephole  optimization can have a dramatic effect\n    on  the  quality  of the code,  with  little  change  to  the\n    structure of the compiler  itself.   There is a price to pay,\n    though,  in  both  the  speed,   size, and complexity of  the\n    compiler.  Looking for all those combinations calls for a lot\n    of IF tests, each one of which is a source of error.  And, of\n    course, it takes time.\n\n     In  the  classical  implementation  of a peephole optimizer,\n    it's done as a second pass to the compiler.  The  output code\n    is  written  to  disk,  and  then  the  optimizer  reads  and\n    processes the disk file again.  As a matter of fact,  you can\n    see that the optimizer could  even be a separate PROGRAM from\n    the compiler proper.  Since the optimizer only  looks  at the\n    code through a  small  \"window\"  of  instructions  (hence the\n    name), a better implementation would be to simply buffer up a\n    few lines of output, and scan the buffer after each EmitLn.\n\n  o Try to generate better code in the first place\n                             \n    This approach calls for us to look for  special  cases BEFORE\n    we Emit them.  As a trivial example,  we  should  be  able to\n    identify a constant zero,  and  Emit a CLR instead of a load,\n    or even do nothing at all, as in an add of zero, for example.\n    Closer to home, if we had chosen to recognize the unary minus\n    in Factor  instead of in Expression, we could treat constants\n    like -1 as ordinary constants,  rather  then  generating them\n    from  positive  ones.   None of these things are difficult to\n    deal with ... they only add extra tests in the code, which is\n    why  I  haven't  included them in our program.  The way I see\n    it, once we get to the point that we have a working compiler,\n    generating useful code  that  executes, we can always go back\n    and tweak the thing to tighten up the code produced.   That's\n    why there are Release 2.0's in the world.\n\nThere IS one more type  of  optimization  worth  mentioning, that\nseems to promise pretty tight code without too much hassle.  It's\nmy \"invention\" in the  sense  that I haven't seen it suggested in\nprint anywhere, though I have  no  illusions  that  it's original\nwith me.\n\nThis  is to avoid such a heavy use of the stack, by making better\nuse of the CPU registers.  Remember back when we were  doing only\naddition  and  subtraction,  that we used registers  D0  and  D1,\nrather than the stack?  It worked, because with  only  those  two\noperations, the \"stack\" never needs more than two entries.\n\nWell,  the 68000 has eight data registers.  Why not use them as a\nprivately managed stack?  The key is to recognize  that,  at  any\npoint in its processing,  the  parser KNOWS how many items are on\nthe  stack, so it can indeed manage it properly.  We can define a\nprivate \"stack pointer\" that keeps  track  of  which  stack level\nwe're at, and addresses the  corresponding  register.   Procedure\nFactor,  for  example,  would  not  cause data to be loaded  into\nregister  D0,  but   into  whatever  the  current  \"top-of-stack\"\nregister happened to be.\n\nWhat we're doing in effect is to replace the CPU's RAM stack with\na  locally  managed  stack  made  up  of  registers.    For  most\nexpressions, the stack level  will  never  exceed eight, so we'll\nget pretty good code out.  Of course, we also  have  to deal with\nthose  odd cases where the stack level  DOES  exceed  eight,  but\nthat's no problem  either.    We  simply let the stack spill over\ninto the CPU  stack.    For  levels  beyond eight, the code is no\nworse  than  what  we're generating now, and for levels less than\neight, it's considerably better.\n\nFor the record, I  have  implemented  this  concept, just to make\nsure  it  works  before  I  mentioned  it to you.  It does.    In\npractice, it turns out that you can't really use all eight levels\n... you need at least one register free to  reverse  the  operand\norder for division  (sure  wish  the  68000 had an XTHL, like the\n8080!).  For expressions  that  include  function calls, we would\nalso need a register reserved for them. Still, there  is  a  nice\nimprovement in code size for most expressions.\n\nSo, you see, getting  better  code  isn't  that difficult, but it\ndoes add complexity to the our translator ...  complexity  we can\ndo without at this point.  For that reason,  I  STRONGLY  suggest\nthat we continue to ignore efficiency issues for the rest of this\nseries,  secure  in  the knowledge that we can indeed improve the\ncode quality without throwing away what we've done.\n\nNext lesson, I'll show you how to deal with variables factors and\nfunction calls.  I'll also show you just how easy it is to handle\nmulticharacter tokens and embedded white space.\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1988 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n \n\n\n\n"
  },
  {
    "path": "3/Makefile",
    "content": "IN=main.c cradle.c\nOUT=main\nFLAGS=-Wall -Werror\n\nall:\n\tgcc -o $(OUT) $(IN) $(FLAGS)\n\nrun:\n\t./$(OUT)\n\n.PHONY: clean\nclean:\n\trm $(OUT)\n"
  },
  {
    "path": "3/cradle.c",
    "content": "#include \"cradle.h\"\n#include <stdio.h>\n#include <stdlib.h>\n\nvoid GetChar() \n{\n    Look = getchar();\n}\n\n\nvoid Error(char *s)\n{\n    printf(\"\\nError: %s.\", s);\n}\n\nvoid Abort(char *s)\n{\n    Error(s);\n    exit(1);\n}\n\n\nvoid Expected(char *s)\n{\n    sprintf(tmp, \"%s Expected\", s);\n    Abort(tmp);\n}\n\n\nvoid Match(char x)\n{\n    if(Look == x) {\n        GetChar();\n        SkipWhite();\n    } else {\n        sprintf(tmp, \"' %c ' \",  x);\n        Expected(tmp);\n    }\n}\n\n\nint IsAlpha(char c)\n{\n    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');\n} \n\nint IsDigit(char c)\n{\n    return (c >= '0') && (c <= '9');\n}\n\nint IsAlNum(char c)\n{\n    return IsAlpha(c) || IsDigit(c);\n}\n\nint IsAddop(char c)\n{\n    return (c == '+') || (c == '-');\n}\n\nint IsWhite(char c)\n{\n    return (c == ' ') || (c == '\\t');\n}\n\nchar* GetName()\n{\n    char *token = token_buf;\n\n    if( !IsAlNum(Look)) {\n        Expected(\"Name\");\n    }\n    while (IsAlNum(Look)) {\n        *token = Look;\n        token++;\n\n        GetChar();\n    }\n\n    SkipWhite();\n\n    *token = '\\0';\n    return token_buf;\n}\n\n\nchar* GetNum()\n{\n    char *value = token_buf;\n\n    if( !IsAlNum(Look)) {\n        Expected(\"Integer\");\n    }\n    while (IsDigit(Look)) {\n        *value = Look;\n        value++;\n\n        GetChar();\n    }\n\n    SkipWhite();\n\n    *value = '\\0';\n    return token_buf;\n}\n\nvoid SkipWhite()\n{\n    while (IsWhite(Look)) {\n        GetChar();\n    }\n}\n\nvoid Emit(char *s)\n{\n    printf(\"\\t%s\", s);\n}\n\nvoid EmitLn(char *s)\n{\n    Emit(s);\n    printf(\"\\n\");\n}\n\nvoid Init()\n{\n    GetChar();\n    SkipWhite();\n}\n\n"
  },
  {
    "path": "3/cradle.h",
    "content": "#ifndef _CRADLE_H\n#define _CRADLE_H\n\n#define MAX_BUF 100\nchar tmp[MAX_BUF];\nchar token_buf[MAX_BUF];\n\nchar Look;\n\nvoid GetChar();\n\nvoid Error(char *s);\nvoid Abort(char *s);\nvoid Expected(char *s);\nvoid Match(char x);\n\nint IsAlpha(char c);\nint IsDigit(char c);\nint IsAddop(char c);\nint IsAlNum(char c);\nint IsWhite(char c);\n\nchar *GetName();\nchar *GetNum();\n\nvoid SkipWhite();\n\nvoid Emit(char *s);\nvoid EmitLn(char *s);\n\nvoid Init();\n\n#endif\n"
  },
  {
    "path": "3/main.c",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n\n#include \"cradle.h\"\n\nvoid Term();\nvoid Expression();\nvoid Add();\nvoid Substract();\nvoid Factor();\nvoid Ident();\nvoid Assignment();\n\n\nvoid Multiply()\n{\n    Match('*');\n    Factor();\n    EmitLn(\"imull (%esp), %eax\");\n    /* push of the stack */\n    EmitLn(\"addl $4, %esp\");\n} \n\nvoid Divide()\n{\n    Match('/');\n    Factor();\n\n    /* for a expersion like a/b we have eax=b and %(esp)=a\n     * but we need eax=a, and b on the stack \n     */\n    EmitLn(\"movl (%esp), %edx\");\n    EmitLn(\"addl $4, %esp\");\n\n    EmitLn(\"pushl %eax\");\n\n    EmitLn(\"movl %edx, %eax\");\n\n    /* sign extesnion */\n    EmitLn(\"sarl $31, %edx\");\n    EmitLn(\"idivl (%esp)\");\n    EmitLn(\"addl $4, %esp\");\n\n}\n\nvoid Ident()\n{\n    char *name = GetName();\n    if (Look == '(') {\n        Match('(');\n        Match(')');\n        sprintf(tmp, \"call %s\", name);\n        EmitLn(tmp);\n    } else {\n        sprintf(tmp, \"movl %s, %%eax\", name);\n        EmitLn(tmp);\n    }\n}\n\nvoid Factor()\n{\n    if(Look == '(') {\n        Match('(');\n        Expression();\n        Match(')');\n     } else if(IsAddop(Look)) {\n        Match('-');\n        sprintf(tmp,\"movl $%s, %%eax\", GetNum());\n        EmitLn(tmp);\n        EmitLn(\"negl %eax\");\n    } else if (IsAlpha(Look)) {\n        Ident();\n    } else {\n        sprintf(tmp,\"movl $%s, %%eax\", GetNum());\n        EmitLn(tmp);\n    }\n}\n\nvoid Term()\n{\n    Factor();\n    while (strchr(\"*/\", Look)) {\n\n        EmitLn(\"pushl %eax\");\n\n        switch(Look)\n        {\n            case '*':\n                Multiply();\n                break;\n            case '/':\n                Divide();\n                break;\n            default:\n                Expected(\"Mulop\");\n        }\n    }\n}\n\nvoid Expression()\n{\n    if(IsAddop(Look))\n        EmitLn(\"xor %eax, %eax\");\n    else\n        Term();\n\n    while (strchr(\"+-\", Look)) {\n\n        EmitLn(\"pushl %eax\");\n\n        switch(Look)\n        {\n            case '+':\n                Add();\n                break;\n            case '-':\n                Substract();\n                break;\n            default:\n                Expected(\"Addop\");\n        }\n    }\n}\n\n\nvoid Add()\n{\n    Match('+');\n    Term();\n    EmitLn(\"addl (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n    \n}\n\n\nvoid Substract()\n{\n    Match('-');\n    Term();\n    EmitLn(\"subl (%esp), %eax\");\n    EmitLn(\"negl %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\nvoid Assignment()\n{\n    char *name = GetName();\n    Match('=');\n    Expression();\n    sprintf(tmp, \"lea %s, %%ebx\", name);\n    EmitLn(tmp);\n    EmitLn(\"movl %eax, (%ebx)\");\n}\n\nint main()\n{\n\n    Init();\n    EmitLn(\".text\");\n    EmitLn(\".global _start\");\n    EmitLn(\"_start:\");\n    /* Expression(); */\n    Assignment();\n    if (Look != '\\n') {\n        Expected(\"NewLine\");\n    }\n\n\n    /* return the result */\n    EmitLn(\"movl %eax, %ebx\");\n    EmitLn(\"movl $1, %eax\");\n    EmitLn(\"int $0x80\");\n    return 0;\n}\n"
  },
  {
    "path": "3/tutor3.txt",
    "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                     LET'S BUILD A COMPILER!\n\n                                By\n\n                     Jack W. Crenshaw, Ph.D.\n\n                            4 Aug 1988\n\n\n                    Part III: MORE EXPRESSIONS\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1988 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\nINTRODUCTION\n\nIn the last installment, we examined the techniques used to parse\nand  translate a general math expression.  We  ended  up  with  a\nsimple parser that  could handle arbitrarily complex expressions,\nwith two restrictions:\n\n  o No variables were allowed, only numeric factors\n\n  o The numeric factors were limited to single digits\n\nIn this installment, we'll get  rid of those restrictions.  We'll\nalso extend what  we've  done  to  include  assignment statements\nfunction  calls  and.    Remember,   though,   that   the  second\nrestriction was  mainly self-imposed  ... a choice of convenience\non our part, to make life easier and to let us concentrate on the\nfundamental concepts.    As  you'll  see  in  a bit, it's an easy\nrestriction to get rid of, so don't get  too  hung  up  about it.\nWe'll use the trick when it serves us to do so, confident that we\ncan discard it when we're ready to.\n\n\nVARIABLES\n\nMost expressions  that we see in practice involve variables, such\nas\n\n               b * b + 4 * a * c\n\nNo  parser is much good without being able  to  deal  with  them.\nFortunately, it's also quite easy to do.\n\nRemember that in our parser as it currently stands, there are two\nkinds of  factors  allowed:  integer  constants  and  expressions\nwithin parentheses.  In BNF notation,\n\n     <factor> ::= <number> | (<expression>)\n\nThe '|' stands for \"or\", meaning of course that either form  is a\nlegal form for a factor.   Remember,  too, that we had no trouble\nknowing which was which  ...  the  lookahead  character is a left\nparen '(' in one case, and a digit in the other.\n                              \nIt probably won't come as too much of a surprise that  a variable\nis just another kind of factor.    So  we extend the BNF above to\nread:\n\n\n     <factor> ::= <number> | (<expression>) | <variable>\n\n\nAgain, there is no  ambiguity:  if  the  lookahead character is a\nletter,  we  have  a variable; if a digit, we have a number. Back\nwhen we translated the number, we just issued code  to  load  the\nnumber,  as immediate data, into D0.  Now we do the same, only we\nload a variable.\n\nA minor complication in the  code generation arises from the fact\nthat most  68000 operating systems, including the SK*DOS that I'm\nusing, require the code to be  written  in \"position-independent\"\nform, which  basically means that everything is PC-relative.  The\nformat for a load in this language is\n\n               MOVE X(PC),D0\n\nwhere X is, of course, the variable name.  Armed with that, let's\nmodify the current version of Factor to read:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Factor }\n\nprocedure Expression; Forward;\n\nprocedure Factor;\nbegin\n   if Look = '(' then begin\n      Match('(');\n      Expression;\n      Match(')');\n      end\n   else if IsAlpha(Look) then\n      EmitLn('MOVE ' + GetName + '(PC),D0')\n   else\n      EmitLn('MOVE #' + GetNum + ',D0');\nend;\n{--------------------------------------------------------------}\n\n\nI've  remarked before how easy it is to  add  extensions  to  the\nparser, because of  the  way  it's  structured.  You can see that\nthis  still  holds true here.  This time it cost us  all  of  two\nextra lines of code.  Notice, too, how the if-else-else structure\nexactly parallels the BNF syntax equation.\n\nOK, compile and test this new version of the parser.  That didn't\nhurt too badly, did it?\n                              \n\nFUNCTIONS\n\nThere is only one  other  common kind of factor supported by most\nlanguages: the function call.  It's really too early  for  us  to\ndeal with functions well,  because  we  haven't yet addressed the\nissue of parameter passing.  What's more, a \"real\" language would\ninclude a mechanism to  support  more than one type, one of which\nshould be a function type.  We haven't gotten there  yet, either.\nBut I'd still like to deal with functions  now  for  a  couple of\nreasons.    First,  it  lets  us  finally  wrap  up the parser in\nsomething very close to its final form, and second, it  brings up\na new issue which is very much worth talking about.\n\nUp  till  now,  we've  been  able  to  write  what  is  called  a\n\"predictive parser.\"  That  means  that at any point, we can know\nby looking at the current  lookahead character exactly what to do\nnext.  That isn't the case when we add functions.  Every language\nhas some naming rules  for  what  constitutes a legal identifier.\nFor the present, ours is simply that it  is  one  of  the letters\n'a'..'z'.  The  problem  is  that  a variable name and a function\nname obey  the  same  rules.   So how can we tell which is which?\nOne way is to require that they each be declared before  they are\nused.    Pascal  takes that approach.  The other is that we might\nrequire a function to be followed by a (possibly empty) parameter\nlist.  That's the rule used in C.\n\nSince  we  don't  yet have a mechanism for declaring types, let's\nuse the C  rule for now.  Since we also don't have a mechanism to\ndeal  with parameters, we can only handle  empty  lists,  so  our\nfunction calls will have the form\n\n                    x()  .\n\nSince  we're  not  dealing  with  parameter lists yet,  there  is\nnothing  to do but to call the function, so we need only to issue\na BSR (call) instead of a MOVE.\n\nNow that there are two  possibilities for the \"If IsAlpha\" branch\nof the test in Factor, let's treat them in a  separate procedure.\nModify Factor to read:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Factor }\n\nprocedure Expression; Forward;\n\nprocedure Factor;\nbegin\n   if Look = '(' then begin\n      Match('(');\n      Expression;\n      Match(')');\n      end\n   else if IsAlpha(Look) then\n      Ident\n   else\n      EmitLn('MOVE #' + GetNum + ',D0');\nend;\n{--------------------------------------------------------------}\n\n\nand insert before it the new procedure\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Identifier }\n\nprocedure Ident;\nvar Name: char;\nbegin\n   Name := GetName;\n   if Look = '(' then begin\n      Match('(');\n      Match(')');\n      EmitLn('BSR ' + Name);\n      end\n   else\n      EmitLn('MOVE ' + Name + '(PC),D0')\nend;\n{---------------------------------------------------------------}\n\n\nOK, compile and  test  this  version.  Does  it  parse  all legal\nexpressions?  Does it correctly flag badly formed ones?\n\nThe important thing to notice is that even though  we  no  longer\nhave  a predictive parser, there is  little  or  no  complication\nadded with the recursive descent approach that we're  using.   At\nthe point where  Factor  finds an identifier (letter), it doesn't\nknow whether it's a variable name or a function name, nor does it\nreally care.  It simply passes it on to Ident and leaves it up to\nthat procedure to figure it out.  Ident, in  turn,  simply  tucks\naway the identifier and then reads one more  character  to decide\nwhich kind of identifier it's dealing with.\n\nKeep this approach in mind.  It's a very powerful concept, and it\nshould be used  whenever  you  encounter  an  ambiguous situation\nrequiring further lookahead.   Even  if  you  had to look several\ntokens ahead, the principle would still work.\n\n\nMORE ON ERROR HANDLING\n\nAs long as we're talking  philosophy,  there's  another important\nissue to point out:  error  handling.    Notice that although the\nparser correctly rejects (almost)  every malformed  expression we\ncan  throw at it, with a meaningful  error  message,  we  haven't\nreally had to  do much work to make that happen.  In fact, in the\nwhole parser per se (from  Ident  through  Expression)  there are\nonly two calls to the error routine, Expected.  Even those aren't\nnecessary ... if you'll look again in Term and Expression, you'll\nsee that those statements can't be reached.  I put them  in early\non as a  bit  of  insurance,  but  they're no longer needed.  Why\ndon't you delete them now?\n\nSo how did we get this nice error handling  virtually  for  free?\nIt's simply  that  I've  carefully  avoided  reading  a character\ndirectly  using  GetChar.  Instead,  I've  relied  on  the  error\nhandling in GetName,  GetNum,  and  Match  to  do  all  the error\nchecking for me.    Astute  readers  will notice that some of the\ncalls to Match (for example, the ones in Add  and  Subtract)  are\nalso unnecessary ... we already know what the character is by the\ntime  we get there ... but it maintains  a  certain  symmetry  to\nleave them in, and  the  general rule to always use Match instead\nof GetChar is a good one.\n\nI mentioned an \"almost\" above.   There  is a case where our error\nhandling  leaves a bit to be desired.  So far we haven't told our\nparser what and  end-of-line  looks  like,  or  what  to  do with\nembedded  white  space.  So  a  space  character  (or  any  other\ncharacter not part of the recognized character set) simply causes\nthe parser to terminate, ignoring the unrecognized characters.\n\nIt  could  be  argued  that  this is reasonable behavior at  this\npoint.  In a \"real\"  compiler, there is usually another statement\nfollowing the one we're working on, so any characters not treated\nas part of our expression will either be used for or  rejected as\npart of the next one.\n\nBut  it's  also a very easy thing to fix up, even  if  it's  only\ntemporary.   All  we  have  to  do  is assert that the expression\nshould end with an end-of-line , i.e., a carriage return.\n\nTo see what I'm talking about, try the input line\n\n               1+2 <space> 3+4\n\nSee  how the space was treated as a terminator?  Now, to make the\ncompiler properly flag this, add the line\n\n               if Look <> CR then Expected('Newline');\n\nin the main  program,  just  after  the call to Expression.  That\ncatches anything left over in the input stream.  Don't  forget to\ndefine CR in the const statement:\n\n               CR = ^M;\n\nAs usual, recompile the program and verify that it does what it's\nsupposed to.\n\n\nASSIGNMENT STATEMENTS\n\nOK,  at  this  point we have a parser that works very nicely. I'd\nlike to  point  out  that  we  got  it  using  only  88  lines of\nexecutable code, not  counting  what  was  in  the  cradle.   The\ncompiled  object  file  is  a  whopping  4752  bytes.   Not  bad,\nconsidering we weren't trying very  hard  to  save  either source\ncode or object size.  We just stuck to the KISS principle.\n\nOf course, parsing an expression  is not much good without having\nsomething to do with it afterwards.  Expressions USUALLY (but not\nalways) appear in assignment statements, in the form\n\n          <Ident> = <Expression>\n\nWe're only a breath  away  from being able to parse an assignment\nstatement, so let's take that  last  step.  Just  after procedure\nExpression, add the following new procedure:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Assignment Statement }\n\nprocedure Assignment;\nvar Name: char;\nbegin\n   Name := GetName;\n   Match('=');\n   Expression;\n   EmitLn('LEA ' + Name + '(PC),A0');\n   EmitLn('MOVE D0,(A0)')\nend;\n{--------------------------------------------------------------}\n\n\nNote again that the  code  exactly parallels the BNF.  And notice\nfurther that  the error checking was painless, handled by GetName\nand Match.\n\nThe reason for the two  lines  of  assembler  has  to  do  with a\npeculiarity in the  68000,  which requires this kind of construct\nfor PC-relative code.\n\nNow change the call to Expression, in the main program, to one to\nAssignment.  That's all there is to it.\n\nSon of a gun!  We are actually  compiling  assignment statements.\nIf those were the only kind of statements in a language, all we'd\nhave to  do  is  put  this in a loop and we'd have a full-fledged\ncompiler!\n\nWell, of course they're not the only kind.  There are also little\nitems  like  control  statements  (IFs  and  loops),  procedures,\ndeclarations, etc.  But cheer  up.    The  arithmetic expressions\nthat we've been dealing with are among the most challenging  in a\nlanguage.      Compared  to  what  we've  already  done,  control\nstatements  will be easy.  I'll be covering  them  in  the  fifth\ninstallment.  And the other statements will all fall in  line, as\nlong as we remember to KISS.\n\n\nMULTI-CHARACTER TOKENS\n\nThroughout  this   series,   I've   been   carefully  restricting\neverything  we  do  to  single-character  tokens,  all  the while\nassuring  you  that  it wouldn't be difficult to extend to multi-\ncharacter ones.    I  don't  know if you believed me or not ... I\nwouldn't  really blame you if you were a  bit  skeptical.    I'll\ncontinue  to use  that approach in  the  sessions  which  follow,\nbecause it helps keep complexity away.    But I'd like to back up\nthose  assurances, and wrap up this portion  of  the  parser,  by\nshowing you  just  how  easy  that  extension  really is.  In the\nprocess, we'll also provide for embedded white space.  Before you\nmake  the  next  few changes, though, save the current version of\nthe parser away under another name.  I have some more uses for it\nin  the  next  installment, and we'll be working with the single-\ncharacter version.\n\nMost compilers separate out the handling of the input stream into\na separate module called  the  lexical scanner.  The idea is that\nthe scanner deals with all the character-by-character  input, and\nreturns the separate units  (tokens)  of  the  stream.  There may\ncome a time when we'll want  to  do something like that, too, but\nfor  now  there  is  no  need. We can handle the  multi-character\ntokens that we need by very slight and  very  local modifications\nto GetName and GetNum.\n\nThe usual definition of an identifier is that the first character\nmust be a letter, but the rest can be  alphanumeric  (letters  or\nnumbers).  To  deal  with  this,  we  need  one  other recognizer\nfunction\n\n\n{--------------------------------------------------------------}\n{ Recognize an Alphanumeric }\n\nfunction IsAlNum(c: char): boolean;\nbegin\n   IsAlNum := IsAlpha(c) or IsDigit(c);\nend;\n{--------------------------------------------------------------}\n\n\nAdd this function to your parser.  I put mine just after IsDigit.\nWhile you're  at  it,  might  as  well  include it as a permanent\nmember of Cradle, too.\n                              \nNow, we need  to  modify  function  GetName  to  return  a string\ninstead of a character:\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nfunction GetName: string;\nvar Token: string;\nbegin\n   Token := '';\n   if not IsAlpha(Look) then Expected('Name');\n   while IsAlNum(Look) do begin\n      Token := Token + UpCase(Look);\n      GetChar;\n   end;\n   GetName := Token;\nend;\n{--------------------------------------------------------------}\n\n\nSimilarly, modify GetNum to read:\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nfunction GetNum: string;\nvar Value: string;\nbegin\n   Value := '';\n   if not IsDigit(Look) then Expected('Integer');\n   while IsDigit(Look) do begin\n      Value := Value + Look;\n      GetChar;\n   end;\n   GetNum := Value;\nend;\n{--------------------------------------------------------------}\n\n\nAmazingly enough, that  is  virtually all the changes required to\nthe  parser!  The local variable Name  in  procedures  Ident  and\nAssignment was originally declared as  \"char\",  and  must  now be\ndeclared string[8].  (Clearly,  we  could  make the string length\nlonger if we chose, but most assemblers limit the length anyhow.)\nMake  this  change,  and  then  recompile and test. _NOW_ do  you\nbelieve that it's a simple change?\n\n\nWHITE SPACE\n\nBefore we leave this parser for awhile, let's  address  the issue\nof  white  space.   As it stands now, the parser  will  barf  (or\nsimply terminate) on a single space  character  embedded anywhere\nin  the input stream.  That's pretty  unfriendly  behavior.    So\nlet's \"productionize\" the thing  a  bit  by eliminating this last\nrestriction.\n\nThe  key  to easy handling of white space is to come  up  with  a\nsimple rule for how the parser should treat the input stream, and\nto  enforce that rule everywhere.  Up  till  now,  because  white\nspace wasn't permitted, we've been able to assume that after each\nparsing action, the lookahead character  Look  contains  the next\nmeaningful  character,  so  we could test it  immediately.    Our\ndesign was based upon this principle.\n\nIt still sounds like a good rule to me, so  that's  the one we'll\nuse.    This  means  that  every routine that advances the  input\nstream must skip over white space, and leave  the  next non-white\ncharacter in Look.   Fortunately,  because  we've been careful to\nuse GetName, GetNum, and Match  for most of our input processing,\nit is  only  those  three  routines  (plus  Init) that we need to\nmodify.\n\nNot  surprisingly,  we  start  with  yet  another  new recognizer\nroutine:\n\n\n{--------------------------------------------------------------}\n{ Recognize White Space }\n\nfunction IsWhite(c: char): boolean;\nbegin\n   IsWhite := c in [' ', TAB];\nend;\n{--------------------------------------------------------------}\n\n\nWe  also need a routine that  will  eat  white-space  characters,\nuntil it finds a non-white one:\n\n\n{--------------------------------------------------------------}\n{ Skip Over Leading White Space }\n\nprocedure SkipWhite;\nbegin\n   while IsWhite(Look) do\n      GetChar;\nend;\n{--------------------------------------------------------------}\n\n\nNow,  add calls to SkipWhite to Match,  GetName,  and  GetNum  as\nshown below:\n\n\n{--------------------------------------------------------------}\n{ Match a Specific Input Character }\n\nprocedure Match(x: char);\nbegin\n   if Look <> x then Expected('''' + x + '''')\n   else begin\n      GetChar;\n      SkipWhite;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nfunction GetName: string;\nvar Token: string;\nbegin\n   Token := '';\n   if not IsAlpha(Look) then Expected('Name');\n   while IsAlNum(Look) do begin\n      Token := Token + UpCase(Look);\n      GetChar;\n   end;\n   GetName := Token;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nfunction GetNum: string;\nvar Value: string;\nbegin\n   Value := '';\n   if not IsDigit(Look) then Expected('Integer');\n   while IsDigit(Look) do begin\n      Value := Value + Look;\n      GetChar;\n   end;\n   GetNum := Value;\n   SkipWhite;\nend;\n{--------------------------------------------------------------}\n\n(Note  that  I  rearranged  Match  a  bit,  without changing  the\nfunctionality.)\n\nFinally, we need to skip over leading blanks where we  \"prime the\npump\" in Init:\n                             \n{--------------------------------------------------------------}\n{ Initialize }\n\nprocedure Init;\nbegin\n   GetChar;\n   SkipWhite;\nend;\n{--------------------------------------------------------------}\n\n\nMake these changes and recompile the program.  You will find that\nyou will have to move Match below SkipWhite, to  avoid  an  error\nmessage from the Pascal compiler.  Test the program as  always to\nmake sure it works properly.\n\nSince we've made quite  a  few  changes  during this session, I'm\nreproducing the entire parser below:\n\n\n{--------------------------------------------------------------}\nprogram parse;\n\n{--------------------------------------------------------------}\n{ Constant Declarations }\n\nconst TAB = ^I;\n       CR = ^M;\n\n{--------------------------------------------------------------}\n{ Variable Declarations }\n\nvar Look: char;              { Lookahead Character }\n\n{--------------------------------------------------------------}\n{ Read New Character From Input Stream }\n\nprocedure GetChar;\nbegin\n   Read(Look);\nend;\n\n{--------------------------------------------------------------}\n{ Report an Error }\n\nprocedure Error(s: string);\nbegin\n   WriteLn;\n   WriteLn(^G, 'Error: ', s, '.');\nend;\n\n\n{--------------------------------------------------------------}\n{ Report Error and Halt }\n                             \nprocedure Abort(s: string);\nbegin\n   Error(s);\n   Halt;\nend;\n\n\n{--------------------------------------------------------------}\n{ Report What Was Expected }\n\nprocedure Expected(s: string);\nbegin\n   Abort(s + ' Expected');\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an Alpha Character }\n\nfunction IsAlpha(c: char): boolean;\nbegin\n   IsAlpha := UpCase(c) in ['A'..'Z'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Decimal Digit }\n\nfunction IsDigit(c: char): boolean;\nbegin\n   IsDigit := c in ['0'..'9'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an Alphanumeric }\n\nfunction IsAlNum(c: char): boolean;\nbegin\n   IsAlNum := IsAlpha(c) or IsDigit(c);\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an Addop }\n\nfunction IsAddop(c: char): boolean;\nbegin\n   IsAddop := c in ['+', '-'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize White Space }\n                             \nfunction IsWhite(c: char): boolean;\nbegin\n   IsWhite := c in [' ', TAB];\nend;\n\n\n{--------------------------------------------------------------}\n{ Skip Over Leading White Space }\n\nprocedure SkipWhite;\nbegin\n   while IsWhite(Look) do\n      GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Match a Specific Input Character }\n\nprocedure Match(x: char);\nbegin\n   if Look <> x then Expected('''' + x + '''')\n   else begin\n      GetChar;\n      SkipWhite;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nfunction GetName: string;\nvar Token: string;\nbegin\n   Token := '';\n   if not IsAlpha(Look) then Expected('Name');\n   while IsAlNum(Look) do begin\n      Token := Token + UpCase(Look);\n      GetChar;\n   end;\n   GetName := Token;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nfunction GetNum: string;\nvar Value: string;\nbegin\n   Value := '';\n   if not IsDigit(Look) then Expected('Integer');\n   while IsDigit(Look) do begin\n      Value := Value + Look;\n      GetChar;\n   end;\n   GetNum := Value;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Output a String with Tab }\n\nprocedure Emit(s: string);\nbegin\n   Write(TAB, s);\nend;\n\n\n{--------------------------------------------------------------}\n{ Output a String with Tab and CRLF }\n\nprocedure EmitLn(s: string);\nbegin\n   Emit(s);\n   WriteLn;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Identifier }\n\nprocedure Ident;\nvar Name: string[8];\nbegin\n   Name:= GetName;\n   if Look = '(' then begin\n      Match('(');\n      Match(')');\n      EmitLn('BSR ' + Name);\n      end\n   else\n      EmitLn('MOVE ' + Name + '(PC),D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Factor }\n\nprocedure Expression; Forward;\n\nprocedure Factor;\nbegin\n   if Look = '(' then begin\n      Match('(');\n      Expression;\n      Match(')');\n      end\n   else if IsAlpha(Look) then\n      Ident\n   else\n      EmitLn('MOVE #' + GetNum + ',D0');\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Multiply }\n\nprocedure Multiply;\nbegin\n   Match('*');\n   Factor;\n   EmitLn('MULS (SP)+,D0');\nend;\n\n\n{-------------------------------------------------------------}\n{ Recognize and Translate a Divide }\n\nprocedure Divide;\nbegin\n   Match('/');\n   Factor;\n   EmitLn('MOVE (SP)+,D1');\n   EmitLn('EXS.L D0');\n   EmitLn('DIVS D1,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Term }\n\nprocedure Term;\nbegin\n   Factor;\n   while Look in ['*', '/'] do begin\n      EmitLn('MOVE D0,-(SP)');\n      case Look of\n       '*': Multiply;\n       '/': Divide;\n      end;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an Add }\n\nprocedure Add;\nbegin\n   Match('+');\n   Term;\n   EmitLn('ADD (SP)+,D0');\nend;\n\n\n{-------------------------------------------------------------}\n{ Recognize and Translate a Subtract }\n\nprocedure Subtract;\nbegin\n   Match('-');\n   Term;\n   EmitLn('SUB (SP)+,D0');\n   EmitLn('NEG D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nprocedure Expression;\nbegin\n   if IsAddop(Look) then\n      EmitLn('CLR D0')\n   else\n      Term;\n   while IsAddop(Look) do begin\n      EmitLn('MOVE D0,-(SP)');\n      case Look of\n       '+': Add;\n       '-': Subtract;\n      end;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Assignment Statement }\n\nprocedure Assignment;\nvar Name: string[8];\nbegin\n   Name := GetName;\n   Match('=');\n   Expression;\n   EmitLn('LEA ' + Name + '(PC),A0');\n   EmitLn('MOVE D0,(A0)')\nend;\n\n\n{--------------------------------------------------------------}\n{ Initialize }\n                             \nprocedure Init;\nbegin\n   GetChar;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\n   Assignment;\n   If Look <> CR then Expected('NewLine');\nend.\n{--------------------------------------------------------------}\n\n\nNow the parser is complete.  It's got every feature we can put in\na  one-line \"compiler.\"  Tuck it away in a safe place.  Next time\nwe'll move on to a new subject, but we'll still be  talking about\nexpressions for quite awhile.  Next installment, I plan to talk a\nbit about interpreters as opposed  to compilers, and show you how\nthe structure of the parser changes a bit as we change  what sort\nof action has to be taken.  The information we pick up there will\nserve  us in good stead later on, even if you have no interest in\ninterpreters.  See you next time.\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1988 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\n\n\n"
  },
  {
    "path": "4/Makefile",
    "content": "IN=main.c cradle.c\nOUT=main\nFLAGS=-Wall -Werror\n\nall:\n\tgcc -o $(OUT) $(IN) $(FLAGS)\n\nrun:\n\t./$(OUT)\n\n.PHONY: clean\nclean:\n\trm $(OUT)\n"
  },
  {
    "path": "4/cradle.c",
    "content": "#include \"cradle.h\"\n#include <stdio.h>\n#include <stdlib.h>\n\n/* Helper Functions */\nchar uppercase(char c)\n{\n    return (c & 0xDD);\n}\n\nvoid GetChar() \n{\n    Look = getchar();\n    /* printf(\"Getchar: %c\\n\", Look); */\n}\n\n\nvoid Error(char *s)\n{\n    printf(\"\\nError: %s.\", s);\n}\n\nvoid Abort(char *s)\n{\n    Error(s);\n    exit(1);\n}\n\n\nvoid Expected(char *s)\n{\n    sprintf(tmp, \"%s Expected\", s);\n    Abort(tmp);\n}\n\n\nvoid Match(char x)\n{\n    if(Look == x) {\n        GetChar();\n    } else {\n        sprintf(tmp, \"' %c ' \",  x);\n        Expected(tmp);\n    }\n}\n\nvoid Newline()\n{\n    if (Look == '\\r') {\n        GetChar();\n        if (Look == '\\n') {\n            GetChar();\n        }\n    } else if (Look == '\\n') {\n        GetChar();\n    }\n}\n\nint IsAlpha(char c)\n{\n    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');\n} \n\nint IsDigit(char c)\n{\n    return (c >= '0') && (c <= '9');\n}\n\nint IsAddop(char c)\n{\n    return (c == '+') || (c == '-');\n}\n\nchar GetName()\n{\n    char c = Look;\n\n    if( !IsAlpha(Look)) {\n        sprintf(tmp, \"Name\");\n        Expected(tmp);\n    }\n\n    GetChar();\n\n    return uppercase(c);\n}\n\n\nint GetNum()\n{\n    int value = 0;\n    if( !IsDigit(Look)) {\n        sprintf(tmp, \"Integer\");\n        Expected(tmp);\n    }\n\n    while (IsDigit(Look)) {\n        value = value * 10 + Look - '0';\n        GetChar();\n    }\n\n    return value;\n}\n\nvoid Emit(char *s)\n{\n    printf(\"\\t%s\", s);\n}\n\nvoid EmitLn(char *s)\n{\n    Emit(s);\n    printf(\"\\n\");\n}\n\nvoid Init()\n{\n    InitTable();\n    GetChar();\n}\n\nvoid InitTable()\n{\n    int i;\n    for (i = 0; i < TABLE_SIZE; i++) {\n        Table[i] = 0;\n    }\n\n}\n"
  },
  {
    "path": "4/cradle.h",
    "content": "#ifndef _CRADLE_H\n#define _CRADLE_H\n\n#define MAX_BUF 100\n#define TABLE_SIZE 26\nchar tmp[MAX_BUF];\n\nchar Look;\nint Table[TABLE_SIZE];\n\nvoid GetChar();\n\nvoid Error(char *s);\nvoid Abort(char *s);\nvoid Expected(char *s);\nvoid Match(char x);\n\nvoid Newline();\n\nint IsAlpha(char c);\nint IsDigit(char c);\nint IsAddop(char c);\n\nchar GetName();\nint GetNum();\n\nvoid Emit(char *s);\nvoid EmitLn(char *s);\n\nvoid Init();\nvoid InitTable();\n\n#endif\n"
  },
  {
    "path": "4/main.c",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n\n#include \"cradle.h\"\n\nint Term();\nint Expression();\nvoid Add();\nvoid Substract();\nint Factor();\nvoid Ident();\nvoid Assignment();\n\n\n/* Not used in Chapter 4 */\nvoid Multiply()\n{\n    Match('*');\n    Factor();\n    EmitLn(\"imull (%esp), %eax\");\n    /* push of the stack */\n    EmitLn(\"addl $4, %esp\");\n} \n\n/* Not used in Chapter 4 */\nvoid Divide()\n{\n    Match('/');\n    Factor();\n\n    /* for a expersion like a/b we have eax=b and %(esp)=a\n     * but we need eax=a, and b on the stack \n     */\n    EmitLn(\"movl (%esp), %edx\");\n    EmitLn(\"addl $4, %esp\");\n\n    EmitLn(\"pushl %eax\");\n\n    EmitLn(\"movl %edx, %eax\");\n\n    /* sign extesnion */\n    EmitLn(\"sarl $31, %edx\");\n    EmitLn(\"idivl (%esp)\");\n    EmitLn(\"addl $4, %esp\");\n\n}\n\nvoid Ident()\n{\n    char name = GetName();\n    if (Look == '(') {\n        Match('(');\n        Match(')');\n        sprintf(tmp, \"call %c\", name);\n        EmitLn(tmp);\n    } else {\n        sprintf(tmp, \"movl %c, %%eax\", name);\n        EmitLn(tmp);\n    }\n}\n\nint Factor()\n{\n    int factor;\n    if (Look == '(') {\n        Match('(');\n        factor = Expression();\n        Match(')');\n    } else if (IsAlpha(Look)) {\n        factor = Table[GetName() - 'A'];\n    } else {\n        factor = GetNum();\n    }\n\n    return factor;\n}\n\nint Term()\n{\n    int value = Factor();\n    while (strchr(\"*/\", Look)) {\n        switch(Look)\n        {\n            case '*':\n                Match('*');\n                value *= Factor();\n                break;\n            case '/':\n                Match('/');\n                value /= Factor();\n                break;\n            default:\n                Expected(\"Mulop\");\n        }\n    }\n\n    return value;\n}\n\nint Expression()\n{\n    int value;\n    if(IsAddop(Look))\n        value = 0;\n    else\n        value = Term();\n\n    while (IsAddop(Look)) {\n        switch(Look)\n        {\n            case '+':\n                Match('+');\n                value += Term();\n                break;\n            case '-':\n                Match('-');\n                value -= Term();\n                break;\n            default:\n                Expected(\"Addop\");\n        }\n    }\n\n    return value;\n}\n\n\n/* Not used in Chapter 4 */\nvoid Add()\n{\n    Match('+');\n    Term();\n    EmitLn(\"addl (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n    \n}\n\n\n/* Not used in Chapter 4 */\nvoid Substract()\n{\n    Match('-');\n    Term();\n    EmitLn(\"subl (%esp), %eax\");\n    EmitLn(\"negl %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\nvoid Assignment()\n{\n    char name = GetName();\n    Match('=');\n    Table[name - 'A'] = Expression();\n}\n\n/* Input Routine\n * We do a little different to the original article.  The syntax of\n * input is \"?<variable name><expression>\" */\nvoid Input()\n{\n    Match('?');\n    char name = GetName();\n    Table[name - 'A'] = Expression();\n}\n\n/* Output Routine */\nvoid Output()\n{\n    Match('!');\n    sprintf(tmp, \"%d\", Table[GetName() - 'A']);\n    EmitLn(tmp);\n}\n\nint main()\n{\n\n    Init();\n    do\n    {\n        switch(Look) {\n        case '?':\n            Input();\n            break;\n        case '!':\n            Output();\n            break;\n        default:\n            Assignment();\n        }\n\n        Newline();\n    } while (Look != '.');\n    return 0;\n}\n"
  },
  {
    "path": "4/tutor4.txt",
    "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                     LET'S BUILD A COMPILER!\n\n                                By\n\n                     Jack W. Crenshaw, Ph.D.\n\n                           24 July 1988\n\n\n                      Part IV: INTERPRETERS\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1988 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\nINTRODUCTION\n\nIn the first three installments of this series,  we've  looked at\nparsing and  compiling math expressions, and worked our way grad-\nually and methodically from dealing  with  very  simple one-term,\none-character \"expressions\" up through more general ones, finally\narriving at a very complete parser that could parse and translate\ncomplete  assignment  statements,  with  multi-character  tokens,\nembedded white space, and function calls.  This  time,  I'm going\nto walk you through the process one more time, only with the goal\nof interpreting rather than compiling object code.\n\nSince this is a series on compilers, why should  we  bother  with\ninterpreters?  Simply because I want you to see how the nature of\nthe  parser changes as we change the goals.  I also want to unify\nthe concepts of the two types of translators, so that you can see\nnot only the differences, but also the similarities.\n\nConsider the assignment statement\n\n               x = 2 * y + 3\n\nIn a compiler, we want the target CPU to execute  this assignment\nat EXECUTION time.  The translator itself doesn't  do  any arith-\nmetic ... it only issues the object code that will cause  the CPU\nto do it when the code is executed.  For  the  example above, the\ncompiler would issue code to compute the expression and store the\nresults in variable x.\n\nFor an interpreter,  on  the  other  hand, no object code is gen-\nerated.   Instead, the arithmetic is computed immediately, as the\nparsing is going on.  For the example, by the time parsing of the\nstatement is complete, x will have a new value.\n\nThe approach we've been  taking  in  this  whole series is called\n\"syntax-driven translation.\"  As you are aware by now, the struc-\nture of the  parser  is  very  closely  tied to the syntax of the\nproductions we parse.  We  have built Pascal procedures that rec-\nognize every language  construct.   Associated with each of these\nconstructs (and procedures) is  a  corresponding  \"action,\" which\ndoes  whatever  makes  sense to do  once  a  construct  has  been\nrecognized.    In  our  compiler  so far, every  action  involves\nemitting object code, to be executed later at execution time.  In\nan interpreter, every action  involves  something  to be done im-\nmediately.\n\nWhat I'd like you to see here is that the  layout  ... the struc-\nture ... of  the  parser  doesn't  change.  It's only the actions\nthat change.   So  if  you  can  write an interpreter for a given\nlanguage, you can also write a compiler, and vice versa.  Yet, as\nyou  will  see,  there  ARE  differences,  and  significant ones.\nBecause the actions are different,  the  procedures  that  do the\nrecognizing end up being written differently.    Specifically, in\nthe interpreter  the recognizing procedures end up being coded as\nFUNCTIONS that return numeric values to their callers.    None of\nthe parsing routines for our compiler did that.\n\nOur compiler, in fact,  is  what we might call a \"pure\" compiler.\nEach time a construct is recognized, the object  code  is emitted\nIMMEDIATELY.  (That's one reason the code is not very efficient.)\nThe interpreter we'll be building  here is a pure interpreter, in\nthe sense that there is  no  translation,  such  as \"tokenizing,\"\nperformed on the source code.  These represent  the  two extremes\nof translation.  In  the  real  world,  translators are rarely so\npure, but tend to have bits of each technique.\n\nI can think of  several  examples.    I've already mentioned one:\nmost interpreters, such as Microsoft BASIC,  for  example, trans-\nlate the source code (tokenize it) into an  intermediate  form so\nthat it'll be easier to parse real time.\n\nAnother example is an assembler.  The purpose of an assembler, of\ncourse, is to produce object code, and it normally does that on a\none-to-one basis: one object instruction per line of source code.\nBut almost every assembler also permits expressions as arguments.\nIn this case, the expressions  are  always  constant expressions,\nand  so the assembler isn't supposed to  issue  object  code  for\nthem.  Rather,  it  \"interprets\" the expressions and computes the\ncorresponding constant result, which is what it actually emits as\nobject code.\n\nAs a matter of fact, we  could  use  a bit of that ourselves. The\ntranslator we built in the  previous  installment  will dutifully\nspit out object code  for  complicated  expressions,  even though\nevery term in  the  expression  is  a  constant.  In that case it\nwould be far better if the translator behaved a bit more  like an\ninterpreter, and just computed the equivalent constant result.\n\nThere is  a concept in compiler theory called \"lazy\" translation.\nThe  idea is that you typically don't just  emit  code  at  every\naction.  In fact, at the extreme you don't emit anything  at all,\nuntil  you  absolutely  have to.  To accomplish this, the actions\nassociated with the parsing routines  typically  don't  just emit\ncode.  Sometimes  they  do,  but  often  they  simply  return in-\nformation back to the caller.  Armed with  such  information, the\ncaller can then make a better choice of what to do.\n\nFor example, given the statement\n\n               x = x + 3 - 2 - (5 - 4)  ,\n\nour compiler will dutifully spit  out a stream of 18 instructions\nto load each parameter into  registers,  perform  the arithmetic,\nand store the result.  A lazier evaluation  would  recognize that\nthe arithmetic involving constants can  be  evaluated  at compile\ntime, and would reduce the expression to\n\n               x = x + 0  .\n\nAn  even  lazier  evaluation would then be smart enough to figure\nout that this is equivalent to\n\n               x = x  ,\n\nwhich  calls  for  no  action  at  all.   We could reduce 18  in-\nstructions to zero!\n\nNote that there is no chance of optimizing this way in our trans-\nlator as it stands, because every action takes place immediately.\n\nLazy  expression  evaluation  can  produce  significantly  better\nobject code than  we  have  been  able  to  so  far.  I warn you,\nthough: it complicates the parser code considerably, because each\nroutine now has to make decisions as to whether  to  emit  object\ncode or not.  Lazy evaluation is certainly not named that because\nit's easier on the compiler writer!\n\nSince we're operating mainly on  the KISS principle here, I won't\ngo  into much more depth on this subject.  I just want you to  be\naware  that  you  can get some code optimization by combining the\ntechniques of compiling and  interpreting.    In  particular, you\nshould know that the parsing  routines  in  a  smarter translator\nwill generally  return  things  to  their  caller,  and sometimes\nexpect things as  well.    That's  the main reason for going over\ninterpretation in this installment.\n\n\nTHE INTERPRETER\n\nOK, now that you know WHY we're going into all this, let's do it.\nJust to give you practice, we're going to start over with  a bare\ncradle and build up the translator all over again.  This time, of\ncourse, we can go a bit faster.\n\nSince we're now going  to  do arithmetic, the first thing we need\nto do is to change function GetNum, which up till now  has always\nreturned a character  (or  string).    Now, it's better for it to\nreturn an integer.    MAKE  A  COPY of the cradle (for goodness's\nsake, don't change the version  in  Cradle  itself!!)  and modify\nGetNum as follows:\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nfunction GetNum: integer;\nbegin\n   if not IsDigit(Look) then Expected('Integer');\n   GetNum := Ord(Look) - Ord('0');\n   GetChar;\nend;\n{--------------------------------------------------------------}\n\n\nNow, write the following version of Expression:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nfunction Expression: integer;\nbegin\n   Expression := GetNum;\nend;\n{--------------------------------------------------------------}\n\n\nFinally, insert the statement\n\n\n   Writeln(Expression);\n\n\nat the end of the main program.  Now compile and test.\n\nAll this program  does  is  to  \"parse\"  and  translate  a single\ninteger  \"expression.\"    As always, you should make sure that it\ndoes that with the digits 0..9, and gives an  error  message  for\nanything else.  Shouldn't take you very long!\n\nOK, now let's extend this to include addops.    Change Expression\nto read:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nfunction Expression: integer;\nvar Value: integer;\nbegin\n   if IsAddop(Look) then\n      Value := 0\n   else\n      Value := GetNum;\n   while IsAddop(Look) do begin\n      case Look of\n       '+': begin\n               Match('+');\n               Value := Value + GetNum;\n            end;\n       '-': begin\n               Match('-');\n               Value := Value - GetNum;\n            end;\n      end;\n   end;\n   Expression := Value;\nend;\n{--------------------------------------------------------------}\n\n\nThe structure of Expression, of  course,  parallels  what  we did\nbefore,  so  we  shouldn't have too much  trouble  debugging  it.\nThere's  been  a  SIGNIFICANT  development, though, hasn't there?\nProcedures Add and Subtract went away!  The reason  is  that  the\naction to be taken  requires  BOTH arguments of the operation.  I\ncould have chosen to retain the procedures and pass into them the\nvalue of the expression to date,  which  is Value.  But it seemed\ncleaner to me to  keep  Value as strictly a local variable, which\nmeant that the code for Add and Subtract had to be moved in line.\nThis result suggests  that,  while the structure we had developed\nwas nice and  clean  for our simple-minded translation scheme, it\nprobably  wouldn't do for use with lazy  evaluation.    That's  a\nlittle tidbit we'll probably want to keep in mind for later.\n\nOK,  did the translator work?  Then let's  take  the  next  step.\nIt's not hard to  figure  out what procedure Term should now look\nlike.  Change every call to GetNum in function  Expression  to  a\ncall to Term, and then enter the following form for Term:\n\n\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Term }\n\nfunction Term: integer;\nvar Value: integer;\nbegin\n   Value := GetNum;\n   while Look in ['*', '/'] do begin\n      case Look of\n       '*': begin\n               Match('*');\n               Value := Value * GetNum;\n            end;\n       '/': begin\n               Match('/');\n               Value := Value div GetNum;\n            end;\n      end;\n   end;\n   Term := Value;\nend;\n{--------------------------------------------------------------}\n\nNow, try it out.    Don't forget two things: first, we're dealing\nwith integer division, so, for example, 1/3 should come out zero.\nSecond, even  though we can output multi-digit results, our input\nis still restricted to single digits.\n\nThat seems like a silly restriction at this point, since  we have\nalready  seen how easily function GetNum can  be  extended.    So\nlet's go ahead and fix it right now.  The new version is\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nfunction GetNum: integer;\nvar Value: integer;\nbegin\n   Value := 0;\n   if not IsDigit(Look) then Expected('Integer');\n   while IsDigit(Look) do begin\n      Value := 10 * Value + Ord(Look) - Ord('0');\n      GetChar;\n   end;\n   GetNum := Value;\nend;\n{--------------------------------------------------------------}\n\n\nIf you've compiled and  tested  this  version of the interpreter,\nthe  next  step  is to install function Factor, complete with pa-\nrenthesized  expressions.  We'll hold off a  bit  longer  on  the\nvariable  names.    First, change the references  to  GetNum,  in\nfunction Term, so that they call Factor instead.   Now  code  the\nfollowing version of Factor:\n\n\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Factor }\n\nfunction Expression: integer; Forward;\n\nfunction Factor: integer;\nbegin\n   if Look = '(' then begin\n      Match('(');\n      Factor := Expression;\n      Match(')');\n      end\n   else\n       Factor := GetNum;\nend;\n{---------------------------------------------------------------}\n\nThat was pretty easy, huh?  We're rapidly closing in on  a useful\ninterpreter.\n\n\nA LITTLE PHILOSOPHY\n\nBefore going any further, there's something I'd like  to  call to\nyour attention.  It's a concept that we've been making use  of in\nall these sessions, but I haven't explicitly mentioned it up till\nnow.  I think it's time, because it's a concept so useful, and so\npowerful,  that  it  makes all the difference  between  a  parser\nthat's trivially easy, and one that's too complex to deal with.\n\nIn the early days of compiler technology, people  had  a terrible\ntime  figuring  out  how to deal with things like operator prece-\ndence  ...  the  way  that  multiply  and  divide operators  take\nprecedence over add and subtract, etc.  I remember a colleague of\nsome  thirty years ago, and how excited he was to find out how to\ndo it.  The technique used involved building two  stacks,    upon\nwhich you pushed each operator  or operand.  Associated with each\noperator was a precedence level,  and the rules required that you\nonly actually performed an operation  (\"reducing\"  the  stack) if\nthe precedence level showing on top of the stack was correct.  To\nmake life more interesting,  an  operator  like ')' had different\nprecedence levels, depending  upon  whether or not it was already\non the stack.  You  had to give it one value before you put it on\nthe stack, and another to decide when to take it  off.   Just for\nthe experience, I worked all of  this  out for myself a few years\nago, and I can tell you that it's very tricky.\n\nWe haven't  had  to  do  anything like that.  In fact, by now the\nparsing of an arithmetic statement should seem like child's play.\nHow did we get so lucky?  And where did the precedence stacks go?\n\nA similar thing is going on  in  our interpreter above.  You just\nKNOW that in  order  for  it  to do the computation of arithmetic\nstatements (as opposed to the parsing of them), there have  to be\nnumbers pushed onto a stack somewhere.  But where is the stack?\n\nFinally,  in compiler textbooks, there are  a  number  of  places\nwhere  stacks  and  other structures are discussed.  In the other\nleading parsing method (LR), an explicit stack is used.  In fact,\nthe technique is very  much  like the old way of doing arithmetic\nexpressions.  Another concept  is  that of a parse tree.  Authors\nlike to draw diagrams  of  the  tokens  in a statement, connected\ninto a tree with  operators  at the internal nodes.  Again, where\nare the trees and stacks in our technique?  We haven't seen any.\nThe answer in all cases is that the structures are  implicit, not\nexplicit.    In  any computer language, there is a stack involved\nevery  time  you  call  a  subroutine.  Whenever a subroutine  is\ncalled, the return address is pushed onto the CPU stack.   At the\nend of the subroutine, the address is popped back off and control\nis  transferred  there.   In a recursive language such as Pascal,\nthere can also be local data pushed onto the stack, and  it, too,\nreturns when it's needed.\n\nFor example,  function  Expression  contains  a  local  parameter\ncalled  Value, which it fills by a call to Term.  Suppose, in its\nnext call to  Term  for  the  second  argument,  that  Term calls\nFactor, which recursively  calls  Expression  again.    That \"in-\nstance\" of Expression gets another value for its  copy  of Value.\nWhat happens  to  the  first  Value?    Answer: it's still on the\nstack, and  will  be  there  again  when  we return from our call\nsequence.\n\nIn other words, the reason things look so simple  is  that  we've\nbeen making maximum use of the resources of the  language.    The\nhierarchy levels  and  the  parse trees are there, all right, but\nthey're hidden within the  structure  of  the parser, and they're\ntaken care of by the order with which the various  procedures are\ncalled.  Now that you've seen how we do it, it's probably hard to\nimagine doing it  any other way.  But I can tell you that it took\na lot of years for compiler writers to get that smart.  The early\ncompilers were too complex  too  imagine.    Funny how things get\neasier with a little practice.\n\nThe reason  I've  brought  all  this up is as both a lesson and a\nwarning.  The lesson: things can be easy when you do  them right.\nThe warning: take a look at what you're doing.  If, as you branch\nout on  your  own,  you  begin to find a real need for a separate\nstack or tree structure, it may be time to ask yourself if you're\nlooking at things the right way.  Maybe you just aren't using the\nfacilities of the language as well as you could be.\n\n\nThe next step is to add variable names.  Now,  though,  we have a\nslight problem.  For  the  compiler, we had no problem in dealing\nwith variable names ... we just issued the names to the assembler\nand let the rest  of  the program take care of allocating storage\nfor  them.  Here, on the other hand, we need to be able to  fetch\nthe values of the variables and return them as the  return values\nof Factor.  We need a storage mechanism for these variables.\n\nBack in the early days of personal computing,  Tiny  BASIC lived.\nIt had  a  grand  total  of  26  possible variables: one for each\nletter of the  alphabet.    This  fits nicely with our concept of\nsingle-character tokens, so we'll  try  the  same  trick.  In the\nbeginning of your  interpreter,  just  after  the  declaration of\nvariable Look, insert the line:\n\n               Table: Array['A'..'Z'] of integer;\n\nWe also need to initialize the array, so add this procedure:\n\n\n\n\n{---------------------------------------------------------------}\n{ Initialize the Variable Area }\n\nprocedure InitTable;\nvar i: char;\nbegin\n   for i := 'A' to 'Z' do\n      Table[i] := 0;\nend;\n{---------------------------------------------------------------}\n\n\nYou must also insert a call to InitTable, in procedure Init.\nDON'T FORGET to do that, or the results may surprise you!\n\nNow that we have an array  of  variables, we can modify Factor to\nuse it.  Since we don't have a way (so far) to set the variables,\nFactor  will always return zero values for  them,  but  let's  go\nahead and extend it anyway.  Here's the new version:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Factor }\n\nfunction Expression: integer; Forward;\n\nfunction Factor: integer;\nbegin\n   if Look = '(' then begin\n      Match('(');\n      Factor := Expression;\n      Match(')');\n      end\n   else if IsAlpha(Look) then\n      Factor := Table[GetName]\n   else\n       Factor := GetNum;\nend;\n{---------------------------------------------------------------}\n\n\nAs always, compile and test this version of the  program.    Even\nthough all the variables are now zeros, at least we can correctly\nparse the complete expressions, as well as catch any badly formed\nexpressions.\n\nI suppose you realize the next step: we need to do  an assignment\nstatement so we can  put  something INTO the variables.  For now,\nlet's  stick  to  one-liners,  though  we will soon  be  handling\nmultiple statements.\n\nThe assignment statement parallels what we did before:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Assignment Statement }\n                             \n\n\nprocedure Assignment;\nvar Name: char;\nbegin\n   Name := GetName;\n   Match('=');\n   Table[Name] := Expression;\nend;\n{--------------------------------------------------------------}\n\n\nTo test this,  I  added  a  temporary write statement in the main\nprogram,  to  print out the value of A.  Then I  tested  it  with\nvarious assignments to it.\n\nOf course, an interpretive language that can only accept a single\nline of program  is not of much value.  So we're going to want to\nhandle multiple statements.  This  merely  means  putting  a loop\naround  the  call  to Assignment.  So let's do that now. But what\nshould be the loop exit criterion?  Glad you  asked,  because  it\nbrings up a point we've been able to ignore up till now.\n\nOne of the most tricky things  to  handle in any translator is to\ndetermine when to bail out of  a  given construct and go look for\nsomething else.  This hasn't been a problem for us so far because\nwe've only allowed for  a  single kind of construct ... either an\nexpression  or an assignment statement.   When  we  start  adding\nloops and different kinds of statements, you'll find that we have\nto be very careful that things terminate properly.  If we put our\ninterpreter in a loop, we need a way to quit.    Terminating on a\nnewline is no good, because that's what sends us back for another\nline.  We could always let an unrecognized character take us out,\nbut that would cause every run to end in an error  message, which\ncertainly seems uncool.\n\nWhat we need  is  a  termination  character.  I vote for Pascal's\nending period ('.').   A  minor  complication  is that Turbo ends\nevery normal line  with  TWO characters, the carriage return (CR)\nand line feed (LF).   At  the  end  of  each line, we need to eat\nthese characters before processing the next one.   A  natural way\nto do this would  be  with  procedure  Match, except that Match's\nerror  message  prints  the character, which of course for the CR\nand/or  LF won't look so great.  What we need is a special proce-\ndure for this, which we'll no doubt be using over and over.  Here\nit is:\n\n\n{--------------------------------------------------------------}\n{ Recognize and Skip Over a Newline }\n\nprocedure NewLine;\nbegin\n   if Look = CR then begin\n      GetChar;\n      if Look = LF then\n         GetChar;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nInsert this procedure at any convenient spot ... I put  mine just\nafter Match.  Now, rewrite the main program to look like this:\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\n   repeat\n      Assignment;\n      NewLine;\n   until Look = '.';\nend.\n{--------------------------------------------------------------}\n\n\nNote that the  test for a CR is now gone, and that there are also\nno  error tests within NewLine itself.   That's  OK,  though  ...\nwhatever is left over in terms of bogus characters will be caught\nat the beginning of the next assignment statement.\n\nWell, we now have a functioning interpreter.  It doesn't do  us a\nlot of  good,  however,  since  we have no way to read data in or\nwrite it out.  Sure would help to have some I/O!\n\nLet's wrap this session  up,  then,  by  adding the I/O routines.\nSince we're  sticking to single-character tokens, I'll use '?' to\nstand for a read statement, and  '!'  for a write, with the char-\nacter  immediately  following  them  to  be used as  a  one-token\n\"parameter list.\"  Here are the routines:\n\n{--------------------------------------------------------------}\n{ Input Routine }\n\nprocedure Input;\nbegin\n   Match('?');\n   Read(Table[GetName]);\nend;\n\n\n{--------------------------------------------------------------}\n{ Output Routine }\n\nprocedure Output;\nbegin\n   Match('!');\n   WriteLn(Table[GetName]);\nend;\n{--------------------------------------------------------------}\n\nThey aren't very fancy, I admit ... no prompt character on input,\nfor example ... but they get the job done.\n\nThe corresponding changes in  the  main  program are shown below.\nNote that we use the usual  trick  of a case statement based upon\nthe current lookahead character, to decide what to do.\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\n   repeat\n      case Look of\n       '?': Input;\n       '!': Output;\n       else Assignment;\n      end;\n      NewLine;\n   until Look = '.';\nend.\n{--------------------------------------------------------------}\n\n\nYou have now completed a  real, working interpreter.  It's pretty\nsparse, but it works just like the \"big boys.\"  It includes three\nkinds of program statements  (and  can  tell the difference!), 26\nvariables,  and  I/O  statements.  The only things that it lacks,\nreally, are control statements,  subroutines,    and some kind of\nprogram editing function.  The program editing part, I'm going to\npass on.  After all, we're  not  here  to build a product, but to\nlearn  things.    The control statements, we'll cover in the next\ninstallment, and the subroutines soon  after.  I'm anxious to get\non with that, so we'll leave the interpreter as it stands.\n\nI hope that by  now  you're convinced that the limitation of sin-\ngle-character names  and the processing of white space are easily\ntaken  care  of, as we did in the last session.   This  time,  if\nyou'd like to play around with these extensions, be my  guest ...\nthey're  \"left as an exercise for the student.\"    See  you  next\ntime.\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1988 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n 1 --\n\n\n"
  },
  {
    "path": "5/Makefile",
    "content": "IN=main.c cradle.c\nOUT=main\nFLAGS=-Wall -Werror\n\nall:\n\tgcc -o $(OUT) $(IN) $(FLAGS)\n\nrun:\n\t./$(OUT)\n\n.PHONY: clean\nclean:\n\trm $(OUT)\n"
  },
  {
    "path": "5/README.md",
    "content": "# Some Notes on the original Article\n\n## The IF statement\nThe BNF shown on this chapter had some bugs. The author said that `<program>`\nand `<block>` are as follows:\n\n```\n<program> ::= <block> END\n<block> ::= [ <statement> ]*\n```\n\nThat means a `<block>` can contain zero or more statments. However, there is\nno way to decide when to terminate a `<block>`. When dealing with \"IF\" without\n\"ELSE\", the problem is not yet obvious:\n\n```\nIF <condition> <block> ENDIF\n```\n\nBecause the author imply that `<block>` should not be ended with token \"END\",\nnow we can treat token \"ENDIF\" which is represent by character 'e' as the\ntermination of block. That means source code like \"aibced\" is allowed, aka\n\"bc\" this two statements are treated as one block.\n\nWhen it comes to \"IF\" statement with \"ELSE\", things just change.\n\n```\nIF <condition> <block> [ ELSE <block>] ENDIF\n```\n\nWhile we are dealing with the first `<block>`, there is no way to quit\nrecognizing `<block>` and match token \"ELSE\". Especially when the author\npropose test case \"aiblcede\". It was expected to recognize \"b\" as a single\nblock and \"l\" as token \"ELSE\". Which however is not possible because there is\nno way to determine the exit of `<block>`.\n\n```\nprocedure Block;\nbegin\n   while not(Look in ['e']) do begin\n      case Look of\n       'i': DoIf;\n       'o': Other;\n      end;\n   end;\nend;\n```\n\nThe source code implies that every block should end with token \"END\" which is\ncharacter 'e'. One way to fix this is to explicitly state that a `<block>`\nends with token \"END\". So the test case should be \"aibelceede\" and the BNF\nshould explicitly be:\n\n```\n<program> ::= <block>\n<block> ::= [ <statement> ]* END\n```\n\nAnother way is to tell the block matcher to recognize the \"ELSE\" token as the\nauthor did in the next section.\n\n```\nprocedure Block;\nbegin\n   while not(Look in ['e', 'l']) do begin\n      case Look of\n       'i': DoIf;\n       'w': DoWhile;\n       else Other;\n      end;\n   end;\nend;\n```\n\nAlso note the `else` branch instead of the original 'o' branch.\n"
  },
  {
    "path": "5/cradle.c",
    "content": "#include \"cradle.h\"\n#include <stdio.h>\n#include <stdlib.h>\n\n#define TABLE_SIZE 26\nstatic int LCount = 0;\nstatic char labelName[MAX_BUF];\n\nstatic int Table[TABLE_SIZE];\n\n/* Helper Functions */\nchar uppercase(char c)\n{\n    return (c & 0xDF);\n}\n\nvoid GetChar() \n{\n    Look = getchar();\n    /* printf(\"Getchar: %c\\n\", Look); */\n}\n\n\nvoid Error(char *s)\n{\n    printf(\"\\nError: %s.\", s);\n}\n\nvoid Abort(char *s)\n{\n    Error(s);\n    exit(1);\n}\n\n\nvoid Expected(char *s)\n{\n    sprintf(tmp, \"%s Expected\", s);\n    Abort(tmp);\n}\n\n\nvoid Match(char x)\n{\n    if(Look == x) {\n        GetChar();\n    } else {\n        sprintf(tmp, \"' %c ' \",  x);\n        Expected(tmp);\n    }\n}\n\nvoid Newline()\n{\n    if (Look == '\\r') {\n        GetChar();\n        if (Look == '\\n') {\n            GetChar();\n        }\n    } else if (Look == '\\n') {\n        GetChar();\n    }\n}\n\nint IsAlpha(char c)\n{\n    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');\n} \n\nint IsDigit(char c)\n{\n    return (c >= '0') && (c <= '9');\n}\n\nint IsAddop(char c)\n{\n    return (c == '+') || (c == '-');\n}\n\nchar GetName()\n{\n    char c = Look;\n\n    if( !IsAlpha(Look)) {\n        sprintf(tmp, \"Name\");\n        Expected(tmp);\n    }\n\n    GetChar();\n\n    return uppercase(c);\n}\n\n\nint GetNum()\n{\n    int value = 0;\n    if( !IsDigit(Look)) {\n        sprintf(tmp, \"Integer\");\n        Expected(tmp);\n    }\n\n    while (IsDigit(Look)) {\n        value = value * 10 + Look - '0';\n        GetChar();\n    }\n\n    return value;\n}\n\nvoid Emit(char *s)\n{\n    printf(\"\\t%s\", s);\n}\n\nvoid EmitLn(char *s)\n{\n    Emit(s);\n    printf(\"\\n\");\n}\n\nvoid Init()\n{\n    LCount = 0;\n\n    InitTable();\n    GetChar();\n}\n\nvoid InitTable()\n{\n    int i;\n    for (i = 0; i < TABLE_SIZE; i++) {\n        Table[i] = 0;\n    }\n\n}\n\nchar *NewLabel()\n{\n    sprintf(labelName, \"L%02d\", LCount);\n    LCount ++;\n    return labelName;\n}\n\nvoid PostLabel(char *label)\n{\n    printf(\"%s:\\n\", label);\n}\n"
  },
  {
    "path": "5/cradle.h",
    "content": "#ifndef _CRADLE_H\n#define _CRADLE_H\n\n#define MAX_BUF 100\nstatic char tmp[MAX_BUF];\nchar Look;\n\nvoid GetChar();\n\nvoid Error(char *s);\nvoid Abort(char *s);\nvoid Expected(char *s);\nvoid Match(char x);\n\nvoid Newline();\n\nint IsAlpha(char c);\nint IsDigit(char c);\nint IsAddop(char c);\n\nchar GetName();\nint GetNum();\n\nvoid Emit(char *s);\nvoid EmitLn(char *s);\n\nvoid Init();\nvoid InitTable();\n\nchar *NewLabel();\nvoid PostLabel(char *label);\n#endif\n"
  },
  {
    "path": "5/main.c",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n\n#include \"cradle.h\"\n\n#ifdef DEBUG\n#define dprint(fmt, ...) printf(fmt, __VA_ARGS__);\n#else\n#define dprint(fmt, ...)\n#endif\n\nvoid Other();\nvoid Block(char *L);\nvoid Condition();\nvoid DoProgram();\nvoid DoIf(char *L);\nvoid DoWhile();\nvoid DoLoop();\nvoid DoRepeat();\nvoid DoFor();\nvoid Expression();\nvoid DoDo();\nvoid DoBreak(char *L);\n\nvoid Other()\n{\n    sprintf(tmp, \"%c\", GetName());\n    EmitLn(tmp);\n}\n\nvoid Block(char *L)\n{\n    while (! strchr(\"elu\", Look)) {\n        dprint(\"Block: get Look = %c\\n\", Look);\n        switch (Look) {\n            case 'i':\n                DoIf(L);\n                break;\n            case 'w':\n                DoWhile();\n                break;\n            case 'p':\n                DoLoop();\n                break;\n            case 'r':\n                DoRepeat();\n                break;\n            case 'f':\n                DoFor();\n                break;\n            case 'd':\n                DoDo();\n                break;\n            case 'b':\n                DoBreak(L);\n            default:\n                Other();\n                break;\n        }\n        /* this is for convinent, otherwise newline character will\n        cause an error */\n        Newline();\n    }\n}\n\nvoid Condition()\n{\n    EmitLn(\"<codition>\");\n}\n\nvoid DoProgram()\n{\n    Block(NULL);\n    if (Look != 'e') {\n        Expected(\"End\");\n    }\n    EmitLn(\"END\");\n}\n\nvoid DoIf(char *L)\n{\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    strcpy(L1, NewLabel());\n    strcpy(L2, L1);\n\n    Match('i');\n    Condition();\n\n    sprintf(tmp, \"jz %s\", L1);\n    EmitLn(tmp);\n\n    Block(L);\n    dprint(\"DoIf: Got Look = %c\\n\", Look);\n\n    if (Look == 'l') {\n        /* match *else* statement */\n        Match('l');\n        strcpy(L2, NewLabel());\n\n        sprintf(tmp, \"jmp %s\", L2);\n        EmitLn(tmp);\n\n        PostLabel(L1);\n\n        Block(L);\n    }\n\n    Match('e');\n    PostLabel(L2);\n}\n\nvoid DoWhile()\n{\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n\n    Match('w');\n    strcpy(L1, NewLabel());\n    strcpy(L2, NewLabel());\n    PostLabel(L1);\n    Condition();\n    sprintf(tmp, \"jz %s\", L2);\n    EmitLn(tmp);\n    Block(L2);\n    Match('e');\n    sprintf(tmp, \"jmp %s\", L1);\n    EmitLn(tmp);\n    PostLabel(L2);\n}\n\nvoid DoLoop()\n{\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    Match('p');\n    strcpy(L1, NewLabel());\n    strcpy(L2, NewLabel());\n    PostLabel(L1);\n    Block(L2);\n    Match('e');\n    sprintf(tmp, \"jmp %s\", L1);\n    EmitLn(tmp);\n    PostLabel(L2);\n}\n\nvoid DoRepeat()\n{\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    Match('r');\n    strcpy(L1, NewLabel());\n    strcpy(L2, NewLabel());\n    PostLabel(L1);\n    Block(L2);\n    Match('u');\n    Condition();\n\n    sprintf(tmp, \"jz %s\", L1);\n    EmitLn(tmp);\n    PostLabel(L2);\n}\n\n/* I haven't test the actual generated x86 code here, so you're free to\n * inform me if there are bugs. :) */\nvoid DoFor()\n{\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n\n    Match('f');\n    strcpy(L1, NewLabel());\n    strcpy(L2, NewLabel());\n    char name = GetName();\n    Match('=');\n    Expression();\n    EmitLn(\"subl %eax, $1\");  /* SUBQ #1, D0*/\n    sprintf(tmp, \"lea %c, %%edx\", name);\n    EmitLn(tmp);\n    EmitLn(\"movl %eax, (%edx)\");\n    Expression();\n    EmitLn(\"push %eax\"); /* save the execution of expression */\n    PostLabel(L1);\n    sprintf(tmp, \"lea %c, %%edx\", name);\n    EmitLn(tmp);\n    EmitLn(\"movl (%edx), %eax\");\n    EmitLn(\"addl %eax, 1\");\n    EmitLn(\"movl %eax, (%edx)\");\n    EmitLn(\"cmp (%esp), %eax\");\n    sprintf(tmp, \"jg %s\", L2);\n    EmitLn(tmp);\n    Block(L2);\n    Match('e');\n    sprintf(tmp, \"jmp %s\", L1);\n    EmitLn(tmp);\n    PostLabel(L2);\n    EmitLn(\"pop %eax\");\n}\n\nvoid Expression()\n{\n    EmitLn(\"<expression>\");\n}\n\nvoid DoDo()\n{\n    Match('d');\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    strcpy(L1, NewLabel());\n    strcpy(L2, NewLabel());\n    Expression();\n    EmitLn(\"subl %eax, $1\");\n    EmitLn(\"movl %eax, %ecx\");\n    PostLabel(L1);\n    EmitLn(\"pushl %ecx\");\n    Block(L2);\n    EmitLn(\"popl %ecx\");\n    sprintf(tmp, \"loop %s\", L1);\n    EmitLn(tmp);\n    EmitLn(\"pushl %ecx\");\n    PostLabel(L2);\n    EmitLn(\"popl %ecx\");\n}\n\nvoid DoBreak(char *L)\n{\n    Match('b');\n    if (L != NULL) {\n        sprintf(tmp, \"jmp %s\", L);\n        EmitLn(tmp);\n    } else {\n        Abort(\"No loop to break from\");\n    }\n}\n\nint main()\n{\n    Init();\n    DoProgram();\n    return 0;\n}\n"
  },
  {
    "path": "5/tutor5.txt",
    "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                     LET'S BUILD A COMPILER!\n\n                                By\n\n                     Jack W. Crenshaw, Ph.D.\n\n                          19 August 1988\n\n\n                    Part V: CONTROL CONSTRUCTS\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1988 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\nINTRODUCTION\n\nIn  the  first  four  installments  of  this  series, we've  been\nconcentrating on the parsing of math  expressions  and assignment\nstatements.  In  this  installment,  we'll  take off on a new and\nexciting  tangent:  that   of  parsing  and  translating  control\nconstructs such as IF statements.\n\nThis subject is dear to my heart, because it represents a turning\npoint  for  me.    I  had  been  playing  with  the   parsing  of\nexpressions, just as  we  have  done  in this series, but I still\nfelt that I was a LONG way from being able  to  handle a complete\nlanguage.  After all, REAL  languages have branches and loops and\nsubroutines and all that.  Perhaps you've shared some of the same\nthoughts.    Awhile  back,  though,  I  had  to  produce  control\nconstructs for a structured assembler preprocessor I was writing.\nImagine my surprise to  discover  that it was far easier than the\nexpression  parsing  I  had  already  been through.   I  remember\nthinking, \"Hey! This is EASY!\" After we've finished this session,\nI'll bet you'll be thinking so, too.\n\n\nTHE PLAN\n\nIn what follows, we'll be starting over again with a bare cradle,\nand as we've done twice before now, we'll build things up  one at\na time.  We'll also  be retaining the concept of single-character\ntokens that has served us so well to date.   This  means that the\n\"code\" will look a little funny, with 'i' for IF, 'w'  for WHILE,\netc.  But it helps us  get  the concepts down pat without fussing\nover  lexical  scanning.    Fear  not  ...  eventually we'll  see\nsomething looking like \"real\" code.\n\nI also don't  want  to  have  us  get bogged down in dealing with\nstatements other than branches, such as the assignment statements\nwe've  been  working  on.  We've already demonstrated that we can\nhandle them, so there's no point carrying them  around  as excess\nbaggage during this exercise.  So what I'll do instead is  to use\nan  anonymous  statement,  \"other\", to take the place of the non-\ncontrol statements and serve as a place-holder for them.  We have\nto generate some kind of object code for them  (we're  back  into\ncompiling, not interpretation), so for want of anything else I'll\njust echo the character input.\n\nOK, then, starting with  yet  another  copy  of the cradle, let's\ndefine the procedure:\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an \"Other\" }\n\nprocedure Other;\nbegin\n   EmitLn(GetName);\nend;\n{--------------------------------------------------------------}\n\n\nNow include a call to it in the main program, thus:\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\n   Other;\nend.\n{--------------------------------------------------------------}\n\n\nRun  the program and see what you get.  Not very exciting, is it?\nBut hang in there, it's a start, and things will get better.\n\nThe first thing we need is the ability to deal with more than one\nstatement, since a single-line branch  is pretty limited.  We did\nthat in the last session on interpreting, but this time let's get\na little more formal.  Consider the following BNF:\n\n          <program> ::= <block> END\n\n          <block> ::= [ <statement> ]*\n\nThis says that, for our purposes here, a program is defined  as a\nblock, followed by an END statement.  A block, in  turn, consists\nof zero or more statements.  We only have one kind  of statement,\nso far.\n\nWhat signals the end of a block?  It's  simply any construct that\nisn't an \"other\"  statement.    For  now, that means only the END\nstatement.\n\nArmed with these ideas, we can proceed to build  up  our  parser.\nThe code for a program (we  have  to call it DoProgram, or Pascal\nwill complain, is:\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Program }\n\nprocedure DoProgram;\nbegin\n   Block;\n   if Look <> 'e' then Expected('End');\n   EmitLn('END')\nend;\n{--------------------------------------------------------------}\n\n\nNotice  that  I've  arranged to emit  an  \"END\"  command  to  the\nassembler, which sort of  punctuates  the  output code, and makes\nsense considering that we're parsing a complete program here.\n\nThe code for Block is:\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Statement Block }\n\nprocedure Block;\nbegin\n   while not(Look in ['e']) do begin\n      Other;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\n(From the form of the procedure, you just KNOW we're going  to be\nadding to it in a bit!)\n\nOK, enter these routines into your program.  Replace the  call to\nBlock in the main program, by  a  call  to DoProgram.  Now try it\nand  see  how  it works.  Well, it's still not  much,  but  we're\ngetting closer.\n\n\nSOME GROUNDWORK\n\nBefore we begin to define the various control constructs, we need\nto  lay a bit more groundwork.  First, a word of warning: I won't\nbe using the same syntax  for these constructs as you're familiar\nwith  from Pascal or C.  For example, the Pascal syntax for an IF\nis:\n\n\n     IF <condition> THEN <statement>\n\n\n(where the statement, of course, may be compound).\n\nThe C version is similar:\n\n\n     IF ( <condition> ) <statement>\n\n\nInstead, I'll be using something that looks more like Ada:\n\n\n     IF <condition> <block> ENDIF\n\n\nIn  other  words,  the IF construct has  a  specific  termination\nsymbol.  This avoids  the  dangling-else of Pascal and C and also\nprecludes the need for the brackets {} or begin-end.   The syntax\nI'm showing you here, in fact, is that of the language  KISS that\nI'll be detailing in  later  installments.   The other constructs\nwill also be  slightly  different.    That  shouldn't  be  a real\nproblem for you.  Once you see how it's done, you'll realize that\nit  really  doesn't  matter  so  much  which  specific syntax  is\ninvolved.  Once the syntax is defined, turning it  into  code  is\nstraightforward.\n\nNow, all of the  constructs  we'll  be  dealing with here involve\ntransfer of control, which at the assembler-language  level means\nconditional  and/or  unconditional branches.   For  example,  the\nsimple IF statement\n\n\n          IF <condition> A ENDIF B ....\n\nmust get translated into\n\n          Branch if NOT condition to L\n          A\n     L:   B\n          ...\n\n\nIt's clear, then, that we're going to need  some  more procedures\nto  help  us  deal with these branches.  I've defined two of them\nbelow.  Procedure NewLabel generates unique labels.  This is done\nvia the simple expedient of calling every label  'Lnn',  where nn\nis a label number starting from zero.   Procedure  PostLabel just\noutputs the labels at the proper place.\n\nHere are the two routines:\n\n\n{--------------------------------------------------------------}\n{ Generate a Unique Label }\n\nfunction NewLabel: string;\nvar S: string;\nbegin\n   Str(LCount, S);\n   NewLabel := 'L' + S;\n   Inc(LCount);\nend;\n\n\n{--------------------------------------------------------------}\n{ Post a Label To Output }\n\nprocedure PostLabel(L: string);\nbegin\n   WriteLn(L, ':');\nend;\n{--------------------------------------------------------------}\n\n\nNotice that we've added  a  new  global  variable, LCount, so you\nneed to change the VAR declarations at the top of the  program to\nlook like this:\n\n\nvar Look  : char;              { Lookahead Character }\n    Lcount: integer;           { Label Counter }\n\n\nAlso, add the following extra initialization to Init:\n\n\n   LCount := 0;\n\n(DON'T forget that, or your labels can look really strange!)\n\n\nAt this point I'd also like to show you a  new  kind of notation.\nIf  you  compare  the form of the IF statement above with the as-\nsembler code that must be produced, you can see  that  there  are\ncertain  actions  associated  with each of the  keywords  in  the\nstatement:\n\n\n     IF:  First, get the condition and issue the code for it.\n          Then, create a unique label and emit a branch if false.\n\n     ENDIF: Emit the label.\n\n\nThese actions can be shown very concisely if we write  the syntax\nthis way:\n                              \n\n     IF\n     <condition>    { Condition;\n                      L = NewLabel;\n                      Emit(Branch False to L); }\n     <block>\n     ENDIF          { PostLabel(L) }\n\n\nThis is an example  of  syntax-directed  translation.  We've been\ndoing it all along ... we've just never written it down  this way\nbefore.  The stuff in curly brackets represents the ACTIONS to be\ntaken.  The nice part about this representation is  that  it  not\nonly shows what  we  have  to  recognize, but also the actions we\nhave to perform, and in which  order.   Once we have this syntax,\nthe code almost writes itself.\n\nAbout  the  only thing left to do is to be a  bit  more  specific\nabout what we mean by \"Branch if false.\"\n\nI'm assuming that there will  be  code  executed  for <condition>\nthat  will  perform  Boolean algebra and compute some result.  It\nshould also set the condition flags corresponding to that result.\nNow, the usual convention  for  a Boolean variable is to let 0000\nrepresent \"false,\" and  anything  else (some use FFFF, some 0001)\nrepresent \"true.\"\n\nOn the 68000  the  condition  flags  are set whenever any data is\nmoved or calculated.  If the  data  is a 0000 (corresponding to a\nfalse condition, remember), the zero flag will be set.   The code\nfor \"Branch on zero\" is BEQ.  So for our purposes here,\n\n\n               BEQ  <=> Branch if false\n               BNE  <=> Branch if true\n\n\nIt's the nature of the beast that most  of  the  branches  we see\nwill  be  BEQ's  ...  we'll  be branching AROUND the code  that's\nsupposed to be executed when the condition is true.\n\n\nTHE IF STATEMENT\n\nWith that bit of explanation out of the way, we're  finally ready\nto begin coding the IF-statement parser.  In  fact,  we've almost\nalready  done  it!   As usual, I'll be using our single-character\napproach, with the character 'i' for IF, and 'e'  for  ENDIF  (as\nwell  as END ... that dual nature causes  no  confusion).    I'll\nalso, for now, skip completely  the character for the branch con-\ndition, which we still have to define.\n\nThe code for DoIf is:\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an IF Construct }\n\nprocedure Block; Forward;\n\n\nprocedure DoIf;\nvar L: string;\nbegin\n   Match('i');\n   L := NewLabel;\n   Condition;\n   EmitLn('BEQ ' + L);\n   Block;\n   Match('e');\n   PostLabel(L);\nend;\n{--------------------------------------------------------------}\n\n\nAdd this routine to your program, and change  Block  to reference\nit as follows:\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Statement Block }\n\nprocedure Block;\nbegin\n   while not(Look in ['e']) do begin\n      case Look of\n       'i': DoIf;\n       'o': Other;\n      end;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nNotice the reference to procedure Condition.    Eventually, we'll\nwrite a routine that  can  parse  and  translate any Boolean con-\ndition we care to give it.  But  that's  a  whole  installment by\nitself (the next one, in fact).    For  now, let's just make it a\ndummy that emits some text.  Write the following routine:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Boolean Condition }\n{ This version is a dummy }\n\nProcedure Condition;\nbegin\n   EmitLn('<condition>');\nend;\n{--------------------------------------------------------------}\n\n\nInsert this procedure in your program just before DoIf.   Now run\nthe program.  Try a string like\n\n     aibece\n\nAs you can see,  the  parser seems to recognize the construct and\ninserts the object code at the  right  places.   Now try a set of\nnested IF's, like\n\n     aibicedefe\n\nIt's starting to look real, eh?\n\nNow that we  have  the  general  idea  (and the tools such as the\nnotation and the procedures NewLabel and PostLabel), it's a piece\nof cake to extend the parser to include other  constructs.    The\nfirst (and also one of the  trickiest)  is to add the ELSE clause\nto IF.  The BNF is\n\n\n     IF <condition> <block> [ ELSE <block>] ENDIF\n\n\nThe tricky part arises simply  because there is an optional part,\nwhich doesn't occur in the other constructs.\n\nThe corresponding output code should be\n\n\n          <condition>\n          BEQ L1\n          <block>\n          BRA L2\n     L1:  <block>\n     L2:  ...\n\n\nThis leads us to the following syntax-directed translation:\n\n\n     IF\n     <condition>    { L1 = NewLabel;\n                      L2 = NewLabel;\n                      Emit(BEQ L1) }\n     <block>\n     ELSE           { Emit(BRA L2);\n                      PostLabel(L1) }\n     <block>\n     ENDIF          { PostLabel(L2) }\n\n\nComparing this with the case for an ELSE-less IF gives us  a clue\nas to how to handle both situations.   The  code  below  does it.\n(Note that I  use  an  'l'  for  the ELSE, since 'e' is otherwise\noccupied):\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an IF Construct }\n\nprocedure DoIf;\nvar L1, L2: string;\nbegin\n   Match('i');\n   Condition;\n   L1 := NewLabel;\n   L2 := L1;\n   EmitLn('BEQ ' + L1);\n   Block;\n   if Look = 'l' then begin\n      Match('l');\n      L2 := NewLabel;\n      EmitLn('BRA ' + L2);\n      PostLabel(L1);\n      Block;\n   end;\n   Match('e');\n   PostLabel(L2);\nend;\n{--------------------------------------------------------------}\n\n\nThere you have it.  A complete IF parser/translator, in  19 lines\nof code.\n\nGive it a try now.  Try something like\n\n   aiblcede\n\nDid it work?  Now, just  to  be  sure we haven't broken the ELSE-\nless case, try\n\n   aibece\n\nNow try some nested IF's.  Try anything you like,  including some\nbadly formed statements.   Just  remember that 'e' is not a legal\n\"other\" statement.\n\n\nTHE WHILE STATEMENT\n\nThe next type of statement should be easy, since we  already have\nthe process  down  pat.    The  syntax  I've chosen for the WHILE\nstatement is\n\n\n          WHILE <condition> <block> ENDWHILE\n\n\nI know,  I  know,  we  don't  REALLY  need separate kinds of ter-\nminators for each construct ... you can see that by the fact that\nin our one-character version, 'e' is used for all of them.  But I\nalso remember  MANY debugging sessions in Pascal, trying to track\ndown a wayward END that the compiler obviously thought I meant to\nput  somewhere  else.   It's been my experience that specific and\nunique  keywords,  although  they add to the  vocabulary  of  the\nlanguage,  give  a  bit of error-checking that is worth the extra\nwork for the compiler writer.\n\nNow,  consider  what  the  WHILE  should be translated into.   It\nshould be:\n\n\n     L1:  <condition>\n          BEQ L2\n          <block>\n          BRA L1\n     L2:\n\n\n\n\nAs before, comparing the two representations gives us the actions\nneeded at each point.\n\n\n     WHILE          { L1 = NewLabel;\n                      PostLabel(L1) }\n     <condition>    { Emit(BEQ L2) }\n     <block>\n     ENDWHILE       { Emit(BRA L1);\n                      PostLabel(L2) }\n\n\nThe code follows immediately from the syntax:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a WHILE Statement }\n\nprocedure DoWhile;\nvar L1, L2: string;\nbegin\n   Match('w');\n   L1 := NewLabel;\n   L2 := NewLabel;\n   PostLabel(L1);\n   Condition;\n   EmitLn('BEQ ' + L2);\n   Block;\n   Match('e');\n   EmitLn('BRA ' + L1);\n   PostLabel(L2);\nend;\n{--------------------------------------------------------------}\n\n\nSince  we've  got a new statement, we have to add a  call  to  it\nwithin procedure Block:\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Statement Block }\n\nprocedure Block;\nbegin\n   while not(Look in ['e', 'l']) do begin\n      case Look of\n       'i': DoIf;\n       'w': DoWhile;\n       else Other;\n      end;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nNo other changes are necessary.\n\nOK, try the new program.  Note that this  time,  the  <condition>\ncode is INSIDE the upper label, which is just where we wanted it.\nTry some nested loops.  Try some loops within IF's, and some IF's\nwithin loops.  If you get  a  bit  confused as to what you should\ntype, don't be discouraged:  you  write  bugs in other languages,\ntoo, don't you?  It'll look a lot  more  meaningful  when  we get\nfull keywords.\n\nI hope by now that you're beginning to  get  the  idea  that this\nreally  IS easy.  All we have to do to accomodate a new construct\nis to work out  the  syntax-directed translation of it.  The code\nalmost falls out  from  there,  and  it doesn't affect any of the\nother routines.  Once you've gotten the feel of the thing, you'll\nsee that you  can  add  new  constructs  about as fast as you can\ndream them up.\n\n\nTHE LOOP STATEMENT\n\nWe could stop right here, and  have  a language that works.  It's\nbeen  shown  many  times that a high-order language with only two\nconstructs, the IF and the WHILE, is sufficient  to  write struc-\ntured  code.   But we're on a roll now, so let's  richen  up  the\nrepertoire a bit.\n\nThis construct is even easier, since it has no condition  test at\nall  ... it's an infinite loop.  What's the point of such a loop?\nNot much, by  itself,  but  later  on  we're going to add a BREAK\ncommand,  that  will  give us a way out.  This makes the language\nconsiderably richer than Pascal, which  has  no  break,  and also\navoids the funny  WHILE(1) or WHILE TRUE of C and Pascal.\n\nThe syntax is simply\n\n     LOOP <block> ENDLOOP\n\nand the syntax-directed translation is:\n\n\n     LOOP           { L = NewLabel;\n                      PostLabel(L) }\n     <block>\n     ENDLOOP        { Emit(BRA L }\n\n\nThe corresponding code is shown below.  Since  I've  already used\n'l'  for  the  ELSE, I've used  the  last  letter,  'p',  as  the\n\"keyword\" this time.\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a LOOP Statement }\n\nprocedure DoLoop;\nvar L: string;\nbegin\n   Match('p');\n   L := NewLabel;\n   PostLabel(L);\n   Block;\n   Match('e');\n   EmitLn('BRA ' + L);\nend;\n{--------------------------------------------------------------}\n                             \n\nWhen you insert this routine, don't forget to add a line in Block\nto call it.\n\n\n\n\nREPEAT-UNTIL\n\nHere's one construct that I lifted right from Pascal.  The syntax\nis\n\n\n     REPEAT <block> UNTIL <condition>  ,\n\n\nand the syntax-directed translation is:\n\n\n     REPEAT         { L = NewLabel;\n                      PostLabel(L) }\n     <block>\n     UNTIL\n     <condition>    { Emit(BEQ L) }\n\n\nAs usual, the code falls out pretty easily:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a REPEAT Statement }\n\nprocedure DoRepeat;\nvar L: string;\nbegin\n   Match('r');\n   L := NewLabel;\n   PostLabel(L);\n   Block;\n   Match('u');\n   Condition;\n   EmitLn('BEQ ' + L);\nend;\n{--------------------------------------------------------------}\n\n\nAs  before, we have to add the call  to  DoRepeat  within  Block.\nThis time, there's a difference, though.  I decided  to  use  'r'\nfor REPEAT (naturally), but I also decided to use 'u'  for UNTIL.\nThis means that the 'u' must be added to the set of characters in\nthe while-test.  These  are  the  characters  that signal an exit\nfrom the current  block  ... the \"follow\" characters, in compiler\njargon.\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Statement Block }\n\nprocedure Block;\nbegin\n   while not(Look in ['e', 'l', 'u']) do begin\n      case Look of\n       'i': DoIf;\n       'w': DoWhile;\n       'p': DoLoop;\n       'r': DoRepeat;\n       else Other;\n      end;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nTHE FOR LOOP\n\nThe FOR loop  is a very handy one to have around, but it's a bear\nto translate.  That's not so much because the construct itself is\nhard ... it's only a loop  after  all ... but simply because it's\nhard to implement  in  assembler  language.    Once  the  code is\nfigured out, the translation is straightforward enough.\n\nC fans love  the  FOR-loop  of  that language (and, in fact, it's\neasier to code), but I've chosen instead a syntax very  much like\nthe one from good ol' BASIC:\n\n\n     FOR <ident> = <expr1> TO <expr2> <block> ENDFOR\n\n\nThe translation of a FOR loop  can  be just about as difficult as\nyou choose  to  make  it,  depending  upon  the way you decide to\ndefine  the rules as to how to handle the limits.  Does expr2 get\nevaluated  every time through the loop, for  example,  or  is  it\ntreated as a constant limit?   Do  you always go through the loop\nat least once,  as  in  FORTRAN,  or  not? It gets simpler if you\nadopt the point of view that the construct is equivalent to:\n\n\n     <ident> = <expr1>\n     TEMP = <expr2>\n     WHILE <ident> <= TEMP\n     <block>\n     ENDWHILE\n\n\nNotice that with this definition of the loop, <block> will not be\nexecuted at all if <expr1> is initially larger than <expr2>.\n                             \nThe 68000 code needed to do this is trickier than  anything we've\ndone so far.  I had a couple  of  tries  at  it, putting both the\ncounter  and  the    upper limit on the stack, both in registers,\netc.  I  finally  arrived  at  a hybrid arrangement, in which the\nloop counter is in memory (so that it can be accessed  within the\nloop), and the upper limit is on the stack.  The  translated code\ncame out like this:\n\n\n          <ident>             get name of loop counter\n          <expr1>             get initial value\n          LEA <ident>(PC),A0  address the loop counter\n          SUBQ #1,D0          predecrement it\n          MOVE D0,(A0)        save it\n          <expr1>             get upper limit\n          MOVE D0,-(SP)       save it on stack\n\n     L1:  LEA <ident>(PC),A0  address loop counter\n          MOVE (A0),D0        fetch it to D0\n          ADDQ #1,D0          bump the counter\n          MOVE D0,(A0)        save new value\n          CMP (SP),D0         check for range\n          BLE L2              skip out if D0 > (SP)\n          <block>\n          BRA L1              loop for next pass\n     L2:  ADDQ #2,SP          clean up the stack\n\n\nWow!    That  seems like a lot of code ...  the  line  containing\n<block> seems to almost get lost.  But that's the best I could do\nwith it.   I guess it helps to keep in mind that it's really only\nsixteen  words,  after  all.  If  anyone else can  optimize  this\nbetter, please let me know.\n\nStill, the parser  routine  is  pretty  easy now that we have the\ncode:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a FOR Statement }\n\nprocedure DoFor;\nvar L1, L2: string;\n    Name: char;\nbegin\n   Match('f');\n   L1 := NewLabel;\n   L2 := NewLabel;\n   Name := GetName;\n   Match('=');\n   Expression;\n   EmitLn('SUBQ #1,D0');\n   EmitLn('LEA ' + Name + '(PC),A0');\n   EmitLn('MOVE D0,(A0)');\n   Expression;\n   EmitLn('MOVE D0,-(SP)');\n   PostLabel(L1);\n   EmitLn('LEA ' + Name + '(PC),A0');\n   EmitLn('MOVE (A0),D0');\n   EmitLn('ADDQ #1,D0');\n   EmitLn('MOVE D0,(A0)');\n   EmitLn('CMP (SP),D0');\n   EmitLn('BGT ' + L2);\n   Block;\n   Match('e');\n   EmitLn('BRA ' + L1);\n   PostLabel(L2);\n   EmitLn('ADDQ #2,SP');\nend;\n{--------------------------------------------------------------}\n\n\nSince we don't have  expressions  in this parser, I used the same\ntrick as for Condition, and wrote the routine\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Expression }\n{ This version is a dummy }\n\nProcedure Expression;\nbegin\n   EmitLn('<expr>');\nend;\n{--------------------------------------------------------------}\n\n\nGive it a try.  Once again,  don't  forget  to  add  the  call in\nBlock.    Since  we don't have any input for the dummy version of\nExpression, a typical input line would look something like\n\n     afi=bece\n\nWell, it DOES generate a lot of code, doesn't it?    But at least\nit's the RIGHT code.\n\n\nTHE DO STATEMENT\n\nAll this made me wish for a simpler version of the FOR loop.  The\nreason for all the code  above  is  the  need  to  have  the loop\ncounter accessible as a variable within the loop.  If all we need\nis a counting loop to make us go through  something  a  specified\nnumber of times, but  don't  need  access  to the counter itself,\nthere is a much easier solution.  The 68000 has a  \"decrement and\nbranch nonzero\" instruction built in which is ideal for counting.\nFor good measure, let's add this construct, too.   This  will  be\nthe last of our loop structures.\n                             \nThe syntax and its translation is:\n\n\n     DO\n     <expr>         { Emit(SUBQ #1,D0);\n                      L = NewLabel;\n                      PostLabel(L);\n                      Emit(MOVE D0,-(SP) }\n     <block>\n     ENDDO          { Emit(MOVE (SP)+,D0;\n                      Emit(DBRA D0,L) }\n\n\nThat's quite a bit simpler!  The loop will execute  <expr> times.\nHere's the code:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a DO Statement }\n\nprocedure Dodo;\nvar L: string;\nbegin\n   Match('d');\n   L := NewLabel;\n   Expression;\n   EmitLn('SUBQ #1,D0');\n   PostLabel(L);\n   EmitLn('MOVE D0,-(SP)');\n   Block;\n   EmitLn('MOVE (SP)+,D0');\n   EmitLn('DBRA D0,' + L);\nend;\n{--------------------------------------------------------------}\n\n\nI think you'll have to agree, that's a whole lot simpler than the\nclassical FOR.  Still, each construct has its place.\n\n\nTHE BREAK STATEMENT\n\nEarlier I promised you a BREAK statement to accompany LOOP.  This\nis  one  I'm sort of proud of.  On the face of it a  BREAK  seems\nreally  tricky.  My first approach was to just use it as an extra\nterminator to Block, and split all the loops into two parts, just\nas  I did with the ELSE half of an IF.  That  turns  out  not  to\nwork, though, because the BREAK statement is almost certainly not\ngoing to show  up at the same level as the loop itself.  The most\nlikely place for a BREAK is right after an IF, which  would cause\nit to exit to the IF  construct,  not the enclosing loop.  WRONG.\nThe  BREAK  has  to exit the inner LOOP, even if it's nested down\ninto several levels of IFs.\n                             \nMy next thought was that I would just store away, in  some global\nvariable, the ending label of the innermost loop.    That doesn't\nwork  either, because there may be a break  from  an  inner  loop\nfollowed by a break from an outer one.  Storing the label for the\ninner loop would clobber the label for the  outer  one.    So the\nglobal variable turned into a stack.  Things were starting to get\nmessy.\n\nThen  I  decided  to take my own advice.  Remember  in  the  last\nsession when  I  pointed  out  how  well  the implicit stack of a\nrecursive descent parser was  serving  our needs?  I said that if\nyou begin to  see  the  need  for  an external stack you might be\ndoing  something  wrong.   Well, I was.  It is indeed possible to\nlet the recursion built into  our parser take care of everything,\nand the solution is so simple that it's surprising.\n\nThe secret is  to  note  that  every BREAK statement has to occur\nwithin a block ... there's no place else for it to be.  So all we\nhave  to  do  is to pass into  Block  the  exit  address  of  the\ninnermost loop.  Then it can pass the address to the routine that\ntranslates the  break instruction.  Since an IF statement doesn't\nchange the loop level, procedure DoIf doesn't need to do anything\nexcept  pass the label into ITS blocks (both  of  them).    Since\nloops DO change the level,  each  loop  construct  simply ignores\nwhatever label is above it and passes its own exit label along.\n\nAll  this  is easier to show you than it is to  describe.    I'll\ndemonstrate with the easiest loop, which is LOOP:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a LOOP Statement }\n\nprocedure DoLoop;\nvar L1, L2: string;\nbegin\n   Match('p');\n   L1 := NewLabel;\n   L2 := NewLabel;\n   PostLabel(L1);\n   Block(L2);\n   Match('e');\n   EmitLn('BRA ' + L1);\n   PostLabel(L2);\nend;\n{--------------------------------------------------------------}\n\n\nNotice that DoLoop now has TWO labels, not just one.   The second\nis to give the BREAK instruction a target to jump  to.   If there\nis no BREAK within  the  loop, we've wasted a label and cluttered\nup things a bit, but there's no harm done.\n\nNote also that Block now has a parameter, which  for  loops  will\nalways be the exit address.  The new version of Block is:\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Statement Block }\n\nprocedure Block(L: string);\nbegin\n   while not(Look in ['e', 'l', 'u']) do begin\n      case Look of\n       'i': DoIf(L);\n       'w': DoWhile;\n       'p': DoLoop;\n       'r': DoRepeat;\n       'f': DoFor;\n       'd': DoDo;\n       'b': DoBreak(L);\n       else Other;\n      end;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nAgain,  notice  that  all Block does with the label is to pass it\ninto DoIf and  DoBreak.    The  loop  constructs  don't  need it,\nbecause they are going to pass their own label anyway.\n\nThe new version of DoIf is:\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an IF Construct }\n\nprocedure Block(L: string); Forward;\n\n\nprocedure DoIf(L: string);\nvar L1, L2: string;\nbegin\n   Match('i');\n   Condition;\n   L1 := NewLabel;\n   L2 := L1;\n   EmitLn('BEQ ' + L1);\n   Block(L);\n   if Look = 'l' then begin\n      Match('l');\n      L2 := NewLabel;\n      EmitLn('BRA ' + L2);\n      PostLabel(L1);\n      Block(L);\n   end;\n   Match('e');\n   PostLabel(L2);\nend;\n{--------------------------------------------------------------}\n\n\nHere,  the  only  thing  that  changes  is  the addition  of  the\nparameter to procedure Block.  An IF statement doesn't change the\nloop  nesting level, so DoIf just passes the  label  along.    No\nmatter how many levels of IF nesting we have, the same label will\nbe used.\n\nNow, remember that DoProgram also calls Block, so it now needs to\npass it a label.  An  attempt  to  exit the outermost block is an\nerror, so DoProgram  passes  a  null  label  which  is  caught by\nDoBreak:\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a BREAK }\n\nprocedure DoBreak(L: string);\nbegin\n   Match('b');\n   if L <> '' then\n      EmitLn('BRA ' + L)\n   else Abort('No loop to break from');\nend;\n\n\n{--------------------------------------------------------------}\n\n{ Parse and Translate a Program }\n\nprocedure DoProgram;\nbegin\n   Block('');\n   if Look <> 'e' then Expected('End');\n   EmitLn('END')\nend;\n{--------------------------------------------------------------}\n\n\nThat  ALMOST takes care of everything.  Give it a try, see if you\ncan \"break\" it <pun>.  Careful, though.  By this time  we've used\nso many letters, it's hard to think of characters that aren't now\nrepresenting  reserved  words.    Remember:  before  you  try the\nprogram, you're going to have to edit every occurence of Block in\nthe other loop constructs to include the new parameter.    Do  it\njust like I did for LOOP.\n\nI  said ALMOST above.  There is one slight problem: if you take a\nhard  look  at  the code generated for DO, you'll see that if you\nbreak  out  of  this loop, the value of the loop counter is still\nleft on the stack.  We're going to have to fix that!  A shame ...\nthat was one  of  our  smaller  routines, but it can't be helped.\nHere's a version that doesn't have the problem:\n\n\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a DO Statement }\n\nprocedure Dodo;\nvar L1, L2: string;\nbegin\n   Match('d');\n   L1 := NewLabel;\n   L2 := NewLabel;\n   Expression;\n   EmitLn('SUBQ #1,D0');\n   PostLabel(L1);\n   EmitLn('MOVE D0,-(SP)');\n   Block(L2);\n   EmitLn('MOVE (SP)+,D0');\n   EmitLn('DBRA D0,' + L1);\n   EmitLn('SUBQ #2,SP');\n   PostLabel(L2);\n   EmitLn('ADDQ #2,SP');\nend;\n{--------------------------------------------------------------}\n\n\nThe  two  extra  instructions,  the  SUBQ and ADDQ, take care  of\nleaving the stack in the right shape.\n                             \n\nCONCLUSION\n\nAt this point we have created a number of control  constructs ...\na richer set, really, than that provided by almost any other pro-\ngramming language.  And,  except  for the FOR loop, it was pretty\neasy to do.  Even that one was tricky only because it's tricky in\nassembler language.\n\nI'll conclude this session here.  To wrap the thing up with a red\nribbon, we really  should  have  a  go  at  having  real keywords\ninstead of these mickey-mouse  single-character  things.   You've\nalready seen that  the  extension to multi-character words is not\ndifficult, but in this case it will make a big difference  in the\nappearance of our input code.  I'll save that little bit  for the\nnext installment.  In that installment we'll also address Boolean\nexpressions, so we can get rid of the dummy version  of Condition\nthat we've used here.  See you then.\n\nFor reference purposes, here is  the  completed  parser  for this\nsession:\n\n\n\n\n{--------------------------------------------------------------}\nprogram Branch;\n\n{--------------------------------------------------------------}\n{ Constant Declarations }\n\nconst TAB = ^I;\n      CR  = ^M;\n\n\n{--------------------------------------------------------------}\n{ Variable Declarations }\n\nvar Look  : char;              { Lookahead Character }\n    Lcount: integer;           { Label Counter }\n\n\n{--------------------------------------------------------------}\n{ Read New Character From Input Stream }\n\nprocedure GetChar;\nbegin\n   Read(Look);\nend;\n\n\n{--------------------------------------------------------------}\n{ Report an Error }\n\nprocedure Error(s: string);\nbegin\n   WriteLn;\n   WriteLn(^G, 'Error: ', s, '.');\nend;\n\n\n{--------------------------------------------------------------}\n{ Report Error and Halt }\n\nprocedure Abort(s: string);\nbegin\n   Error(s);\n   Halt;\nend;\n\n\n{--------------------------------------------------------------}\n{ Report What Was Expected }\n\nprocedure Expected(s: string);\nbegin\n   Abort(s + ' Expected');\nend;\n\n{--------------------------------------------------------------}\n{ Match a Specific Input Character }\n\nprocedure Match(x: char);\nbegin\n   if Look = x then GetChar\n   else Expected('''' + x + '''');\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an Alpha Character }\n\nfunction IsAlpha(c: char): boolean;\nbegin\n   IsAlpha := UpCase(c) in ['A'..'Z'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Decimal Digit }\n\nfunction IsDigit(c: char): boolean;\nbegin\n   IsDigit := c in ['0'..'9'];\nend;\n                             \n\n{--------------------------------------------------------------}\n{ Recognize an Addop }\n\nfunction IsAddop(c: char): boolean;\nbegin\n   IsAddop := c in ['+', '-'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize White Space }\n\nfunction IsWhite(c: char): boolean;\nbegin\n   IsWhite := c in [' ', TAB];\nend;\n\n\n{--------------------------------------------------------------}\n{ Skip Over Leading White Space }\n\nprocedure SkipWhite;\nbegin\n   while IsWhite(Look) do\n      GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nfunction GetName: char;\nbegin\n   if not IsAlpha(Look) then Expected('Name');\n   GetName := UpCase(Look);\n   GetChar;\nend;\n\n\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nfunction GetNum: char;\nbegin\n   if not IsDigit(Look) then Expected('Integer');\n   GetNum := Look;\n   GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Generate a Unique Label }\n\nfunction NewLabel: string;\nvar S: string;\nbegin\n   Str(LCount, S);\n   NewLabel := 'L' + S;\n   Inc(LCount);\nend;\n\n\n{--------------------------------------------------------------}\n{ Post a Label To Output }\n\nprocedure PostLabel(L: string);\nbegin\n   WriteLn(L, ':');\nend;\n\n\n{--------------------------------------------------------------}\n{ Output a String with Tab }\n\nprocedure Emit(s: string);\nbegin\n   Write(TAB, s);\nend;\n\n\n{--------------------------------------------------------------}\n\n{ Output a String with Tab and CRLF }\n\nprocedure EmitLn(s: string);\nbegin\n   Emit(s);\n   WriteLn;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Boolean Condition }\n\nprocedure Condition;\nbegin\n   EmitLn('<condition>');\nend;\n\n                             \n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Math Expression }\n\nprocedure Expression;\nbegin\n   EmitLn('<expr>');\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an IF Construct }\n\nprocedure Block(L: string); Forward;\n\n\nprocedure DoIf(L: string);\nvar L1, L2: string;\nbegin\n   Match('i');\n   Condition;\n   L1 := NewLabel;\n   L2 := L1;\n   EmitLn('BEQ ' + L1);\n   Block(L);\n   if Look = 'l' then begin\n      Match('l');\n      L2 := NewLabel;\n      EmitLn('BRA ' + L2);\n      PostLabel(L1);\n      Block(L);\n   end;\n   Match('e');\n   PostLabel(L2);\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a WHILE Statement }\n\nprocedure DoWhile;\nvar L1, L2: string;\nbegin\n   Match('w');\n   L1 := NewLabel;\n   L2 := NewLabel;\n   PostLabel(L1);\n   Condition;\n   EmitLn('BEQ ' + L2);\n   Block(L2);\n   Match('e');\n   EmitLn('BRA ' + L1);\n   PostLabel(L2);\nend;\n                             \n\n{--------------------------------------------------------------}\n{ Parse and Translate a LOOP Statement }\n\nprocedure DoLoop;\nvar L1, L2: string;\nbegin\n   Match('p');\n   L1 := NewLabel;\n   L2 := NewLabel;\n   PostLabel(L1);\n   Block(L2);\n   Match('e');\n   EmitLn('BRA ' + L1);\n   PostLabel(L2);\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a REPEAT Statement }\n\nprocedure DoRepeat;\nvar L1, L2: string;\nbegin\n   Match('r');\n   L1 := NewLabel;\n   L2 := NewLabel;\n   PostLabel(L1);\n   Block(L2);\n   Match('u');\n   Condition;\n   EmitLn('BEQ ' + L1);\n   PostLabel(L2);\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a FOR Statement }\n\nprocedure DoFor;\nvar L1, L2: string;\n    Name: char;\nbegin\n   Match('f');\n   L1 := NewLabel;\n   L2 := NewLabel;\n   Name := GetName;\n   Match('=');\n   Expression;\n   EmitLn('SUBQ #1,D0');\n   EmitLn('LEA ' + Name + '(PC),A0');\n   EmitLn('MOVE D0,(A0)');\n   Expression;\n   EmitLn('MOVE D0,-(SP)');\n   PostLabel(L1);\n   EmitLn('LEA ' + Name + '(PC),A0');\n   EmitLn('MOVE (A0),D0');\n   EmitLn('ADDQ #1,D0');\n   EmitLn('MOVE D0,(A0)');\n   EmitLn('CMP (SP),D0');\n   EmitLn('BGT ' + L2);\n   Block(L2);\n   Match('e');\n   EmitLn('BRA ' + L1);\n   PostLabel(L2);\n   EmitLn('ADDQ #2,SP');\nend;\n\n\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a DO Statement }\n\nprocedure Dodo;\nvar L1, L2: string;\nbegin\n   Match('d');\n   L1 := NewLabel;\n   L2 := NewLabel;\n   Expression;\n   EmitLn('SUBQ #1,D0');\n   PostLabel(L1);\n   EmitLn('MOVE D0,-(SP)');\n   Block(L2);\n   EmitLn('MOVE (SP)+,D0');\n   EmitLn('DBRA D0,' + L1);\n   EmitLn('SUBQ #2,SP');\n   PostLabel(L2);\n   EmitLn('ADDQ #2,SP');\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a BREAK }\n\nprocedure DoBreak(L: string);\nbegin\n   Match('b');\n   EmitLn('BRA ' + L);\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an \"Other\" }\n\nprocedure Other;\nbegin\n   EmitLn(GetName);\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Statement Block }\n\nprocedure Block(L: string);\nbegin\n   while not(Look in ['e', 'l', 'u']) do begin\n      case Look of\n       'i': DoIf(L);\n       'w': DoWhile;\n       'p': DoLoop;\n       'r': DoRepeat;\n       'f': DoFor;\n       'd': DoDo;\n       'b': DoBreak(L);\n       else Other;\n      end;\n   end;\nend;\n\n\n\n\n{--------------------------------------------------------------}\n\n{ Parse and Translate a Program }\n\nprocedure DoProgram;\nbegin\n   Block('');\n   if Look <> 'e' then Expected('End');\n   EmitLn('END')\nend;\n\n\n{--------------------------------------------------------------}\n\n{ Initialize }\n\nprocedure Init;\nbegin\n   LCount := 0;\n   GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\n   DoProgram;\nend.\n{--------------------------------------------------------------}\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1988 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\n\n\n"
  },
  {
    "path": "6/Makefile",
    "content": "IN=main.c cradle.c\nOUT=main\nFLAGS=-Wall -Werror\n\nall:\n\tgcc -o $(OUT) $(IN) $(FLAGS)\n\nrun:\n\t./$(OUT)\n\n.PHONY: clean\nclean:\n\trm $(OUT)\n"
  },
  {
    "path": "6/cradle.c",
    "content": "#include \"cradle.h\"\n#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n\n#define TABLE_SIZE 26\nstatic int LCount = 0;\nstatic char labelName[MAX_BUF];\n\nstatic int Table[TABLE_SIZE];\n\n/* Helper Functions */\nchar uppercase(char c)\n{\n    return (c & 0xDF);\n}\n\nvoid GetChar() \n{\n    Look = getchar();\n}\n\n\nvoid Error(char *s)\n{\n    printf(\"\\nError: %s.\", s);\n}\n\nvoid Abort(char *s)\n{\n    Error(s);\n    exit(1);\n}\n\n\nvoid Expected(char *s)\n{\n    sprintf(tmp, \"%s Expected\", s);\n    Abort(tmp);\n}\n\n\nvoid Match(char x)\n{\n    if(Look == x) {\n        GetChar();\n    } else {\n        sprintf(tmp, \"' %c ' \",  x);\n        Expected(tmp);\n    }\n}\n\nvoid Newline()\n{\n    if (Look == '\\r') {\n        GetChar();\n        if (Look == '\\n') {\n            GetChar();\n        }\n    } else if (Look == '\\n') {\n        GetChar();\n    }\n}\n\nint IsAlpha(char c)\n{\n    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');\n} \n\nint IsDigit(char c)\n{\n    return (c >= '0') && (c <= '9');\n}\n\nint IsAddop(char c)\n{\n    return (c == '+') || (c == '-');\n}\n\nint IsBoolean(char c)\n{\n    return strchr(\"TF\", uppercase(c)) != NULL;\n}\n\nchar GetName()\n{\n    char c = Look;\n\n    if( !IsAlpha(Look)) {\n        sprintf(tmp, \"Name\");\n        Expected(tmp);\n    }\n\n    GetChar();\n\n    return uppercase(c);\n}\n\n\nint GetNum()\n{\n    int value = 0;\n    if( !IsDigit(Look)) {\n        sprintf(tmp, \"Integer\");\n        Expected(tmp);\n    }\n\n    while (IsDigit(Look)) {\n        value = value * 10 + Look - '0';\n        GetChar();\n    }\n\n    return value;\n}\n\nint GetBoolean()\n{\n    if (!IsBoolean(Look)) {\n        Expected(\"Boolean Literal\");\n    }\n    int ret = uppercase(Look) == 'T';\n    GetChar();\n    return ret;\n}\n\nint IsOrop(char c)\n{\n    return strchr(\"|~\", c) != NULL;\n}\n\nint IsRelop(char c)\n{\n    return strchr(\"=#<>\", c) != NULL; \n}\n\nvoid Emit(char *s)\n{\n    printf(\"\\t%s\", s);\n}\n\nvoid EmitLn(char *s)\n{\n    Emit(s);\n    printf(\"\\n\");\n}\n\nvoid Init()\n{\n    LCount = 0;\n\n    InitTable();\n    GetChar();\n}\n\nvoid InitTable()\n{\n    int i;\n    for (i = 0; i < TABLE_SIZE; i++) {\n        Table[i] = 0;\n    }\n\n}\n\nchar *NewLabel()\n{\n    sprintf(labelName, \"L%02d\", LCount);\n    LCount ++;\n    return labelName;\n}\n\nvoid PostLabel(char *label)\n{\n    printf(\"%s:\\n\", label);\n}\n\nvoid Fin()\n{\n    if (Look == '\\r') {\n        GetChar();\n    }\n    if (Look == '\\n') {\n        GetChar();\n    }\n}\n"
  },
  {
    "path": "6/cradle.h",
    "content": "#ifndef _CRADLE_H\n#define _CRADLE_H\n\n#define MAX_BUF 100\nstatic char tmp[MAX_BUF];\nchar Look;\n\nvoid GetChar();\n\nvoid Error(char *s);\nvoid Abort(char *s);\nvoid Expected(char *s);\nvoid Match(char x);\n\nvoid Newline();\n\nint IsAlpha(char c);\nint IsDigit(char c);\nint IsAddop(char c);\nint IsBoolean(char c);\nint IsOrop(char c);\nint IsRelop(char c);\n\nchar GetName();\nint GetNum();\nint GetBoolean();\n\nvoid Emit(char *s);\nvoid EmitLn(char *s);\n\nvoid Init();\nvoid InitTable();\n\nchar *NewLabel();\nvoid PostLabel(char *label);\n\nvoid Fin();\n#endif\n"
  },
  {
    "path": "6/main.c",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n\n#include \"cradle.h\"\n\n#ifdef DEBUG\n#define dprint(fmt, ...) printf(fmt, __VA_ARGS__);\n#else\n#define dprint(fmt, ...)\n#endif\n\n\nvoid Other();\nvoid Block(char *L);\nvoid DoProgram();\nvoid DoIf(char *L);\nvoid DoWhile();\nvoid DoLoop();\nvoid DoRepeat();\nvoid DoFor();\nvoid Expression();\nvoid DoDo();\nvoid DoBreak(char *L);\n\n/* Added in chap6 */\nvoid BoolFactor();\nvoid NotFactor();\nvoid BoolTerm();\nvoid BoolExpression();\nvoid BoolOr();\nvoid BoolXor();\nvoid Relation();\nvoid Equals();\nvoid NotEquals();\nvoid Less();\nvoid Greater();\nvoid Ident();\nvoid Factor();\nvoid SignedFactor();\nvoid Multiply();\nvoid Divide();\nvoid Term();\nvoid Add();\nvoid Subtract();\nvoid Expression();\nvoid Assignment();\n\nvoid Other()\n{\n    sprintf(tmp, \"%c\", GetName());\n    EmitLn(tmp);\n}\n\nvoid Block(char *L)\n{\n    while (! strchr(\"elu\", Look)) {\n        dprint(\"Block: get Look = %c\\n\", Look);\n        switch (Look) {\n            case 'i':\n                DoIf(L);\n                break;\n            case 'w':\n                DoWhile();\n                break;\n            case 'p':\n                DoLoop();\n                break;\n            case 'r':\n                DoRepeat();\n                break;\n            case 'f':\n                DoFor();\n                break;\n            case 'd':\n                DoDo();\n                break;\n            case 'b':\n                DoBreak(L);\n            default:\n                Assignment();\n                break;\n        }\n        /* this is for convinent, otherwise newline character will\n        cause an error */\n        /*Newline();*/\n        Fin();\n    }\n}\n\nvoid DoProgram()\n{\n    Block(NULL);\n    if (Look != 'e') {\n        Expected(\"End\");\n    }\n    EmitLn(\"END\");\n}\n\nvoid DoIf(char *L)\n{\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    strcpy(L1, NewLabel());\n    strcpy(L2, L1);\n\n    Match('i');\n    BoolExpression();\n\n    sprintf(tmp, \"jz %s\", L1);\n    EmitLn(tmp);\n\n    Block(L);\n    dprint(\"DoIf: Got Look = %c\\n\", Look);\n\n    if (Look == 'l') {\n        /* match *else* statement */\n        Match('l');\n        strcpy(L2, NewLabel());\n\n        sprintf(tmp, \"jmp %s\", L2);\n        EmitLn(tmp);\n\n        PostLabel(L1);\n\n        Block(L);\n    }\n\n    Match('e');\n    PostLabel(L2);\n}\n\nvoid DoWhile()\n{\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n\n    Match('w');\n    strcpy(L1, NewLabel());\n    strcpy(L2, NewLabel());\n    PostLabel(L1);\n    BoolExpression();\n    sprintf(tmp, \"jz %s\", L2);\n    EmitLn(tmp);\n    Block(L2);\n    Match('e');\n    sprintf(tmp, \"jmp %s\", L1);\n    EmitLn(tmp);\n    PostLabel(L2);\n}\n\nvoid DoLoop()\n{\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    Match('p');\n    strcpy(L1, NewLabel());\n    strcpy(L2, NewLabel());\n    PostLabel(L1);\n    Block(L2);\n    Match('e');\n    sprintf(tmp, \"jmp %s\", L1);\n    EmitLn(tmp);\n    PostLabel(L2);\n}\n\nvoid DoRepeat()\n{\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    Match('r');\n    strcpy(L1, NewLabel());\n    strcpy(L2, NewLabel());\n    PostLabel(L1);\n    Block(L2);\n    Match('u');\n    BoolExpression();\n\n    sprintf(tmp, \"jz %s\", L1);\n    EmitLn(tmp);\n    PostLabel(L2);\n}\n\n/* I haven't test the actual generated x86 code here, so you're free to\n * inform me if there are bugs. :) */\nvoid DoFor()\n{\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n\n    Match('f');\n    strcpy(L1, NewLabel());\n    strcpy(L2, NewLabel());\n    char name = GetName();\n    Match('=');\n    Expression();\n    EmitLn(\"subl %eax, $1\");  /* SUBQ #1, D0*/\n    sprintf(tmp, \"lea %c, %%edx\", name);\n    EmitLn(tmp);\n    EmitLn(\"movl %eax, (%edx)\");\n    Expression();\n    EmitLn(\"push %eax\"); /* save the execution of expression */\n    PostLabel(L1);\n    sprintf(tmp, \"lea %c, %%edx\", name);\n    EmitLn(tmp);\n    EmitLn(\"movl (%edx), %eax\");\n    EmitLn(\"addl %eax, 1\");\n    EmitLn(\"movl %eax, (%edx)\");\n    EmitLn(\"cmp (%esp), %eax\");\n    sprintf(tmp, \"jg %s\", L2);\n    EmitLn(tmp);\n    Block(L2);\n    Match('e');\n    sprintf(tmp, \"jmp %s\", L1);\n    EmitLn(tmp);\n    PostLabel(L2);\n    EmitLn(\"pop %eax\");\n}\n\nvoid DoDo()\n{\n    Match('d');\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    strcpy(L1, NewLabel());\n    strcpy(L2, NewLabel());\n    Expression();\n    EmitLn(\"subl %eax, $1\");\n    EmitLn(\"movl %eax, %ecx\");\n    PostLabel(L1);\n    EmitLn(\"pushl %ecx\");\n    Block(L2);\n    EmitLn(\"popl %ecx\");\n    sprintf(tmp, \"loop %s\", L1);\n    EmitLn(tmp);\n    EmitLn(\"pushl %ecx\");\n    PostLabel(L2);\n    EmitLn(\"popl %ecx\");\n}\n\nvoid DoBreak(char *L)\n{\n    Match('b');\n    if (L != NULL) {\n        sprintf(tmp, \"jmp %s\", L);\n        EmitLn(tmp);\n    } else {\n        Abort(\"No loop to break from\");\n    }\n}\n\nvoid BoolFactor()\n{\n    if (IsBoolean(Look)) {\n        if (GetBoolean()) {\n            EmitLn(\"movl $-1, %eax\");\n        } else {\n            EmitLn(\"xor %eax, %eax\");\n        }\n    } else {\n        Relation();\n    }\n}\n\nvoid Relation()\n{\n    Expression();\n    if (IsRelop(Look)) {\n        EmitLn(\"pushl %eax\");\n        switch (Look) {\n            case '=':\n                Equals();\n                break;\n            case '#':\n                NotEquals();\n                break;\n            case '<':\n                Less();\n                break;\n            case '>':\n                Greater();\n                break;\n        }\n    }\n    EmitLn(\"test %eax, %eax\");\n}\n\nvoid NotFactor()\n{\n    if (Look == '!') {\n        Match('!');\n        BoolFactor();\n        EmitLn(\"xor $-1, %eax\");\n    } else {\n        BoolFactor();\n    }\n}\n\nvoid BoolTerm()\n{\n    NotFactor();\n    while(Look == '&') {\n        EmitLn(\"pushl %eax\");\n        Match('&');\n        NotFactor();\n        EmitLn(\"and (%esp), %eax\");\n        EmitLn(\"addl $4, %esp\");\n    }\n}\n\nvoid BoolExpression()\n{\n    BoolTerm();\n    while (IsOrop(Look)) {\n        EmitLn(\"pushl %eax\");\n        switch (Look) {\n            case '|':\n                BoolOr();\n                break;\n            case '~':\n                BoolXor();\n                break;\n            default:\n                break;\n        }\n    }\n}\n\nvoid BoolOr()\n{\n    Match('|');\n    BoolTerm();\n    EmitLn(\"or (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");    /* recover the stack */\n}\n\nvoid BoolXor()\n{\n    Match('~');\n    BoolTerm();\n    EmitLn(\"xor (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");    /* recover the stack */\n}\n\nvoid Equals()\n{\n    Match('=');\n    Expression();\n    EmitLn(\"cmp (%esp), %eax\");\n    /* Note that 80386 has setcc corresponds to 86000's SETCC\n     * However, it only takes 8-bit registers */\n    EmitLn(\"sete %al\");\n    EmitLn(\"addl $4, %esp\");     /* recover the stack */\n}\n\nvoid NotEquals()\n{\n    Match('#');\n    Expression();\n    EmitLn(\"cmp (%esp), %eax\");\n    EmitLn(\"setne %al\");\n    EmitLn(\"addl $4, %esp\");     /* recover the stack */\n}\n\nvoid Less()\n{\n    Match('<');\n    Expression();\n    EmitLn(\"cmp %eax, (%esp)\");\n    EmitLn(\"setl %al\");\n    EmitLn(\"addl $4, %esp\");     /* recover the stack */\n}\n\nvoid Greater()\n{\n    Match('>');\n    Expression();\n    EmitLn(\"cmp %eax, (%esp)\");\n    EmitLn(\"setg %al\");\n    EmitLn(\"addl $4, %esp\");     /* recover the stack */\n}\n\nvoid Ident()\n{\n    char c = GetName();\n    if (Look == '(') {\n        Match('(');\n        Match(')');\n        sprintf(tmp, \"call %c\", c);\n        EmitLn(tmp);\n    } else {\n        sprintf(tmp, \"movl %c, %%eax\", c);\n        EmitLn(tmp);\n    }\n}\n\nvoid Factor()\n{\n    if (Look == '(') {\n        Match('(');\n        Expression();\n        Match(')');\n    } else if (IsAlpha(Look)) {\n        Ident();\n    } else {\n        sprintf(tmp, \"movl $%d, %%eax\", GetNum());\n        EmitLn(tmp);\n    }\n}\n\nvoid SignedFactor()\n{\n    if (Look == '+') {\n        GetChar();\n        Factor();\n    } else if (Look == '-') {\n        GetChar();\n        if (IsDigit(Look)) {\n            sprintf(tmp, \"movl $-%d, %%eax\", GetNum());\n            EmitLn(tmp);\n        } else {\n            Factor();\n            EmitLn(\"neg %eax\");\n        }\n    } else {\n        Factor();\n    }\n}\n\nvoid Multiply()\n{\n    Match('*');\n    Factor();\n    EmitLn(\"imull (%esp), %eax\");\n    /* push of the stack */\n    EmitLn(\"addl $4, %esp\");\n}\n\nvoid Divide()\n{\n    Match('/');\n    Factor();\n\n    /* for a expersion like a/b we have eax=b and %(esp)=a\n     * but we need eax=a, and b on the stack\n     */\n    EmitLn(\"movl (%esp), %edx\");\n    EmitLn(\"addl $4, %esp\");\n\n    EmitLn(\"pushl %eax\");\n\n    EmitLn(\"movl %edx, %eax\");\n\n    /* sign extesnion */\n    EmitLn(\"sarl $31, %edx\");\n    EmitLn(\"idivl (%esp)\");\n    EmitLn(\"addl $4, %esp\");\n\n}\n\nvoid Term()\n{\n    SignedFactor();\n    while (strchr(\"*/\", Look)) {\n        EmitLn(\"pushl %eax\");\n        switch(Look)\n        {\n            case '*':\n                Multiply();\n                break;\n            case '/':\n                Divide();\n                break;\n            default:\n                Expected(\"Mulop\");\n        }\n    }\n}\n\nvoid Add()\n{\n    Match('+');\n    Term();\n    EmitLn(\"addl (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n\nvoid Subtract()\n{\n    Match('-');\n    Term();\n    EmitLn(\"subl (%esp), %eax\");\n    EmitLn(\"negl %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\nvoid Expression()\n{\n    Term();\n    while(IsAddop(Look)) {\n        EmitLn(\"pushl %eax\");\n        switch (Look) {\n            case '+':\n                Add();\n                break;\n            case '-':\n                Subtract();\n                break;\n            default:\n                Expected(\"Addop\");\n        }\n    }\n}\n\nvoid Assignment()\n{\n    char c = GetName();\n    Match('=');\n    BoolExpression();\n    sprintf(tmp, \"lea %c, %%ebx\", c);\n    EmitLn(tmp);\n    EmitLn(\"movl %eax, (%ebx)\");\n}\n\nint main()\n{\n    Init();\n    DoProgram();\n    return 0;\n}\n"
  },
  {
    "path": "6/tutor6.txt",
    "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                     LET'S BUILD A COMPILER!\n\n                                By\n\n                     Jack W. Crenshaw, Ph.D.\n\n                          31 August 1988\n\n\n                   Part VI: BOOLEAN EXPRESSIONS\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1988 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\nINTRODUCTION\n\nIn Part V of this series,  we  took a look at control constructs,\nand developed parsing  routines  to  translate  them  into object\ncode.    We  ended  up  with  a  nice,  relatively  rich  set  of\nconstructs.\n\nAs we left  the  parser,  though,  there  was one big hole in our\ncapabilities:  we  did  not  address  the  issue  of  the  branch\ncondition.  To fill the void,  I  introduced to you a dummy parse\nroutine called Condition, which only served as a place-keeper for\nthe real thing.\n\nOne of the things we'll do in this session is  to  plug that hole\nby expanding Condition into a true parser/translator.\n\n\nTHE PLAN\n\nWe're going to  approach  this installment a bit differently than\nany of the others.    In those other installments, we started out\nimmediately with experiments  using the Pascal compiler, building\nup the parsers from  very  rudimentary  beginnings to their final\nforms, without spending much time in planning  beforehand. That's\ncalled coding without specs, and it's usually frowned  upon.   We\ncould get away with it before because the rules of arithmetic are\npretty well established ...  we  know what a '+' sign is supposed\nto mean without having to discuss it at length.  The same is true\nfor branches and  loops.    But  the  ways  in  which programming\nlanguages  implement  logic  vary quite a bit  from  language  to\nlanguage.  So before we begin serious coding,  we'd  better first\nmake up our minds what it is we want.  And the way to do  that is\nat the level of the BNF syntax rules (the GRAMMAR).\n\n\nTHE GRAMMAR\n\nFor some time  now,  we've been implementing BNF syntax equations\nfor arithmetic expressions, without  ever  actually  writing them\ndown all in one place.  It's time that we did so.  They are:\n\n\n     <expression> ::= <unary op> <term> [<addop> <term>]*\n     <term>       ::= <factor> [<mulop> factor]*\n     <factor>     ::= <integer> | <variable> | ( <expression> )\n\n(Remember, the nice thing about  this grammar is that it enforces\nthe operator precedence hierarchy  that  we  normally  expect for\nalgebra.)\n\nActually,  while we're on the subject, I'd  like  to  amend  this\ngrammar a bit right now.   The  way we've handled the unary minus\nis  a  bit  awkward.  I've found that it's better  to  write  the\ngrammar this way:\n\n\n  <expression>    ::= <term> [<addop> <term>]*\n  <term>          ::= <signed factor> [<mulop> factor]*\n  <signed factor> ::= [<addop>] <factor>\n  <factor>        ::= <integer> | <variable> | (<expression>)\n\n\nThis puts the job of handling the unary minus onto  Factor, which\nis where it really belongs.\n\nThis  doesn't  mean  that  you  have  to  go  back and recode the\nprograms you've already written, although you're free to do so if\nyou like.  But I will be using the new syntax from now on.\n\nNow, it probably won't come as  a  shock  to you to learn that we\ncan define an analogous grammar for Boolean algebra.    A typical\nset or rules is:\n\n\n <b-expression>::= <b-term> [<orop> <b-term>]*\n <b-term>      ::= <not-factor> [AND <not-factor>]*\n <not-factor>  ::= [NOT] <b-factor>\n <b-factor>    ::= <b-literal> | <b-variable> | (<b-expression>)\n\n\nNotice that in this  grammar,  the  operator  AND is analogous to\n'*',  and  OR  (and exclusive OR) to '+'.  The  NOT  operator  is\nanalogous to a unary  minus.    This  hierarchy is not absolutely\nstandard ...  some  languages,  notably  Ada,  treat  all logical\noperators  as  having  the same precedence level ... but it seems\nnatural.\n\nNotice also the slight difference between the way the NOT and the\nunary  minus  are  handled.    In  algebra,  the unary  minus  is\nconsidered to go with the whole term, and so  never  appears  but\nonce in a given term. So an expression like\n\n                    a * -b\n\nor worse yet,\n                    a - -b\n\nis not allowed.  In Boolean algebra, though, the expression\n\n                    a AND NOT b\n\nmakes perfect sense, and the syntax shown allows for that.\n\n\nRELOPS\n\nOK, assuming that you're willing to accept the grammar I've shown\nhere,  we  now  have syntax rules for both arithmetic and Boolean\nalgebra.    The  sticky part comes in when we have to combine the\ntwo.  Why do we have to do that?  Well, the whole subject came up\nbecause of the  need  to  process  the  \"predicates\" (conditions)\nassociated with control statements such as the IF.  The predicate\nis required to have a Boolean value; that is, it must evaluate to\neither TRUE or FALSE.  The branch is  then  taken  or  not taken,\ndepending  on  that  value.  What we expect to see  going  on  in\nprocedure  Condition,  then,  is  the  evaluation  of  a  Boolean\nexpression.\n\nBut there's more to it than that.  A pure Boolean  expression can\nindeed be the predicate of a control statement ... things like\n\n\n          IF a AND NOT b THEN ....\n\n\nBut more often, we see Boolean algebra show up in such things as\n\n\n     IF (x >= 0) and (x <= 100) THEN ...\n\n\nHere,  the  two  terms in parens are Boolean expressions, but the\nindividual terms being compared:  x,  0, and 100,  are NUMERIC in\nnature.  The RELATIONAL OPERATORS >= and <= are the  catalysts by\nwhich the  Boolean  and  the  arithmetic  ingredients  get merged\ntogether.\n\nNow,  in the example above, the terms  being  compared  are  just\nthat:  terms.    However,  in  general  each  side  can be a math\nexpression.  So we can define a RELATION to be:\n\n\n     <relation> ::= <expression> <relop> <expression>  ,\n\n\nwhere  the  expressions  we're  talking  about here are  the  old\nnumeric type, and the relops are any of the usual symbols\n\n\n               =, <> (or !=), <, >, <=, and >=\n\n\nIf you think about it a  bit,  you'll agree that, since this kind\nof predicate has a single Boolean value, TRUE or  FALSE,  as  its\nresult, it is  really  just  another  kind  of factor.  So we can\nexpand the definition of a Boolean factor above to read:\n\n\n    <b-factor> ::=    <b-literal>\n                    | <b-variable>\n                    | (<b-expression>)\n                    | <relation>\n\n\nTHAT's the connection!  The relops and the  relation  they define\nserve to wed the two kinds of algebra.  It  is  worth noting that\nthis implies a hierarchy  where  the  arithmetic expression has a\nHIGHER precedence that  a  Boolean factor, and therefore than all\nthe  Boolean operators.    If you write out the precedence levels\nfor all the operators, you arrive at the following list:\n\n\n          Level   Syntax Element     Operator\n\n          0       factor             literal, variable\n          1       signed factor      unary minus\n          2       term               *, /\n          3       expression         +, -\n          4       b-factor           literal, variable, relop\n          5       not-factor         NOT\n          6       b-term             AND\n          7       b-expression       OR, XOR\n\n\nIf  we're willing to accept that  many  precedence  levels,  this\n\n\ngrammar seems reasonable.  Unfortunately,  it  won't  work!   The\ngrammar may be great in theory,  but  it's  no good at all in the\npractice of a top-down parser.  To see the problem,  consider the\ncode fragment:\n\n\n     IF ((((((A + B + C) < 0 ) AND ....\n\n\nWhen the parser is parsing this code, it knows after it  sees the\nIF token that a Boolean expression is supposed to be next.  So it\ncan set up to begin evaluating such an expression.  But the first\nexpression in the example is an ARITHMETIC expression, A + B + C.\nWhat's worse, at the point that the parser has read this  much of\nthe input line:\n\n\n     IF ((((((A   ,\n\n\nit  still has no way of knowing which  kind  of  expression  it's\ndealing  with.  That won't do, because  we  must  have  different\nrecognizers  for the two cases.  The  situation  can  be  handled\nwithout  changing  any  of  our  definitions, but only  if  we're\nwilling to accept an arbitrary amount of backtracking to work our\nway out of bad guesses.  No compiler  writer  in  his  right mind\nwould agree to that.\n\nWhat's going  on  here  is  that  the  beauty and elegance of BNF\ngrammar  has  met  face  to  face with the realities of  compiler\ntechnology.\n\nTo  deal  with  this situation, compiler writers have had to make\ncompromises  so  that  a  single  parser can handle  the  grammar\nwithout backtracking.\n\n\nFIXING THE GRAMMAR\n\nThe  problem  that  we've  encountered  comes   up   because  our\ndefinitions of both arithmetic and Boolean factors permit the use\nof   parenthesized  expressions.    Since  the  definitions   are\nrecursive,  we  can  end  up  with  any  number   of   levels  of\nparentheses, and the  parser  can't know which kind of expression\nit's dealing with.\n\nThe  solution is simple, although it  ends  up  causing  profound\nchanges to our  grammar.    We  can only allow parentheses in one\nkind  of factor.  The way to do  that  varies  considerably  from\nlanguage  to  language.  This is one  place  where  there  is  NO\nagreement or convention to help us.\n\nWhen Niklaus Wirth designed Pascal, the desire was  to  limit the\nnumber of levels of precedence (fewer parse routines, after all).\nSo the OR  and  exclusive  OR  operators are treated just like an\nAddop  and  processed   at   the  level  of  a  math  expression.\nSimilarly, the AND is  treated  like  a  Mulop and processed with\nTerm.  The precedence levels are\n\n\n          Level   Syntax Element     Operator\n\n          0       factor             literal, variable\n          1       signed factor      unary minus, NOT\n          2       term               *, /, AND\n          3       expression         +, -, OR\n\n\nNotice that there is only ONE set of syntax  rules,  applying  to\nboth  kinds  of  operators.    According to this  grammar,  then,\nexpressions like\n\n     x + (y AND NOT z) DIV 3\n\nare perfectly legal.  And, in  fact,  they  ARE ... as far as the\nparser  is  concerned.    Pascal  doesn't  allow  the  mixing  of\narithmetic and Boolean variables, and things like this are caught\nat the SEMANTIC level, when it comes time to  generate  code  for\nthem, rather than at the syntax level.\n\nThe authors of C took  a  diametrically  opposite  approach: they\ntreat the operators as  different,  and  have something much more\nakin  to our seven levels of precedence.  In fact, in C there are\nno fewer than 17 levels!  That's because C also has the operators\n'=', '+=' and its kin, '<<', '>>', '++', '--', etc.   Ironically,\nalthough in C the  arithmetic  and  Boolean operators are treated\nseparately, the variables are  NOT  ...  there  are no Boolean or\nlogical variables in  C,  so  a  Boolean  test can be made on any\ninteger value.\n\nWe'll do something that's  sort  of  in-between.   I'm tempted to\nstick  mostly  with  the Pascal approach, since  that  seems  the\nsimplest from an implementation point  of view, but it results in\nsome funnies that I never liked very much, such as the fact that,\nin the expression\n\n     IF (c >= 'A') and (c <= 'Z') then ...\n\nthe  parens  above  are REQUIRED.  I never understood why before,\nand  neither my compiler nor any human  ever  explained  it  very\nwell, either.  But now, we  can  all see that the 'and' operator,\nhaving the precedence of a multiply, has a higher  one  than  the\nrelational operators, so without  the  parens  the  expression is\nequivalent to\n\n     IF c >= ('A' and c) <= 'Z' then\n\nwhich doesn't make sense.\n\nIn  any  case,  I've  elected  to  separate  the  operators  into\ndifferent levels, although not as many as in C.\n\n\n <b-expression> ::= <b-term> [<orop> <b-term>]*\n <b-term>       ::= <not-factor> [AND <not-factor>]*\n <not-factor>   ::= [NOT] <b-factor>\n <b-factor>     ::= <b-literal> | <b-variable> | <relation>\n <relation>     ::= | <expression> [<relop> <expression]\n <expression>   ::= <term> [<addop> <term>]*\n <term>         ::= <signed factor> [<mulop> factor]*\n <signed factor>::= [<addop>] <factor>\n <factor>       ::= <integer> | <variable> | (<b-expression>)\n\n\nThis grammar  results  in  the  same  set  of seven levels that I\nshowed earlier.  Really, it's almost the same grammar ...  I just\nremoved the option of parenthesized b-expressions  as  a possible\nb-factor, and added the relation as a legal form of b-factor.\n\nThere is one subtle but crucial difference, which  is  what makes\nthe  whole  thing  work.    Notice  the  square brackets  in  the\ndefinition  of a relation.  This means that  the  relop  and  the\nsecond expression are OPTIONAL.\n\nA strange consequence of this grammar (and one shared  by  C)  is\nthat EVERY expression  is  potentially a Boolean expression.  The\nparser will always be looking  for a Boolean expression, but will\n\"settle\" for an arithmetic one.  To be honest,  that's  going  to\nslow down the parser, because it has to wade through  more layers\nof procedure calls.  That's  one reason why Pascal compilers tend\nto compile faster than C compilers.  If it's raw speed  you want,\nstick with the Pascal syntax.\n\n\nTHE PARSER\n\nNow that we've gotten through the decision-making process, we can\npress on with development of a parser.  You've done this  with me\nseveral times now, so you know  the  drill: we begin with a fresh\ncopy of the cradle, and begin  adding  procedures one by one.  So\nlet's do it.\n\nWe begin, as we did in the arithmetic case, by dealing  only with\nBoolean literals rather than variables.  This gives us a new kind\nof input token, so we're also going to need a new recognizer, and\na  new procedure to read instances of that  token  type.    Let's\nstart by defining the two new procedures:\n\n\n{--------------------------------------------------------------}\n{ Recognize a Boolean Literal }\n\nfunction IsBoolean(c: char): Boolean;\nbegin\n   IsBoolean := UpCase(c) in ['T', 'F'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Get a Boolean Literal }\n\nfunction GetBoolean: Boolean;\nvar c: char;\nbegin\n   if not IsBoolean(Look) then Expected('Boolean Literal');\n   GetBoolean := UpCase(Look) = 'T';\n   GetChar;\nend;\n{--------------------------------------------------------------}\n\n\nType  these routines into your program.  You  can  test  them  by\nadding into the main program the print statement\n\n\n   WriteLn(GetBoolean);\n\n\n\n\nOK, compile the program and test it.   As  usual,  it's  not very\nimpressive so far, but it soon will be.\n\nNow, when we were dealing with numeric data we had to  arrange to\ngenerate code to load the values into D0.  We need to do the same\nfor Boolean data.   The  usual way to encode Boolean variables is\nto let 0 stand for FALSE,  and  some  other value for TRUE.  Many\nlanguages, such as C, use an  integer  1  to represent it.  But I\nprefer FFFF hex  (or  -1),  because  a bitwise NOT also becomes a\nBoolean  NOT.  So now we need to emit the right assembler code to\nload  those  values.    The  first cut at the Boolean  expression\nparser (BoolExpression, of course) is:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Boolean Expression }\n\nprocedure BoolExpression;\nbegin\n   if not IsBoolean(Look) then Expected('Boolean Literal');\n   if GetBoolean then\n      EmitLn('MOVE #-1,D0')\n   else\n      EmitLn('CLR D0');\nend;\n{---------------------------------------------------------------}\n\n\nAdd  this procedure to your parser, and call  it  from  the  main\nprogram (replacing the  print  statement you had just put there).\nAs you  can  see,  we  still don't have much of a parser, but the\noutput code is starting to look more realistic.\n\nNext, of course, we have to expand the definition  of  a  Boolean\nexpression.  We already have the BNF rule:\n\n\n <b-expression> ::= <b-term> [<orop> <b-term>]*\n\n\nI prefer the Pascal versions of the \"orops\",  OR  and  XOR.   But\nsince we are keeping to single-character tokens here, I'll encode\nthose with '|' and  '~'.  The  next  version of BoolExpression is\nalmost a direct copy of the arithmetic procedure Expression:\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Boolean OR }\n\nprocedure BoolOr;\nbegin\n   Match('|');\n   BoolTerm;\n   EmitLn('OR (SP)+,D0');\nend;\n\n\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate an Exclusive Or }\n\nprocedure BoolXor;\nbegin\n   Match('~');\n   BoolTerm;\n   EmitLn('EOR (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Boolean Expression }\n\nprocedure BoolExpression;\nbegin\n   BoolTerm;\n   while IsOrOp(Look) do begin\n      EmitLn('MOVE D0,-(SP)');\n      case Look of\n       '|': BoolOr;\n       '~': BoolXor;\n      end;\n   end;\nend;\n{---------------------------------------------------------------}\n\n\nNote the new recognizer  IsOrOp,  which is also a copy, this time\nof IsAddOp:\n\n\n{--------------------------------------------------------------}\n{ Recognize a Boolean Orop }\n\nfunction IsOrop(c: char): Boolean;\nbegin\n   IsOrop := c in ['|', '~'];\nend;\n{--------------------------------------------------------------}\n\nOK, rename the old  version  of  BoolExpression to BoolTerm, then\nenter  the  code  above.  Compile and test this version.  At this\npoint, the  output  code  is  starting  to  look pretty good.  Of\ncourse, it doesn't make much sense to do a lot of Boolean algebra\non  constant values, but we'll soon be  expanding  the  types  of\nBooleans we deal with.\n\nYou've  probably  already  guessed  what  the next step  is:  The\nBoolean version of Term.\n\nRename the current procedure BoolTerm to NotFactor, and enter the\nfollowing new version of BoolTerm.  Note that is is  much simpler\nthan  the  numeric  version,  since  there  is  no equivalent  of\ndivision.\n\n\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Boolean Term }\n\nprocedure BoolTerm;\nbegin\n   NotFactor;\n   while Look = '&' do begin\n      EmitLn('MOVE D0,-(SP)');\n      Match('&');\n      NotFactor;\n      EmitLn('AND (SP)+,D0');\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nNow,  we're  almost  home.  We are  translating  complex  Boolean\nexpressions, although only for constant values.  The next step is\nto allow for the NOT.  Write the following procedure:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Boolean Factor with NOT }\n\nprocedure NotFactor;\nbegin\n   if Look = '!' then begin\n      Match('!');\n      BoolFactor;\n      EmitLn('EOR #-1,D0');\n      end\n   else\n      BoolFactor;\nend;\n{--------------------------------------------------------------}\n\nAnd  rename  the  earlier procedure to BoolFactor.  Now try that.\nAt this point  the  parser  should  be able to handle any Boolean\nexpression you care to throw at it.  Does it?  Does it trap badly\nformed expressions?\n\nIf you've  been  following  what  we  did  in the parser for math\nexpressions, you know  that  what  we  did next was to expand the\ndefinition of a factor to include variables and parens.  We don't\nhave  to do that for the Boolean  factor,  because  those  little\nitems get taken care of by the next step.  It  takes  just  a one\nline addition to BoolFactor to take care of relations:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Boolean Factor }\n\nprocedure BoolFactor;\nbegin\n   if IsBoolean(Look) then\n      if GetBoolean then\n         EmitLn('MOVE #-1,D0')\n      else\n         EmitLn('CLR D0')\n      else Relation;\nend;\n{--------------------------------------------------------------}\n\n\nYou  might be wondering when I'm going  to  provide  for  Boolean\nvariables and parenthesized Boolean expressions.  The  answer is,\nI'm NOT!   Remember,  we  took  those out of the grammar earlier.\nRight now all I'm  doing  is  encoding  the grammar we've already\nagreed  upon.    The compiler itself can't  tell  the  difference\nbetween a Boolean variable  or  expression  and an arithmetic one\n... all of those will be handled by Relation, either way.\n\n\nOf course, it would help to have some code for Relation.  I don't\nfeel comfortable, though,  adding  any  more  code  without first\nchecking out what we already have.  So for now let's just write a\ndummy  version  of  Relation  that  does nothing except  eat  the\ncurrent character, and write a little message:\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Relation }\n\nprocedure Relation;\nbegin\n   WriteLn('<Relation>');\n   GetChar;\nend;\n{--------------------------------------------------------------}\n\nOK, key  in  this  code  and  give  it a try.  All the old things\nshould still work ... you should be able to generate the code for\nANDs, ORs, and  NOTs.    In  addition, if you type any alphabetic\ncharacter you should get a little <Relation>  place-holder, where\na  Boolean factor should be.  Did you get that?  Fine, then let's\nmove on to the full-blown version of Relation.\n\nTo  get  that,  though, there is a bit of groundwork that we must\nlay first.  Recall that a relation has the form\n\n\n <relation>     ::= | <expression> [<relop> <expression]\n\n\nSince  we have a new kind of operator, we're also going to need a\nnew Boolean function to  recognize  it.    That function is shown\nbelow.  Because of the single-character limitation,  I'm sticking\nto the four operators  that  can be encoded with such a character\n(the \"not equals\" is encoded by '#').\n\n\n{--------------------------------------------------------------}\n{ Recognize a Relop }\n\nfunction IsRelop(c: char): Boolean;\nbegin\n   IsRelop := c in ['=', '#', '<', '>'];\nend;\n{--------------------------------------------------------------}\n\n\nNow, recall  that  we're  using  a zero or a -1 in register D0 to\nrepresent  a Boolean value, and also  that  the  loop  constructs\nexpect the flags to be set to correspond.   In  implementing  all\nthis on the 68000, things get a a little bit tricky.\n\nSince the loop constructs operate only on the flags, it  would be\nnice (and also quite  efficient)  just to set up those flags, and\n\n\nnot load  anything  into  D0  at all.  This would be fine for the\nloops  and  branches,  but remember that the relation can be used\nANYWHERE a Boolean factor could be  used.   We may be storing its\nresult to a Boolean variable.  Since we can't know at  this point\nhow the result is going to be used, we must allow for BOTH cases.\n\nComparing numeric data  is  easy  enough  ...  the  68000  has an\noperation  for  that ... but it sets  the  flags,  not  a  value.\nWhat's more,  the  flags  will  always  be  set the same (zero if\nequal, etc.), while we need the zero flag set differently for the\neach of the different relops.\n\nThe solution is found in the 68000 instruction Scc, which  sets a\nbyte value to 0000 or FFFF (funny how that works!) depending upon\nthe  result  of  the  specified   condition.    If  we  make  the\ndestination byte to be D0, we get the Boolean value needed.\n\nUnfortunately,  there's one  final  complication:  unlike  almost\nevery other instruction in the 68000 set, Scc does NOT  reset the\ncondition flags to match the data being stored.  So we have to do\none last step, which is to test D0 and set the flags to match it.\nIt must seem to be a trip around the moon to get what we want: we\nfirst perform the test, then test the flags to set data  into D0,\nthen test D0 to set the flags again.  It  is  sort of roundabout,\nbut it's the most straightforward way to get the flags right, and\nafter all it's only a couple of instructions.\n\nI  might  mention  here that this area is, in my opinion, the one\nthat represents the biggest difference between the  efficiency of\nhand-coded assembler language and  compiler-generated  code.   We\nhave  seen  already  that  we  lose   efficiency   in  arithmetic\noperations, although later I plan to show you how to improve that\na  bit.    We've also seen that the control constructs themselves\ncan be done quite efficiently  ... it's usually very difficult to\nimprove  on  the  code generated for an  IF  or  a  WHILE.    But\nvirtually every compiler I've ever seen generates  terrible code,\ncompared to assembler, for the computation of a Boolean function,\nand particularly for relations.    The  reason  is just what I've\nhinted at above.  When I'm writing code in assembler, I  go ahead\nand perform the test the most convenient way I can, and  then set\nup the branch so that it goes the way it should.    In  effect, I\n\"tailor\"  every  branch  to the situation.  The compiler can't do\nthat (practically), and it also can't know that we don't  want to\nstore the result of the test as a Boolean variable.    So it must\ngenerate  the  code  in a very strict order, and it often ends up\nloading  the  result  as  a  Boolean  that  never gets  used  for\nanything.\n\nIn  any  case,  we're now ready to look at the code for Relation.\nIt's shown below with its companion procedures:\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Equals\" }\n\nprocedure Equals;\nbegin\n   Match('=');\n   Expression;\n   EmitLn('CMP (SP)+,D0');\n   EmitLn('SEQ D0');\nend;\n\n\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Not Equals\" }\n\nprocedure NotEquals;\nbegin\n   Match('#');\n   Expression;\n   EmitLn('CMP (SP)+,D0');\n   EmitLn('SNE D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Less Than\" }\n\nprocedure Less;\nbegin\n   Match('<');\n   Expression;\n   EmitLn('CMP (SP)+,D0');\n   EmitLn('SGE D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Relational \"Greater Than\" }\n\nprocedure Greater;\nbegin\n   Match('>');\n   Expression;\n   EmitLn('CMP (SP)+,D0');\n   EmitLn('SLE D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Relation }\n\nprocedure Relation;\nbegin\n   Expression;\n   if IsRelop(Look) then begin\n      EmitLn('MOVE D0,-(SP)');\n      case Look of\n       '=': Equals;\n       '#': NotEquals;\n       '<': Less;\n       '>': Greater;\n      end;\n   EmitLn('TST D0');\n   end;\nend;\n{---------------------------------------------------------------}\n\nNow, that call to  Expression  looks familiar!  Here is where the\neditor of your system comes in handy.  We have  already generated\ncode  for  Expression  and its buddies in previous sessions.  You\ncan  copy  them  into your file now.  Remember to use the single-\ncharacter  versions.  Just to be  certain,  I've  duplicated  the\narithmetic procedures below.  If  you're  observant,  you'll also\nsee that I've changed them a little to make  them  correspond  to\nthe latest version of the syntax.  This change is  NOT necessary,\nso  you  may  prefer  to  hold  off  on  that  until you're  sure\n\n\neverything is working.\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Identifier }\n\nprocedure Ident;\nvar Name: char;\nbegin\n   Name:= GetName;\n   if Look = '(' then begin\n      Match('(');\n      Match(')');\n      EmitLn('BSR ' + Name);\n      end\n   else\n      EmitLn('MOVE ' + Name + '(PC),D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Factor }\n\nprocedure Expression; Forward;\n\nprocedure Factor;\nbegin\n   if Look = '(' then begin\n      Match('(');\n      Expression;\n      Match(')');\n      end\n   else if IsAlpha(Look) then\n      Ident\n   else\n      EmitLn('MOVE #' + GetNum + ',D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate the First Math Factor }\n\n\nprocedure SignedFactor;\nbegin\n   if Look = '+' then\n      GetChar;\n   if Look = '-' then begin\n      GetChar;\n      if IsDigit(Look) then\n         EmitLn('MOVE #-' + GetNum + ',D0')\n      else begin\n         Factor;\n         EmitLn('NEG D0');\n      end;\n   end\n   else Factor;\nend;\n\n\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Multiply }\n\nprocedure Multiply;\nbegin\n   Match('*');\n   Factor;\n   EmitLn('MULS (SP)+,D0');\nend;\n\n\n{-------------------------------------------------------------}\n{ Recognize and Translate a Divide }\n\nprocedure Divide;\nbegin\n   Match('/');\n   Factor;\n   EmitLn('MOVE (SP)+,D1');\n   EmitLn('EXS.L D0');\n   EmitLn('DIVS D1,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Term }\n\nprocedure Term;\nbegin\n   SignedFactor;\n   while Look in ['*', '/'] do begin\n      EmitLn('MOVE D0,-(SP)');\n      case Look of\n       '*': Multiply;\n       '/': Divide;\n      end;\n   end;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate an Add }\n\nprocedure Add;\nbegin\n   Match('+');\n   Term;\n   EmitLn('ADD (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Subtract }\n\nprocedure Subtract;\nbegin\n   Match('-');\n   Term;\n   EmitLn('SUB (SP)+,D0');\n   EmitLn('NEG D0');\nend;\n\n\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nprocedure Expression;\nbegin\n   Term;\n   while IsAddop(Look) do begin\n      EmitLn('MOVE D0,-(SP)');\n      case Look of\n       '+': Add;\n       '-': Subtract;\n      end;\n   end;\nend;\n{---------------------------------------------------------------}\n\n\nThere you have it ... a parser that can  handle  both  arithmetic\nAND Boolean algebra, and things  that combine the two through the\nuse of relops.   I suggest you file away a copy of this parser in\na safe place for future reference, because in our next step we're\ngoing to be chopping it up.\n\n\nMERGING WITH CONTROL CONSTRUCTS\n\nAt this point, let's go back to the file we had  previously built\nthat parses control  constructs.    Remember  those  little dummy\nprocedures called Condition and  Expression?    Now you know what\ngoes in their places!\n\nI  warn you, you're going to have to  do  some  creative  editing\nhere, so take your time and get it right.  What you need to do is\nto copy all of  the  procedures from the logic parser, from Ident\nthrough  BoolExpression, into the parser for control  constructs.\nInsert  them  at  the current location of Condition.  Then delete\nthat  procedure,  as  well as the dummy Expression.  Next, change\nevery call  to  Condition  to  refer  to  BoolExpression instead.\nFinally, copy the procedures IsMulop, IsOrOp, IsRelop, IsBoolean,\nand GetBoolean into place.  That should do it.\n\nCompile  the  resulting program and give it  a  try.    Since  we\nhaven't  used  this  program in awhile, don't forget that we used\nsingle-character tokens for IF,  WHILE,  etc.   Also don't forget\nthat any letter not a keyword just gets echoed as a block.\n\nTry\n\n     ia=bxlye\n\nwhich stands for \"IF a=b X ELSE Y ENDIF\".\n\nWhat do you think?  Did it work?  Try some others.\n\n\nADDING ASSIGNMENTS\n\nAs long as we're this far,  and  we already have the routines for\nexpressions in place, we might  as well replace the \"blocks\" with\nreal assignment statements.    We've already done that before, so\nit won't be too hard.   Before  taking that step, though, we need\nto fix something else.\n\n\n\nWe're soon going to find  that the one-line \"programs\" that we're\nhaving to write here will really cramp our style.  At  the moment\nwe  have  no  cure for that, because our parser doesn't recognize\nthe end-of-line characters, the carriage return (CR) and the line\nfeed (LF).  So before going any further let's plug that hole.\n\nThere are  a  couple  of  ways to deal with the CR/LFs.  One (the\nC/Unix approach) is just to  treat them as additional white space\ncharacters  and  ignore  them.    That's actually not such a  bad\napproach,  but  it  does  sort  of produce funny results for  our\nparser as  it  stands  now.   If it were reading its input from a\nsource file as  any  self-respecting  REAL  compiler  does, there\nwould be no problem.  But we're reading input from  the keyboard,\nand we're sort of conditioned  to expect something to happen when\nwe hit the return key.  It won't, if we just skip over the CR and\nLF  (try it).  So I'm going to use a different method here, which\nis NOT necessarily the  best  approach in the long run.  Consider\nit a temporary kludge until we're further along.\n\nInstead of skipping the CR/LF,  We'll let the parser go ahead and\ncatch them, then  introduce  a  special  procedure,  analogous to\nSkipWhite, that skips them only in specified \"legal\" spots.\n\nHere's the procedure:\n\n\n{--------------------------------------------------------------}\n{ Skip a CRLF }\n\nprocedure Fin;\nbegin\n   if Look = CR then GetChar;\n   if Look = LF then GetChar;\nend;\n\n{--------------------------------------------------------------}\n\n\nNow, add two calls to Fin in procedure Block, like this:\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Statement Block }\n\nprocedure Block(L: string);\nbegin\n   while not(Look in ['e', 'l', 'u']) do begin\n      Fin;\n      case Look of\n       'i': DoIf(L);\n       'w': DoWhile;\n       'p': DoLoop;\n       'r': DoRepeat;\n       'f': DoFor;\n       'd': DoDo;\n       'b': DoBreak(L);\n       else Other;\n      end;\n      Fin;\n end;\nend;\n{--------------------------------------------------------------}\n\n\n\nNow, you'll find that you  can use multiple-line \"programs.\"  The\nonly restriction is that you can't separate an IF or  WHILE token\nfrom its predicate.\n\nNow we're ready to include  the  assignment  statements.   Simply\nchange  that  call  to  Other  in  procedure  Block  to a call to\nAssignment, and add  the  following procedure, copied from one of\nour  earlier  programs.     Note   that   Assignment   now  calls\nBoolExpression, so that we can assign Boolean variables.\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Assignment Statement }\n\nprocedure Assignment;\nvar Name: char;\nbegin\n   Name := GetName;\n   Match('=');\n   BoolExpression;\n   EmitLn('LEA ' + Name + '(PC),A0');\n   EmitLn('MOVE D0,(A0)');\nend;\n{--------------------------------------------------------------}\n\n\nWith  that change, you should now be  able  to  write  reasonably\nrealistic-looking  programs,  subject  only  to our limitation on\nsingle-character tokens.  My original intention was to get rid of\nthat limitation for you, too.  However, that's going to require a\nfairly major change to what we've  done  so  far.  We need a true\nlexical scanner, and that requires some structural changes.  They\nare not BIG changes that require us to  throw  away  all  of what\nwe've done so far ... with care, it can be done with very minimal\nchanges, in fact.  But it does require that care.\n\nThis installment  has already gotten pretty long, and it contains\nsome pretty heavy stuff, so I've decided to leave that step until\nnext  time, when you've had a little more  time  to  digest  what\nwe've done and are ready to start fresh.\n\nIn the next installment, then,  we'll build a lexical scanner and\neliminate the single-character  barrier  once and for all.  We'll\nalso write our first complete  compiler, based on what we've done\nin this session.  See you then.\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1988 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\n\u001a"
  },
  {
    "path": "7/Makefile",
    "content": "IN=main.c cradle.c\nOUT=main\nFLAGS=-Wall -Werror\n\nall:\n\tgcc -o $(OUT) $(IN) $(FLAGS)\n\nrun:\n\t./$(OUT)\n\n.PHONY: clean\nclean:\n\trm $(OUT)\n"
  },
  {
    "path": "7/cradle.c",
    "content": "#include \"cradle.h\"\n#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n#include <stdbool.h>\n\n#define TABLE_SIZE 26\nstatic int LCount = 0;\nstatic char labelName[MAX_BUF];\n/*static char identifier[MAX_BUF];*/\nstatic int Table[TABLE_SIZE];\nchar tmp[MAX_BUF];\n\n\n/* Keywords symbol table */\nconst char * const KWList[] = {\n    \"IF\",\n    \"ELSE\",\n    \"ENDIF\",\n    \"END\",\n};\nconst char KWCode[] = \"xilee\";\nconst int KWNum = sizeof(KWList)/sizeof(*KWList);\n\nchar Token;      /* current token */\nchar Value[MAX_BUF];     /* string token of Look */\n\n/* Table Lookup\n * If the input string matches a table entry, return the entry index, else\n * return -1.\n * *n* is the size of the table */\nint Lookup(const char  * const table[], const char *string, int n)\n{\n    int i;\n    bool found = false;\n\n    for (i = 0; i < n; ++i) {\n        if (strcmp(table[i], string) == 0) {\n            found = true;\n            break;\n        }\n    }\n    return found ? i : -1;\n}\n\n\nvoid Scan()\n{ \n    /* in Unix/Linux, Endline is CR instead of LF CR in MSDOS*/\n    SkipWhite();   \n    while(Look == '\\n') {\n        Fin();\n    }\n\n    GetName();\n    int index = Lookup(KWList, Value, KWNum);\n    Token = KWCode[index+1];\n}\n\n/* Helper Functions */\nchar uppercase(char c)\n{\n    if (IsAlpha(c)) {\n        return (c & 0xDF);\n    } else {\n        return c;\n    }\n}\n\nvoid GetChar()\n{\n    Look = getchar();\n}\n\n\nvoid Error(char *s)\n{\n    printf(\"\\nError: %s.\", s);\n}\n\nvoid Abort(char *s)\n{\n    Error(s);\n    exit(1);\n}\n\n\nvoid Expected(char *s)\n{\n    sprintf(tmp, \"%s Expected\", s);\n    Abort(tmp);\n}\n\n\nvoid Match(char x)\n{\n    if(Look == x) {\n        GetChar();\n    } else {\n        sprintf(tmp, \"' %c ' \",  x);\n        Expected(tmp);\n    }\n}\n\nvoid MatchString(char *str)\n{\n    if (strcmp(Value, str) != 0) {\n        sprintf(tmp, \"\\\"%s\\\"\", Value);\n        Expected(tmp);\n    }\n}\n\nvoid Newline()\n{\n    if (Look == '\\r') {\n        GetChar();\n        if (Look == '\\n') {\n            GetChar();\n        }\n    } else if (Look == '\\n') {\n        GetChar();\n    }\n}\n\nint IsWhite(char c)\n{\n    return strchr(\" \\t\\r\\n\", c) != NULL;\n}\n\nint IsOp(char c)\n{\n    return strchr(\"+-*/<>:=\", c) != NULL;\n}\n\nint IsAlpha(char c)\n{\n    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');\n}\n\nint IsDigit(char c)\n{\n    return (c >= '0') && (c <= '9');\n}\n\nint IsAddop(char c)\n{\n    return (c == '+') || (c == '-');\n}\n\nint IsBoolean(char c)\n{\n    return strchr(\"TF\", uppercase(c)) != NULL;\n}\n\nint IsAlNum(char c)\n{\n    return IsAlpha(c) || IsDigit(c);\n}\n\nvoid GetName()\n{\n    SkipWhite();   \n    while(Look == '\\n') {\n        Fin();\n    }\n\n    char *p = Value;\n    if (!IsAlpha(Look)) {\n        Expected(\"Name\");\n    }\n\n    while(IsAlNum(Look)) {\n        *p++ = uppercase(Look);\n        GetChar();\n    }\n    *p = '\\0';\n}\n\nvoid GetNum()\n{\n    SkipWhite();\n    char *p = Value;\n    if( !IsDigit(Look)) {\n        Expected(\"Integer\");\n    }\n\n    while (IsDigit(Look)) {\n        *p++ = Look;\n        GetChar();\n    }\n    *p = '\\0';\n    Token = '#';\n}\n\nint GetBoolean()\n{\n    if (!IsBoolean(Look)) {\n        Expected(\"Boolean Literal\");\n    }\n    int ret = uppercase(Look) == 'T';\n    GetChar();\n    return ret;\n}\n\nint IsOrop(char c)\n{\n    return strchr(\"|~\", c) != NULL;\n}\n\nint IsRelop(char c)\n{\n    return strchr(\"=#<>\", c) != NULL;\n}\n\nvoid Emit(char *s)\n{\n    printf(\"\\t%s\", s);\n}\n\nvoid EmitLn(char *s)\n{\n    Emit(s);\n    printf(\"\\n\");\n}\n\nvoid Init()\n{\n    LCount = 0;\n\n    InitTable();\n    GetChar();\n}\n\nvoid SkipWhite()\n{\n    while (IsWhite(Look)) {\n        GetChar();\n    }\n}\n\nvoid InitTable()\n{\n    int i;\n    for (i = 0; i < TABLE_SIZE; i++) {\n        Table[i] = 0;\n    }\n\n}\n\nchar *NewLabel()\n{\n    sprintf(labelName, \"L%02d\", LCount);\n    LCount ++;\n    return labelName;\n}\n\nvoid PostLabel(char *label)\n{\n    printf(\"%s:\\n\", label);\n}\n\nvoid Fin()\n{\n    if (Look == '\\r') {\n        GetChar();\n    }\n    if (Look == '\\n') {\n        GetChar();\n    }\n}\n\n"
  },
  {
    "path": "7/cradle.h",
    "content": "#ifndef _CRADLE_H\n#define _CRADLE_H\n\n#define MAX_BUF 100\nchar Look;\nextern char Token;      /* current token */\nextern char Value[MAX_BUF];     /* string token of Look */\nextern char tmp[MAX_BUF];\n\nvoid GetChar();\n\nvoid Error(char *s);\nvoid Abort(char *s);\nvoid Expected(char *s);\nvoid Match(char x);\nvoid MatchString(char *str);\n\nvoid Newline();\n\nint IsWhite(char c);\nint IsOp(char c);\n\nint IsAlpha(char c);\nint IsDigit(char c);\nint IsAlNum(char c);\nint IsAddop(char c);\nint IsBoolean(char c);\nint IsOrop(char c);\nint IsRelop(char c);\n\nvoid GetName();\nvoid GetNum();\nvoid GetOp();\nint GetBoolean();\n\nvoid Scan();\nvoid SkipWhite();\n\nvoid Emit(char *s);\nvoid EmitLn(char *s);\n\nvoid Init();\nvoid InitTable();\n\nchar *NewLabel();\nvoid PostLabel(char *label);\n\nvoid Fin();\n#endif\n"
  },
  {
    "path": "7/main.c",
    "content": "/* not that only IF statement is supported for merging lexer and parser in\n * chapter 7.\n */\n#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n#include <stdbool.h>\n\n#include \"cradle.h\"\n\n#ifdef DEBUG\n#define dprint(fmt, ...) printf(fmt, __VA_ARGS__);\n#else\n#define dprint(fmt, ...)\n#endif\n\nvoid Other();\nvoid Block();\nvoid DoProgram();\nvoid DoIf();\nvoid DoWhile();\nvoid DoLoop();\nvoid DoRepeat();\nvoid DoFor();\nvoid Expression();\nvoid DoDo();\nvoid DoBreak(char *L);\n\n/* Added in chap6 */\nvoid BoolFactor();\nvoid NotFactor();\nvoid BoolTerm();\nvoid BoolExpression();\nvoid BoolOr();\nvoid BoolXor();\nvoid Relation();\nvoid Equals();\nvoid NotEquals();\nvoid Less();\nvoid Greater();\nvoid Ident();\nvoid Factor();\nvoid SignedFactor();\nvoid Multiply();\nvoid Divide();\nvoid Term();\nvoid Add();\nvoid Subtract();\nvoid Expression();\nvoid Assignment();\nvoid Condition();\n\nvoid Block()\n{\n    Scan();\n    while (! strchr(\"el\", Token)) {\n        dprint(\"Block: get Look = %c\\n\", Look);\n        switch (Token) {\n            case 'i':\n                DoIf();\n                break;\n            case '\\n':\n                while(Look == '\\n') {\n                    Fin();\n                }\n                break;\n            default:\n                Assignment();\n                break;\n        }\n        Scan();\n    }\n}\n\nvoid DoProgram()\n{\n    Block();\n    MatchString(\"END\");\n    EmitLn(\"END\");\n}\n\nvoid DoIf()\n{\n    Condition();\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    strcpy(L1, NewLabel());\n    strcpy(L2, L1);\n\n    sprintf(tmp, \"jz %s\", L1);\n    EmitLn(tmp);\n\n    Block();\n\n    if (Token == 'l') {\n        /* match *else* statement */\n        strcpy(L2, NewLabel());\n\n        sprintf(tmp, \"jmp %s\", L2);\n        EmitLn(tmp);\n\n        PostLabel(L1);\n\n        Block();\n    }\n\n    PostLabel(L2);\n    MatchString(\"ENDIF\");\n}\n\nvoid DoWhile()\n{\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n\n    Match('w');\n    strcpy(L1, NewLabel());\n    strcpy(L2, NewLabel());\n    PostLabel(L1);\n    BoolExpression();\n    sprintf(tmp, \"jz %s\", L2);\n    EmitLn(tmp);\n    Block(L2);\n    Match('e');\n    sprintf(tmp, \"jmp %s\", L1);\n    EmitLn(tmp);\n    PostLabel(L2);\n}\n\nvoid DoLoop()\n{\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    Match('p');\n    strcpy(L1, NewLabel());\n    strcpy(L2, NewLabel());\n    PostLabel(L1);\n    Block(L2);\n    Match('e');\n    sprintf(tmp, \"jmp %s\", L1);\n    EmitLn(tmp);\n    PostLabel(L2);\n}\n\nvoid DoRepeat()\n{\n    char L1[MAX_BUF];\n    char L2[MAX_BUF];\n    Match('r');\n    strcpy(L1, NewLabel());\n    strcpy(L2, NewLabel());\n    PostLabel(L1);\n    Block(L2);\n    Match('u');\n    BoolExpression();\n\n    sprintf(tmp, \"jz %s\", L1);\n    EmitLn(tmp);\n    PostLabel(L2);\n}\n\n\nvoid DoBreak(char *L)\n{\n    Match('b');\n    if (L != NULL) {\n        sprintf(tmp, \"jmp %s\", L);\n        EmitLn(tmp);\n    } else {\n        Abort(\"No loop to break from\");\n    }\n}\n\nvoid BoolFactor()\n{\n    if (IsBoolean(Look)) {\n        if (GetBoolean()) {\n            EmitLn(\"movl $-1, %eax\");\n        } else {\n            EmitLn(\"xor %eax, %eax\");\n        }\n    } else {\n        Relation();\n    }\n}\n\nvoid Relation()\n{\n    Expression();\n    if (IsRelop(Look)) {\n        EmitLn(\"pushl %eax\");\n        switch (Look) {\n            case '=':\n                Equals();\n                break;\n            case '#':\n                NotEquals();\n                break;\n            case '<':\n                Less();\n                break;\n            case '>':\n                Greater();\n                break;\n        }\n    }\n    EmitLn(\"test %eax, %eax\");\n}\n\nvoid NotFactor()\n{\n    if (Look == '!') {\n        Match('!');\n        BoolFactor();\n        EmitLn(\"xor $-1, %eax\");\n    } else {\n        BoolFactor();\n    }\n}\n\nvoid BoolTerm()\n{\n    NotFactor();\n    while(Look == '&') {\n        EmitLn(\"pushl %eax\");\n        Match('&');\n        NotFactor();\n        EmitLn(\"and (%esp), %eax\");\n        EmitLn(\"addl $4, %esp\");\n    }\n}\n\nvoid BoolExpression()\n{\n    BoolTerm();\n    while (IsOrop(Look)) {\n        EmitLn(\"pushl %eax\");\n        switch (Look) {\n            case '|':\n                BoolOr();\n                break;\n            case '~':\n                BoolXor();\n                break;\n            default:\n                break;\n        }\n    }\n}\n\nvoid BoolOr()\n{\n    Match('|');\n    BoolTerm();\n    EmitLn(\"or (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");    /* recover the stack */\n}\n\nvoid BoolXor()\n{\n    Match('~');\n    BoolTerm();\n    EmitLn(\"xor (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");    /* recover the stack */\n}\n\nvoid Equals()\n{\n    Match('=');\n    Expression();\n    EmitLn(\"cmp (%esp), %eax\");\n    /* Note that 80386 has setcc corresponds to 86000's SETCC\n     * However, it only takes 8-bit registers */\n    EmitLn(\"sete %al\");\n    EmitLn(\"addl $4, %esp\");     /* recover the stack */\n}\n\nvoid NotEquals()\n{\n    Match('#');\n    Expression();\n    EmitLn(\"cmp (%esp), %eax\");\n    EmitLn(\"setne %al\");\n    EmitLn(\"addl $4, %esp\");     /* recover the stack */\n}\n\nvoid Less()\n{\n    Match('<');\n    Expression();\n    EmitLn(\"cmp %eax, (%esp)\");\n    EmitLn(\"setl %al\");\n    EmitLn(\"addl $4, %esp\");     /* recover the stack */\n}\n\nvoid Greater()\n{\n    Match('>');\n    Expression();\n    EmitLn(\"cmp %eax, (%esp)\");\n    EmitLn(\"setg %al\");\n    EmitLn(\"addl $4, %esp\");     /* recover the stack */\n}\n\nvoid Ident()\n{\n    GetName();\n    if (Look == '(') {\n        Match('(');\n        Match(')');\n        sprintf(tmp, \"call %s\", Value);\n        EmitLn(tmp);\n    } else {\n        sprintf(tmp, \"movl %s, %%eax\", Value);\n        EmitLn(tmp);\n    }\n}\n\nvoid Factor()\n{\n    if (Look == '(') {\n        Match('(');\n        Expression();\n        Match(')');\n    } else if (IsAlpha(Look)) {\n        Ident();\n    } else {\n        GetNum();\n        sprintf(tmp, \"movl $%s, %%eax\", Value);\n        EmitLn(tmp);\n    }\n}\n\nvoid SignedFactor()\n{\n    bool negative = Look == '-';\n\n    if (Look == '+') {\n        GetChar();\n        SkipWhite();\n    } \n\n    Factor();\n\n    if (negative) {\n        EmitLn(\"neg %eax\");\n    }\n}\n\nvoid Multiply()\n{\n    Match('*');\n    Factor();\n    EmitLn(\"imull (%esp), %eax\");\n    /* push of the stack */\n    EmitLn(\"addl $4, %esp\");\n}\n\nvoid Divide()\n{\n    Match('/');\n    Factor();\n\n    /* for a expersion like a/b we have eax=b and %(esp)=a\n     * but we need eax=a, and b on the stack\n     */\n    EmitLn(\"movl (%esp), %edx\");\n    EmitLn(\"addl $4, %esp\");\n\n    EmitLn(\"pushl %eax\");\n\n    EmitLn(\"movl %edx, %eax\");\n\n    /* sign extesnion */\n    EmitLn(\"sarl $31, %edx\");\n    EmitLn(\"idivl (%esp)\");\n    EmitLn(\"addl $4, %esp\");\n\n}\n\nvoid Term1()\n{\n    while (strchr(\"*/\", Look)) {\n        EmitLn(\"pushl %eax\");\n        switch(Look)\n        {\n            case '*':\n                Multiply();\n                break;\n            case '/':\n                Divide();\n                break;\n            default:\n                Expected(\"Mulop\");\n        }\n    }\n}\n\nvoid Term()\n{\n    Factor();\n    Term1();\n}\n\nvoid FirstTerm()\n{\n    SignedFactor();\n    Term1();\n}\n\nvoid Add()\n{\n    Match('+');\n    Term();\n    EmitLn(\"addl (%esp), %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\n\nvoid Subtract()\n{\n    Match('-');\n    Term();\n    EmitLn(\"subl (%esp), %eax\");\n    EmitLn(\"negl %eax\");\n    EmitLn(\"addl $4, %esp\");\n}\n\nvoid Expression()\n{\n    FirstTerm();\n    while(IsAddop(Look)) {\n        EmitLn(\"pushl %eax\");\n        switch (Look) {\n            case '+':\n                Add();\n                break;\n            case '-':\n                Subtract();\n                break;\n            default:\n                Expected(\"Addop\");\n        }\n    }\n}\n\n/* This version of 'condition' is dummy */\nvoid Condition()\n{\n    EmitLn(\"Condition\");\n}\n\nvoid Assignment()\n{\n    char name[MAX_BUF];\n    strcpy(name, Value);\n    Match('=');\n    Expression();\n    sprintf(tmp, \"lea %s, %%ebx\", name);\n    EmitLn(tmp);\n    EmitLn(\"movl %eax, (%ebx)\");\n}\n\nint main()\n{\n    Init();\n    DoProgram();\n    return 0;\n}\n"
  },
  {
    "path": "7/tutor7.txt",
    "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                     LET'S BUILD A COMPILER!\n\n                                By\n\n                     Jack W. Crenshaw, Ph.D.\n\n                         7 November 1988\n\n\n                    Part VII: LEXICAL SCANNING\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1988 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\nINTRODUCTION\n\nIn the last installment, I left you with a  compiler  that  would\nALMOST  work,  except  that  we  were  still  limited to  single-\ncharacter tokens.  The purpose of  this  session is to get rid of\nthat restriction, once and for all.  This means that we must deal\nwith the concept of the lexical scanner.\n\nMaybe I should mention why we  need  a lexical scanner at all ...\nafter all, we've been able to manage all right  without  one,  up\ntill now, even when we provided for multi-character tokens.\n\nThe ONLY reason, really, has to do with keywords.  It's a fact of\ncomputer life that the syntax for a keyword has the same  form as\nthat  for  any  other identifier.  We can't tell until we get the\ncomplete word whether or not it  IS  a keyword.  For example, the\nvariable IFILE and the keyword IF look just alike, until  you get\nto the third character.  In the examples to date, we  were always\nable to make  a  decision  based  upon the first character of the\ntoken, but that's  no  longer possible when keywords are present.\nWe  need to know that a given string is a keyword BEFORE we begin\nto process it.  And that's why we need a scanner.\n\nIn the last session, I also promised that  we  would  be  able to\nprovide for normal tokens  without  making  wholesale  changes to\nwhat we have  already done.  I didn't lie ... we can, as you will\nsee later.  But every time I set out to install these elements of\nthe software into  the  parser  we  have already built, I had bad\nfeelings about it.  The whole thing felt entirely too much like a\nband-aid.  I finally figured out what was causing the  problem: I\nwas installing lexical scanning software without first explaining\nto you what scanning is all about, and what the alternatives are.\nUp  till  now, I have studiously avoided  giving  you  a  lot  of\ntheory,  and  certainly  not  alternatives.    I  generally don't\nrespond well to the textbooks that give you twenty-five different\nways  to do something, but no clue as to which way best fits your\nneeds.  I've tried to avoid that pitfall by just showing  you ONE\nmethod, that WORKS.\n\nBut  this is an important area.  While  the  lexical  scanner  is\nhardly the most  exciting  part  of  a compiler, it often has the\nmost  profound  effect  on  the  general  \"look  & feel\"  of  the\nlanguage, since after all it's the  part  closest to the user.  I\nhave a particular structure in mind for the scanner  to  be  used\nwith  KISS.    It fits the look &  feel  that  I  want  for  that\nlanguage.  But it may not work at  all  for  the  language YOU'RE\ncooking  up,  so  in this one case I feel that it's important for\nyou to know your options.\n\nSo I'm going to depart, again, from my  usual  format.    In this\nsession we'll be getting  much  deeper  than usual into the basic\ntheory of languages and  grammars.    I'll  also be talking about\nareas OTHER than compilers in  which  lexical  scanning  plays an\nimportant role.  Finally, I will show you  some  alternatives for\nthe structure of the lexical scanner.  Then, and only  then, will\nwe get back to our parser  from  the last installment.  Bear with\nme ... I think you'll find it's worth the wait.    In fact, since\nscanners have many applications  outside  of  compilers,  you may\nwell find this to be the most useful session for you.\n\n\nLEXICAL SCANNING\n\nLexical scanning is the process of scanning the  stream  of input\ncharacters and separating it  into  strings  called tokens.  Most\ncompiler  texts  start  here,  and  devote  several  chapters  to\ndiscussing various ways to build scanners.  This approach has its\nplace, but as you have already  seen,  there  is a lot you can do\nwithout ever even addressing the issue, and in  fact  the scanner\nwe'll  end  up with here won't look  much  like  what  the  texts\ndescribe.  The reason?    Compiler  theory and, consequently, the\nprograms resulting from it, must  deal with the most general kind\nof parsing rules.  We don't.  In the real  world,  it is possible\nto specify the language syntax in such a way that a pretty simple\nscanner will suffice.  And as always, KISS is our motto.\n\nTypically, lexical scanning is  done  in  a  separate part of the\ncompiler, so that the parser per  se  sees only a stream of input\ntokens.  Now, theoretically it  is not necessary to separate this\nfunction from the rest of the parser.  There is  only  one set of\nsyntax equations that define the  whole language, so in theory we\ncould write the whole parser in one module.\n\nWhy  the  separation?      The  answer  has  both  practical  and\ntheoretical bases.\n\nIn  1956,  Noam  Chomsky  defined  the  \"Chomsky   Hierarchy\"  of\ngrammars.  They are:\n\n     o Type 0:  Unrestricted (e.g., English)\n\n     o Type 1:  Context-Sensitive\n\n     o Type 2:  Context-Free\n\n     o Type 3:  Regular\n\nA few features of the typical programming  language (particularly\nthe older ones, such as FORTRAN) are Type  1,  but  for  the most\npart  all  modern  languages can be described using only the last\ntwo types, and those are all we'll be dealing with here.\n\nThe  neat  part about these two types  is  that  there  are  very\nspecific ways to parse them.  It has been shown that  any regular\ngrammar can be parsed using a particular form of abstract machine\ncalled the state machine (finite  automaton).    We  have already\nimplemented state machines in some of our recognizers.\n\nSimilarly, Type 2 (context-free) grammars  can  always  be parsed\nusing  a  push-down  automaton (a state machine  augmented  by  a\nstack).  We have  also  implemented  these  machines.  Instead of\nimplementing  a literal stack, we have  relied  on  the  built-in\nstack associated with recursive coding to do the job, and that in\nfact is the preferred approach for top-down parsing.\n\nNow, it happens that in  real, practical grammars, the parts that\nqualify as  regular expressions tend to be the lower-level parts,\nsuch as the definition of an identifier:\n\n     <ident> ::= <letter> [ <letter> | <digit> ]*\n\nSince it takes a different kind of abstract machine to  parse the\ntwo  types  of  grammars, it makes sense to separate these lower-\nlevel functions into  a  separate  module,  the  lexical scanner,\nwhich is built around the idea of a state machine. The idea is to\nuse the simplest parsing technique needed for the job.\n\nThere is another, more practical  reason  for  separating scanner\nfrom  parser.   We like to think of the input source  file  as  a\nstream  of characters, which we process  right  to  left  without\nbacktracking.  In practice that  isn't  possible.    Almost every\nlanguage has certain keywords such as  IF,  WHILE, and END.  As I\nmentioned  earlier,    we  can't  really  know  whether  a  given\ncharacter string is a keyword, until we've reached the end of it,\nas defined by a space or other delimiter.  So  in  that sense, we\nMUST  save  the  string long enough to find out whether we have a\nkeyword or not.  That's a limited form of backtracking.\n\nSo the structure of a conventional compiler involves splitting up\nthe functions of  the  lower-level and higher-level parsing.  The\nlexical  scanner  deals  with  things  at  the  character  level,\ncollecting characters into strings, etc., and passing  them along\nto the parser proper as indivisible tokens.  It's also considered\nnormal to let the scanner have the job of identifying keywords.\n\n\nSTATE MACHINES AND ALTERNATIVES\n\nI  mentioned  that  the regular expressions can be parsed using a\nstate machine.   In  most  compiler  texts,  and  indeed  in most\ncompilers as well, you will find this taken literally.   There is\ntypically  a  real  implementation  of  the  state  machine, with\nintegers used to define the current state, and a table of actions\nto  take   for  each  combination  of  current  state  and  input\ncharacter.  If you  write  a compiler front end using the popular\nUnix tools LEX and YACC, that's  what  you'll get.  The output of\nLEX is a state machine implemented in C, plus a table  of actions\ncorresponding to the input grammar given to LEX.  The YACC output\nis  similar  ...  a canned table-driven parser,  plus  the  table\ncorresponding to the language syntax.\n\nThat  is  not  the  only  choice,  though.     In   our  previous\ninstallments, you have seen over and over that it is  possible to\nimplement  parsers  without  dealing  specifically  with  tables,\nstacks, or state variables.    In fact, in Installment V I warned\nyou that if you  find  yourself needing these things you might be\ndoing something wrong, and not taking advantage of  the  power of\nPascal.  There are basically two ways to define a state machine's\nstate: explicitly, with  a  state number or code, and implicitly,\nsimply by virtue of the fact that I'm at a  certain  place in the\ncode  (if  it's  Tuesday,  this  must be Belgium).  We've  relied\nheavily on the implicit approaches  before,  and  I  think you'll\nfind that they work well here, too.\n\nIn practice, it may not even be necessary to HAVE  a well-defined\nlexical scanner.  This isn't our first experience at dealing with\nmulti-character tokens.   In  Installment  III,  we  extended our\nparser to provide  for  them,  and  we didn't even NEED a lexical\nscanner.    That  was  because  in that narrow context, we  could\nalways tell, just  by  looking at the single lookahead character,\nwhether  we  were  dealing  with  a  number,  a variable,  or  an\noperator.  In effect, we  built  a  distributed  lexical scanner,\nusing procedures GetName and GetNum.\n\nWith keywords present,  we  can't know anymore what we're dealing\nwith, until the entire token is  read.    This leads us to a more\nlocalized  scanner; although,  as you will see,  the  idea  of  a\ndistributed scanner still has its merits.\n\n\nSOME EXPERIMENTS IN SCANNING\n\nBefore  getting  back  to our compiler,  it  will  be  useful  to\nexperiment a bit with the general concepts.\n\nLet's  begin with the two definitions most  often  seen  in  real\nprogramming languages:\n\n     <ident> ::= <letter> [ <letter> | <digit> ]*\n     <number ::= [<digit>]+\n\n(Remember, the '*' indicates zero or more occurences of the terms\nin brackets, and the '+', one or more.)\n\nWe  have already dealt with similar  items  in  Installment  III.\nLet's begin (as usual) with a bare cradle.  Not  surprisingly, we\nare going to need a new recognizer:\n                              \n\n{--------------------------------------------------------------}\n{ Recognize an Alphanumeric Character }\n\nfunction IsAlNum(c: char): boolean;\nbegin\n   IsAlNum := IsAlpha(c) or IsDigit(c);\nend;\n{--------------------------------------------------------------}\n\n\nUsing this let's write the following two routines, which are very\nsimilar to those we've used before:\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nfunction GetName: string;\nvar x: string[8];\nbegin\n   x := '';\n   if not IsAlpha(Look) then Expected('Name');\n   while IsAlNum(Look) do begin\n     x := x + UpCase(Look);\n     GetChar;\n   end;\n   GetName := x;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nfunction GetNum: string;\nvar x: string[16];\nbegin\n   x := '';\n   if not IsDigit(Look) then Expected('Integer');\n   while IsDigit(Look) do begin\n     x := x + Look;\n     GetChar;\n   end;\n   GetNum := x;\nend;\n{--------------------------------------------------------------}\n\n\n(Notice  that this version of GetNum returns  a  string,  not  an\ninteger as before.)\n\nYou  can  easily  verify that these routines work by calling them\nfrom the main program, as in\n\n     WriteLn(GetName);\n\nThis  program  will  print any legal name typed in (maximum eight\ncharacters, since that's what we told GetName).   It  will reject\nanything else.\n\nTest the other routine similarly.\n\n\nWHITE SPACE\n\nWe  also  have  dealt with embedded white space before, using the\ntwo  routines  IsWhite  and  SkipWhite.    Make  sure that  these\nroutines are in your  current  version of the cradle, and add the\nthe line\n\n     SkipWhite;\n\nat the end of both GetName and GetNum.\n\nNow, let's define the new procedure:\n\n\n{--------------------------------------------------------------}\n{ Lexical Scanner }\n\nFunction Scan: string;\nbegin\n   if IsAlpha(Look) then\n      Scan := GetName\n   else if IsDigit(Look) then\n      Scan := GetNum\n   else begin\n      Scan := Look;\n      GetChar;\n   end;\n   SkipWhite;\nend;\n{--------------------------------------------------------------}\n\n\nWe can call this from the new main program:\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\n\nbegin\n   Init;\n   repeat\n      Token := Scan;\n      writeln(Token);\n   until Token = CR;\nend.\n{--------------------------------------------------------------}\n\n\n(You will have to add the declaration of the string Token  at the\nbeginning of the program.  Make it any convenient length,  say 16\ncharacters.)\n\nNow,  run the program.  Note how the  input  string  is,  indeed,\nseparated into distinct tokens.\n\n\nSTATE MACHINES\n\nFor  the  record,  a  parse  routine  like  GetName  does  indeed\nimplement a state machine.  The state is implicit in  the current\nposition in the code.  A very useful trick for visualizing what's\ngoing on is  the  syntax  diagram,  or  \"railroad-track\" diagram.\nIt's a little difficult to draw  one  in this medium, so I'll use\nthem very sparingly, but  the  figure  below  should give you the\nidea:\n\n\n           |-----> Other---------------------------> Error\n           |\n   Start -------> Letter ---------------> Other -----> Finish\n           ^                        V\n           |                        |\n           |<----- Letter <---------|\n           |                        |\n           |<----- Digit  <----------\n\n\nAs  you  can  see,  this  diagram  shows  how  the logic flows as\ncharacters  are  read.    Things  begin, of course, in the  start\nstate, and end when  a  character  other  than an alphanumeric is\nfound.  If  the  first  character  is not alpha, an error occurs.\nOtherwise the machine will continue looping until the terminating\ndelimiter is found.\n\nNote  that at any point in the flow,  our  position  is  entirely\ndependent on the past  history  of the input characters.  At that\npoint, the action to be taken depends only on the  current state,\nplus the current input character.  That's what make this  a state\nmachine.\n\nBecause of the difficulty of drawing  railroad-track  diagrams in\nthis medium, I'll continue to  stick to syntax equations from now\non.  But I highly recommend the diagrams to you for  anything you\ndo that involves parsing.  After a little practice you  can begin\nto  see  how  to  write  a  parser  directly from  the  diagrams.\nParallel paths get coded into guarded actions (guarded by IF's or\nCASE statements),  serial  paths  into  sequential  calls.   It's\nalmost like working from a schematic.\n\nWe didn't even discuss SkipWhite, which  was  introduced earlier,\nbut it also is a simple state machine, as is GetNum.  So is their\nparent procedure, Scan.  Little machines make big machines.\n\nThe neat thing that I'd like  you  to note is how painlessly this\nimplicit approach creates these  state  machines.    I personally\nprefer it a lot over the table-driven approach.  It  also results\nis a small, tight, and fast scanner.\n\n\nNEWLINES\n\nMoving right along, let's modify  our scanner to handle more than\none line.  As I mentioned last time, the most straightforward way\nto  do  this  is to simply treat the newline characters, carriage\nreturn  and line feed, as white space.  This is, in fact, the way\nthe  C  standard  library  routine,  iswhite, works.   We  didn't\nactually try this  before.  I'd like to do it now, so you can get\na feel for the results.\n\nTo do this, simply modify the single executable  line  of IsWhite\nto read:\n\n\n   IsWhite := c in [' ', TAB, CR, LF];\n\n\nWe need to give the main  program  a new stop condition, since it\nwill never see a CR.  Let's just use:\n\n\n   until Token = '.';\n\n\nOK, compile this  program  and  run  it.   Try a couple of lines,\nterminated by the period.  I used:\n\n\n     now is the time\n     for all good men.\n\nHey,  what  happened?   When I tried it, I didn't  get  the  last\ntoken, the period.  The program didn't halt.  What's more, when I\npressed the  'enter'  key  a  few  times,  I still didn't get the\nperiod.\n\nIf you're still stuck in your program, you'll find that  typing a\nperiod on a new line will terminate it.\n\nWhat's going on here?  The answer is  that  we're  hanging  up in\nSkipWhite.  A quick look at  that  routine will show that as long\nas we're typing null lines, we're going to just continue to loop.\nAfter SkipWhite encounters an LF,  it tries to execute a GetChar.\nBut since the input buffer is now empty, GetChar's read statement\ninsists  on  having  another  line.    Procedure  Scan  gets  the\nterminating period, all right,  but  it  calls SkipWhite to clean\nup, and SkipWhite won't return until it gets a non-null line.\n\nThis kind of behavior is not quite as bad as it seems.  In a real\ncompiler,  we'd  be  reading  from  an input file instead of  the\nconsole, and as long  as  we have some procedure for dealing with\nend-of-files, everything will come out  OK.  But for reading data\nfrom the console, the behavior is just too bizarre.  The  fact of\nthe matter is that the C/Unix convention is  just  not compatible\nwith the structure of  our  parser,  which  calls for a lookahead\ncharacter.    The  code that the Bell  wizards  have  implemented\ndoesn't use that convention, which is why they need 'ungetc'.\n\nOK, let's fix the problem.  To do that, we need to go back to the\nold definition of IsWhite (delete the CR and  LF  characters) and\nmake  use  of  the procedure Fin that I introduced last time.  If\nit's not in your current version of the cradle, put it there now.\n\nAlso, modify the main program to read:\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\n\nbegin\n   Init;\n   repeat\n      Token := Scan;\n      writeln(Token);\n      if Token = CR then Fin;\n   until Token = '.';\nend.\n{--------------------------------------------------------------}\n\n\nNote the \"guard\"  test  preceding  the  call to Fin.  That's what\nmakes the whole thing work, and ensures that we don't try to read\na line ahead.\n                             \nTry the code now. I think you'll like it better.\n\nIf you refer to the code  we  did in the last installment, you'll\nfind that I quietly sprinkled calls to Fin  throughout  the code,\nwherever  a line break was appropriate.  This  is  one  of  those\nareas that really affects the look  &  feel that I mentioned.  At\nthis  point  I  would  urge  you  to  experiment  with  different\narrangements  and  see  how  you  like  them.    If you want your\nlanguage  to  be  truly  free-field,  then  newlines   should  be\ntransparent.   In  this  case,  the  best  approach is to put the\nfollowing lines at the BEGINNING of Scan:\n\n\n          while Look = CR do\n             Fin;\n\n\nIf, on the other  hand,  you  want  a line-oriented language like\nAssembler, BASIC, or FORTRAN  (or  even  Ada...  note that it has\ncomments terminated by newlines),  then  you'll  need for Scan to\nreturn CR's as tokens.  It  must  also  eat the trailing LF.  The\nbest way to do that is to use this line,  again  at the beginning\nof Scan:\n\n          if Look = LF then Fin;\n\n\nFor other conventions, you'll  have  to  use  other arrangements.\nIn my example  of  the  last  session, I allowed newlines only at\nspecific places, so I was somewhere in the middle ground.  In the\nrest of these sessions, I'll be picking ways  to  handle newlines\nthat I happen to like, but I want you to know how to choose other\nways for yourselves.\n\n\nOPERATORS\n\nWe  could  stop now and have a  pretty  useful  scanner  for  our\npurposes.  In the fragments of KISS that we've built so  far, the\nonly tokens that have multiple characters are the identifiers and\nnumbers.    All  operators  were  single  characters.   The  only\nexception I can think of is the relops <=, >=,  and  <>, but they\ncould be dealt with as special cases.\n\nStill, other languages have  multi-character  operators,  such as\nthe ':=' of  Pascal or the '++' and '>>' of C.  So  while  we may\nnot need multi-character operators, it's  nice to know how to get\nthem if necessary.\n\nNeedless to say, we  can  handle operators very much the same way\nas the other tokens.  Let's start with a recognizer:\n                             \n\n{--------------------------------------------------------------}\n{ Recognize Any Operator }\n\nfunction IsOp(c: char): boolean;\nbegin\n   IsOp := c in ['+', '-', '*', '/', '<', '>', ':', '='];\nend;\n{--------------------------------------------------------------}\n\n\nIt's important to  note  that  we  DON'T  have  to  include every\npossible  operator in this list.   For  example,  the  paretheses\naren't  included, nor is the terminating  period.    The  current\nversion of Scan handles single-character operators  just  fine as\nit is.  The list above includes only those  characters  that  can\nappear in multi-character operators.  (For specific languages, of\ncourse, the list can always be edited.)\n\nNow, let's modify Scan to read:\n\n\n{--------------------------------------------------------------}\n{ Lexical Scanner }\n\nFunction Scan: string;\nbegin\n   while Look = CR do\n      Fin;\n   if IsAlpha(Look) then\n      Scan := GetName\n   else if IsDigit(Look) then\n      Scan := GetNum\n   else if IsOp(Look) then\n      Scan := GetOp\n   else begin\n      Scan := Look;\n      GetChar;\n   end;\n   SkipWhite;\nend;\n{--------------------------------------------------------------}\n\n\nTry the program now.  You  will  find that any code fragments you\ncare  to throw at it will be neatly  broken  up  into  individual\ntokens.\n\n\nLISTS, COMMAS AND COMMAND LINES\n\nBefore getting back to the main thrust of our study, I'd  like to\nget on my soapbox for a moment.\n                             \nHow many times have you worked with a program or operating system\nthat had rigid rules about how you must separate items in a list?\n(Try,  the  last  time  you  used MSDOS!)  Some programs  require\nspaces as delimiters, and  some  require  commas.   Worst of all,\nsome  require  both,  in  different  places.    Most  are  pretty\nunforgiving about violations of their rules.\n\nI think this is inexcusable.  It's too  easy  to  write  a parser\nthat will handle  both  spaces  and  commas  in  a  flexible way.\nConsider the following procedure:\n\n\n{--------------------------------------------------------------}\n{ Skip Over a Comma }\n\nprocedure SkipComma;\nbegin\n   SkipWhite;\n   if Look = ',' then begin\n      GetChar;\n      SkipWhite;\n   end;\nend;\n{--------------------------------------------------------------}\n\n\nThis eight-line procedure will skip over  a  delimiter consisting\nof any number (including zero)  of spaces, with zero or one comma\nembedded in the string.\n\nTEMPORARILY, change the call to SkipWhite in Scan to  a  call  to\nSkipComma,  and  try  inputting some lists.   Works  nicely,  eh?\nDon't you wish more software authors knew about SkipComma?\n\nFor the record, I found that adding the  equivalent  of SkipComma\nto my Z80 assembler-language programs took all of  6  (six) extra\nbytes of  code.    Even  in a 64K machine, that's not a very high\nprice to pay for user-friendliness!\n\nI  think  you can see where I'm going here.  Even  if  you  never\nwrite a line of a compiler code in your life, there are places in\nevery program where  you  can  use  the concepts of parsing.  Any\nprogram that processes a command line needs them.   In  fact,  if\nyou  think  about  it for a bit, you'll have to conclude that any\ntime  you  write  a program that processes  user  inputs,  you're\ndefining a  language.  People communicate with languages, and the\nsyntax implicit in your program  defines that language.  The real\nquestion  is:  are  you  going  to  define  it  deliberately  and\nexplicitly, or just let it turn out to be  whatever  the  program\nends up parsing?\n\nI claim that you'll have  a better, more user-friendly program if\nyou'll take the time to define the syntax explicitly.  Write down\nthe syntax equations or  draw  the  railroad-track  diagrams, and\ncode the parser using the techniques I've shown you here.  You'll\nend  up with a better program, and it will be easier to write, to\nboot.\n\n\nGETTING FANCY\n\nOK, at this point we have a pretty nice lexical scanner that will\nbreak  an  input stream up into tokens.  We could use  it  as  it\nstands and have a servicable compiler.  But there are  some other\naspects of lexical scanning that we need to cover.\n\nThe main consideration is <shudder> efficiency.  Remember when we\nwere dealing  with  single-character  tokens,  every  test  was a\ncomparison of a single character, Look, with a byte constant.  We\nalso used the Case statement heavily.\n\nWith the multi-character tokens being returned by Scan, all those\ntests now become string comparisons.  Much slower.  And  not only\nslower, but more awkward, since  there is no string equivalent of\nthe  Case  statement  in Pascal.  It seems especially wasteful to\ntest for what used to be single characters ... the '=',  '+', and\nother operators ... using string comparisons.\n\nUsing string comparison is not  impossible ... Ron Cain used just\nthat approach in writing Small C.  Since we're  sticking  to  the\nKISS principle here, we would  be truly justified in settling for\nthis  approach.    But then I would have failed to tell you about\none of the key approaches used in \"real\" compilers.\n\nYou have to remember: the lexical scanner is going to be called a\n_LOT_!   Once for every token in the  whole  source  program,  in\nfact.   Experiments  have  indicated  that  the  average compiler\nspends  anywhere  from 20% to 40% of  its  time  in  the  scanner\nroutines.  If there were ever a place  where  efficiency deserves\nreal consideration, this is it.\n\nFor this reason, most compiler writers ask the lexical scanner to\ndo  a  little  more work, by \"tokenizing\"  the input stream.  The\nidea  is  to  match every token  against  a  list  of  acceptable\nkeywords  and operators, and return unique  codes  for  each  one\nrecognized.  In the case of ordinary variable  names  or numbers,\nwe  just return a code that says what kind of token they are, and\nsave the actual string somewhere else.\n\nOne  of the first things we're going to need is a way to identify\nkeywords.  We can always do  it  with successive IF tests, but it\nsurely would be nice  if  we  had  a general-purpose routine that\ncould compare a given string with  a  table of keywords.  (By the\nway, we're also going  to  need such a routine later, for dealing\nwith symbol tables.)  This  usually presents a problem in Pascal,\nbecause standard Pascal  doesn't  allow  for  arrays  of variable\nlengths.   It's  a  real  bother  to  have to declare a different\nsearch routine for every table.    Standard  Pascal  also doesn't\nallow for initializing arrays, so you tend to see code like\n\n     Table[1] := 'IF';\n     Table[2] := 'ELSE';\n     .\n     .\n     Table[n] := 'END';\n\nwhich can get pretty old if there are many keywords.\n\nFortunately, Turbo Pascal 4.0 has extensions that  eliminate both\nof  these  problems.   Constant arrays can be declared using TP's\n\"typed constant\" facility, and  the  variable  dimensions  can be\nhandled with its C-like extensions for pointers.\n\nFirst, modify your declarations like this:\n\n\n{--------------------------------------------------------------}\n{ Type Declarations  }\n\ntype Symbol = string[8];\n\n     SymTab = array[1..1000] of Symbol;\n\n     TabPtr = ^SymTab;\n{--------------------------------------------------------------}\n\n\n(The dimension  used  in  SymTab  is  not  real ... no storage is\nallocated by the declaration itself,  and the number need only be\n\"big enough.\")\n\nNow, just beneath those declarations, add the following:\n\n\n{--------------------------------------------------------------}\n{ Definition of Keywords and Token Types }\n\nconst KWlist: array [1..4] of Symbol =\n              ('IF', 'ELSE', 'ENDIF', 'END');\n\n{--------------------------------------------------------------}\n\n\nNext, insert the following new function:\n\n\n{--------------------------------------------------------------}\n{ Table Lookup }\n\n{ If the input string matches a table entry, return the entry\n  index.  If not, return a zero.  }\n                             \nfunction Lookup(T: TabPtr; s: string; n: integer): integer;\nvar i: integer;\n    found: boolean;\nbegin\n   found := false;\n   i := n;\n   while (i > 0) and not found do\n      if s = T^[i] then\n         found := true\n      else\n         dec(i);\n   Lookup := i;\nend;\n{--------------------------------------------------------------}\n\n\nTo test it,  you  can  temporarily  change  the  main  program as\nfollows:\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\n\nbegin\n   ReadLn(Token);\n   WriteLn(Lookup(Addr(KWList), Token, 4));\nend.\n{--------------------------------------------------------------}\n\n\nNotice how Lookup is called: The Addr function sets up  a pointer\nto KWList, which gets passed to Lookup.\n\nOK, give this  a  try.    Since we're bypassing Scan here, you'll\nhave to type the keywords in upper case to get any matches.\n\nNow that we can recognize keywords, the next thing is  to arrange\nto return codes for them.\n\nSo what kind of code should we return?  There are really only two\nreasonable choices.  This seems like an ideal application for the\nPascal enumerated type.   For  example,  you can define something\nlike\n\n     SymType = (IfSym, ElseSym, EndifSym, EndSym, Ident, Number,\n                    Operator);\n\nand arrange to return a variable of this type.   Let's  give it a\ntry.  Insert the line above into your type definitions.\n\nNow, add the two variable declarations:\n                             \n\n    Token: Symtype;          { Current Token  }\n    Value: String[16];       { String Token of Look }\n\n\nModify the scanner to read:\n\n\n{--------------------------------------------------------------}\n{ Lexical Scanner }\n\nprocedure Scan;\nvar k: integer;\nbegin\n   while Look = CR do\n      Fin;\n   if IsAlpha(Look) then begin\n      Value := GetName;\n      k := Lookup(Addr(KWlist), Value, 4);\n      if k = 0 then\n         Token := Ident\n      else\n         Token := SymType(k - 1);\n      end\n   else if IsDigit(Look) then begin\n      Value := GetNum;\n      Token := Number;\n      end\n   else if IsOp(Look) then begin\n      Value := GetOp;\n      Token := Operator;\n      end\n   else begin\n      Value := Look;\n      Token := Operator;\n      GetChar;\n   end;\n   SkipWhite;\nend;\n{--------------------------------------------------------------}\n\n\n(Notice that Scan is now a procedure, not a function.)\n\n\nFinally, modify the main program to read:\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\n   repeat\n      Scan;\n      case Token of\n        Ident: write('Ident ');\n        Number: Write('Number ');\n        Operator: Write('Operator ');\n        IfSym, ElseSym, EndifSym, EndSym: Write('Keyword ');\n      end;\n      Writeln(Value);\n   until Token = EndSym;\nend.\n{--------------------------------------------------------------}\n\n\nWhat we've done here is to replace the string Token  used earlier\nwith an enumerated type. Scan returns the type in variable Token,\nand returns the string itself in the new variable Value.\n\nOK, compile this and give it a whirl.  If everything  goes right,\nyou should see that we are now recognizing keywords.\n\nWhat  we  have  now is working right, and it was easy to generate\nfrom what  we  had  earlier.    However,  it still seems a little\n\"busy\" to me.  We can  simplify  things a bit by letting GetName,\nGetNum, GetOp, and Scan be  procedures  working  with  the global\nvariables Token and Value, thereby eliminating the  local copies.\nIt  also seems a little cleaner to move  the  table  lookup  into\nGetName.  The new form for the four procedures is, then:\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nprocedure GetName;\nvar k: integer;\nbegin\n   Value := '';\n   if not IsAlpha(Look) then Expected('Name');\n   while IsAlNum(Look) do begin\n     Value := Value + UpCase(Look);\n     GetChar;\n   end;\n   k := Lookup(Addr(KWlist), Value, 4);\n   if k = 0 then\n      Token := Ident\n   else\n      Token := SymType(k-1);\nend;\n                             \n{--------------------------------------------------------------}\n{ Get a Number }\n\nprocedure GetNum;\nbegin\n   Value := '';\n   if not IsDigit(Look) then Expected('Integer');\n   while IsDigit(Look) do begin\n     Value := Value + Look;\n     GetChar;\n   end;\n   Token := Number;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get an Operator }\n\nprocedure GetOp;\nbegin\n   Value := '';\n   if not IsOp(Look) then Expected('Operator');\n   while IsOp(Look) do begin\n     Value := Value + Look;\n     GetChar;\n   end;\n   Token := Operator;\nend;\n\n\n{--------------------------------------------------------------}\n{ Lexical Scanner }\n\nprocedure Scan;\nvar k: integer;\nbegin\n   while Look = CR do\n      Fin;\n   if IsAlpha(Look) then\n      GetName\n   else if IsDigit(Look) then\n      GetNum\n   else if IsOp(Look) then\n      GetOp\n   else begin\n      Value := Look;\n      Token := Operator;\n      GetChar;\n   end;\n   SkipWhite;\nend;\n{--------------------------------------------------------------}\n                             \n\nRETURNING A CHARACTER\n\nEssentially  every scanner I've ever seen  that  was  written  in\nPascal  used  the  mechanism of an enumerated type that I've just\ndescribed.  It is certainly  a workable mechanism, but it doesn't\nseem the simplest approach to me.\n\nFor one thing, the  list  of possible symbol types can get pretty\nlong. Here, I've used just one symbol, \"Operator,\"  to  stand for\nall of the operators, but I've seen other  designs  that actually\nreturn different codes for each one.\n\nThere is, of course, another simple type that can be  returned as\na  code: the character.  Instead  of  returning  the  enumeration\nvalue 'Operator' for a '+' sign, what's wrong with just returning\nthe character itself?  A character is just as good a variable for\nencoding the different  token  types,  it  can  be  used  in case\nstatements  easily, and it's sure a lot easier  to  type.    What\ncould be simpler?\n\nBesides, we've already  had  experience with the idea of encoding\nkeywords as single characters.  Our previous programs are already\nwritten  that  way,  so  using  this approach will  minimize  the\nchanges to what we've already done.\n\nSome of you may feel that this idea of returning  character codes\nis too mickey-mouse.  I must  admit  it gets a little awkward for\nmulti-character operators like '<='.   If you choose to stay with\nthe  enumerated  type,  fine.  For the rest, I'd like to show you\nhow to change what we've done above to support that approach.\n\nFirst, you can delete the SymType declaration now ... we won't be\nneeding that.  And you can change the type of Token to char.\n\nNext, to replace SymType, add the following constant string:\n\n\n   const KWcode: string[5] = 'xilee';\n\n\n(I'll be encoding all idents with the single character 'x'.)\n\n\nLastly, modify Scan and its relatives as follows:\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nprocedure GetName;\nbegin\n   Value := '';\n   if not IsAlpha(Look) then Expected('Name');\n   while IsAlNum(Look) do begin\n     Value := Value + UpCase(Look);\n     GetChar;\n   end;\n   Token := KWcode[Lookup(Addr(KWlist), Value, 4) + 1];\nend;\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nprocedure GetNum;\nbegin\n   Value := '';\n   if not IsDigit(Look) then Expected('Integer');\n   while IsDigit(Look) do begin\n     Value := Value + Look;\n     GetChar;\n   end;\n   Token := '#';\nend;\n\n\n{--------------------------------------------------------------}\n{ Get an Operator }\n\nprocedure GetOp;\nbegin\n   Value := '';\n   if not IsOp(Look) then Expected('Operator');\n   while IsOp(Look) do begin\n     Value := Value + Look;\n     GetChar;\n   end;\n   if Length(Value) = 1 then\n      Token := Value[1]\n   else\n      Token := '?';\nend;\n\n\n{--------------------------------------------------------------}\n{ Lexical Scanner }\n\nprocedure Scan;\nvar k: integer;\nbegin\n   while Look = CR do\n      Fin;\n   if IsAlpha(Look) then\n      GetName\n   else if IsDigit(Look) then\n      GetNum\n   else if IsOp(Look) then begin\n      GetOp\n   else begin\n      Value := Look;\n      Token := '?';\n      GetChar;\n   end;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\n\nbegin\n   Init;\n   repeat\n      Scan;\n      case Token of\n        'x': write('Ident ');\n        '#': Write('Number ');\n        'i', 'l', 'e': Write('Keyword ');\n        else Write('Operator ');\n      end;\n      Writeln(Value);\n   until Value = 'END';\nend.\n{--------------------------------------------------------------}\n\n\nThis program should  work  the  same  as the previous version.  A\nminor  difference  in  structure,  maybe,  but   it   seems  more\nstraightforward to me.\n\n\nDISTRIBUTED vs CENTRALIZED SCANNERS\n\nThe structure for the lexical scanner that I've just shown you is\nvery conventional, and  about  99% of all compilers use something\nvery  close  to it.  This is  not,  however,  the  only  possible\nstructure, or even always the best one.\n                             \nThe problem with the  conventional  approach  is that the scanner\nhas no knowledge of context.  For example,  it  can't distinguish\nbetween the assignment operator '=' and  the  relational operator\n'=' (perhaps that's why both C and Pascal  use  different strings\nfor the  two).    All  the scanner can do is to pass the operator\nalong  to  the  parser, which can hopefully tell from the context\nwhich operator is meant.    Similarly, a keyword like 'IF' has no\nplace in the middle of a  math  expression, but if one happens to\nappear there, the scanner  will  see no problem with it, and will\nreturn it to the parser, properly encoded as an 'IF'.\n\nWith this  kind  of  approach,  we  are  not really using all the\ninformation at our disposal.  In the middle of an expression, for\nexample, the parser  \"knows\"  that  there  is no need to look for\nkeywords,  but it has no way of telling the scanner that.  So the\nscanner  continues to do so.  This, of  course,  slows  down  the\ncompilation.\n\nIn real-world compilers, the  designers  often  arrange  for more\ninformation  to be passed between parser  and  scanner,  just  to\navoid  this  kind of problem.  But  that  can  get  awkward,  and\ncertainly destroys a lot of the modularity of the structure.\n\nThe  alternative  is  to seek some  way  to  use  the  contextual\ninformation that comes from knowing where we are  in  the parser.\nThis leads us  back  to  the  notion of a distributed scanner, in\nwhich various portions  of  the scanner are called depending upon\nthe context.\n\nIn KISS, as  in  most  languages,  keywords  ONLY  appear  at the\nbeginning of a statement.  In places like  expressions,  they are\nnot allowed.  Also, with one minor exception (the multi-character\nrelops)  that  is  easily  handled,  all  operators   are  single\ncharacters, which means that we don't need GetOp at all.\n\nSo it turns out  that  even  with  multi-character tokens, we can\nstill always tell from the  current  lookahead  character exactly\nwhat kind of token is coming,  except  at the very beginning of a\nstatement.\n\nEven at that point, the ONLY  kind  of  token we can accept is an\nidentifier.  We need only to determine if that  identifier  is  a\nkeyword or the target of an assignment statement.\n\nWe end up, then, still needing only GetName and GetNum, which are\nused very much as we've used them in earlier installments.\n\nIt may seem  at first to you that this is a step backwards, and a\nrather  primitive  approach.   In fact, it is an improvement over\nthe classical scanner, since we're  using  the  scanning routines\nonly where they're really needed.  In places  where  keywords are\nnot allowed, we don't slow things down by looking for them.\n\n\nMERGING SCANNER AND PARSER\n\nNow that we've covered  all  of the theory and general aspects of\nlexical scanning that we'll be needing, I'm FINALLY ready to back\nup my claim that  we  can  accomodate multi-character tokens with\nminimal change to our previous work.  To keep  things  short  and\nsimple I will restrict myself here to a subset of what we've done\nbefore; I'm allowing only one control construct (the  IF)  and no\nBoolean expressions.  That's enough to demonstrate the parsing of\nboth keywords and expressions.  The extension to the full  set of\nconstructs should be  pretty  apparent  from  what  we've already\ndone.\n\nAll  the  elements  of  the  program to parse this subset,  using\nsingle-character tokens, exist  already in our previous programs.\nI built it by judicious copying of these files,  but  I  wouldn't\ndare try to lead you through that process.  Instead, to avoid any\nconfusion, the whole program is shown below:\n\n\n{--------------------------------------------------------------}\nprogram KISS;\n\n{--------------------------------------------------------------}\n{ Constant Declarations }\n\nconst TAB = ^I;\n      CR  = ^M;\n      LF  = ^J;\n\n{--------------------------------------------------------------}\n{ Type Declarations  }\n\ntype Symbol = string[8];\n\n     SymTab = array[1..1000] of Symbol;\n\n     TabPtr = ^SymTab;\n\n\n{--------------------------------------------------------------}\n{ Variable Declarations }\n\nvar Look  : char;              { Lookahead Character }\n    Lcount: integer;           { Label Counter       }\n\n\n{--------------------------------------------------------------}\n{ Read New Character From Input Stream }\n\nprocedure GetChar;\nbegin\n   Read(Look);\nend;\n                             \n\n{--------------------------------------------------------------}\n{ Report an Error }\n\nprocedure Error(s: string);\nbegin\n   WriteLn;\n   WriteLn(^G, 'Error: ', s, '.');\nend;\n\n\n{--------------------------------------------------------------}\n{ Report Error and Halt }\n\nprocedure Abort(s: string);\nbegin\n   Error(s);\n   Halt;\nend;\n\n\n{--------------------------------------------------------------}\n{ Report What Was Expected }\n\nprocedure Expected(s: string);\nbegin\n   Abort(s + ' Expected');\nend;\n\n{--------------------------------------------------------------}\n{ Recognize an Alpha Character }\n\nfunction IsAlpha(c: char): boolean;\nbegin\n   IsAlpha := UpCase(c) in ['A'..'Z'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Decimal Digit }\n\nfunction IsDigit(c: char): boolean;\nbegin\n   IsDigit := c in ['0'..'9'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an AlphaNumeric Character }\n\nfunction IsAlNum(c: char): boolean;\nbegin\n   IsAlNum := IsAlpha(c) or IsDigit(c);\nend;\n                             \n{--------------------------------------------------------------}\n{ Recognize an Addop }\n\nfunction IsAddop(c: char): boolean;\nbegin\n   IsAddop := c in ['+', '-'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Mulop }\n\nfunction IsMulop(c: char): boolean;\nbegin\n   IsMulop := c in ['*', '/'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize White Space }\n\nfunction IsWhite(c: char): boolean;\nbegin\n   IsWhite := c in [' ', TAB];\nend;\n\n\n{--------------------------------------------------------------}\n{ Skip Over Leading White Space }\n\nprocedure SkipWhite;\nbegin\n   while IsWhite(Look) do\n      GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Match a Specific Input Character }\n\nprocedure Match(x: char);\nbegin\n   if Look <> x then Expected('''' + x + '''');\n   GetChar;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Skip a CRLF }\n\nprocedure Fin;\nbegin\n   if Look = CR then GetChar;\n   if Look = LF then GetChar;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nfunction GetName: char;\nbegin\n   while Look = CR do\n      Fin;\n   if not IsAlpha(Look) then Expected('Name');\n   Getname := UpCase(Look);\n   GetChar;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nfunction GetNum: char;\nbegin\n   if not IsDigit(Look) then Expected('Integer');\n   GetNum := Look;\n   GetChar;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Generate a Unique Label }\n\nfunction NewLabel: string;\nvar S: string;\nbegin\n   Str(LCount, S);\n   NewLabel := 'L' + S;\n   Inc(LCount);\nend;\n\n\n{--------------------------------------------------------------}\n{ Post a Label To Output }\n\nprocedure PostLabel(L: string);\nbegin\n   WriteLn(L, ':');\nend;\n\n\n{--------------------------------------------------------------}\n{ Output a String with Tab }\n                             \nprocedure Emit(s: string);\nbegin\n   Write(TAB, s);\nend;\n\n\n{--------------------------------------------------------------}\n\n{ Output a String with Tab and CRLF }\n\nprocedure EmitLn(s: string);\nbegin\n   Emit(s);\n   WriteLn;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Identifier }\n\nprocedure Ident;\nvar Name: char;\nbegin\n   Name := GetName;\n   if Look = '(' then begin\n      Match('(');\n      Match(')');\n      EmitLn('BSR ' + Name);\n      end\n   else\n      EmitLn('MOVE ' + Name + '(PC),D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Factor }\n\nprocedure Expression; Forward;\n\nprocedure Factor;\nbegin\n   if Look = '(' then begin\n      Match('(');\n      Expression;\n      Match(')');\n      end\n   else if IsAlpha(Look) then\n      Ident\n   else\n      EmitLn('MOVE #' + GetNum + ',D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate the First Math Factor }\n\n\nprocedure SignedFactor;\nvar s: boolean;\nbegin\n   s := Look = '-';\n   if IsAddop(Look) then begin\n      GetChar;\n      SkipWhite;\n   end;\n   Factor;\n   if s then\n      EmitLn('NEG D0');\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Multiply }\n\nprocedure Multiply;\nbegin\n   Match('*');\n   Factor;\n   EmitLn('MULS (SP)+,D0');\nend;\n\n\n{-------------------------------------------------------------}\n{ Recognize and Translate a Divide }\n\nprocedure Divide;\nbegin\n   Match('/');\n   Factor;\n   EmitLn('MOVE (SP)+,D1');\n   EmitLn('EXS.L D0');\n   EmitLn('DIVS D1,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Completion of Term Processing  (called by Term and FirstTerm }\n\nprocedure Term1;\nbegin\n   while IsMulop(Look) do begin\n      EmitLn('MOVE D0,-(SP)');\n      case Look of\n       '*': Multiply;\n       '/': Divide;\n      end;\n   end;\nend;\n                             \n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Term }\n\nprocedure Term;\nbegin\n   Factor;\n   Term1;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Term with Possible Leading Sign }\n\nprocedure FirstTerm;\nbegin\n   SignedFactor;\n   Term1;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate an Add }\n\nprocedure Add;\nbegin\n   Match('+');\n   Term;\n   EmitLn('ADD (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Subtract }\n\nprocedure Subtract;\nbegin\n   Match('-');\n   Term;\n   EmitLn('SUB (SP)+,D0');\n   EmitLn('NEG D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nprocedure Expression;\nbegin\n   FirstTerm;\n   while IsAddop(Look) do begin\n      EmitLn('MOVE D0,-(SP)');\n      case Look of\n       '+': Add;\n       '-': Subtract;\n      end;\n   end;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Boolean Condition }\n{ This version is a dummy }\n\nProcedure Condition;\nbegin\n   EmitLn('Condition');\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate an IF Construct }\n\nprocedure Block;\n Forward;\n\nprocedure DoIf;\nvar L1, L2: string;\nbegin\n   Match('i');\n   Condition;\n   L1 := NewLabel;\n   L2 := L1;\n   EmitLn('BEQ ' + L1);\n   Block;\n   if Look = 'l' then begin\n      Match('l');\n      L2 := NewLabel;\n      EmitLn('BRA ' + L2);\n      PostLabel(L1);\n      Block;\n   end;\n   PostLabel(L2);\n   Match('e');\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Assignment Statement }\n\nprocedure Assignment;\nvar Name: char;\nbegin\n   Name := GetName;\n   Match('=');\n   Expression;\n   EmitLn('LEA ' + Name + '(PC),A0');\n   EmitLn('MOVE D0,(A0)');\nend;\n                             \n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Statement Block }\n\nprocedure Block;\nbegin\n   while not(Look in ['e', 'l']) do begin\n      case Look of\n       'i': DoIf;\n       CR: while Look = CR do\n              Fin;\n       else Assignment;\n      end;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Program }\n\nprocedure DoProgram;\nbegin\n   Block;\n   if Look <> 'e' then Expected('END');\n   EmitLn('END')\nend;\n\n\n{--------------------------------------------------------------}\n\n{ Initialize }\n\nprocedure Init;\nbegin\n   LCount := 0;\n   GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\n   DoProgram;\nend.\n{--------------------------------------------------------------}\n\n\nA couple of comments:\n\n (1) The form for the expression parser,  using  FirstTerm, etc.,\n     is  a  little  different from what you've seen before.  It's\n     yet another variation on the same theme.  Don't let it throw\n     you ... the change is not required for what follows.\n\n (2) Note that, as usual, I had to add calls to Fin  at strategic\n     spots to allow for multiple lines.\n\nBefore we proceed to adding the scanner, first copy this file and\nverify that it does indeed  parse things correctly.  Don't forget\nthe \"codes\": 'i' for IF, 'l' for ELSE, and 'e' for END or ENDIF.\n\nIf the program works, then let's press on.  In adding the scanner\nmodules to the program, it helps  to  have a systematic plan.  In\nall  the  parsers  we've  written  to  date,  we've  stuck  to  a\nconvention that the current lookahead character should  always be\na non-blank character.  We  preload  the  lookahead  character in\nInit, and keep the \"pump primed\"  after  that.  To keep the thing\nworking right at newlines, we had to modify this a bit  and treat\nthe newline as a legal token.\n\nIn the  multi-character version, the rule is similar: The current\nlookahead character should always be left at the BEGINNING of the\nnext token, or at a newline.\n\nThe multi-character version is shown next.  To get it,  I've made\nthe following changes:\n\n\n o Added the variables Token  and Value, and the type definitions\n   needed by Lookup.\n\n o Added the definitions of KWList and KWcode.\n\n o Added Lookup.\n\n o Replaced GetName and GetNum by their multi-character versions.\n   (Note that the call  to  Lookup has been moved out of GetName,\n   so  that  it  will  not   be  executed  for  calls  within  an\n   expression.)\n\n o Created a new,  vestigial  Scan that calls GetName, then scans\n   for keywords.\n\n o Created  a  new  procedure,  MatchString,  that  looks  for  a\n   specific keyword.  Note that, unlike  Match,  MatchString does\n   NOT read the next keyword.\n\n o Modified Block to call Scan.\n\n o Changed the calls  to  Fin  a  bit.   Fin is now called within\n   GetName.\n\nHere is the program in its entirety:\n\n\n{--------------------------------------------------------------}\nprogram KISS;\n                             \n{--------------------------------------------------------------}\n{ Constant Declarations }\n\nconst TAB = ^I;\n      CR  = ^M;\n      LF  = ^J;\n\n{--------------------------------------------------------------}\n{ Type Declarations  }\n\ntype Symbol = string[8];\n\n     SymTab = array[1..1000] of Symbol;\n\n     TabPtr = ^SymTab;\n\n\n{--------------------------------------------------------------}\n{ Variable Declarations }\n\nvar Look  : char;              { Lookahead Character }\n    Token : char;              { Encoded Token       }\n    Value : string[16];        { Unencoded Token     }\n    Lcount: integer;           { Label Counter       }\n\n\n{--------------------------------------------------------------}\n{ Definition of Keywords and Token Types }\n\nconst KWlist: array [1..4] of Symbol =\n              ('IF', 'ELSE', 'ENDIF', 'END');\n\nconst KWcode: string[5] = 'xilee';\n\n\n{--------------------------------------------------------------}\n{ Read New Character From Input Stream }\n\nprocedure GetChar;\nbegin\n   Read(Look);\nend;\n\n{--------------------------------------------------------------}\n{ Report an Error }\n\nprocedure Error(s: string);\nbegin\n   WriteLn;\n   WriteLn(^G, 'Error: ', s, '.');\nend;\n\n\n{--------------------------------------------------------------}\n{ Report Error and Halt }\n\nprocedure Abort(s: string);\nbegin\n   Error(s);\n   Halt;\nend;\n\n\n{--------------------------------------------------------------}\n{ Report What Was Expected }\n\nprocedure Expected(s: string);\nbegin\n   Abort(s + ' Expected');\nend;\n\n{--------------------------------------------------------------}\n{ Recognize an Alpha Character }\n\nfunction IsAlpha(c: char): boolean;\nbegin\n   IsAlpha := UpCase(c) in ['A'..'Z'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Decimal Digit }\n\nfunction IsDigit(c: char): boolean;\nbegin\n   IsDigit := c in ['0'..'9'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an AlphaNumeric Character }\n\nfunction IsAlNum(c: char): boolean;\nbegin\n   IsAlNum := IsAlpha(c) or IsDigit(c);\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize an Addop }\n\nfunction IsAddop(c: char): boolean;\nbegin\n   IsAddop := c in ['+', '-'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize a Mulop }\n\nfunction IsMulop(c: char): boolean;\nbegin\n   IsMulop := c in ['*', '/'];\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize White Space }\n\nfunction IsWhite(c: char): boolean;\nbegin\n   IsWhite := c in [' ', TAB];\nend;\n\n\n{--------------------------------------------------------------}\n{ Skip Over Leading White Space }\n\nprocedure SkipWhite;\nbegin\n   while IsWhite(Look) do\n      GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Match a Specific Input Character }\n\nprocedure Match(x: char);\nbegin\n   if Look <> x then Expected('''' + x + '''');\n   GetChar;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Skip a CRLF }\n\nprocedure Fin;\nbegin\n   if Look = CR then GetChar;\n   if Look = LF then GetChar;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Table Lookup }\n\nfunction Lookup(T: TabPtr; s: string; n: integer): integer;\nvar i: integer;\n    found: boolean;\nbegin\n   found := false;\n   i := n;\n   while (i > 0) and not found do\n      if s = T^[i] then\n         found := true\n      else\n         dec(i);\n   Lookup := i;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier }\n\nprocedure GetName;\nbegin\n   while Look = CR do\n      Fin;\n   if not IsAlpha(Look) then Expected('Name');\n   Value := '';\n   while IsAlNum(Look) do begin\n     Value := Value + UpCase(Look);\n     GetChar;\n   end;\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get a Number }\n\nprocedure GetNum;\nbegin\n   if not IsDigit(Look) then Expected('Integer');\n   Value := '';\n   while IsDigit(Look) do begin\n     Value := Value + Look;\n     GetChar;\n   end;\n   Token := '#';\n   SkipWhite;\nend;\n\n\n{--------------------------------------------------------------}\n{ Get an Identifier and Scan it for Keywords }\n\nprocedure Scan;\nbegin\n   GetName;\n   Token := KWcode[Lookup(Addr(KWlist), Value, 4) + 1];\nend;\n                             \n\n{--------------------------------------------------------------}\n{ Match a Specific Input String }\n\nprocedure MatchString(x: string);\nbegin\n   if Value <> x then Expected('''' + x + '''');\nend;\n\n\n{--------------------------------------------------------------}\n{ Generate a Unique Label }\n\nfunction NewLabel: string;\nvar S: string;\nbegin\n   Str(LCount, S);\n   NewLabel := 'L' + S;\n   Inc(LCount);\nend;\n\n\n{--------------------------------------------------------------}\n{ Post a Label To Output }\n\nprocedure PostLabel(L: string);\nbegin\n   WriteLn(L, ':');\nend;\n\n\n{--------------------------------------------------------------}\n{ Output a String with Tab }\n\nprocedure Emit(s: string);\nbegin\n   Write(TAB, s);\nend;\n\n\n{--------------------------------------------------------------}\n{ Output a String with Tab and CRLF }\n\nprocedure EmitLn(s: string);\nbegin\n   Emit(s);\n   WriteLn;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Identifier }\n\nprocedure Ident;\nbegin\n   GetName;\n   if Look = '(' then begin\n      Match('(');\n      Match(')');\n      EmitLn('BSR ' + Value);\n      end\n   else\n      EmitLn('MOVE ' + Value + '(PC),D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Factor }\n\nprocedure Expression; Forward;\n\nprocedure Factor;\nbegin\n   if Look = '(' then begin\n      Match('(');\n      Expression;\n      Match(')');\n      end\n   else if IsAlpha(Look) then\n      Ident\n   else begin\n      GetNum;\n      EmitLn('MOVE #' + Value + ',D0');\n   end;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate the First Math Factor }\n\nprocedure SignedFactor;\nvar s: boolean;\nbegin\n   s := Look = '-';\n   if IsAddop(Look) then begin\n      GetChar;\n      SkipWhite;\n   end;\n   Factor;\n   if s then\n      EmitLn('NEG D0');\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Multiply }\n\nprocedure Multiply;\nbegin\n   Match('*');\n   Factor;\n   EmitLn('MULS (SP)+,D0');\nend;\n\n\n{-------------------------------------------------------------}\n{ Recognize and Translate a Divide }\n\nprocedure Divide;\nbegin\n   Match('/');\n   Factor;\n   EmitLn('MOVE (SP)+,D1');\n   EmitLn('EXS.L D0');\n   EmitLn('DIVS D1,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Completion of Term Processing  (called by Term and FirstTerm }\n\nprocedure Term1;\nbegin\n   while IsMulop(Look) do begin\n      EmitLn('MOVE D0,-(SP)');\n      case Look of\n       '*': Multiply;\n       '/': Divide;\n      end;\n   end;\nend;\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Term }\n\nprocedure Term;\nbegin\n   Factor;\n   Term1;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Math Term with Possible Leading Sign }\n\nprocedure FirstTerm;\nbegin\n   SignedFactor;\n   Term1;\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate an Add }\n\nprocedure Add;\nbegin\n   Match('+');\n   Term;\n   EmitLn('ADD (SP)+,D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate a Subtract }\n\nprocedure Subtract;\nbegin\n   Match('-');\n   Term;\n   EmitLn('SUB (SP)+,D0');\n   EmitLn('NEG D0');\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate an Expression }\n\nprocedure Expression;\nbegin\n   FirstTerm;\n   while IsAddop(Look) do begin\n      EmitLn('MOVE D0,-(SP)');\n      case Look of\n       '+': Add;\n       '-': Subtract;\n      end;\n   end;\nend;\n\n\n{---------------------------------------------------------------}\n{ Parse and Translate a Boolean Condition }\n{ This version is a dummy }\n\nProcedure Condition;\nbegin\n   EmitLn('Condition');\nend;\n\n\n{---------------------------------------------------------------}\n{ Recognize and Translate an IF Construct }\n\nprocedure Block; Forward;\n\n\nprocedure DoIf;\nvar L1, L2: string;\nbegin\n   Condition;\n   L1 := NewLabel;\n   L2 := L1;\n   EmitLn('BEQ ' + L1);\n   Block;\n   if Token = 'l' then begin\n      L2 := NewLabel;\n      EmitLn('BRA ' + L2);\n      PostLabel(L1);\n      Block;\n   end;\n   PostLabel(L2);\n   MatchString('ENDIF');\nend;\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate an Assignment Statement }\n\nprocedure Assignment;\nvar Name: string;\nbegin\n   Name := Value;\n   Match('=');\n   Expression;\n   EmitLn('LEA ' + Name + '(PC),A0');\n   EmitLn('MOVE D0,(A0)');\nend;\n\n\n{--------------------------------------------------------------}\n{ Recognize and Translate a Statement Block }\n\nprocedure Block;\nbegin\n   Scan;\n   while not (Token in ['e', 'l']) do begin\n      case Token of\n       'i': DoIf;\n       else Assignment;\n      end;\n      Scan;\n   end;\nend;\n\n\n{--------------------------------------------------------------}\n\n{ Parse and Translate a Program }\n\nprocedure DoProgram;\nbegin\n   Block;\n   MatchString('END');\n   EmitLn('END')\nend;\n\n\n{--------------------------------------------------------------}\n\n{ Initialize }\n\nprocedure Init;\nbegin\n   LCount := 0;\n   GetChar;\nend;\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\n   DoProgram;\nend.\n{--------------------------------------------------------------}\n\n\nCompare this program with its  single-character  counterpart.   I\nthink you will agree that the differences are minor.\n\n\nCONCLUSION\n\nAt this point, you have learned how to parse  and  generate  code\nfor expressions,  Boolean  expressions,  and  control structures.\nYou have now learned how to develop lexical scanners, and  how to\nincorporate their elements into a translator.  You have still not\nseen ALL the elements combined into one program, but on the basis\nof  what  we've  done before you should find it a straightforward\nmatter to extend our earlier programs to include scanners.\n\nWe are very  close  to  having  all  the elements that we need to\nbuild a real, functional compiler.  There are still a  few things\nmissing, notably procedure  calls  and type definitions.  We will\ndeal with  those  in  the  next  few  sessions.  Before doing so,\nhowever, I thought it  would  be fun to turn the translator above\ninto a true compiler.  That's what we'll  be  doing  in  the next\ninstallment.\n\nUp till now, we've taken  a rather bottom-up approach to parsing,\nbeginning with low-level constructs and working our way  up.   In\nthe next installment,  I'll  also  be  taking a look from the top\ndown,  and  we'll  discuss how the structure of the translator is\naltered by changes in the language definition.\n\nSee you then.\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1988 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\n\n\n"
  },
  {
    "path": "8/tutor8.txt",
    "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                     LET'S BUILD A COMPILER!\n\n                                By\n\n                     Jack W. Crenshaw, Ph.D.\n\n                           2 April 1989\n\n\n                  Part VIII: A LITTLE PHILOSOPHY\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\nINTRODUCTION\n\nThis is going to be a  different  kind of session than the others\nin our series on  parsing  and  compiler  construction.  For this\nsession, there won't be  any  experiments to do or code to write.\nThis  once,  I'd  like  to  just  talk  with  you  for  a  while.\nMercifully, it will be a short  session,  and then we can take up\nwhere we left off, hopefully with renewed vigor.\n\nWhen  I  was  in college, I found that I could  always  follow  a\nprof's lecture a lot better if I knew where he was going with it.\nI'll bet you were the same.\n\nSo I thought maybe it's about  time  I told you where we're going\nwith this series: what's coming up in future installments, and in\ngeneral what all  this  is  about.   I'll also share some general\nthoughts concerning the usefulness of what we've been doing.\n\n\nTHE ROAD HOME\n\nSo far, we've  covered  the parsing and translation of arithmetic\nexpressions,  Boolean expressions, and combinations connected  by\nrelational  operators.    We've also done the  same  for  control\nconstructs.    In  all of this we've leaned heavily on the use of\ntop-down, recursive  descent  parsing,  BNF  definitions  of  the\nsyntax, and direct generation of assembly-language code.  We also\nlearned the value of  such  tricks  as single-character tokens to\nhelp  us  see  the  forest  through  the  trees.    In  the  last\ninstallment  we dealt with lexical scanning,  and  I  showed  you\nsimple but powerful ways to remove the single-character barriers.\n\nThroughout the whole study, I've emphasized  the  KISS philosophy\n... Keep It Simple, Sidney ... and I hope by now  you've realized\njust  how  simple  this stuff can really be.  While there are for\nsure areas of compiler  theory  that  are truly intimidating, the\nultimate message of this series is that in practice you  can just\npolitely  sidestep   many  of  these  areas.    If  the  language\ndefinition  cooperates  or,  as in this series, if you can define\nthe language as you go, it's possible to write down  the language\ndefinition in BNF with reasonable ease.  And, as we've  seen, you\ncan crank out parse procedures from the BNF just about as fast as\nyou can type.\n\nAs our compiler has taken form, it's gotten more parts,  but each\npart  is  quite small and simple, and  very  much  like  all  the\nothers.\n\nAt this point, we have many  of  the makings of a real, practical\ncompiler.  As a matter of  fact,  we  already have all we need to\nbuild a toy  compiler  for  a  language as powerful as, say, Tiny\nBASIC.  In the next couple of installments, we'll  go  ahead  and\ndefine that language.\n\nTo round out  the  series,  we  still  have a few items to cover.\nThese include:\n\n   o Procedure calls, with and without parameters\n\n   o Local and global variables\n\n   o Basic types, such as character and integer types\n\n   o Arrays\n\n   o Strings\n\n   o User-defined types and structures\n\n   o Tree-structured parsers and intermediate languages\n\n   o Optimization\n\nThese will all be  covered  in  future  installments.  When we're\nfinished, you'll have all the tools you need to design  and build\nyour own languages, and the compilers to translate them.\n\nI can't  design  those  languages  for  you,  but I can make some\ncomments  and  recommendations.    I've  already  sprinkled  some\nthroughout past installments.    You've  seen,  for  example, the\ncontrol constructs I prefer.\n\nThese constructs are going  to  be part of the languages I build.\nI  have  three  languages in mind at this point, two of which you\nwill see in installments to come:\n\nTINY - A  minimal,  but  usable  language  on the order  of  Tiny\n       BASIC or Tiny C.  It won't be very practical, but  it will\n       have enough power to let you write and  run  real programs\n       that do something worthwhile.\n\nKISS - The  language  I'm  building for my  own  use.    KISS  is\n       intended to be  a  systems programming language.  It won't\n       have strong typing  or  fancy data structures, but it will\n       support most of  the  things  I  want to do with a higher-\n       order language (HOL), except perhaps writing compilers.\n                              \nI've also  been  toying  for  years  with  the idea of a HOL-like\nassembler,  with  structured  control  constructs   and  HOL-like\nassignment statements.  That, in  fact, was the impetus behind my\noriginal foray into the jungles of compiler theory.  This one may\nnever be built, simply  because  I've  learned that it's actually\neasier to implement a language like KISS, that only uses a subset\nof the CPU instructions.    As you know, assembly language can be\nbizarre  and  irregular  in the extreme, and a language that maps\none-for-one onto it can be a real challenge.  Still,  I've always\nfelt that the syntax used  in conventional assemblers is dumb ...\nwhy is\n\n     MOVE.L A,B\n\nbetter, or easier to translate, than\n\n     B=A ?\n\nI  think  it  would  be  an  interesting  exercise to  develop  a\n\"compiler\" that  would give the programmer complete access to and\ncontrol over the full complement  of the CPU instruction set, and\nwould allow you to generate  programs  as  efficient  as assembly\nlanguage, without the pain  of  learning a set of mnemonics.  Can\nit be done?  I don't  know.  The  real question may be, \"Will the\nresulting language be any  easier  to  write  than assembly\"?  If\nnot, there's no point in it.  I think that it  can  be  done, but\nI'm not completely sure yet how the syntax should look.\n\nPerhaps you have some  comments  or suggestions on this one.  I'd\nlove to hear them.\n\nYou probably won't be surprised to learn that I've already worked\nahead in most  of the areas that we will cover.  I have some good\nnews:  Things  never  get  much  harder than they've been so far.\nIt's  possible  to  build a complete, working compiler for a real\nlanguage, using nothing  but  the same kinds of techniques you've\nlearned so far.  And THAT brings up some interesting questions.\n\n\nWHY IS IT SO SIMPLE?\n\nBefore embarking  on this series, I always thought that compilers\nwere just naturally complex computer  programs  ...  the ultimate\nchallenge.  Yet the things we have done here have  usually turned\nout to be quite simple, sometimes even trivial.\n\nFor awhile, I thought  is  was simply because I hadn't yet gotten\ninto the meat  of  the  subject.    I had only covered the simple\nparts.  I will freely admit  to  you  that, even when I began the\nseries,  I  wasn't  sure how far we would be able  to  go  before\nthings got too complex to deal with in the ways  we  have so far.\nBut at this point I've already  been  down the road far enough to\nsee the end of it.  Guess what?\n                              \n\n                     THERE ARE NO HARD PARTS!\n\n\nThen, I thought maybe it was because we were not  generating very\ngood object  code.    Those  of  you  who have been following the\nseries and trying sample compiles know that, while the code works\nand  is  rather  foolproof,  its  efficiency is pretty awful.   I\nfigured that if we were  concentrating on turning out tight code,\nwe would soon find all that missing complexity.\n\nTo  some  extent,  that one is true.  In particular, my first few\nefforts at trying to improve efficiency introduced  complexity at\nan alarming rate.  But since then I've been tinkering around with\nsome simple optimizations and I've found some that result in very\nrespectable code quality, WITHOUT adding a lot of complexity.\n\nFinally, I thought that  perhaps  the  saving  grace was the \"toy\ncompiler\" nature of the study.   I  have made no pretense that we\nwere  ever  going  to be able to build a compiler to compete with\nBorland and Microsoft.  And yet, again, as I get deeper into this\nthing the differences are starting to fade away.\n\nJust  to make sure you get the message here, let me state it flat\nout:\n\n   USING THE TECHNIQUES WE'VE USED  HERE,  IT  IS  POSSIBLE TO\n   BUILD A PRODUCTION-QUALITY, WORKING COMPILER WITHOUT ADDING\n   A LOT OF COMPLEXITY TO WHAT WE'VE ALREADY DONE.\n\n\nSince  the series began I've received  some  comments  from  you.\nMost of them echo my own thoughts:  \"This is easy!    Why  do the\ntextbooks make it seem so hard?\"  Good question.\n\nRecently, I've gone back and looked at some of those texts again,\nand even bought and read some new ones.  Each  time,  I come away\nwith the same feeling: These guys have made it seem too hard.\n\nWhat's going on here?  Why does the whole thing seem difficult in\nthe texts, but easy to us?    Are  we that much smarter than Aho,\nUllman, Brinch Hansen, and all the rest?\n\nHardly.  But we  are  doing some things differently, and more and\nmore  I'm  starting  to appreciate the value of our approach, and\nthe way that  it  simplifies  things.    Aside  from  the obvious\nshortcuts that I outlined in Part I, like single-character tokens\nand console I/O, we have  made some implicit assumptions and done\nsome things differently from those who have designed compilers in\nthe past. As it turns out, our approach makes life a lot easier.\n\nSo why didn't all those other guys use it?\n\nYou have to remember the context of some of the  earlier compiler\ndevelopment.  These people were working with very small computers\nof  limited  capacity.      Memory  was  very  limited,  the  CPU\ninstruction  set  was  minimal, and programs ran  in  batch  mode\nrather  than  interactively.   As it turns out, these caused some\nkey design decisions that have  really  complicated  the designs.\nUntil recently,  I hadn't realized how much of classical compiler\ndesign was driven by the available hardware.\n\nEven in cases where these  limitations  no  longer  apply, people\nhave  tended  to  structure their programs in the same way, since\nthat is the way they were taught to do it.\n\nIn  our case, we have started with a blank sheet of paper.  There\nis a danger there, of course,  that  you will end up falling into\ntraps that other people have long since learned to avoid.  But it\nalso has allowed us to  take different approaches that, partly by\ndesign  and partly by pure dumb luck, have  allowed  us  to  gain\nsimplicity.\n\nHere are the areas that I think have  led  to  complexity  in the\npast:\n\n  o  Limited RAM Forcing Multiple Passes\n\n     I  just  read  \"Brinch  Hansen  on  Pascal   Compilers\"  (an\n     excellent book, BTW).  He  developed a Pascal compiler for a\n     PC, but he started the effort in 1981 with a 64K system, and\n     so almost every design decision  he made was aimed at making\n     the compiler fit  into  RAM.    To do this, his compiler has\n     three passes, one of which is the lexical scanner.  There is\n     no way he could, for  example, use the distributed scanner I\n     introduced  in  the last installment,  because  the  program\n     structure wouldn't allow it.  He also required  not  one but\n     two intermediate  languages,  to  provide  the communication\n     between phases.\n\n     All the early compiler writers  had to deal with this issue:\n     Break the compiler up into enough parts so that it  will fit\n     in memory.  When  you  have multiple passes, you need to add\n     data structures to support the  information  that  each pass\n     leaves behind for the next.   That adds complexity, and ends\n     up driving the  design.    Lee's  book,  \"The  Anatomy  of a\n     Compiler,\"  mentions a FORTRAN compiler developed for an IBM\n     1401.  It had no fewer than 63 separate passes!  Needless to\n     say,  in a compiler like this  the  separation  into  phases\n     would dominate the design.\n\n     Even in  situations  where  RAM  is  plentiful,  people have\n     tended  to  use  the same techniques because  that  is  what\n     they're familiar with.   It  wasn't  until Turbo Pascal came\n     along that we found how simple a compiler could  be  if  you\n     started with different assumptions.\n\n\n  o  Batch Processing\n                              \n     In the early days, batch  processing was the only choice ...\n     there was no interactive computing.   Even  today, compilers\n     run in essentially batch mode.\n\n     In a mainframe compiler as  well  as  many  micro compilers,\n     considerable effort is expended on error recovery ... it can\n     consume as much as 30-40%  of  the  compiler  and completely\n     drive the design.  The idea is to avoid halting on the first\n     error, but rather to keep going at all costs,  so  that  you\n     can  tell  the  programmer about as many errors in the whole\n     program as possible.\n\n     All of that harks back to the days of the  early mainframes,\n     where turnaround time was measured  in hours or days, and it\n     was important to squeeze every last ounce of information out\n     of each run.\n\n     In this series, I've been very careful to avoid the issue of\n     error recovery, and instead our compiler  simply  halts with\n     an error message on  the  first error.  I will frankly admit\n     that it was mostly because I wanted to take the easy way out\n     and keep things simple.   But  this  approach,  pioneered by\n     Borland in Turbo Pascal, also has a lot going for it anyway.\n     Aside from keeping the  compiler  simple,  it also fits very\n     well  with   the  idea  of  an  interactive  system.    When\n     compilation is  fast, and especially when you have an editor\n     such as Borland's that  will  take you right to the point of\n     the error, then it makes a  lot  of sense to stop there, and\n     just restart the compilation after the error is fixed.\n\n\n  o  Large Programs\n\n     Early compilers were designed to handle  large  programs ...\n     essentially infinite ones.    In those days there was little\n     choice;  the  idea  of  subroutine  libraries  and  separate\n     compilation  were  still  in  the  future.      Again,  this\n     assumption led to  multi-pass designs and intermediate files\n     to hold the results of partial processing.\n\n     Brinch Hansen's  stated goal was that the compiler should be\n     able to compile itself.   Again, because of his limited RAM,\n     this drove him to a multi-pass design.  He needed  as little\n     resident compiler code as possible,  so  that  the necessary\n     tables and other data structures would fit into RAM.\n\n     I haven't stated this one yet, because there  hasn't  been a\n     need  ... we've always just read and  written  the  data  as\n     streams, anyway.  But  for  the  record,  my plan has always\n     been that, in  a  production compiler, the source and object\n     data should all coexist  in  RAM with the compiler, a la the\n     early Turbo Pascals.  That's why I've been  careful  to keep\n     routines like GetChar  and  Emit  as  separate  routines, in\n     spite of their small size.   It  will be easy to change them\n     to read to and write from memory.\n\n\n  o  Emphasis on Efficiency\n\n     John  Backus has stated that, when  he  and  his  colleagues\n     developed the original FORTRAN compiler, they KNEW that they\n     had to make it produce tight code.  In those days, there was\n     a strong sentiment against HOLs  and  in  favor  of assembly\n     language, and  efficiency was the reason.  If FORTRAN didn't\n     produce very good  code  by  assembly  standards,  the users\n     would simply refuse to use it.  For the record, that FORTRAN\n     compiler turned out to  be  one  of  the most efficient ever\n     built, in terms of code quality.  But it WAS complex!\n\n     Today,  we have CPU power and RAM size  to  spare,  so  code\n     efficiency is not  so  much  of  an  issue.    By studiously\n     ignoring this issue, we  have  indeed  been  able to Keep It\n     Simple.    Ironically,  though, as I have said, I have found\n     some optimizations that we can  add  to  the  basic compiler\n     structure, without having to add a lot of complexity.  So in\n     this  case we get to have our cake and eat it too:  we  will\n     end up with reasonable code quality, anyway.\n\n\n  o  Limited Instruction Sets\n\n     The early computers had primitive instruction sets.   Things\n     that  we  take  for granted, such as  stack  operations  and\n     indirect addressing, came only with great difficulty.\n\n     Example: In most compiler designs, there is a data structure\n     called the literal pool.  The compiler  typically identifies\n     all literals used in the program, and collects  them  into a\n     single data structure.    All references to the literals are\n     done  indirectly  to  this  pool.    At  the   end   of  the\n     compilation, the  compiler  issues  commands  to  set  aside\n     storage and initialize the literal pool.\n\n     We haven't had to address that  issue  at all.  When we want\n     to load a literal, we just do it, in line, as in\n\n          MOVE #3,D0\n\n     There is something to be said for the use of a literal pool,\n     particularly on a machine like  the 8086 where data and code\n     can  be separated.  Still, the whole  thing  adds  a  fairly\n     large amount of complexity with little in return.\n\n     Of course, without the stack we would be lost.  In  a micro,\n     both  subroutine calls and temporary storage depend  heavily\n     on the stack, and  we  have used it even more than necessary\n     to ease expression parsing.\n\n\n  o  Desire for Generality\n\n     Much of the content of the typical compiler text is taken up\n     with issues we haven't addressed here at all ... things like\n     automated  translation  of  grammars,  or generation of LALR\n     parse tables.  This is not simply because  the  authors want\n     to impress you.  There are good, practical  reasons  why the\n     subjects are there.\n\n     We have been concentrating on the use of a recursive-descent\n     parser to parse a  deterministic  grammar,  i.e.,  a grammar\n     that is not ambiguous and, therefore, can be parsed with one\n     level of lookahead.  I haven't made much of this limitation,\n     but  the  fact  is  that  this represents a small subset  of\n     possible grammars.  In fact,  there is an infinite number of\n     grammars that we can't parse using our techniques.    The LR\n     technique is a more powerful one, and can deal with grammars\n     that we can't.\n\n     In compiler theory, it's important  to know how to deal with\n     these  other  grammars,  and  how  to  transform  them  into\n     grammars  that  are  easier to deal with.  For example, many\n     (but not all) ambiguous  grammars  can  be  transformed into\n     unambiguous ones.  The way to do this is not always obvious,\n     though, and so many people  have  devoted  years  to develop\n     ways to transform them automatically.\n\n     In practice, these  issues  turn out to be considerably less\n     important.  Modern languages tend  to be designed to be easy\n     to parse, anyway.   That  was a key motivation in the design\n     of Pascal.   Sure,  there are pathological grammars that you\n     would be hard pressed to write unambiguous BNF  for,  but in\n     the  real  world  the best answer is probably to avoid those\n     grammars!\n\n     In  our  case,  of course, we have sneakily let the language\n     evolve  as  we  go, so we haven't painted ourselves into any\n     corners here.  You may not always have that luxury.   Still,\n     with a little  care  you  should  be able to keep the parser\n     simple without having to resort to automatic  translation of\n     the grammar.\n\n\nWe have taken  a  vastly  different  approach in this series.  We\nstarted with a clean sheet  of  paper,  and  developed techniques\nthat work in the context that  we  are in; that is, a single-user\nPC  with  rather  ample CPU power and RAM space.  We have limited\nourselves to reasonable grammars that  are easy to parse, we have\nused the instruction set of the CPU to advantage, and we have not\nconcerned ourselves with efficiency.  THAT's why it's been easy.\n\nDoes this mean that we are forever doomed  to  be  able  to build\nonly toy compilers?   No, I don't think so.  As I've said, we can\nadd  certain   optimizations   without   changing   the  compiler\nstructure.  If we want to process large files, we can  always add\nfile  buffering  to do that.  These  things  do  not  affect  the\noverall program design.\n\nAnd I think  that's  a  key  factor.   By starting with small and\nlimited  cases,  we  have been able to concentrate on a structure\nfor  the  compiler  that is natural  for  the  job.    Since  the\nstructure naturally fits the job, it is almost bound to be simple\nand transparent.   Adding  capability doesn't have to change that\nbasic  structure.    We  can  simply expand things like the  file\nstructure or add an optimization layer.  I guess  my  feeling  is\nthat, back when resources were tight, the structures people ended\nup  with  were  artificially warped to make them work under those\nconditions, and weren't optimum  structures  for  the  problem at\nhand.\n\n\nCONCLUSION\n\nAnyway, that's my arm-waving  guess  as to how we've been able to\nkeep things simple.  We started with something simple and  let it\nevolve  naturally,  without  trying  to   force   it   into  some\ntraditional mold.\n\nWe're going to  press on with this.  I've given you a list of the\nareas  we'll  be  covering in future installments.    With  those\ninstallments, you  should  be  able  to  build  complete, working\ncompilers for just about any occasion, and build them simply.  If\nyou REALLY want to build production-quality compilers,  you'll be\nable to do that, too.\n\nFor those of you who are chafing at the bit for more parser code,\nI apologize for this digression.  I just thought  you'd  like  to\nhave things put  into  perspective  a  bit.  Next time, we'll get\nback to the mainstream of the tutorial.\n\nSo far, we've only looked at pieces of compilers,  and  while  we\nhave  many  of  the  makings  of a complete language, we  haven't\ntalked about how to put  it  all  together.    That  will  be the\nsubject of our next  two  installments.  Then we'll press on into\nthe new subjects I listed at the beginning of this installment.\n\nSee you then.\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n"
  },
  {
    "path": "9/Makefile",
    "content": "IN=main.c cradle.c\nOUT=main\nFLAGS=-Wall -Werror\n\nall:\n\tgcc -o $(OUT) $(IN) $(FLAGS)\n\nrun:\n\t./$(OUT)\n\n.PHONY: clean\nclean:\n\trm $(OUT)\n"
  },
  {
    "path": "9/cradle.c",
    "content": "#include \"cradle.h\"\n#include <stdio.h>\n#include <stdlib.h>\n\n#define TABLE_SIZE 26\nstatic int LCount = 0;\nstatic char labelName[MAX_BUF];\nchar tmp[MAX_BUF];\n\nstatic int Table[TABLE_SIZE];\n\n/* Helper Functions */\nchar uppercase(char c)\n{\n    return (c & 0xDF);\n}\n\nvoid GetChar() \n{\n    Look = getchar();\n    /* printf(\"Getchar: %c\\n\", Look); */\n}\n\n\nvoid Error(char *s)\n{\n    printf(\"\\nError: %s.\", s);\n}\n\nvoid Abort(char *s)\n{\n    Error(s);\n    exit(1);\n}\n\n\nvoid Expected(char *s)\n{\n    sprintf(tmp, \"%s Expected\", s);\n    Abort(tmp);\n}\n\n\nvoid Match(char x)\n{\n    if(Look == x) {\n        GetChar();\n    } else {\n        sprintf(tmp, \"' %c ' \",  x);\n        Expected(tmp);\n    }\n}\n\nvoid Newline()\n{\n    if (Look == '\\r') {\n        GetChar();\n        if (Look == '\\n') {\n            GetChar();\n        }\n    } else if (Look == '\\n') {\n        GetChar();\n    }\n}\n\nint IsAlpha(char c)\n{\n    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');\n} \n\nint IsDigit(char c)\n{\n    return (c >= '0') && (c <= '9');\n}\n\nint IsAddop(char c)\n{\n    return (c == '+') || (c == '-');\n}\n\nchar GetName()\n{\n    char c = Look;\n\n    if( !IsAlpha(Look)) {\n        sprintf(tmp, \"Name\");\n        Expected(tmp);\n    }\n\n    GetChar();\n\n    return uppercase(c);\n}\n\n\nint GetNum()\n{\n    int value = 0;\n    if( !IsDigit(Look)) {\n        sprintf(tmp, \"Integer\");\n        Expected(tmp);\n    }\n\n    while (IsDigit(Look)) {\n        value = value * 10 + Look - '0';\n        GetChar();\n    }\n\n    return value;\n}\n\nvoid Emit(char *s)\n{\n    printf(\"\\t%s\", s);\n}\n\nvoid EmitLn(char *s)\n{\n    Emit(s);\n    printf(\"\\n\");\n}\n\nvoid Init()\n{\n    LCount = 0;\n\n    InitTable();\n    GetChar();\n}\n\nvoid InitTable()\n{\n    int i;\n    for (i = 0; i < TABLE_SIZE; i++) {\n        Table[i] = 0;\n    }\n\n}\n\nchar *NewLabel()\n{\n    sprintf(labelName, \"L%02d\", LCount);\n    LCount ++;\n    return labelName;\n}\n\nvoid PostLabel(char *label)\n{\n    printf(\"%s:\\n\", label);\n}\n"
  },
  {
    "path": "9/cradle.h",
    "content": "#ifndef _CRADLE_H\n#define _CRADLE_H\n\n#define MAX_BUF 100\nextern char tmp[MAX_BUF];\nchar Look;\n\nvoid GetChar();\n\nvoid Error(char *s);\nvoid Abort(char *s);\nvoid Expected(char *s);\nvoid Match(char x);\n\nvoid Newline();\n\nint IsAlpha(char c);\nint IsDigit(char c);\nint IsAddop(char c);\n\nchar GetName();\nint GetNum();\n\nvoid Emit(char *s);\nvoid EmitLn(char *s);\n\nvoid Init();\nvoid InitTable();\n\nchar *NewLabel();\nvoid PostLabel(char *label);\n#endif\n"
  },
  {
    "path": "9/main.c",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n\n#include \"cradle.h\"\n\n#ifdef DEBUG\n#define dprint(fmt, ...) printf(fmt, __VA_ARGS__);\n#else\n#define dprint(fmt, ...)\n#endif\n\nvoid Prolog(char name);\nvoid Epilog(char name);\nvoid Prog();\nvoid DoBlock(char name);\nvoid Declarations();\nvoid Labels();\nvoid Constants();\nvoid Types();\nvoid Variables();\nvoid DoProcedure();\nvoid DoFunction();\nvoid Statements();\n\nvoid Prog()\n{\n    Match('p');     /* handles program header part */\n    char name = GetName();\n    Prolog(name);\n    DoBlock(name);\n    Match('.');\n    Epilog(name);\n}\n\nvoid Prolog(char name)\n{\n    EmitLn(\".text\");\n    EmitLn(\".global _start\");\n    EmitLn(\"_start:\");\n}\n\nvoid Epilog(char name)\n{\n    EmitLn(\"movl %eax, %ebx\");\n    EmitLn(\"movl $1, %eax\");\n    EmitLn(\"int $0x80\");\n}\nvoid DoBlock(char name)\n{\n    Declarations();\n    sprintf(tmp, \"%c\", name);\n    PostLabel(tmp);\n    Statements();\n}\n\nvoid Declarations()\n{\n    while(strchr(\"lctvpf\", Look) != NULL) {\n        switch(Look) {\n            case 'l':\n                Labels();\n                break;\n            case 'c':\n                Constants();\n                break;\n            case 't':\n                Types();\n                break;\n            case 'v':\n                Variables();\n                break;\n            case 'p':\n                DoProcedure();\n                break;\n            case 'f':\n                DoFunction();\n            default:\n                break;\n        }\n    }\n}\n\nvoid Labels()\n{\n    Match('l');\n}\n\nvoid Constants()\n{\n    Match('c');\n}\n\nvoid Types()\n{\n    Match('t');\n}\n\nvoid Variables()\n{\n    Match('v');\n}\n\nvoid DoProcedure()\n{\n    Match('p');\n}\n\nvoid DoFunction()\n{\n    Match('f');\n}\n\nvoid Statements()\n{\n    Match('b');\n    while(Look != 'e') {\n        GetChar();\n    }\n    Match('e');\n}\n\nint main()\n{\n    Init();\n    Prog();\n    return 0;\n}\n"
  },
  {
    "path": "9/tutor9.txt",
    "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                     LET'S BUILD A COMPILER!\n\n                                By\n\n                     Jack W. Crenshaw, Ph.D.\n\n                          16 April 1989\n\n\n                       Part IX: A TOP VIEW\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n\nINTRODUCTION\n\nIn  the  previous  installments,  we  have  learned  many of  the\ntechniques required to  build  a full-blown compiler.  We've done\nboth  assignment   statements   (with   Boolean   and  arithmetic\nexpressions),  relational operators, and control constructs.   We\nstill haven't  addressed procedure or function calls, but even so\nwe  could  conceivably construct a  mini-language  without  them.\nI've  always  thought  it would be fun to see just  how  small  a\nlanguage  one  could  build  that  would still be useful.   We're\nALMOST in a position to do that now.  The  problem  is: though we\nknow  how  to  parse and translate the constructs, we still don't\nknow quite how to put them all together into a language.\n\nIn those earlier installments, the  development  of  our programs\nhad  a decidedly bottom-up flavor.  In  the  case  of  expression\nparsing,  for  example,  we  began  with  the  very lowest  level\nconstructs, the individual constants  and  variables,  and worked\nour way up to more complex expressions.\n\nMost people regard  the  top-down design approach as being better\nthan  the  bottom-up  one.  I do too,  but  the  way  we  did  it\ncertainly seemed natural enough for the kinds of  things  we were\nparsing.\n\nYou mustn't get  the  idea, though, that the incremental approach\nthat  we've  been  using  in  all these tutorials  is  inherently\nbottom-up.  In  this  installment  I'd  like to show you that the\napproach can work just as well when applied from the top down ...\nmaybe better.  We'll consider languages such as C and Pascal, and\nsee how complete compilers can be built starting from the top.\n\nIn the next installment, we'll  apply the same technique to build\na  complete  translator  for a subset of the KISS language, which\nI'll be  calling  TINY.    But one of my goals for this series is\nthat you will  not only be able to see how a compiler for TINY or\nKISS  works,  but  that you will also be able to design and build\ncompilers for your own languages.  The C and Pascal examples will\nhelp.    One  thing I'd like you  to  see  is  that  the  natural\nstructure of the compiler depends very much on the language being\ntranslated, so the simplicity and  ease  of  construction  of the\ncompiler  depends  very  much  on  letting the language  set  the\nprogram structure.\n                              \nIt's  a bit much to produce a full C or Pascal compiler here, and\nwe won't try.   But we can flesh out the top levels far enough so\nthat you can see how it goes.\n\nLet's get started.\n\n\nTHE TOP LEVEL\n\nOne of the biggest  mistakes  people make in a top-down design is\nfailing  to start at the true top.  They think they know what the\noverall structure of the  design  should be, so they go ahead and\nwrite it down.\n\nWhenever  I  start a new design, I always like to do  it  at  the\nabsolute beginning.   In  program design language (PDL), this top\nlevel looks something like:\n\n\n     begin\n        solve the problem\n     end\n\n\nOK, I grant  you that this doesn't give much of a hint as to what\nthe next level is, but I  like  to  write it down anyway, just to\ngive me that warm feeling that I am indeed starting at the top.\n\nFor our problem, the overall function of a compiler is to compile\na complete program.  Any definition of the  language,  written in\nBNF,  begins here.  What does the top level BNF look like?  Well,\nthat depends quite a bit on the language to be translated.  Let's\ntake a look at Pascal.\n\n\nTHE STRUCTURE OF PASCAL\n\nMost  texts  for  Pascal  include  a   BNF   or  \"railroad-track\"\ndefinition of the language.  Here are the first few lines of one:\n\n\n     <program> ::= <program-header> <block> '.'\n\n     <program-header> ::= PROGRAM <ident>\n\n     <block> ::= <declarations> <statements>\n\n\nWe can write recognizers  to  deal  with  each of these elements,\njust as we've done before.  For each one, we'll use  our familiar\nsingle-character tokens to represent the input, then flesh things\nout a little at a time.    Let's begin with the first recognizer:\nthe program itself.\n                              \nTo translate this, we'll  start  with a fresh copy of the Cradle.\nSince we're back to single-character  names, we'll just use a 'p'\nto stand for 'PROGRAM.'\n\nTo a fresh copy of the cradle, add the following code, and insert\na call to it from the main program:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate A Program }\n\nprocedure Prog;\nvar  Name: char;\nbegin\n   Match('p');            { Handles program header part }\n   Name := GetName;\n   Prolog(Name);\n   Match('.');\n   Epilog(Name);\nend;\n{--------------------------------------------------------------}\n\n\nThe procedures  Prolog and Epilog perform whatever is required to\nlet the program interface with the operating system,  so  that it\ncan execute as a program.  Needless to  say,  this  part  will be\nVERY OS-dependent.  Remember, I've been emitting code for a 68000\nrunning under the OS I use, which is SK*DOS.   I  realize most of\nyou are using PC's  and  would rather see something else, but I'm\nin this thing too deep to change now!\n\nAnyhow, SK*DOS is a  particularly  easy OS to interface to.  Here\nis the code for Prolog and Epilog:\n\n\n{--------------------------------------------------------------}\n{ Write the Prolog }\n\nprocedure Prolog;\nbegin\n   EmitLn('WARMST EQU $A01E');\nend;\n\n\n{--------------------------------------------------------------}\n{ Write the Epilog }\n\nprocedure Epilog(Name: char);\nbegin\n   EmitLn('DC WARMST');\n   EmitLn('END ' + Name);\nend;\n{--------------------------------------------------------------}\n                              \nAs usual, add  this  code  and  try  out the \"compiler.\"  At this\npoint, there is only one legal input:\n\n\n     px.   (where x is any single letter, the program name)\n\n\nWell,  as  usual  our first effort is rather unimpressive, but by\nnow  I'm sure you know that things  will  get  more  interesting.\nThere is one important thing to  note:   THE OUTPUT IS A WORKING,\nCOMPLETE, AND EXECUTABLE PROGRAM (at least after it's assembled).\n\nThis  is  very  important.  The  nice  feature  of  the  top-down\napproach is that at any stage you can  compile  a  subset  of the\ncomplete language and get  a  program that will run on the target\nmachine.    From here on, then, we  need  only  add  features  by\nfleshing out the language constructs.  It's all  very  similar to\nwhat we've been doing all along, except that we're approaching it\nfrom the other end.\n\n\nFLESHING IT OUT\n\nTo flesh out  the  compiler,  we  only have to deal with language\nfeatures  one by one.  I like to start with a stub procedure that\ndoes  nothing, then add detail in  incremental  fashion.    Let's\nbegin  by  processing  a block, in accordance with its PDL above.\nWe can do this in two stages.  First, add the null procedure:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Pascal Block }\n\nprocedure DoBlock(Name: char);\nbegin\nend;\n{--------------------------------------------------------------}\n\n\nand modify Prog to read:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate A Program }\n\nprocedure Prog;\nvar  Name: char;\nbegin\n   Match('p');\n   Name := GetName;\n   Prolog;\n   DoBlock(Name);\n   Match('.');\n   Epilog(Name);\nend;\n{--------------------------------------------------------------}\n\n\nThat certainly  shouldn't change the behavior of the program, and\nit doesn't.  But now the  definition  of Prog is complete, and we\ncan proceed to flesh out DoBlock.  That's done right from its BNF\ndefinition:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate a Pascal Block }\n\nprocedure DoBlock(Name: char);\nbegin\n   Declarations;\n   PostLabel(Name);\n   Statements;\nend;\n{--------------------------------------------------------------}\n\n\nThe  procedure  PostLabel  was  defined  in  the  installment  on\nbranches.  Copy it into your cradle.\n\nI probably need to  explain  the  reason  for inserting the label\nwhere I have.  It has to do with the operation of SK*DOS.  Unlike\nsome OS's,  SK*DOS allows the entry point to the main  program to\nbe  anywhere in the program.  All you have to do is to give  that\npoint a name.  The call  to  PostLabel puts that name just before\nthe first executable statement  in  the  main  program.  How does\nSK*DOS know which of the many labels is the entry point, you ask?\nIt's the one that matches the END statement  at  the  end  of the\nprogram.\n\nOK,  now  we  need  stubs  for  the  procedures Declarations  and\nStatements.  Make them null procedures as we did before.\n\nDoes the program  still run the same?  Then we can move on to the\nnext stage.\n\n\nDECLARATIONS\n\nThe BNF for Pascal declarations is:\n\n\n     <declarations> ::= ( <label list>    |\n                          <constant list> |\n                          <type list>     |\n                          <variable list> |\n                          <procedure>     |\n                          <function>         )*\n                              \n\n(Note  that  I'm  using the more liberal definition used by Turbo\nPascal.  In the standard Pascal definition, each  of  these parts\nmust be in a specific order relative to the rest.)\n\nAs  usual,  let's  let a single character represent each of these\ndeclaration types.  The new form of Declarations is:\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate the Declaration Part }\n\nprocedure Declarations;\nbegin\n   while Look in ['l', 'c', 't', 'v', 'p', 'f'] do\n      case Look of\n       'l': Labels;\n       'c': Constants;\n       't': Types;\n       'v': Variables;\n       'p': DoProcedure;\n       'f': DoFunction;\n      end;\nend;\n{--------------------------------------------------------------}\n\n\nOf course, we need stub  procedures for each of these declaration\ntypes.  This time,  they  can't  quite  be null procedures, since\notherwise we'll end up with an infinite While loop.  At  the very\nleast, each recognizer must  eat  the  character that invokes it.\nInsert the following procedures:\n\n\n{--------------------------------------------------------------}\n{ Process Label Statement }\n\nprocedure Labels;\nbegin\n   Match('l');\nend;\n\n\n{--------------------------------------------------------------}\n{ Process Const Statement }\n\nprocedure Constants;\nbegin\n   Match('c');\nend;\n\n\n{--------------------------------------------------------------}\n{ Process Type Statement }\nprocedure Types;\nbegin\n   Match('t');\nend;\n\n\n{--------------------------------------------------------------}\n{ Process Var Statement }\n\nprocedure Variables;\nbegin\n   Match('v');\nend;\n\n\n{--------------------------------------------------------------}\n{ Process Procedure Definition }\n\nprocedure DoProcedure;\nbegin\n   Match('p');\nend;\n\n\n{--------------------------------------------------------------}\n{ Process Function Definition }\n\nprocedure DoFunction;\nbegin\n   Match('f');\nend;\n{--------------------------------------------------------------}\n\n\nNow try out the  compiler  with a few representative inputs.  You\ncan  mix  the  declarations any way you like, as long as the last\ncharacter  in  the  program is'.' to  indicate  the  end  of  the\nprogram.  Of course,  none  of  the declarations actually declare\nanything, so you don't need  (and can't use) any characters other\nthan those standing for the keywords.\n\nWe can flesh out the statement  part  in  a similar way.  The BNF\nfor it is:\n\n\n     <statements> ::= <compound statement>\n\n     <compound statement> ::= BEGIN <statement>\n                                   (';' <statement>) END\n\n\nNote that statements can  begin  with  any identifier except END.\nSo the first stub form of procedure Statements is:\n                              \n\n{--------------------------------------------------------------}\n{ Parse and Translate the Statement Part }\n\nprocedure Statements;\nbegin\n   Match('b');\n   while Look <> 'e' do\n      GetChar;\n   Match('e');\nend;\n{--------------------------------------------------------------}\n\n\nAt  this  point  the  compiler   will   accept   any   number  of\ndeclarations, followed by the  BEGIN  block  of the main program.\nThis  block  itself  can contain any characters at all (except an\nEND), but it must be present.\n\nThe simplest form of input is now\n\n     'pxbe.'\n\nTry  it.    Also  try  some  combinations  of  this.   Make  some\ndeliberate errors and see what happens.\n\nAt this point you should be beginning to see the drill.  We begin\nwith a stub translator to process a program, then  we  flesh  out\neach procedure in turn,  based  upon its BNF definition.  Just as\nthe lower-level BNF definitions add detail and elaborate upon the\nhigher-level ones, the lower-level  recognizers  will  parse more\ndetail  of  the  input  program.    When  the  last stub has been\nexpanded,  the  compiler  will  be  complete.    That's  top-down\ndesign/implementation in its purest form.\n\nYou might note that even though we've been adding procedures, the\noutput of the program hasn't changed.  That's as  it  should  be.\nAt these  top  levels  there  is  no  emitted code required.  The\nrecognizers are  functioning as just that: recognizers.  They are\naccepting input sentences, catching bad ones, and channeling good\ninput to the right places, so  they  are  doing their job.  If we\nwere to pursue this a bit longer, code would start to appear.\n\nThe  next  step  in our expansion should  probably  be  procedure\nStatements.  The Pascal definition is:\n\n\n    <statement> ::= <simple statement> | <structured statement>\n\n    <simple statement> ::= <assignment> | <procedure call> | null\n\n    <structured statement> ::= <compound statement> |\n                               <if statement>       |\n                               <case statement>     |\n                               <while statement>    |\n                               <repeat statement>   |\n                               <for statement>      |\n                               <with statement>\n\n\nThese  are  starting  to look familiar.  As a matter of fact, you\nhave already gone  through  the process of parsing and generating\ncode for both assignment statements and control structures.  This\nis where the top level meets our bottom-up  approach  of previous\nsessions.  The constructs will be a little  different  from those\nwe've  been  using  for KISS, but the differences are nothing you\ncan't handle.\n\nI  think  you can get the picture now as to the  procedure.    We\nbegin with a complete BNF  description of the language.  Starting\nat  the  top  level, we code  up  the  recognizer  for  that  BNF\nstatement, using stubs  for  the next-level recognizers.  Then we\nflesh those lower-level statements out one by one.\n\nAs it happens, the definition of Pascal is  very  compatible with\nthe  use of BNF, and BNF descriptions  of  the  language  abound.\nArmed  with  such   a   description,  you  will  find  it  fairly\nstraightforward to continue the process we've begun.\n\nYou  might  have  a go at fleshing a few of these constructs out,\njust  to get a feel for it.  I don't expect you  to  be  able  to\ncomplete a Pascal compiler  here  ...  there  are too many things\nsuch  as  procedures  and types that we haven't addressed yet ...\nbut  it  might  be helpful to try some of the more familiar ones.\nIt will do  you  good  to  see executable programs coming out the\nother end.\n\nIf I'm going to address those issues that we haven't covered yet,\nI'd rather  do  it  in  the context of KISS.  We're not trying to\nbuild a complete Pascal  compiler  just yet, so I'm going to stop\nthe expansion of Pascal here.    Let's  take  a  look  at  a very\ndifferent language.\n\n\nTHE STRUCTURE OF C\n\nThe C language is quite another matter, as you'll see.   Texts on\nC  rarely  include  a BNF definition of  the  language.  Probably\nthat's because the language is quite hard to write BNF for.\n\nOne reason I'm showing you these structures now is so that  I can\nimpress upon you these two facts:\n\n (1) The definition of  the  language drives the structure of the\n     compiler.  What works for one language may be a disaster for\n     another.    It's  a very bad idea to try to  force  a  given\n     structure upon the compiler.  Rather, you should let the BNF\n     drive the structure, as we have done here.\n                             \n (2) A language that is hard to write BNF for  will  probably  be\n     hard  to  write  a compiler for, as well.  C  is  a  popular\n     language,  and  it  has  a  reputation  for  letting you  do\n     virtually  anything that is possible to  do.    Despite  the\n     success of Small C, C is _NOT_ an easy language to parse.\n\n\nA C program has  less  structure than its Pascal counterpart.  At\nthe top level, everything in C is a static declaration, either of\ndata or of a function.  We can capture this thought like this:\n\n\n     <program> ::= ( <global declaration> )*\n\n     <global declaration> ::= <data declaration>  |\n                              <function>\n\nIn Small C, functions  can  only have the default type int, which\nis not declared.  This makes  the  input easy to parse: the first\ntoken is either \"int,\" \"char,\" or the name  of  a  function.   In\nSmall  C, the preprocessor commands are  also  processed  by  the\ncompiler proper, so the syntax becomes:\n\n\n     <global declaration> ::= '#' <preprocessor command>  |\n                              'int' <data list>           |\n                              'char' <data list>          |\n                              <ident> <function body>     |\n\n\nAlthough we're really more interested in full C  here,  I'll show\nyou the  code corresponding to this top-level structure for Small\nC.\n\n\n{--------------------------------------------------------------}\n{ Parse and Translate A Program }\n\nprocedure Prog;\nbegin\n   while Look <> ^Z do begin\n      case Look of\n       '#': PreProc;\n       'i': IntDecl;\n       'c': CharDecl;\n      else DoFunction(Int);\n      end;\n   end;\nend;\n{--------------------------------------------------------------}\n\nNote that I've had to use a ^Z to indicate the end of the source.\nC has no keyword such as END or the '.' to otherwise indicate the\nend.\n                             \nWith full C,  things  aren't  even  this easy.  The problem comes\nabout because in full C, functions can also have types.   So when\nthe compiler sees a  keyword  like  \"int,\"  it still doesn't know\nwhether to expect a  data  declaration  or a function definition.\nThings get more  complicated  since  the  next token may not be a\nname  ... it may start with an '*' or '(', or combinations of the\ntwo.\n\nMore specifically, the BNF for full C begins with:\n\n\n     <program> ::= ( <top-level decl> )*\n\n     <top-level decl> ::= <function def> | <data decl>\n\n     <data decl> ::= [<class>] <type> <decl-list>\n\n     <function def> ::= [<class>] [<type>] <function decl>\n\n\nYou  can  now  see the problem:   The  first  two  parts  of  the\ndeclarations for data and functions can be the same.   Because of\nthe  ambiguity  in  the grammar as  written  above,  it's  not  a\nsuitable  grammar  for  a  recursive-descent  parser.     Can  we\ntransform it into one that is suitable?  Yes, with a little work.\nSuppose we write it this way:\n\n\n     <top-level decl> ::= [<class>] <decl>\n\n     <decl> ::= <type> <typed decl> | <function decl>\n\n     <typed decl> ::= <data list> | <function decl>\n\n\nWe  can  build  a  parsing  routine  for  the   class   and  type\ndefinitions, and have them store away their findings  and  go on,\nwithout their ever having to \"know\" whether a function or  a data\ndeclaration is being processed.\n\nTo begin, key in the following version of the main program:\n\n\n{--------------------------------------------------------------}\n{ Main Program }\n\nbegin\n   Init;\n   while Look <> ^Z do begin\n      GetClass;\n      GetType;\n      TopDecl;\n   end;\nend.\n\n{--------------------------------------------------------------}\n\n\nFor the first round, just make the three procedures stubs that do\nnothing _BUT_ call GetChar.\n\nDoes this program work?  Well, it would be hard put NOT to, since\nwe're not really asking it to do anything.  It's been said that a\nC compiler will accept virtually any input without choking.  It's\ncertainly true of THIS  compiler,  since in effect all it does is\nto eat input characters until it finds a ^Z.\n\nNext, let's make  GetClass  do something worthwhile.  Declare the\nglobal variable\n\n\n     var Class: char;\n\n\nand change GetClass to do the following:\n\n\n{--------------------------------------------------------------}\n{  Get a Storage Class Specifier }\n\nProcedure GetClass;\nbegin\n   if Look in ['a', 'x', 's'] then begin\n      Class := Look;\n      GetChar;\n      end\n   else Class := 'a';\nend;\n{--------------------------------------------------------------}\n\n\nHere, I've used three  single  characters  to represent the three\nstorage classes \"auto,\" \"extern,\"  and  \"static.\"   These are not\nthe only three possible classes ... there are also \"register\" and\n\"typedef,\" but this should  give  you the picture.  Note that the\ndefault class is \"auto.\"\n\nWe  can  do  a  similar  thing  for  types.   Enter the following\nprocedure next:\n\n\n{--------------------------------------------------------------}\n{  Get a Type Specifier }\n\nprocedure GetType;\nbegin\n   Typ := ' ';\n   if Look = 'u' then begin\n      Sign := 'u';\n      Typ := 'i';\n      GetChar;\n      end\n   else Sign := 's';\n   if Look in ['i', 'l', 'c'] then begin\n      Typ := Look;\n      GetChar;\n   end;\nend;\n{--------------------------------------------------------------}\n\nNote that you must add two more global variables, Sign and Typ.\n\nWith these two procedures in place, the compiler will process the\nclass and type definitions and store away their findings.  We can\nnow process the rest of the declaration.\n\nWe  are by no means out of the woods yet, because there are still\nmany complexities just in the definition of the  type,  before we\neven get to the actual data or function names.  Let's pretend for\nthe moment that we have passed all those gates, and that the next\nthing in the  input stream is a name.  If the name is followed by\na left paren, we have a function declaration.  If not, we have at\nleast one data item,  and  possibly a list, each element of which\ncan have an initializer.\n\nInsert the following version of TopDecl:\n\n\n{--------------------------------------------------------------}\n{ Process a Top-Level Declaration }\n\nprocedure TopDecl;\nvar Name: char;\nbegin\n   Name := Getname;\n   if Look = '(' then\n      DoFunc(Name)\n   else\n      DoData(Name);\nend;\n{--------------------------------------------------------------}\n\n\n(Note that, since we have already read the name, we must  pass it\nalong to the appropriate routine.)\n\nFinally, add the two procedures DoFunc and DoData:\n\n\n{--------------------------------------------------------------}\n{ Process a Function Definition }\n\nprocedure DoFunc(n: char);\nbegin\n   Match('(');\n   Match(')');\n   Match('{');\n   Match('}');\n   if Typ = ' ' then Typ := 'i';\n   Writeln(Class, Sign, Typ, ' function ', n);\nend;\n\n{--------------------------------------------------------------}\n{ Process a Data Declaration }\n\nprocedure DoData(n: char);\nbegin\n   if Typ = ' ' then Expected('Type declaration');\n   Writeln(Class, Sign, Typ, ' data ', n);\n   while Look = ',' do begin\n      Match(',');\n      n := GetName;\n      WriteLn(Class, Sign, Typ, ' data ', n);\n   end;\n   Match(';');\nend;\n{--------------------------------------------------------------}\n\n\nSince  we're  still  a long way from producing executable code, I\ndecided to just have these two routines tell us what they found.\n\nOK, give this program a try.    For data declarations, it's OK to\ngive a list separated by commas.  We  can't  process initializers\nas yet.  We also can't process argument lists for  the functions,\nbut the \"(){}\" characters should be there.\n\nWe're still a _VERY_ long way from having a C compiler,  but what\nwe have is starting to process the right kinds of inputs,  and is\nrecognizing both good  and  bad  inputs.    In  the  process, the\nnatural structure of the compiler is starting to take form.\n\nCan we continue this until we have something that acts  more like\na compiler. Of course we can.  Should we?  That's another matter.\nI don't know about you, but I'm beginning to get dizzy, and we've\nstill  got  a  long  way  to  go  to  even  get  past   the  data\ndeclarations.\n\nAt  this  point,  I think you can see how the  structure  of  the\ncompiler evolves from the language  definition.    The structures\nwe've seen for our  two  examples, Pascal and C, are as different\nas night and day.  Pascal was designed at least partly to be easy\nto parse, and that's  reflected  in the compiler.  In general, in\nPascal there is more structure and we have a better idea  of what\nkinds of constructs to expect at any point.  In  C,  on the other\nhand,  the  program  is  essentially  a  list   of  declarations,\nterminated only by the end of file.\n\nWe  could  pursue  both  of  these structures much  farther,  but\nremember that our purpose here is  not  to  build a Pascal or a C\ncompiler, but rather to study compilers in general.  For those of\nyou  who DO want to deal with Pascal or C, I hope I've given  you\nenough of a start so that you can  take  it  from  here (although\nyou'll soon need some of the stuff we still haven't  covered yet,\nsuch as typing and procedure calls).    For the rest of you, stay\nwith me through the next installment.  There, I'll be leading you\nthrough the development of a complete compiler for TINY, a subset\nof KISS.\n\nSee you then.\n\n\n*****************************************************************\n*                                                               *\n*                        COPYRIGHT NOTICE                       *\n*                                                               *\n*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *\n*                                                               *\n*****************************************************************\n\n"
  },
  {
    "path": "README.md",
    "content": "# Information\nA C version of the \"Let's Build a Compiler\" by Jack Crenshaw \nhttp://compilers.iecc.com/crenshaw/\n\nThis repository is forked form: https://github.com/vtudose/Let-s-build-a-compiler\n\nAnd since the original is far from complete(only the first two chapter), and\nthe author had been inactive for quite a long time, I decided to create a new\nrepository.\n\nIf there are are any licence issue, please inform.\n\n# Comments\nBelow are some comments about the source code here and the original article.\n\n## What to do with the generated code\nThe generated assembly code is devoted to x86 architecture. The syntax is AT&T\nstyle and the experiment operating system is Linux. You may try to save the\ngenerated code to for example \"test.s\" and execute the following commands to\ncheck the output.\n```\nas --32 -o test.o test.s\nld -m elf_i386 test.o -o test\n./test\necho $?\n```\n\n## Assembly code\nIt is a C port of the original article which is written in Pascal. And the\ngenerated code are ported to x86 instead of 86000.\n\nIf you want to test the generated code, please keep in mind that the generated\ncode might be incomplete to be directly assembled. For example, when loading a\nvariable, we directly generate Assembly code \"movl X, %eax\", and variable 'X'\nmight not be declared since the article is far from mentioning \"variable types\".\nThus you'll have to type the missing parts all by yourself.\n\n## IO routines\nI am definitely NOT an assembly expert. When the author mentioned adding IO\nroutine by library, I cannot simply find the x86 alternative. The closest\nthing is C's library function \"printf\". But then I decided not no bother\nadding this kind of routine cause they are only used in two chapters.\n\nInstead, I'll save the value as the return code of a process. It's done by\nsaving the value to register \"%ebx\" and call \"int $0x80\", as follows:\n```\nmovl var, %ebx\nmovl $1, %eax  # exit function\nint $0x80\n```\n\n-Note from random github user-\n\nOn older operating systems IO was performed by an interrupt to an area of\nmemory that contains instructions for performing the operation with your\nparticular hardware configuration. This left the developers of the operating\nsystems unable to change the memory layout without forcing all the software\nwritten to be modified as well.\n\nThe POSIX standard on UNIX-like operating systems as well as Windows and\nMacintosh operating systems, started including libraries for C or Pascal\nwith functions for system calls. This means you will either have to compile\nfor DOS or learn some specialized things about your operating system and\nthe format of C libraries.\n\nFor anyone on Windows I would recommend getting a used copy of Charles\nPetzold's Programming Windows Fifth Edition and checking out\nhttp://www.godevtool.com for some free tools and tutorials for accessing\nWindows API from assembly.\n\n-End of note-\n\n## The Article\nThe article mainly talks about how to design compiler top-down. It covered a\nlot of aspects in compiler design like lexical scanning, BNF, symbols,\nprocedures, types, code generation, etc.\n\nIt is a good article to follow for beginners.\n"
  },
  {
    "path": "get_chapters.sh",
    "content": "#!/bin/bash\nset -ex\n\nURL=\"http://compilers.iecc.com/crenshaw/\"\n\nfor i in $(seq 1 16); do  \n\n\tfile=\"tutor\"$i\".txt\"\n\tmkdir -p $i\n\tcd $i\n\n\tif [ ! -f $file ];\n\tthen\n\t\twget $URL\"\"$i\n\tfi\n\n\tcd ..\necho $i; done;\n"
  }
]