Let's Build A Compiler
Let's Build A Compiler
1. Introduction
2. Expression Parsing
3. More Expressions
4. Interpreters
5. Control Constructs
6. Boolean Expressions
7. Lexical Scanning
8. A Little Philosophy
9. A Top View
10. Introduction to "TINY"
11. Lexical Scan Revisited
12. Miscellany
13. Procedures
14. Types
15. Back To The Future
16. Unit Construction
Book Review :
Rating :
Part I: INTRODUCTION
INTRODUCTION
I also take a page from the work of Ron Cain, the author of the
original Small C. Whereas almost all other compiler authors have
historically used an intermediate language like P-code and
divided the compiler into two parts (a front end that produces
P-code, and a back end that processes P-code to produce
executable object code), Ron showed us that it is a
straightforward matter to make a compiler directly produce
executable object code, in the form of assembler language
statements. The code will _NOT_ be the world's tightest code ...
producing optimized code is a much more difficult job. But it
will work, and work reasonably well. Just so that I don't leave
you with the impression that our end product will be worthless, I
_DO_ intend to show you how to "soup up" the compiler with some
optimization.
THE CRADLE
Every program needs some boiler plate ... I/O routines, error
message routines, etc. The programs we develop here will be no
exceptions. I've tried to hold this stuff to an absolute
minimum, however, so that we can concentrate on the important
stuff without losing it among the trees. The code given below
represents about the minimum that we need to get anything done.
It consists of some I/O routines, an error-handling routine and a
skeleton, null main program. I call it our cradle. As we
develop other routines, we'll add them to the cradle, and add the
calls to them as we need to. Make a copy of the cradle and save
it, because we'll be using it more than once.
{--------------------------------------------------------------}
program Cradle;
{--------------------------------------------------------------}
{ Constant Declarations }
{--------------------------------------------------------------}
{ Variable Declarations }
{--------------------------------------------------------------}
{ Read New Character From Input Stream }
procedure GetChar;
begin
Read(Look);
end;
{--------------------------------------------------------------}
{ Report an Error }
{--------------------------------------------------------------}
{ Report Error and Halt }
{--------------------------------------------------------------}
{ Report What Was Expected }
{--------------------------------------------------------------}
{ Match a Specific Input Character }
{--------------------------------------------------------------}
{ Recognize an Alpha Character }
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Get a Number }
{--------------------------------------------------------------}
{ Output a String with Tab }
{--------------------------------------------------------------}
{ Output a String with Tab and CRLF }
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
begin
GetChar;
end;
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
end.
{--------------------------------------------------------------}
That's it for this introduction. Copy the code above into TP and
compile it. Make sure that it compiles and runs correctly. Then
proceed to the first lesson, which is on expression parsing.
Free eBook
Let's Build A Compiler
Copyright Info.
INTRODUCTION
GETTING STARTED
already know what we're about. You will also have copied the
cradle software into your Turbo Pascal system, and have compiled
x = 2*y + 3/(4*z)
That's so that the beginners among you won't get totally lost.
There are also some very good lessons to be learned early on,
that will serve us well later. For the more experienced readers:
SINGLE DIGITS
let's start with the absolutely most simple case we can think of.
the "cradle" that I gave last time. We'll be using it again for
{---------------------------------------------------------------}
procedure Expression;
begin
end;
{---------------------------------------------------------------}
reads:
{---------------------------------------------------------------}
begin
Init;
Expression;
end.
{---------------------------------------------------------------}
Now run the program. Try any single-digit number as input. You
any other character as input, and you'll see that the parser
OK, I grant you that it's pretty limited. But don't brush it off
legal, and gives a meaningful error message. Who could ask for
more? As we expand our parser, we'd better make sure those two
file, and the writes to another disk file, but this way is much
BINARY EXPRESSIONS
Now that we have that under our belt, let's branch out a bit.
not going to meet our needs for long, so let's see what we can do
1+2
or 4-3
in DO, where should Term leave its result? Answer: the same
{---------------------------------------------------------------}
{ Parse and Translate an Expression }
procedure Expression;
begin
case Look of
'+': Add;
'-': Subtract;
else Expected('Addop');
end;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
procedure Add;
begin
Match('+');
Term;
end;
{-------------------------------------------------------------}
begin
Match('-');
Term;
EmitLn('SUB D1,D0');
end;
{-------------------------------------------------------------}
When you're finished with that, the order of the routines should
be:
o Add
o Subtract
o Expression
Now run the program. Try any combination you can think of of two
MOVE #n,D0
MOVE D0,D1
over, I'll briefly mention some ways that we can do a little op-
without too much trouble. But remember, we're here to learn, not
to see how tight we can make the object code. For now, and
works.
the FIRST argument in it) from D0 (which has the second). That's
the wrong way, so we end up with the wrong sign for the result.
it reads
{-------------------------------------------------------------}
procedure Subtract;
begin
Match('-');
Term;
end;
{-------------------------------------------------------------}
Now our code is even less efficient, but at least it gives the
those facts of life you learn to live with. This one will come
OK, at this point we have a parser that can recognize the sum or
single digit. But real expressions can have either form (or an
infinity of others). For kicks, go back and run the program with
Didn't work, did it? And why should it? We just finished
telling our parser that the only kinds of expressions that are
written
{---------------------------------------------------------------}
procedure Expression;
begin
Term;
EmitLn('MOVE D0,D1');
case Look of
'+': Add;
'-': Subtract;
else Expected('Addop');
end;
end;
end;
{--------------------------------------------------------------}
NOW we're getting somewhere! This version handles any number of
you can turn BNF into parser code just about as fast as you can
type!
OK, compile the new version of our parser, and give it a try. As
illegal one. Neat, eh? You might note that in our test version,
any error message comes out sort of buried in whatever code had
separated ... one to the output file, and one to the screen.
things stand now, the parser uses D0 for the "primary" register,
and D1 as a place to store the partial sum. That works fine for
now, because as long as we deal with only the "addops" '+' and
1+(2-(3+(4-5)))
in D0 to D1, let's just push it onto the stack. For the benefit
written
-(SP)
respectively. Now try the parser again and make sure we haven't
broken it.
Once again, the generated code is less efficient than before, but
Now let's get down to some REALLY serious business. As you all
like
2 + 3 * 4,
i.e.,
single digit.
procedure Factor;
begin
end;
{--------------------------------------------------------------}
procedure Multiply;
begin
Match('*'); POP BX
end;
{-------------------------------------------------------------}
procedure Divide;
begin
Match('/');
Factor;
EmitLn('MOVE (SP)+,D1');
EmitLn('DIVS D1,D0');
end;
{---------------------------------------------------------------}
procedure Term;
begin
case Look of
'*': Multiply;
'/': Divide;
else Expected('Mulop');
end;
end;
end;
{--------------------------------------------------------------}
procedure Add;
begin
Match('+');
Term;
EmitLn('ADD (SP)+,D0');
end;
{-------------------------------------------------------------}
procedure Subtract;
begin
Match('-');
Term;
EmitLn('SUB (SP)+,D0');
EmitLn('NEG D0');
end;
{---------------------------------------------------------------}
procedure Expression;
begin
Term;
EmitLn('MOVE D0,-(SP)');
case Look of
'+': Add;
'-': Subtract;
else Expected('Addop');
end;
end;
end;
{--------------------------------------------------------------}
Hot dog! A NEARLY functional parser/translator, in only 55 lines
PARENTHESES
2*(3+4) ,
(1+2)/((3+4)+(5-6))
simple factor. That is, one of the forms for a factor is:
etc., ad infinitum.
procedure Factor;
begin
Match('(');
Expression;
Match(')');
end
else
end;
{--------------------------------------------------------------}
Note again how easily we can extend the parser, and how well the
As usual, compile the new version and make sure that it correctly
message.
UNARY MINUS
At this point, we have a parser that can handle just about any
-1
minus sign. You'll find that +3 won't work either, nor will
something like
-(3-2) .
becomes 0-3. We can easily patch this into our existing version
of Expression:
{---------------------------------------------------------------}
procedure Expression;
begin
if IsAddop(Look) then
EmitLn('CLR D0')
else
Term;
EmitLn('MOVE D0,-(SP)');
case Look of
'+': Add;
'-': Subtract;
else Expected('Addop');
end;
end;
end;
{--------------------------------------------------------------}
I TOLD you that making changes was easy! This time it cost us
{--------------------------------------------------------------}
{ Recognize an Addop }
begin
end;
{--------------------------------------------------------------}
OK, make these changes to the program and recompile. You should
The efficiency of the code is pretty poor ... six lines of code
just for loading a simple constant ... but at least it's correct.
At this point we're just about finished with the structure of our
parse and compile just about any expression you care to throw at
it. It's still limited in that we can only handle factors
In the next session, I'll show you just how easy it is to extend
our parser to take care of these things too, and I'll also show
variable names. So you see, we're not far at all from a truly
useful parser.
just wasting our time here ... that we can indeed modify the
are pretty bad (such as the code for -1, above). So all we
deal with ... they only add extra tests in the code, which is
seems to promise pretty tight code without too much hassle. It's
with me.
use of the CPU registers. Remember back when we were doing only
rather than the stack? It worked, because with only those two
point in its processing, the parser KNOWS how many items are on
What we're doing in effect is to replace the CPU's RAM stack with
get pretty good code out. Of course, we also have to deal with
those odd cases where the stack level DOES exceed eight, but
into the CPU stack. For levels beyond eight, the code is no
worse than what we're generating now, and for levels less than
practice, it turns out that you can't really use all eight levels
... you need at least one register free to reverse the operand
order for division (sure wish the 68000 had an XTHL, like the
Next lesson, I'll show you how to deal with variables factors and
function calls. I'll also show you just how easy it is to handle
INTRODUCTION
INTRODUCTION
restriction to get rid of, so don't get too hung up about it.
We'll use the trick when it serves us to do so, confident that we
VARIABLES
as
b*b+4*a*c
The '|' stands for "or", meaning of course that either form is a
read:
load a variable.
that most 68000 operating systems, including the SK*DOS that I'm
MOVE X(PC),D0
where X is, of course, the variable name. Armed with that, let's
{---------------------------------------------------------------}
procedure Factor;
begin
Match('(');
Expression;
Match(')');
end
else
end;
{--------------------------------------------------------------}
parser, because of the way it's structured. You can see that
this still holds true here. This time it cost us all of two
OK, compile and test this new version of the parser. That didn't
FUNCTIONS
But I'd still like to deal with functions now for a couple of
next. That isn't the case when we add functions. Every language
name obey the same rules. So how can we tell which is which?
One way is to require that they each be declared before they are
use the C rule for now. Since we also don't have a mechanism to
x() .
Now that there are two possibilities for the "If IsAlpha" branch
{---------------------------------------------------------------}
procedure Factor;
begin
Match('(');
Expression;
Match(')');
end
Ident
else
end;
{--------------------------------------------------------------}
{---------------------------------------------------------------}
procedure Ident;
begin
Name := GetName;
getip();
Match('(');
Match(')');
end
else
end;
{---------------------------------------------------------------}
OK, compile and test this version. Does it parse all legal
away the identifier and then reads one more character to decide
only two calls to the error routine, Expected. Even those aren't
necessary ... if you'll look again in Term and Expression, you'll
So how did we get this nice error handling virtually for free?
checking for me. Astute readers will notice that some of the
calls to Match (for example, the ones in Add and Subtract) are
leave them in, and the general rule to always use Match instead
following the one we're working on, so any characters not treated
But it's also a very easy thing to fix up, even if it's only
temporary. All we have to do is assert that the expression
See how the space was treated as a terminator? Now, to make the
CR = ^M;
As usual, recompile the program and verify that it does what it's
supposed to.
ASSIGNMENT STATEMENTS
OK, at this point we have a parser that works very nicely. I'd
<Ident> = <Expression>
{--------------------------------------------------------------}
procedure Assignment;
begin
Name := GetName;
Match('=');
Expression;
EmitLn('MOVE D0,(A0)')
end;
{--------------------------------------------------------------}
Note again that the code exactly parallels the BNF. And notice
and Match.
compiler!
Well, of course they're not the only kind. There are also little
that we've been dealing with are among the most challenging in a
MULTI-CHARACTER TOKENS
showing you just how easy that extension really is. In the
process, we'll also provide for embedded white space. Before you
make the next few changes, though, save the current version of
the parser away under another name. I have some more uses for it
character version.
Most compilers separate out the handling of the input stream into
come a time when we'll want to do something like that, too, but
function
{--------------------------------------------------------------}
{ Recognize an Alphanumeric }
begin
end;
{--------------------------------------------------------------}
Add this function to your parser. I put mine just after IsDigit.
instead of a character:
{--------------------------------------------------------------}
{ Get an Identifier }
begin
Token := '';
GetChar;
end;
GetName := Token;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Get a Number }
begin
Value := '';
GetChar;
end;
GetNum := Value;
end;
{--------------------------------------------------------------}
Make this change, and then recompile and test. _NOW_ do you
WHITE SPACE
Before we leave this parser for awhile, let's address the issue
restriction.
simple rule for how the parser should treat the input stream, and
space wasn't permitted, we've been able to assume that after each
parsing action, the lookahead character Look contains the next
It still sounds like a good rule to me, so that's the one we'll
use. This means that every routine that advances the input
stream must skip over white space, and leave the next non-white
use GetName, GetNum, and Match for most of our input processing,
modify.
routine:
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
begin
while IsWhite(Look) do
GetChar;
end;
{--------------------------------------------------------------}
shown below:
{--------------------------------------------------------------}
begin
else begin
GetChar;
SkipWhite;
end;
end;
{--------------------------------------------------------------}
{ Get an Identifier }
begin
Token := '';
GetChar;
end;
GetName := Token;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Get a Number }
begin
Value := '';
GetChar;
end;
GetNum := Value;
SkipWhite;
end;
{--------------------------------------------------------------}
pump" in Init:
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
begin
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
Make these changes and recompile the program. You will find that
Since we've made quite a few changes during this session, I'm
{--------------------------------------------------------------}
program parse;
{--------------------------------------------------------------}
{ Constant Declarations }
CR = ^M;
{--------------------------------------------------------------}
{ Variable Declarations }
{--------------------------------------------------------------}
procedure GetChar;
begin
Read(Look);
end;
{--------------------------------------------------------------}
{ Report an Error }
begin
WriteLn;
end;
{--------------------------------------------------------------}
begin
Error(s);
Halt;
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
begin
IsDigit := c in ['0'..'9'];
end;
{--------------------------------------------------------------}
{ Recognize an Alphanumeric }
begin
end;
{--------------------------------------------------------------}
{ Recognize an Addop }
begin
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
procedure SkipWhite;
begin
while IsWhite(Look) do
GetChar;
end;
{--------------------------------------------------------------}
begin
if Look <> x then Expected('''' + x + '''')
else begin
GetChar;
SkipWhite;
end;
end;
{--------------------------------------------------------------}
{ Get an Identifier }
begin
Token := '';
GetChar;
end;
GetName := Token;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Get a Number }
Value := '';
GetChar;
end;
GetNum := Value;
SkipWhite;
end;
{--------------------------------------------------------------}
begin
Write(TAB, s);
end;
{--------------------------------------------------------------}
begin
Emit(s);
WriteLn;
end;
{---------------------------------------------------------------}
procedure Ident;
begin
Name:= GetName;
Match('(');
Match(')');
end
else
end;
{---------------------------------------------------------------}
procedure Factor;
begin
Match('(');
Expression;
Match(')');
end
else
end;
{--------------------------------------------------------------}
procedure Multiply;
begin
Match('*');
Factor;
EmitLn('MULS (SP)+,D0');
end;
{-------------------------------------------------------------}
procedure Divide;
begin
Match('/');
Factor;
EmitLn('MOVE (SP)+,D1');
EmitLn('EXS.L D0');
EmitLn('DIVS D1,D0');
end;
{---------------------------------------------------------------}
{ Parse and Translate a Math Term }
procedure Term;
begin
Factor;
EmitLn('MOVE D0,-(SP)');
case Look of
'*': Multiply;
'/': Divide;
end;
end;
end;
{--------------------------------------------------------------}
procedure Add;
begin
Match('+');
Term;
EmitLn('ADD (SP)+,D0');
end;
{-------------------------------------------------------------}
procedure Subtract;
begin
Match('-');
Term;
EmitLn('SUB (SP)+,D0');
EmitLn('NEG D0');
end;
{---------------------------------------------------------------}
procedure Expression;
begin
if IsAddop(Look) then
EmitLn('CLR D0')
else
Term;
EmitLn('MOVE D0,-(SP)');
case Look of
'+': Add;
'-': Subtract;
end;
end;
end;
{--------------------------------------------------------------}
procedure Assignment;
var Name: string[8];
begin
Name := GetName;
Match('=');
Expression;
EmitLn('MOVE D0,(A0)')
end;
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
begin
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
Assignment;
end.
{--------------------------------------------------------------}
Now the parser is complete. It's got every feature we can put in
INTRODUCTION
INTRODUCTION
parsing and compiling math expressions, and worked our way grad-
embedded white space, and function calls. This time, I'm going
to walk you through the process one more time, only with the goal
the concepts of the two types of translators, so that you can see
not only the differences, but also the similarities.
x=2*y+3
metic ... it only issues the object code that will cause the CPU
compiler would issue code to compute the expression and store the
results in variable x.
parsing is going on. For the example, by the time parsing of the
mediately.
What I'd like you to see here is that the layout ... the struc-
ture ... of the parser doesn't change. It's only the actions
that change. So if you can write an interpreter for a given
language, you can also write a compiler, and vice versa. Yet, as
object code.
The idea is that you typically don't just emit code at every
code. Sometimes they do, but often they simply return in-
x = x + 3 - 2 - (5 - 4) ,
x=x+0.
x=x,
structions to zero!
aware that you can get some code optimization by combining the
expect things as well. That's the main reason for going over
THE INTERPRETER
OK, now that you know WHY we're going into all this, let's do it.
Just to give you practice, we're going to start over with a bare
cradle and build up the translator all over again. This time, of
GetNum as follows:
{--------------------------------------------------------------}
{ Get a Number }
begin
GetChar;
end;
{--------------------------------------------------------------}
{---------------------------------------------------------------}
begin
Expression := GetNum;
end;
{--------------------------------------------------------------}
Finally, insert the statement
Writeln(Expression);
does that with the digits 0..9, and gives an error message for
to read:
{---------------------------------------------------------------}
begin
if IsAddop(Look) then
Value := 0
else
Value := GetNum;
case Look of
'+': begin
Match('+');
Value := Value + GetNum;
end;
'-': begin
Match('-');
end;
end;
end;
Expression := Value;
end;
{--------------------------------------------------------------}
Procedures Add and Subtract went away! The reason is that the
could have chosen to retain the procedures and pass into them the
meant that the code for Add and Subtract had to be moved in line.
OK, did the translator work? Then let's take the next step.
It's not hard to figure out what procedure Term should now look
call to Term, and then enter the following form for Term:
{---------------------------------------------------------------}
begin
Value := GetNum;
case Look of
'*': begin
Match('*');
end;
'/': begin
Match('/');
end;
end;
end;
Term := Value;
end;
{--------------------------------------------------------------}
Now, try it out. Don't forget two things: first, we're dealing
with integer division, so, for example, 1/3 should come out zero.
{--------------------------------------------------------------}
{ Get a Number }
begin
Value := 0;
GetChar;
end;
GetNum := Value;
end;
{--------------------------------------------------------------}
function Term, so that they call Factor instead. Now code the
{---------------------------------------------------------------}
begin
Match('(');
Factor := Expression;
Match(')');
end
else
Factor := GetNum;
end;
{---------------------------------------------------------------}
interpreter.
A LITTLE PHILOSOPHY
that's trivially easy, and one that's too complex to deal with.
time figuring out how to deal with things like operator prece-
dence ... the way that multiply and divide operators take
some thirty years ago, and how excited he was to find out how to
operator was a precedence level, and the rules required that you
on the stack. You had to give it one value before you put it on
the stack, and another to decide when to take it off. Just for
the experience, I worked all of this out for myself a few years
How did we get so lucky? And where did the precedence stacks go?
the technique is very much like the old way of doing arithmetic
are the trees and stacks in our technique? We haven't seen any.
The answer in all cases is that the structures are implicit, not
called, the return address is pushed onto the CPU stack. At the
end of the subroutine, the address is popped back off and control
there can also be local data pushed onto the stack, and it, too,
next call to Term for the second argument, that Term calls
stack, and will be there again when we return from our call
sequence.
In other words, the reason things look so simple is that we've
hierarchy levels and the parse trees are there, all right, but
taken care of by the order with which the various procedures are
called. Now that you've seen how we do it, it's probably hard to
imagine doing it any other way. But I can tell you that it took
a lot of years for compiler writers to get that smart. The early
compilers were too complex too imagine. Funny how things get
warning. The lesson: things can be easy when you do them right.
The warning: take a look at what you're doing. If, as you branch
out on your own, you begin to find a real need for a separate
looking at things the right way. Maybe you just aren't using the
with variable names ... we just issued the names to the assembler
and let the rest of the program take care of allocating storage
the values of the variables and return them as the return values
{---------------------------------------------------------------}
procedure InitTable;
var i: char;
begin
Table[i] := 0;
end;
{---------------------------------------------------------------}
use it. Since we don't have a way (so far) to set the variables,
Factor will always return zero values for them, but let's go
begin
Match('(');
Factor := Expression;
Match(')');
end
Factor := Table[GetName]
else
Factor := GetNum;
end;
{---------------------------------------------------------------}
though all the variables are now zeros, at least we can correctly
expressions.
multiple statements.
The assignment statement parallels what we did before:
{--------------------------------------------------------------}
procedure Assignment;
begin
Name := GetName;
Match('=');
Table[Name] := Expression;
end;
{--------------------------------------------------------------}
but that would cause every run to end in an error message, which
every normal line with TWO characters, the carriage return (CR)
and line feed (LF). At the end of each line, we need to eat
dure for this, which we'll no doubt be using over and over. Here
it is:
{--------------------------------------------------------------}
procedure NewLine;
begin
GetChar;
if Look = LF then
GetChar;
end;
end;
{--------------------------------------------------------------}
Insert this procedure at any convenient spot ... I put mine just
after Match. Now, rewrite the main program to look like this:
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
repeat
Assignment;
NewLine;
end.
{--------------------------------------------------------------}
Note that the test for a CR is now gone, and that there are also
Let's wrap this session up, then, by adding the I/O routines.
stand for a read statement, and '!' for a write, with the char-
{--------------------------------------------------------------}
{ Input Routine }
procedure Input;
begin
Match('?');
Read(Table[GetName]);
end;
{--------------------------------------------------------------}
{ Output Routine }
procedure Output;
begin
Match('!');
WriteLn(Table[GetName]);
end;
{--------------------------------------------------------------}
Note that we use the usual trick of a case statement based upon
the current lookahead character, to decide what to do.
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
repeat
case Look of
'?': Input;
'!': Output;
else Assignment;
end;
NewLine;
end.
{--------------------------------------------------------------}
sparse, but it works just like the "big boys." It includes three
pass on. After all, we're not here to build a product, but to
time.
GET YOUR DOMAIN:
type here .com
Get It Now >>
[ Amit Mathur Online Homepage ]
INTRODUCTION
INTRODUCTION
felt that I was a LONG way from being able to handle a complete
language. After all, REAL languages have branches and loops and
subroutines and all that. Perhaps you've shared some of the same
THE PLAN
and as we've done twice before now, we'll build things up one at
tokens that has served us so well to date. This means that the
"code" will look a little funny, with 'i' for IF, 'w' for WHILE,
etc. But it helps us get the concepts down pat without fussing
to generate some kind of object code for them (we're back into
OK, then, starting with yet another copy of the cradle, let's
procedure Other;
begin
EmitLn(GetName);
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
Other;
end.
{--------------------------------------------------------------}
Run the program and see what you get. Not very exciting, is it?
But hang in there, it's a start, and things will get better.
The first thing we need is the ability to deal with more than one
that in the last session on interpreting, but this time let's get
so far.
What signals the end of a block? It's simply any construct that
isn't an "other" statement. For now, that means only the END
statement.
{--------------------------------------------------------------}
procedure DoProgram;
begin
Block;
EmitLn('END')
end;
{--------------------------------------------------------------}
procedure Block;
begin
Other;
end;
end;
{--------------------------------------------------------------}
(From the form of the procedure, you just KNOW we're going to be
adding to it in a bit!)
OK, enter these routines into your program. Replace the call to
and see how it works. Well, it's still not much, but we're
getting closer.
SOME GROUNDWORK
is:
IF ( <condition> ) <statement>
I'm showing you here, in fact, is that of the language KISS that
problem for you. Once you see how it's done, you'll realize that
straightforward.
simple IF statement
L: B
...
It's clear, then, that we're going to need some more procedures
{--------------------------------------------------------------}
var S: string;
begin
Str(LCount, S);
NewLabel := 'L' + S;
Inc(LCount);
end;
{--------------------------------------------------------------}
WriteLn(L, ':');
end;
{--------------------------------------------------------------}
LCount := 0;
At this point I'd also like to show you a new kind of notation.
If you compare the form of the IF statement above with the as-
sembler code that must be produced, you can see that there are
statement:
IF: First, get the condition and issue the code for it.
this way:
IF
<condition> { Condition;
L = NewLabel;
<block>
ENDIF { PostLabel(L) }
doing it all along ... we've just never written it down this way
represent "false," and anything else (some use FFFF, some 0001)
represent "true."
On the 68000 the condition flags are set whenever any data is
moved or calculated. If the data is a 0000 (corresponding to a
false condition, remember), the zero flag will be set. The code
It's the nature of the beast that most of the branches we see
THE IF STATEMENT
With that bit of explanation out of the way, we're finally ready
approach, with the character 'i' for IF, and 'e' for ENDIF (as
also, for now, skip completely the character for the branch con-
{--------------------------------------------------------------}
procedure DoIf;
var L: string;
begin
Match('i');
L := NewLabel;
Condition;
Block;
Match('e');
PostLabel(L);
end;
{--------------------------------------------------------------}
it as follows:
{--------------------------------------------------------------}
procedure Block;
begin
case Look of
'i': DoIf;
'o': Other;
end;
end;
end;
{--------------------------------------------------------------}
Notice the reference to procedure Condition. Eventually, we'll
write a routine that can parse and translate any Boolean con-
itself (the next one, in fact). For now, let's just make it a
{--------------------------------------------------------------}
Procedure Condition;
begin
EmitLn('<condition>');
end;
{--------------------------------------------------------------}
Insert this procedure in your program just before DoIf. Now run
aibece
As you can see, the parser seems to recognize the construct and
inserts the object code at the right places. Now try a set of
aibicedefe
Now that we have the general idea (and the tools such as the
first (and also one of the trickiest) is to add the ELSE clause
<condition>
BEQ L1
<block>
BRA L2
L1: <block>
L2: ...
IF
<condition> { L1 = NewLabel;
L2 = NewLabel;
Emit(BEQ L1) }
<block>
PostLabel(L1) }
<block>
ENDIF { PostLabel(L2) }
(Note that I use an 'l' for the ELSE, since 'e' is otherwise
occupied):
{--------------------------------------------------------------}
procedure DoIf;
begin
Match('i');
Condition;
L1 := NewLabel;
L2 := L1;
Block;
Match('l');
L2 := NewLabel;
PostLabel(L1);
Block;
end;
Match('e');
PostLabel(L2);
end;
{--------------------------------------------------------------}
of code.
aiblcede
aibece
Now try some nested IF's. Try anything you like, including some
"other" statement.
the process down pat. The syntax I've chosen for the WHILE
statement is
minators for each construct ... you can see that by the fact that
should be:
L1: <condition>
BEQ L2
<block>
BRA L1
L2:
WHILE { L1 = NewLabel;
PostLabel(L1) }
<block>
PostLabel(L2) }
The code follows immediately from the syntax:
{--------------------------------------------------------------}
procedure DoWhile;
begin
Match('w');
L1 := NewLabel;
L2 := NewLabel;
PostLabel(L1);
Condition;
Block;
Match('e');
PostLabel(L2);
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
begin
case Look of
'i': DoIf;
'w': DoWhile;
else Other;
end;
end;
end;
{--------------------------------------------------------------}
OK, try the new program. Note that this time, the <condition>
code is INSIDE the upper label, which is just where we wanted it.
Try some nested loops. Try some loops within IF's, and some IF's
too, don't you? It'll look a lot more meaningful when we get
full keywords.
I hope by now that you're beginning to get the idea that this
almost falls out from there, and it doesn't affect any of the
other routines. Once you've gotten the feel of the thing, you'll
see that you can add new constructs about as fast as you can
dream them up.
We could stop right here, and have a language that works. It's
been shown many times that a high-order language with only two
repertoire a bit.
all ... it's an infinite loop. What's the point of such a loop?
command, that will give us a way out. This makes the language
LOOP { L = NewLabel;
PostLabel(L) }
<block>
ENDLOOP { Emit(BRA L }
'l' for the ELSE, I've used the last letter, 'p', as the
procedure DoLoop;
var L: string;
begin
Match('p');
L := NewLabel;
PostLabel(L);
Block;
Match('e');
end;
{--------------------------------------------------------------}
When you insert this routine, don't forget to add a line in Block
to call it.
REPEAT-UNTIL
Here's one construct that I lifted right from Pascal. The syntax
is
PostLabel(L) }
<block>
UNTIL
<condition> { Emit(BEQ L) }
{--------------------------------------------------------------}
procedure DoRepeat;
var L: string;
begin
Match('r');
L := NewLabel;
PostLabel(L);
Block;
Match('u');
Condition;
end;
{--------------------------------------------------------------}
This means that the 'u' must be added to the set of characters in
jargon.
{--------------------------------------------------------------}
procedure Block;
begin
case Look of
'i': DoIf;
'w': DoWhile;
'p': DoLoop;
'r': DoRepeat;
else Other;
end;
end;
end;
{--------------------------------------------------------------}
The FOR loop is a very handy one to have around, but it's a bear
hard ... it's only a loop after all ... but simply because it's
hard to implement in assembler language. Once the code is
easier to code), but I've chosen instead a syntax very much like
you choose to make it, depending upon the way you decide to
define the rules as to how to handle the limits. Does expr2 get
<ident> = <expr1>
TEMP = <expr2>
<block>
ENDWHILE
Notice that with this definition of the loop, <block> will not be
loop), and the upper limit is on the stack. The translated code
<block>
Wow! That seems like a lot of code ... the line containing
<block> seems to almost get lost. But that's the best I could do
with it. I guess it helps to keep in mind that it's really only
sixteen words, after all. If anyone else can optimize this
Still, the parser routine is pretty easy now that we have the
code:
{--------------------------------------------------------------}
procedure DoFor;
Name: char;
begin
Match('f');
L1 := NewLabel;
L2 := NewLabel;
Name := GetName;
Match('=');
Expression;
EmitLn('SUBQ #1,D0');
EmitLn('MOVE D0,(A0)');
Expression;
EmitLn('MOVE D0,-(SP)');
PostLabel(L1);
EmitLn('MOVE (A0),D0');
EmitLn('ADDQ #1,D0');
EmitLn('MOVE D0,(A0)');
EmitLn('CMP (SP),D0');
Block;
Match('e');
PostLabel(L2);
EmitLn('ADDQ #2,SP');
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
Procedure Expression;
begin
EmitLn('<expr>');
end;
{--------------------------------------------------------------}
Block. Since we don't have any input for the dummy version of
THE DO STATEMENT
All this made me wish for a simpler version of the FOR loop. The
reason for all the code above is the need to have the loop
For good measure, let's add this construct, too. This will be
DO
L = NewLabel;
PostLabel(L);
Emit(MOVE D0,-(SP) }
<block>
Emit(DBRA D0,L) }
That's quite a bit simpler! The loop will execute <expr> times.
Here's the code:
{--------------------------------------------------------------}
procedure Dodo;
var L: string;
begin
Match('d');
L := NewLabel;
Expression;
EmitLn('SUBQ #1,D0');
PostLabel(L);
EmitLn('MOVE D0,-(SP)');
Block;
EmitLn('MOVE (SP)+,D0');
end;
{--------------------------------------------------------------}
I think you'll have to agree, that's a whole lot simpler than the
as I did with the ELSE half of an IF. That turns out not to
going to show up at the same level as the loop itself. The most
likely place for a BREAK is right after an IF, which would cause
The BREAK has to exit the inner LOOP, even if it's nested down
My next thought was that I would just store away, in some global
followed by a break from an outer one. Storing the label for the
inner loop would clobber the label for the outer one. So the
messy.
you begin to see the need for an external stack you might be
let the recursion built into our parser take care of everything,
innermost loop. Then it can pass the address to the routine that
translates the break instruction. Since an IF statement doesn't
except pass the label into ITS blocks (both of them). Since
whatever label is above it and passes its own exit label along.
{--------------------------------------------------------------}
procedure DoLoop;
begin
Match('p');
L1 := NewLabel;
L2 := NewLabel;
PostLabel(L1);
Block(L2);
Match('e');
PostLabel(L2);
end;
{--------------------------------------------------------------}
Notice that DoLoop now has TWO labels, not just one. The second
Note also that Block now has a parameter, which for loops will
{--------------------------------------------------------------}
begin
case Look of
'i': DoIf(L);
'w': DoWhile;
'p': DoLoop;
'r': DoRepeat;
'f': DoFor;
'd': DoDo;
'b': DoBreak(L);
else Other;
end;
end;
end;
{--------------------------------------------------------------}
Again, notice that all Block does with the label is to pass it
into DoIf and DoBreak. The loop constructs don't need it,
because they are going to pass their own label anyway.
{--------------------------------------------------------------}
begin
Match('i');
Condition;
L1 := NewLabel;
L2 := L1;
Block(L);
Match('l');
L2 := NewLabel;
PostLabel(L1);
Block(L);
end;
Match('e');
PostLabel(L2);
end;
{--------------------------------------------------------------}
Here, the only thing that changes is the addition of the
matter how many levels of IF nesting we have, the same label will
be used.
DoBreak:
{--------------------------------------------------------------}
begin
Match('b');
EmitLn('BRA ' + L)
end;
{--------------------------------------------------------------}
procedure DoProgram;
begin
Block('');
if Look <> 'e' then Expected('End');
EmitLn('END')
end;
{--------------------------------------------------------------}
hard look at the code generated for DO, you'll see that if you
break out of this loop, the value of the loop counter is still
left on the stack. We're going to have to fix that! A shame ...
{--------------------------------------------------------------}
procedure Dodo;
begin
Match('d');
L1 := NewLabel;
L2 := NewLabel;
Expression;
EmitLn('SUBQ #1,D0');
PostLabel(L1);
EmitLn('MOVE D0,-(SP)');
Block(L2);
EmitLn('MOVE (SP)+,D0');
EmitLn('SUBQ #2,SP');
PostLabel(L2);
EmitLn('ADDQ #2,SP');
end;
{--------------------------------------------------------------}
The two extra instructions, the SUBQ and ADDQ, take care of
CONCLUSION
a richer set, really, than that provided by almost any other pro-
gramming language. And, except for the FOR loop, it was pretty
easy to do. Even that one was tricky only because it's tricky in
assembler language.
I'll conclude this session here. To wrap the thing up with a red
appearance of our input code. I'll save that little bit for the
session:
{--------------------------------------------------------------}
program Branch;
{--------------------------------------------------------------}
{ Constant Declarations }
CR = ^M;
{--------------------------------------------------------------}
{ Variable Declarations }
{--------------------------------------------------------------}
begin
Read(Look);
end;
{--------------------------------------------------------------}
{ Report an Error }
begin
WriteLn;
end;
{--------------------------------------------------------------}
begin
Error(s);
Halt;
end;
{--------------------------------------------------------------}
begin
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
begin
IsDigit := c in ['0'..'9'];
end;
{--------------------------------------------------------------}
{ Recognize an Addop }
begin
IsAddop := c in ['+', '-'];
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
procedure SkipWhite;
begin
while IsWhite(Look) do
GetChar;
end;
{--------------------------------------------------------------}
{ Get an Identifier }
begin
GetName := UpCase(Look);
GetChar;
end;
{--------------------------------------------------------------}
{ Get a Number }
begin
GetNum := Look;
GetChar;
end;
{--------------------------------------------------------------}
var S: string;
begin
Str(LCount, S);
NewLabel := 'L' + S;
Inc(LCount);
end;
{--------------------------------------------------------------}
begin
WriteLn(L, ':');
end;
{--------------------------------------------------------------}
begin
Write(TAB, s);
end;
{--------------------------------------------------------------}
begin
Emit(s);
WriteLn;
end;
{--------------------------------------------------------------}
procedure Condition;
begin
EmitLn('<condition>');
end;
{--------------------------------------------------------------}
{ Parse and Translate a Math Expression }
procedure Expression;
begin
EmitLn('<expr>');
end;
{--------------------------------------------------------------}
begin
Match('i');
Condition;
L1 := NewLabel;
L2 := L1;
Block(L);
Match('l');
L2 := NewLabel;
PostLabel(L1);
Block(L);
end;
Match('e');
PostLabel(L2);
end;
{--------------------------------------------------------------}
procedure DoWhile;
begin
Match('w');
L1 := NewLabel;
L2 := NewLabel;
PostLabel(L1);
Condition;
Block(L2);
Match('e');
PostLabel(L2);
end;
{--------------------------------------------------------------}
procedure DoLoop;
begin
Match('p');
L1 := NewLabel;
L2 := NewLabel;
PostLabel(L1);
Block(L2);
Match('e');
PostLabel(L2);
end;
{--------------------------------------------------------------}
procedure DoRepeat;
begin
Match('r');
L1 := NewLabel;
L2 := NewLabel;
PostLabel(L1);
Block(L2);
Match('u');
Condition;
PostLabel(L2);
end;
{--------------------------------------------------------------}
{ Parse and Translate a FOR Statement }
procedure DoFor;
Name: char;
begin
Match('f');
L1 := NewLabel;
L2 := NewLabel;
Name := GetName;
Match('=');
Expression;
EmitLn('SUBQ #1,D0');
EmitLn('MOVE D0,(A0)');
Expression;
EmitLn('MOVE D0,-(SP)');
PostLabel(L1);
EmitLn('MOVE (A0),D0');
EmitLn('ADDQ #1,D0');
EmitLn('MOVE D0,(A0)');
EmitLn('CMP (SP),D0');
Block(L2);
Match('e');
EmitLn('ADDQ #2,SP');
end;
{--------------------------------------------------------------}
procedure Dodo;
begin
Match('d');
L1 := NewLabel;
L2 := NewLabel;
Expression;
EmitLn('SUBQ #1,D0');
PostLabel(L1);
EmitLn('MOVE D0,-(SP)');
Block(L2);
EmitLn('MOVE (SP)+,D0');
EmitLn('SUBQ #2,SP');
PostLabel(L2);
EmitLn('ADDQ #2,SP');
end;
{--------------------------------------------------------------}
begin
Match('b');
end;
{--------------------------------------------------------------}
procedure Other;
begin
EmitLn(GetName);
end;
{--------------------------------------------------------------}
begin
case Look of
'i': DoIf(L);
'w': DoWhile;
'p': DoLoop;
'r': DoRepeat;
'f': DoFor;
'd': DoDo;
'b': DoBreak(L);
else Other;
end;
end;
end;
{--------------------------------------------------------------}
procedure DoProgram;
begin
Block('');
EmitLn('END')
end;
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
begin
LCount := 0;
GetChar;
end;
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
DoProgram;
end.
{--------------------------------------------------------------}
GET YOUR DOMAIN:
type here .com
Get It Now >>
[ Amit Mathur Online Homepage ]
INTRODUCTION
INTRODUCTION
constructs.
As we left the parser, though, there was one big hole in our
THE PLAN
We're going to approach this installment a bit differently than
could get away with it before because the rules of arithmetic are
THE GRAMMAR
For some time now, we've been implementing BNF syntax equations
down all in one place. It's time that we did so. They are:
algebra.)
Actually, while we're on the subject, I'd like to amend this
grammar a bit right now. The way we've handled the unary minus
This puts the job of handling the unary minus onto Factor, which
This doesn't mean that you have to go back and recode the
you like. But I will be using the new syntax from now on.
natural.
Notice also the slight difference between the way the NOT and the
a * -b
or worse yet,
a - -b
a AND NOT b
makes perfect sense, and the syntax shown allows for that.
RELOPS
OK, assuming that you're willing to accept the grammar I've shown
here, we now have syntax rules for both arithmetic and Boolean
Here, the two terms in parens are Boolean expressions, but the
nature. The RELATIONAL OPERATORS >= and <= are the catalysts by
together.
Now, in the example above, the terms being compared are just
where the expressions we're talking about here are the old
numeric type, and the relops are any of the usual symbols
| <b-variable>
| (<b-expression>)
| <relation>
THAT's the connection! The relops and the relation they define
2 term *, /
3 expression +, -
5 not-factor NOT
6 b-term AND
code fragment:
When the parser is parsing this code, it knows after it sees the
What's worse, at the point that the parser has read this much of
IF ((((((A ,
technology.
without backtracking.
When Niklaus Wirth designed Pascal, the desire was to limit the
2 term *, /, AND
3 expression +, -, OR
expressions like
are perfectly legal. And, in fact, they ARE ... as far as the
arithmetic and Boolean variables, and things like this are caught
'=', '+=' and its kin, '<<', '>>', '++', '--', etc. Ironically,
integer value.
stick mostly with the Pascal approach, since that seems the
simplest from an implementation point of view, but it results in
some funnies that I never liked very much, such as the fact that,
in the expression
well, either. But now, we can all see that the 'and' operator,
equivalent to
slow down the parser, because it has to wade through more layers
THE PARSER
several times now, so you know the drill: we begin with a fresh
let's do it.
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
var c: char;
begin
GetChar;
end;
{--------------------------------------------------------------}
Type these routines into your program. You can test them by
WriteLn(GetBoolean);
OK, compile the program and test it. As usual, it's not very
generate code to load the values into D0. We need to do the same
to let 0 stand for FALSE, and some other value for TRUE. Many
prefer FFFF hex (or -1), because a bitwise NOT also becomes a
{---------------------------------------------------------------}
procedure BoolExpression;
begin
if GetBoolean then
EmitLn('MOVE #-1,D0')
else
EmitLn('CLR D0');
end;
{---------------------------------------------------------------}
Add this procedure to your parser, and call it from the main
program (replacing the print statement you had just put there).
As you can see, we still don't have much of a parser, but the
{--------------------------------------------------------------}
procedure BoolOr;
begin
Match('|');
BoolTerm;
EmitLn('OR (SP)+,D0');
end;
{--------------------------------------------------------------}
begin
Match('~');
BoolTerm;
EmitLn('EOR (SP)+,D0');
end;
{---------------------------------------------------------------}
procedure BoolExpression;
begin
BoolTerm;
EmitLn('MOVE D0,-(SP)');
case Look of
'|': BoolOr;
'~': BoolXor;
end;
end;
end;
{---------------------------------------------------------------}
Note the new recognizer IsOrOp, which is also a copy, this time
of IsAddOp:
{--------------------------------------------------------------}
{ Recognize a Boolean Orop }
begin
end;
{--------------------------------------------------------------}
enter the code above. Compile and test this version. At this
You've probably already guessed what the next step is: The
division.
{---------------------------------------------------------------}
procedure BoolTerm;
begin
NotFactor;
while Look = '&' do begin
EmitLn('MOVE D0,-(SP)');
Match('&');
NotFactor;
EmitLn('AND (SP)+,D0');
end;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
procedure NotFactor;
begin
Match('!');
BoolFactor;
EmitLn('EOR #-1,D0');
end
else
BoolFactor;
end;
{--------------------------------------------------------------}
And rename the earlier procedure to BoolFactor. Now try that.
expression you care to throw at it. Does it? Does it trap badly
formed expressions?
expressions, you know that what we did next was to expand the
items get taken care of by the next step. It takes just a one
{--------------------------------------------------------------}
procedure BoolFactor;
begin
if IsBoolean(Look) then
if GetBoolean then
EmitLn('MOVE #-1,D0')
else
EmitLn('CLR D0')
else Relation;
end;
{--------------------------------------------------------------}
Right now all I'm doing is encoding the grammar we've already
checking out what we already have. So for now let's just write a
{---------------------------------------------------------------}
procedure Relation;
begin
WriteLn('<Relation>');
GetChar;
end;
{--------------------------------------------------------------}
OK, key in this code and give it a try. All the old things
should still work ... you should be able to generate the code for
a Boolean factor should be. Did you get that? Fine, then let's
{--------------------------------------------------------------}
{ Recognize a Relop }
begin
end;
{--------------------------------------------------------------}
nice (and also quite efficient) just to set up those flags, and
not load anything into D0 at all. This would be fine for the
loops and branches, but remember that the relation can be used
how the result is going to be used, we must allow for BOTH cases.
operation for that ... but it sets the flags, not a value.
What's more, the flags will always be set the same (zero if
equal, etc.), while we need the zero flag set differently for the
byte value to 0000 or FFFF (funny how that works!) depending upon
every other instruction in the 68000 set, Scc does NOT reset the
one last step, which is to test D0 and set the flags to match it.
first perform the test, then test the flags to set data into D0,
but it's the most straightforward way to get the flags right, and
I might mention here that this area is, in my opinion, the one
and perform the test the most convenient way I can, and then set
anything.
In any case, we're now ready to look at the code for Relation.
{---------------------------------------------------------------}
procedure Equals;
begin
Match('=');
Expression;
EmitLn('CMP (SP)+,D0');
EmitLn('SEQ D0');
end;
{---------------------------------------------------------------}
procedure NotEquals;
begin
Match('#');
Expression;
EmitLn('CMP (SP)+,D0');
EmitLn('SNE D0');
end;
{---------------------------------------------------------------}
procedure Less;
begin
Match('<');
Expression;
EmitLn('CMP (SP)+,D0');
EmitLn('SGE D0');
end;
{---------------------------------------------------------------}
procedure Greater;
begin
Match('>');
Expression;
EmitLn('CMP (SP)+,D0');
EmitLn('SLE D0');
end;
{---------------------------------------------------------------}
procedure Relation;
begin
Expression;
EmitLn('MOVE D0,-(SP)');
case Look of
'=': Equals;
'#': NotEquals;
'<': Less;
'>': Greater;
end;
EmitLn('TST D0');
end;
end;
{---------------------------------------------------------------}
can copy them into your file now. Remember to use the single-
everything is working.
{---------------------------------------------------------------}
procedure Ident;
begin
Name:= GetName;
Match('(');
Match(')');
end
else
{---------------------------------------------------------------}
procedure Factor;
begin
Match('(');
Expression;
Match(')');
end
Ident
else
end;
{---------------------------------------------------------------}
procedure SignedFactor;
begin
GetChar;
if IsDigit(Look) then
else begin
Factor;
EmitLn('NEG D0');
end;
end
else Factor;
end;
{--------------------------------------------------------------}
procedure Multiply;
begin
Match('*');
Factor;
EmitLn('MULS (SP)+,D0');
end;
{-------------------------------------------------------------}
procedure Divide;
begin
Match('/');
Factor;
EmitLn('MOVE (SP)+,D1');
EmitLn('EXS.L D0');
EmitLn('DIVS D1,D0');
end;
{---------------------------------------------------------------}
procedure Term;
begin
SignedFactor;
EmitLn('MOVE D0,-(SP)');
case Look of
'*': Multiply;
'/': Divide;
end;
end;
end;
{---------------------------------------------------------------}
procedure Add;
begin
Match('+');
Term;
EmitLn('ADD (SP)+,D0');
end;
{---------------------------------------------------------------}
procedure Subtract;
begin
Match('-');
Term;
EmitLn('SUB (SP)+,D0');
EmitLn('NEG D0');
end;
{---------------------------------------------------------------}
procedure Expression;
begin
Term;
EmitLn('MOVE D0,-(SP)');
case Look of
'+': Add;
'-': Subtract;
end;
end;
end;
{---------------------------------------------------------------}
There you have it ... a parser that can handle both arithmetic
AND Boolean algebra, and things that combine the two through the
a safe place for future reference, because in our next step we're
here, so take your time and get it right. What you need to do is
to copy all of the procedures from the logic parser, from Ident
Try
ia=bxlye
ADDING ASSIGNMENTS
As long as we're this far, and we already have the routines for
We're soon going to find that the one-line "programs" that we're
having to write here will really cramp our style. At the moment
the end-of-line characters, the carriage return (CR) and the line
feed (LF). So before going any further let's plug that hole.
There are a couple of ways to deal with the CR/LFs. One (the
we hit the return key. It won't, if we just skip over the CR and
Instead of skipping the CR/LF, We'll let the parser go ahead and
{--------------------------------------------------------------}
{ Skip a CRLF }
procedure Fin;
begin
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
Fin;
case Look of
'i': DoIf(L);
'w': DoWhile;
'p': DoLoop;
'r': DoRepeat;
'f': DoFor;
'd': DoDo;
'b': DoBreak(L);
else Other;
end;
Fin;
end;
end;
{--------------------------------------------------------------}
Now, you'll find that you can use multiple-line "programs." The
{--------------------------------------------------------------}
procedure Assignment;
begin
Name := GetName;
Match('=');
BoolExpression;
EmitLn('MOVE D0,(A0)');
end;
{--------------------------------------------------------------}
are not BIG changes that require us to throw away all of what
we've done so far ... with care, it can be done with very minimal
some pretty heavy stuff, so I've decided to leave that step until
next time, when you've had a little more time to digest what
also write our first complete compiler, based on what we've done
INTRODUCTION
INTRODUCTION
that restriction, once and for all. This means that we must deal
after all, we've been able to manage all right without one, up
computer life that the syntax for a keyword has the same form as
that for any other identifier. We can't tell until we get the
variable IFILE and the keyword IF look just alike, until you get
to the third character. In the examples to date, we were always
what we have already done. I didn't lie ... we can, as you will
see later. But every time I set out to install these elements of
the software into the parser we have already built, I had bad
feelings about it. The whole thing felt entirely too much like a
to you what scanning is all about, and what the alternatives are.
needs. I've tried to avoid that pitfall by just showing you ONE
language, since after all it's the part closest to the user. I
with KISS. It fits the look & feel that I want for that
language. But it may not work at all for the language YOU'RE
cooking up, so in this one case I feel that it's important for
session we'll be getting much deeper than usual into the basic
the structure of the lexical scanner. Then, and only then, will
we get back to our parser from the last installment. Bear with
me ... I think you'll find it's worth the wait. In fact, since
LEXICAL SCANNING
place, but as you have already seen, there is a lot you can do
without ever even addressing the issue, and in fact the scanner
we'll end up with here won't look much like what the texts
programs resulting from it, must deal with the most general kind
function from the rest of the parser. There is only one set of
theoretical bases.
o Type 1: Context-Sensitive
o Type 2: Context-Free
o Type 3: Regular
the older ones, such as FORTRAN) are Type 1, but for the most
part all modern languages can be described using only the last
two types, and those are all we'll be dealing with here.
The neat part about these two types is that there are very
specific ways to parse them. It has been shown that any regular
MUST save the string long enough to find out whether we have a
Unix tools LEX and YACC, that's what you'll get. The output of
you that if you find yourself needing these things you might be
Let's begin with the two definitions most often seen in real
programming languages:
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
Using this let's write the following two routines, which are very
{--------------------------------------------------------------}
{ Get an Identifier }
var x: string[8];
begin
x := '';
x := x + UpCase(Look);
GetChar;
end;
GetName := x;
end;
{--------------------------------------------------------------}
{ Get a Number }
var x: string[16];
begin
x := '';
x := x + Look;
GetChar;
end;
GetNum := x;
end;
{--------------------------------------------------------------}
integer as before.)
You can easily verify that these routines work by calling them
WriteLn(GetName);
This program will print any legal name typed in (maximum eight
anything else.
We also have dealt with embedded white space before, using the
routines are in your current version of the cradle, and add the
the line
SkipWhite;
{--------------------------------------------------------------}
{ Lexical Scanner }
begin
if IsAlpha(Look) then
Scan := GetName
Scan := GetNum
else begin
Scan := Look;
GetChar;
end;
SkipWhite;
end;
{--------------------------------------------------------------}
We can call this from the new main program:
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
repeat
Token := Scan;
writeln(Token);
end.
{--------------------------------------------------------------}
(You will have to add the declaration of the string Token at the
characters.)
Now, run the program. Note how the input string is, indeed,
STATE MACHINES
idea:
^V
||
||
As you can see, this diagram shows how the logic flows as
delimiter is found.
plus the current input character. That's what make this a state
machine.
on. But I highly recommend the diagrams to you for anything you
do that involves parsing. After a little practice you can begin
The neat thing that I'd like you to note is how painlessly this
NEWLINES
Moving right along, let's modify our scanner to handle more than
return and line feed, as white space. This is, in fact, the way
actually try this before. I'd like to do it now, so you can get
to read:
OK, compile this program and run it. Try a couple of lines,
Hey, what happened? When I tried it, I didn't get the last
token, the period. The program didn't halt. What's more, when I
pressed the 'enter' key a few times, I still didn't get the
period.
But since the input buffer is now empty, GetChar's read statement
end-of-files, everything will come out OK. But for reading data
from the console, the behavior is just too bizarre. The fact of
it's not in your current version of the cradle, put it there now.
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
repeat
Token := Scan;
writeln(Token);
end.
{--------------------------------------------------------------}
Note the "guard" test preceding the call to Fin. That's what
makes the whole thing work, and ensures that we don't try to read
a line ahead.
areas that really affects the look & feel that I mentioned. At
arrangements and see how you like them. If you want your
while Look = CR do
Fin;
return CR's as tokens. It must also eat the trailing LF. The
of Scan:
that I happen to like, but I want you to know how to choose other
OPERATORS
We could stop now and have a pretty useful scanner for our
only tokens that have multiple characters are the identifiers and
exception I can think of is the relops <=, >=, and <>, but they
them if necessary.
Needless to say, we can handle operators very much the same way
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
It's important to note that we DON'T have to include every
it is. The list above includes only those characters that can
{--------------------------------------------------------------}
{ Lexical Scanner }
begin
while Look = CR do
Fin;
if IsAlpha(Look) then
Scan := GetName
Scan := GetNum
Scan := GetOp
else begin
Scan := Look;
GetChar;
end;
SkipWhite;
end;
{--------------------------------------------------------------}
Try the program now. You will find that any code fragments you
tokens.
Before getting back to the main thrust of our study, I'd like to
How many times have you worked with a program or operating system
that had rigid rules about how you must separate items in a list?
(Try, the last time you used MSDOS!) Some programs require
{--------------------------------------------------------------}
procedure SkipComma;
begin
SkipWhite;
if Look = ',' then begin
GetChar;
SkipWhite;
end;
end;
{--------------------------------------------------------------}
I think you can see where I'm going here. Even if you never
every program where you can use the concepts of parsing. Any
you think about it for a bit, you'll have to conclude that any
ends up parsing?
you'll take the time to define the syntax explicitly. Write down
code the parser using the techniques I've shown you here. You'll
boot.
GETTING FANCY
OK, at this point we have a pretty nice lexical scanner that will
stands and have a servicable compiler. But there are some other
tests now become string comparisons. Much slower. And not only
test for what used to be single characters ... the '=', '+', and
Using string comparison is not impossible ... Ron Cain used just
this approach. But then I would have failed to tell you about
For this reason, most compiler writers ask the lexical scanner to
keywords and operators, and return unique codes for each one
we just return a code that says what kind of token they are, and
way, we're also going to need such a routine later, for dealing
Table[2] := 'ELSE';
Table[n] := 'END';
{--------------------------------------------------------------}
{ Type Declarations }
TabPtr = ^SymTab;
{--------------------------------------------------------------}
"big enough.")
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Table Lookup }
var i: integer;
found: boolean;
begin
found := false;
i := n;
if s = T^[i] then
found := true
else
dec(i);
Lookup := i;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Main Program }
begin
ReadLn(Token);
end.
{--------------------------------------------------------------}
OK, give this a try. Since we're bypassing Scan here, you'll
So what kind of code should we return? There are really only two
like
Operator);
{--------------------------------------------------------------}
{ Lexical Scanner }
procedure Scan;
var k: integer;
begin
while Look = CR do
Fin;
Value := GetName;
if k = 0 then
Token := Ident
else
end
Value := GetNum;
Token := Number;
end
Value := GetOp;
Token := Operator;
end
else begin
Value := Look;
Token := Operator;
GetChar;
end;
SkipWhite;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
repeat
Scan;
case Token of
Writeln(Value);
end.
{--------------------------------------------------------------}
What we've done here is to replace the string Token used earlier
GetName. The new form for the four procedures is, then:
{--------------------------------------------------------------}
{ Get an Identifier }
procedure GetName;
var k: integer;
begin
Value := '';
GetChar;
end;
if k = 0 then
Token := Ident
else
Token := SymType(k-1);
end;
{--------------------------------------------------------------}
{ Get a Number }
procedure GetNum;
begin
Value := '';
GetChar;
end;
Token := Number;
end;
{--------------------------------------------------------------}
{ Get an Operator }
procedure GetOp;
begin
Value := '';
GetChar;
end;
Token := Operator;
end;
{--------------------------------------------------------------}
{ Lexical Scanner }
procedure Scan;
var k: integer;
begin
while Look = CR do
Fin;
if IsAlpha(Look) then
GetName
GetNum
GetOp
else begin
Value := Look;
Token := Operator;
GetChar;
end;
SkipWhite;
end;
{--------------------------------------------------------------}
RETURNING A CHARACTER
For one thing, the list of possible symbol types can get pretty
long. Here, I've used just one symbol, "Operator," to stand for
all of the operators, but I've seen other designs that actually
value 'Operator' for a '+' sign, what's wrong with just returning
could be simpler?
Some of you may feel that this idea of returning character codes
is too mickey-mouse. I must admit it gets a little awkward for
the enumerated type, fine. For the rest, I'd like to show you
First, you can delete the SymType declaration now ... we won't be
needing that. And you can change the type of Token to char.
{--------------------------------------------------------------}
{ Get an Identifier }
procedure GetName;
begin
Value := '';
GetChar;
end;
end;
{--------------------------------------------------------------}
{ Get a Number }
procedure GetNum;
begin
Value := '';
GetChar;
end;
Token := '#';
end;
{--------------------------------------------------------------}
{ Get an Operator }
procedure GetOp;
begin
Value := '';
GetChar;
end;
if Length(Value) = 1 then
Token := Value[1]
else
Token := '?';
end;
{--------------------------------------------------------------}
{ Lexical Scanner }
procedure Scan;
var k: integer;
begin
while Look = CR do
Fin;
if IsAlpha(Look) then
GetName
GetNum
GetOp
else begin
Value := Look;
Token := '?';
GetChar;
end;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
repeat
Scan;
case Token of
end;
Writeln(Value);
end.
{--------------------------------------------------------------}
straightforward to me.
The structure for the lexical scanner that I've just shown you is
'=' (perhaps that's why both C and Pascal use different strings
for the two). All the scanner can do is to pass the operator
along to the parser, which can hopefully tell from the context
appear there, the scanner will see no problem with it, and will
With this kind of approach, we are not really using all the
compilation.
avoid this kind of problem. But that can get awkward, and
the context.
statement.
We end up, then, still needing only GetName and GetNum, which are
Now that we've covered all of the theory and general aspects of
before; I'm allowing only one control construct (the IF) and no
Boolean expressions. That's enough to demonstrate the parsing of
done.
dare try to lead you through that process. Instead, to avoid any
{--------------------------------------------------------------}
program KISS;
{--------------------------------------------------------------}
{ Constant Declarations }
CR = ^M;
LF = ^J;
{--------------------------------------------------------------}
{ Type Declarations }
TabPtr = ^SymTab;
{--------------------------------------------------------------}
{ Variable Declarations }
{--------------------------------------------------------------}
procedure GetChar;
begin
Read(Look);
end;
{--------------------------------------------------------------}
{ Report an Error }
begin
WriteLn;
end;
{--------------------------------------------------------------}
begin
Error(s);
Halt;
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
begin
IsDigit := c in ['0'..'9'];
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
{ Recognize an Addop }
function IsAddop(c: char): boolean;
begin
end;
{--------------------------------------------------------------}
{ Recognize a Mulop }
begin
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
procedure SkipWhite;
begin
while IsWhite(Look) do
GetChar;
end;
{--------------------------------------------------------------}
begin
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Skip a CRLF }
procedure Fin;
begin
SkipWhite;
end;
{--------------------------------------------------------------}
{ Get an Identifier }
begin
while Look = CR do
Fin;
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Get a Number }
begin
GetNum := Look;
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
var S: string;
begin
Str(LCount, S);
NewLabel := 'L' + S;
Inc(LCount);
end;
{--------------------------------------------------------------}
{ Post a Label To Output }
begin
WriteLn(L, ':');
end;
{--------------------------------------------------------------}
begin
Write(TAB, s);
end;
{--------------------------------------------------------------}
begin
Emit(s);
WriteLn;
end;
{---------------------------------------------------------------}
procedure Ident;
begin
Name := GetName;
Match('(');
Match(')');
end
else
end;
{---------------------------------------------------------------}
procedure Factor;
begin
Match('(');
Expression;
Match(')');
end
Ident
else
end;
{---------------------------------------------------------------}
procedure SignedFactor;
var s: boolean;
begin
s := Look = '-';
GetChar;
SkipWhite;
end;
Factor;
if s then
EmitLn('NEG D0');
end;
{--------------------------------------------------------------}
procedure Multiply;
begin
Match('*');
Factor;
EmitLn('MULS (SP)+,D0');
end;
{-------------------------------------------------------------}
{ Recognize and Translate a Divide }
procedure Divide;
begin
Match('/');
Factor;
EmitLn('MOVE (SP)+,D1');
EmitLn('EXS.L D0');
EmitLn('DIVS D1,D0');
end;
{---------------------------------------------------------------}
procedure Term1;
begin
EmitLn('MOVE D0,-(SP)');
case Look of
'*': Multiply;
'/': Divide;
end;
end;
end;
{---------------------------------------------------------------}
procedure Term;
begin
Factor;
Term1;
end;
{---------------------------------------------------------------}
procedure FirstTerm;
begin
SignedFactor;
Term1;
end;
{---------------------------------------------------------------}
procedure Add;
begin
Match('+');
Term;
EmitLn('ADD (SP)+,D0');
end;
{---------------------------------------------------------------}
procedure Subtract;
begin
Match('-');
Term;
EmitLn('SUB (SP)+,D0');
EmitLn('NEG D0');
end;
{---------------------------------------------------------------}
procedure Expression;
begin
FirstTerm;
EmitLn('MOVE D0,-(SP)');
case Look of
'+': Add;
'-': Subtract;
end;
end;
end;
{---------------------------------------------------------------}
Procedure Condition;
begin
EmitLn('Condition');
end;
{---------------------------------------------------------------}
procedure Block;
Forward;
procedure DoIf;
begin
Match('i');
Condition;
L1 := NewLabel;
L2 := L1;
Block;
Match('l');
L2 := NewLabel;
PostLabel(L1);
Block;
end;
PostLabel(L2);
Match('e');
end;
{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }
procedure Assignment;
begin
Name := GetName;
Match('=');
Expression;
EmitLn('MOVE D0,(A0)');
end;
{--------------------------------------------------------------}
procedure Block;
begin
case Look of
'i': DoIf;
Fin;
else Assignment;
end;
end;
end;
{--------------------------------------------------------------}
begin
Block;
EmitLn('END')
end;
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
begin
LCount := 0;
GetChar;
end;
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
DoProgram;
end.
{--------------------------------------------------------------}
A couple of comments:
(1) The form for the expression parser, using FirstTerm, etc.,
Before we proceed to adding the scanner, first copy this file and
the "codes": 'i' for IF, 'l' for ELSE, and 'e' for END or ENDIF.
If the program works, then let's press on. In adding the scanner
Init, and keep the "pump primed" after that. To keep the thing
o Added the variables Token and Value, and the type definitions
needed by Lookup.
o Added Lookup.
expression.)
for keywords.
GetName.
{--------------------------------------------------------------}
program KISS;
{--------------------------------------------------------------}
{ Constant Declarations }
CR = ^M;
LF = ^J;
{--------------------------------------------------------------}
{ Type Declarations }
TabPtr = ^SymTab;
{--------------------------------------------------------------}
{ Variable Declarations }
{--------------------------------------------------------------}
{--------------------------------------------------------------}
procedure GetChar;
begin
Read(Look);
end;
{--------------------------------------------------------------}
{ Report an Error }
begin
WriteLn;
end;
{--------------------------------------------------------------}
begin
Error(s);
Halt;
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
begin
IsDigit := c in ['0'..'9'];
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
{ Recognize an Addop }
begin
end;
{--------------------------------------------------------------}
{ Recognize a Mulop }
begin
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
procedure SkipWhite;
begin
while IsWhite(Look) do
GetChar;
end;
{--------------------------------------------------------------}
begin
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Skip a CRLF }
procedure Fin;
begin
if Look = CR then GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Table Lookup }
var i: integer;
found: boolean;
begin
found := false;
i := n;
if s = T^[i] then
found := true
else
dec(i);
Lookup := i;
end;
{--------------------------------------------------------------}
{ Get an Identifier }
procedure GetName;
begin
while Look = CR do
Fin;
Value := '';
GetChar;
end;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Get a Number }
procedure GetNum;
begin
Value := '';
GetChar;
end;
Token := '#';
SkipWhite;
end;
{--------------------------------------------------------------}
begin
GetName;
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
var S: string;
begin
Str(LCount, S);
NewLabel := 'L' + S;
Inc(LCount);
end;
{--------------------------------------------------------------}
begin
WriteLn(L, ':');
end;
{--------------------------------------------------------------}
begin
Write(TAB, s);
end;
{--------------------------------------------------------------}
begin
Emit(s);
WriteLn;
end;
{---------------------------------------------------------------}
procedure Ident;
begin
GetName;
Match('(');
Match(')');
EmitLn('BSR ' + Value);
end
else
end;
{---------------------------------------------------------------}
procedure Factor;
begin
Match('(');
Expression;
Match(')');
end
Ident
else begin
GetNum;
end;
end;
{---------------------------------------------------------------}
var s: boolean;
begin
s := Look = '-';
GetChar;
SkipWhite;
end;
Factor;
if s then
EmitLn('NEG D0');
end;
{--------------------------------------------------------------}
procedure Multiply;
begin
Match('*');
Factor;
EmitLn('MULS (SP)+,D0');
end;
{-------------------------------------------------------------}
procedure Divide;
begin
Match('/');
Factor;
EmitLn('MOVE (SP)+,D1');
EmitLn('EXS.L D0');
EmitLn('DIVS D1,D0');
end;
{---------------------------------------------------------------}
procedure Term1;
begin
EmitLn('MOVE D0,-(SP)');
case Look of
'*': Multiply;
'/': Divide;
end;
end;
end;
{---------------------------------------------------------------}
procedure Term;
begin
Factor;
Term1;
end;
{---------------------------------------------------------------}
procedure FirstTerm;
begin
SignedFactor;
Term1;
end;
{---------------------------------------------------------------}
procedure Add;
begin
Match('+');
Term;
EmitLn('ADD (SP)+,D0');
end;
{---------------------------------------------------------------}
procedure Subtract;
begin
Match('-');
Term;
EmitLn('SUB (SP)+,D0');
EmitLn('NEG D0');
end;
{---------------------------------------------------------------}
procedure Expression;
begin
FirstTerm;
EmitLn('MOVE D0,-(SP)');
case Look of
'+': Add;
'-': Subtract;
end;
end;
end;
{---------------------------------------------------------------}
Procedure Condition;
begin
EmitLn('Condition');
end;
{---------------------------------------------------------------}
procedure DoIf;
begin
Condition;
L1 := NewLabel;
L2 := L1;
Block;
L2 := NewLabel;
PostLabel(L1);
Block;
end;
PostLabel(L2);
MatchString('ENDIF');
end;
{--------------------------------------------------------------}
procedure Assignment;
begin
Name := Value;
Match('=');
Expression;
EmitLn('MOVE D0,(A0)');
end;
{--------------------------------------------------------------}
procedure Block;
begin
Scan;
case Token of
'i': DoIf;
else Assignment;
end;
Scan;
end;
end;
{--------------------------------------------------------------}
procedure DoProgram;
begin
Block;
MatchString('END');
EmitLn('END')
end;
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
begin
LCount := 0;
GetChar;
end;
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
DoProgram;
end.
{--------------------------------------------------------------}
CONCLUSION
At this point, you have learned how to parse and generate code
You have now learned how to develop lexical scanners, and how to
incorporate their elements into a translator. You have still not
seen ALL the elements combined into one program, but on the basis
deal with those in the next few sessions. Before doing so,
installment.
the next installment, I'll also be taking a look from the top
INTRODUCTION
INTRODUCTION
This once, I'd like to just talk with you for a while.
prof's lecture a lot better if I knew where he was going with it.
So I thought maybe it's about time I told you where we're going
general what all this is about. I'll also share some general
thoughts concerning the usefulness of what we've been doing.
... Keep It Simple, Sidney ... and I hope by now you've realized
just how simple this stuff can really be. While there are for
the language as you go, it's possible to write down the language
can crank out parse procedures from the BNF just about as fast as
As our compiler has taken form, it's gotten more parts, but each
part is quite small and simple, and very much like all the
others.
These include:
o Arrays
o Strings
o Optimization
finished, you'll have all the tools you need to design and build
I can't design those languages for you, but I can make some
have enough power to let you write and run real programs
I've also been toying for years with the idea of a HOL-like
original foray into the jungles of compiler theory. This one may
why is
MOVE.L A,B
B=A ?
control over the full complement of the CPU instruction set, and
would allow you to generate programs as efficient as assembly
it be done? I don't know. The real question may be, "Will the
I'm not completely sure yet how the syntax should look.
ahead in most of the areas that we will cover. I have some good
news: Things never get much harder than they've been so far.
WHY IS IT SO SIMPLE?
challenge. Yet the things we have done here have usually turned
into the meat of the subject. I had only covered the simple
parts. I will freely admit to you that, even when I began the
things got too complex to deal with in the ways we have so far.
But at this point I've already been down the road far enough to
see the end of it. Guess what?
good object code. Those of you who have been following the
series and trying sample compiles know that, while the code works
an alarming rate. But since then I've been tinkering around with
some simple optimizations and I've found some that result in very
Finally, I thought that perhaps the saving grace was the "toy
Borland and Microsoft. And yet, again, as I get deeper into this
Just to make sure you get the message here, let me state it flat
out:
Since the series began I've received some comments from you.
Most of them echo my own thoughts: "This is easy! Why do the
Recently, I've gone back and looked at some of those texts again,
and even bought and read some new ones. Each time, I come away
with the same feeling: These guys have made it seem too hard.
What's going on here? Why does the whole thing seem difficult in
the texts, but easy to us? Are we that much smarter than Aho,
Hardly. But we are doing some things differently, and more and
and console I/O, we have made some implicit assumptions and done
the past. As it turns out, our approach makes life a lot easier.
traps that other people have long since learned to avoid. But it
simplicity.
Here are the areas that I think have led to complexity in the
past:
PC, but he started the effort in 1981 with a 64K system, and
between phases.
All the early compiler writers had to deal with this issue:
leaves behind for the next. That adds complexity, and ends
up driving the design. Lee's book, "The Anatomy of a
o Batch Processing
In the early days, batch processing was the only choice ...
program as possible.
of each run.
In this series, I've been very careful to avoid the issue of
that it was mostly because I wanted to take the easy way out
o Large Programs
o Emphasis on Efficiency
would simply refuse to use it. For the record, that FORTRAN
this case we get to have our cake and eat it too: we will
end up with reasonable code quality, anyway.
MOVE #3,D0
that we can't.
grammars!
corners here. You may not always have that luxury. Still,
the grammar.
that work in the context that we are in; that is, a single-user
PC with rather ample CPU power and RAM space. We have limited
used the instruction set of the CPU to advantage, and we have not
only toy compilers? No, I don't think so. As I've said, we can
for the compiler that is natural for the job. Since the
structure naturally fits the job, it is almost bound to be simple
that, back when resources were tight, the structures people ended
hand.
CONCLUSION
traditional mold.
We're going to press on with this. I've given you a list of the
compilers for just about any occasion, and build them simply. If
For those of you who are chafing at the bit for more parser code,
have things put into perspective a bit. Next time, we'll get
INTRODUCTION
INTRODUCTION
parsing.
You mustn't get the idea, though, that the incremental approach
approach can work just as well when applied from the top down ...
see how complete compilers can be built starting from the top.
that you will not only be able to see how a compiler for TINY or
KISS works, but that you will also be able to design and build
compilers for your own languages. The C and Pascal examples will
help. One thing I'd like you to see is that the natural
program structure.
we won't try. But we can flesh out the top levels far enough so
that you can see how it goes.
failing to start at the true top. They think they know what the
write it down.
begin
end
OK, I grant you that this doesn't give much of a hint as to what
the next level is, but I like to write it down anyway, just to
BNF, begins here. What does the top level BNF look like? Well,
definition of the language. Here are the first few lines of one:
just as we've done before. For each one, we'll use our familiar
To a fresh copy of the cradle, add the following code, and insert
{--------------------------------------------------------------}
procedure Prog;
begin
Name := GetName;
Prolog(Name);
Match('.');
Epilog(Name);
end;
{--------------------------------------------------------------}
you are using PC's and would rather see something else, but I'm
{--------------------------------------------------------------}
procedure Prolog;
begin
end;
{--------------------------------------------------------------}
begin
EmitLn('DC WARMST');
end;
{--------------------------------------------------------------}
As usual, add this code and try out the "compiler." At this
now I'm sure you know that things will get more interesting.
complete language and get a program that will run on the target
what we've been doing all along, except that we're approaching it
FLESHING IT OUT
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
procedure Prog;
begin
Match('p');
Name := GetName;
Prolog;
DoBlock(Name);
Match('.');
Epilog(Name);
end;
{--------------------------------------------------------------}
can proceed to flesh out DoBlock. That's done right from its BNF
definition:
{--------------------------------------------------------------}
begin
Declarations;
PostLabel(Name);
Statements;
end;
{--------------------------------------------------------------}
some OS's, SK*DOS allows the entry point to the main program to
point a name. The call to PostLabel puts that name just before
SK*DOS know which of the many labels is the entry point, you ask?
It's the one that matches the END statement at the end of the
program.
Does the program still run the same? Then we can move on to the
next stage.
DECLARATIONS
<constant list> |
<type list> |
<variable list> |
<procedure> |
<function> )*
(Note that I'm using the more liberal definition used by Turbo
{--------------------------------------------------------------}
procedure Declarations;
begin
case Look of
'l': Labels;
'c': Constants;
't': Types;
'v': Variables;
'p': DoProcedure;
'f': DoFunction;
end;
end;
{--------------------------------------------------------------}
least, each recognizer must eat the character that invokes it.
{--------------------------------------------------------------}
procedure Labels;
begin
Match('l');
end;
{--------------------------------------------------------------}
procedure Constants;
begin
Match('c');
end;
{--------------------------------------------------------------}
procedure Types;
begin
Match('t');
end;
{--------------------------------------------------------------}
procedure Variables;
begin
Match('v');
end;
{--------------------------------------------------------------}
procedure DoProcedure;
begin
Match('p');
end;
{--------------------------------------------------------------}
begin
Match('f');
end;
{--------------------------------------------------------------}
Now try out the compiler with a few representative inputs. You
can mix the declarations any way you like, as long as the last
anything, so you don't need (and can't use) any characters other
We can flesh out the statement part in a similar way. The BNF
for it is:
Note that statements can begin with any identifier except END.
{--------------------------------------------------------------}
procedure Statements;
begin
Match('b');
while Look <> 'e' do
GetChar;
Match('e');
end;
{--------------------------------------------------------------}
'pxbe.'
the lower-level BNF definitions add detail and elaborate upon the
detail of the input program. When the last stub has been
You might note that even though we've been adding procedures, the
<if statement> |
<case statement> |
<while statement> |
<repeat statement> |
<for statement> |
<with statement>
we've been using for KISS, but the differences are nothing you
can't handle.
complete a Pascal compiler here ... there are too many things
other end.
different language.
THE STRUCTURE OF C
One reason I'm showing you these structures now is so that I can
structure upon the compiler. Rather, you should let the BNF
<function>
In Small C, functions can only have the default type int, which
is not declared. This makes the input easy to parse: the first
C.
{--------------------------------------------------------------}
procedure Prog;
begin
case Look of
'#': PreProc;
'i': IntDecl;
'c': CharDecl;
else DoFunction(Int);
end;
end;
end;
{--------------------------------------------------------------}
Note that I've had to use a ^Z to indicate the end of the source.
end.
With full C, things aren't even this easy. The problem comes
Things get more complicated since the next token may not be a
two.
You can now see the problem: The first two parts of the
definitions, and have them store away their findings and go on,
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
GetClass;
GetType;
TopDecl;
end;
end.
{--------------------------------------------------------------}
For the first round, just make the three procedures stubs that do
Does this program work? Well, it would be hard put NOT to, since
global variable
{--------------------------------------------------------------}
Procedure GetClass;
begin
Class := Look;
GetChar;
end
end;
{--------------------------------------------------------------}
the only three possible classes ... there are also "register" and
"typedef," but this should give you the picture. Note that the
procedure next:
{--------------------------------------------------------------}
procedure GetType;
begin
Typ := ' ';
Sign := 'u';
Typ := 'i';
GetChar;
end
Typ := Look;
GetChar;
end;
end;
{--------------------------------------------------------------}
Note that you must add two more global variables, Sign and Typ.
With these two procedures in place, the compiler will process the
class and type definitions and store away their findings. We can
We are by no means out of the woods yet, because there are still
even get to the actual data or function names. Let's pretend for
the moment that we have passed all those gates, and that the next
least one data item, and possibly a list, each element of which
procedure TopDecl;
begin
Name := Getname;
DoFunc(Name)
else
DoData(Name);
end;
{--------------------------------------------------------------}
(Note that, since we have already read the name, we must pass it
{--------------------------------------------------------------}
begin
Match('(');
Match(')');
Match('{');
Match('}');
if Typ = ' ' then Typ := 'i';
end;
{--------------------------------------------------------------}
begin
Match(',');
n := GetName;
end;
Match(';');
end;
{--------------------------------------------------------------}
decided to just have these two routines tell us what they found.
We're still a _VERY_ long way from having a C compiler, but what
Can we continue this until we have something that acts more like
I don't know about you, but I'm beginning to get dizzy, and we've
declarations.
At this point, I think you can see how the structure of the
we've seen for our two examples, Pascal and C, are as different
you who DO want to deal with Pascal or C, I hope I've given you
you'll soon need some of the stuff we still haven't covered yet,
such as typing and procedure calls). For the rest of you, stay
of KISS.
See you then.
GET YOUR DOMAIN:
type here .com
Get It Now >>
INTRODUCTION
In the last installment, I showed you the general idea for the
top-down development of a compiler. I gave you the first few
steps of the process for compilers for Pascal and C, but I
stopped far short of pushing it through to completion. The
reason was simple: if we're going to produce a real, functional
compiler for any language, I'd rather do it for KISS, the
language that I've been defining in this tutorial series.
Many of you may be asking at this point: Why bother starting over
from scratch? We had a working subset of KISS as the outcome of
Installment VII (lexical scanning). Why not just extend it as
needed? The answer is threefold. First of all, I have been
making a number of changes to further simplify the program ...
changes like encapsulating the code generation procedures, so
that we can convert to a different target machine more easily.
Second, I want you to see how the development can indeed be done
from the top down as outlined in the last installment. Finally,
we both need the practice. Each time I go through this exercise,
I get a little better at it, and you will, also.
GETTING STARTED
Many years ago there were languages called Tiny BASIC, Tiny
Pascal, and Tiny C, each of which was a subset of its parent full
language. Tiny BASIC, for example, had only single-character
variable names and global variables. It supported only a single
data type. Sound familiar? At this point we have almost all the
tools we need to build a compiler like that.
I've wondered just how small and simple a compiler could be made
and still be useful, if it were designed from the outset to be
both easy to use and to parse. Let's find out. This language
will just be called "TINY," period. It's a subset of KISS, which
I also haven't fully defined, so that at least makes us
consistent (!). I suppose you could call it TINY KISS. But that
opens up a whole can of worms involving cuter and cuter (and
perhaps more risque) names, so let's just stick with TINY.
The language I have in mind will share some of the good features
of Pascal, C, and Ada. Taking a lesson from the comparison of
the Pascal and C compilers in the previous installment, though,
TINY will have a decided Pascal flavor. Wherever feasible, a
language structure will be bracketed by keywords or symbols, so
that the parser will know where it's going without having to
guess.
One other ground rule: As we go, I'd like to keep the compiler
producing real, executable code. Even though it may not DO much
at the beginning, it will at least do it correctly.
Given the BNF above, let's write a parser that just recognizes
the brackets:
{--------------------------------------------------------------}
{ Parse and Translate a Program }
procedure Prog;
begin
Match('p');
Header;
Prolog;
Match('.');
Epilog;
end;
{--------------------------------------------------------------}
The procedure Header just emits the startup code required by the
assembler:
{--------------------------------------------------------------}
{ Write Header Info }
procedure Header;
begin
WriteLn('WARMST', TAB, 'EQU $A01E');
end;
{--------------------------------------------------------------}
The procedures Prolog and Epilog emit the code for identifying
the main program, and for returning to the OS:
{--------------------------------------------------------------}
{ Write the Prolog }
procedure Prolog;
begin
PostLabel('MAIN');
end;
{--------------------------------------------------------------}
{ Write the Epilog }
procedure Epilog;
begin
EmitLn('DC WARMST');
EmitLn('END MAIN');
end;
{--------------------------------------------------------------}
The main program just calls Prog, and then looks for a clean
ending:
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
Prog;
if Look <> CR then Abort('Unexpected data after ''.''');
end.
{--------------------------------------------------------------}
At this point, TINY will accept only one input "program," the
null program:
Note, though, that the compiler DOES generate correct code for
this program. It will run, and do what you'd expect the null
program to do, that is, nothing but return gracefully to the OS.
The next step is to process the code for the main program. I'll
use the Pascal BEGIN-block:
BEGIN <name>
END <name>
{--------------------------------------------------------------}
{ Parse and Translate a Program }
procedure Prog;
begin
Match('p');
Header;
Main;
Match('.');
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Parse and Translate a Main Program }
procedure Main;
begin
Match('b');
Prolog;
Match('e');
Epilog;
end;
{--------------------------------------------------------------}
DECLARATIONS
Note that since there is only one variable type, there is no need
to declare the type. Later on, for full KISS, we can easily add
a type description.
{--------------------------------------------------------------}
{ Parse and Translate a Program }
procedure Prog;
begin
Match('p');
Header;
TopDecls;
Main;
Match('.');
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Process a Data Declaration }
procedure Decl;
begin
Match('v');
GetChar;
end;
{--------------------------------------------------------------}
{ Parse and Translate Global Declarations }
procedure TopDecls;
begin
while Look <> 'b' do
case Look of
'v': Decl;
else Abort('Unrecognized Keyword ''' + Look + '''');
end;
end;
{--------------------------------------------------------------}
That looks pretty good, but we're still only generating the null
program for output. A real compiler would issue assembler
directives to allocate storage for the variables. It's about
time we actually produced some code.
{--------------------------------------------------------------}
{ Parse and Translate a Data Declaration }
procedure Decl;
var Name: char;
begin
Match('v');
Alloc(GetName);
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Allocate Storage for a Variable }
pvxvyvzbe.
See how the storage is allocated? Simple, huh? Note also that
the entry point, "MAIN," comes out in the right place.
For the record, a "real" compiler would also have a symbol table
to record the variables being used. Normally, the symbol table
is necessary to record the type of each variable. But since in
this case all variables have the same type, we don't need a
symbol table for that reason. As it turns out, we're going to
find a symbol necessary even without different types, but let's
postpone that need until it arises.
{--------------------------------------------------------------}
{ Parse and Translate a Data Declaration }
procedure Decl;
var Name: char;
begin
Match('v');
Alloc(GetName);
while Look = ',' do begin
GetChar;
Alloc(GetName);
end;
end;
{--------------------------------------------------------------}
OK, now compile this code and give it a try. Try a number of
lines of VAR declarations, try a list of several variables on one
line, and try combinations of the two. Does it work?
INITIALIZERS
{--------------------------------------------------------------}
{ Allocate Storage for a Variable }
OK, try this version of TINY and verify that you can, indeed,
give the variables initial values.
Before leaving this section, I should point out that we've used
two versions of function GetNum. One, the earlier one, returns a
character value, a single digit. The other accepts a multi-digit
integer and returns an integer value. Either one will work here,
since WriteLn will handle either type. But there's no reason to
limit ourselves to single-digit values here, so the correct
version to use is the one that returns an integer. Here it is:
{--------------------------------------------------------------}
{ Get a Number }
{--------------------------------------------------------------}
{ Allocate Storage for a Variable }
pvavavabe.
Here we've declared the variable A three times. As you can see,
the compiler will cheerfully accept that, and generate three
identical labels. Not good.
{--------------------------------------------------------------}
{ Look for Symbol in Table }
var i: char;
begin
for i := 'A' to 'Z' do
ST[i] := ' ';
...
EXECUTABLE STATEMENTS
At this point, we can generate a null program that has some data
variables declared and possibly initialized. But so far we
haven't arranged to generate the first line of executable code.
Believe it or not, though, we almost have a usable language!
What's missing is the executable code that must go into the main
program. But that code is just assignment statements and control
statements ... all stuff we have done before. So it shouldn't
take us long to provide for them, as well.
The BNF definition given earlier for the main program included a
statement block, which we have so far ignored:
Let's start things off by adding a parser for the block. We'll
begin with a stub for the assignment statement:
{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }
procedure Assignment;
begin
GetChar;
end;
{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }
procedure Block;
begin
while Look <> 'e' do
Assignment;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Parse and Translate a Main Program }
procedure Main;
begin
Match('b');
Prolog;
Block;
Match('e');
Epilog;
end;
{--------------------------------------------------------------}
This version still won't generate any code for the "assignment
statements" ... all it does is to eat characters until it sees
the 'e' for 'END.' But it sets the stage for what is to follow.
{---------------------------------------------------------------}
{ Clear the Primary Register }
procedure Clear;
begin
EmitLn('CLR D0');
end;
{---------------------------------------------------------------}
{ Negate the Primary Register }
procedure Negate;
begin
EmitLn('NEG D0');
end;
{---------------------------------------------------------------}
{ Load a Constant Value to Primary Register }
{---------------------------------------------------------------}
{ Load a Variable to Primary Register }
procedure Push;
begin
EmitLn('MOVE D0,-(SP)');
end;
{---------------------------------------------------------------}
{ Add Top of Stack to Primary }
procedure PopAdd;
begin
EmitLn('ADD (SP)+,D0');
end;
{---------------------------------------------------------------}
{ Subtract Primary from Top of Stack }
procedure PopSub;
begin
EmitLn('SUB (SP)+,D0');
EmitLn('NEG D0');
end;
{---------------------------------------------------------------}
{ Multiply Top of Stack by Primary }
procedure PopMul;
begin
EmitLn('MULS (SP)+,D0');
end;
{---------------------------------------------------------------}
{ Divide Top of Stack by Primary }
procedure PopDiv;
begin
EmitLn('MOVE (SP)+,D7');
EmitLn('EXT.L D7');
EmitLn('DIVS D0,D7');
EmitLn('MOVE D7,D0');
end;
{---------------------------------------------------------------}
{ Store Primary to Variable }
Note that both LoadVar and Store check the symbol table to make
sure that the variable is defined. The error handler Undefined
simply calls Abort:
{--------------------------------------------------------------}
{ Report an Undefined Identifier }
We've been down this road many times before, so this should all
be familiar to you. In fact, except for the changes associated
with the code generation, we could just copy the procedures from
Part VII. Since we are making some changes, I won't just copy
them, but we will go a little faster than usual.
This version of the BNF is also a bit different than we've used
before ... yet another "variation on the theme of an expression."
This particular version has what I consider to be the best
treatment of the unary minus. As you'll see later, it lets us
handle negative constant values efficiently. It's worth
mentioning here that we have often seen the advantages of
"tweaking" the BNF as we go, to help make the language easy to
parse. What you're looking at here is a bit different: we've
tweaked the BNF to make the CODE GENERATION more efficient!
That's a first for this series.
procedure Factor;
begin
if Look = '(' then begin
Match('(');
Expression;
Match(')');
end
else if IsAlpha(Look) then
LoadVar(GetName)
else
LoadConst(GetNum);
end;
{--------------------------------------------------------------}
{ Parse and Translate a Negative Factor }
procedure NegFactor;
begin
Match('-');
if IsDigit(Look) then
LoadConst(-GetNum)
else begin
Factor;
Negate;
end;
end;
{--------------------------------------------------------------}
{ Parse and Translate a Leading Factor }
procedure FirstFactor;
begin
case Look of
'+': begin
Match('+');
Factor;
end;
'-': NegFactor;
else Factor;
end;
end;
{--------------------------------------------------------------}
{ Recognize and Translate a Multiply }
procedure Multiply;
begin
Match('*');
Factor;
PopMul;
end;
{-------------------------------------------------------------}
{ Recognize and Translate a Divide }
procedure Divide;
begin
Match('/');
Factor;
PopDiv;
end;
{---------------------------------------------------------------}
{ Common Code Used by Term and FirstTerm }
procedure Term1;
begin
while IsMulop(Look) do begin
Push;
case Look of
'*': Multiply;
'/': Divide;
end;
end;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Math Term }
procedure Term;
begin
Factor;
Term1;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Leading Term }
procedure FirstTerm;
begin
FirstFactor;
Term1;
end;
{--------------------------------------------------------------}
{ Recognize and Translate an Add }
procedure Add;
begin
Match('+');
Term;
PopAdd;
end;
{-------------------------------------------------------------}
{ Recognize and Translate a Subtract }
procedure Subtract;
begin
Match('-');
Term;
PopSub;
end;
{---------------------------------------------------------------}
{ Parse and Translate an Expression }
procedure Expression;
begin
FirstTerm;
while IsAddop(Look) do begin
Push;
case Look of
'+': Add;
'-': Subtract;
end;
end;
end;
{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }
procedure Assignment;
var Name: char;
begin
Name := GetName;
Match('=');
Expression;
Store(Name);
end;
{--------------------------------------------------------------}
OK, if you've got all this code inserted, then compile it and
check it out. You should be seeing reasonable-looking code,
representing a complete program that will assemble and execute.
We have a compiler!
BOOLEANS
{--------------------------------------------------------------}
{ Recognize a Boolean Orop }
{--------------------------------------------------------------}
{ Recognize a Relop }
{---------------------------------------------------------------}
{ Complement the Primary Register }
procedure NotIt;
begin
EmitLn('NOT D0');
end;
{---------------------------------------------------------------}
.
.
.
{---------------------------------------------------------------}
{ AND Top of Stack with Primary }
procedure PopAnd;
begin
EmitLn('AND (SP)+,D0');
end;
{---------------------------------------------------------------}
{ OR Top of Stack with Primary }
procedure PopOr;
begin
EmitLn('OR (SP)+,D0');
end;
{---------------------------------------------------------------}
{ XOR Top of Stack with Primary }
procedure PopXor;
begin
EmitLn('EOR (SP)+,D0');
end;
{---------------------------------------------------------------}
{ Compare Top of Stack with Primary }
procedure PopCompare;
begin
EmitLn('CMP (SP)+,D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was = }
procedure SetEqual;
begin
EmitLn('SEQ D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was != }
procedure SetNEqual;
begin
EmitLn('SNE D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was > }
procedure SetGreater;
begin
EmitLn('SLT D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was < }
procedure SetLess;
begin
EmitLn('SGT D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
All of this gives us the tools we need. The BNF for the Boolean
expressions is:
Sharp-eyed readers might note that this syntax does not include
the non-terminal "bool-factor" used in earlier versions. It was
needed then because I also allowed for the Boolean constants TRUE
and FALSE. But remember that in TINY there is no distinction
made between Boolean and arithmetic types ... they can be freely
intermixed. So there is really no need for these predefined
values ... we can just use -1 and 0, respectively.
#define TRUE -1
#define FALSE 0
The reason that I'm harping on this is that I've already tried
the alternative, which is to include TRUE and FALSE as keywords.
The problem with that approach is that it then requires lexical
scanning for EVERY variable name in every expression. If you'll
recall, I pointed out in Installment VII that this slows the
compiler down considerably. As long as keywords can't be in
expressions, we need to do the scanning only at the beginning of
every new statement ... quite an improvement. So using the
syntax above not only simplifies the parsing, but speeds up the
scanning as well.
OK, given that we're all satisfied with the syntax above, the
corresponding code is shown below:
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Equals" }
procedure Equals;
begin
Match('=');
Expression;
PopCompare;
SetEqual;
end;
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Not Equals" }
procedure NotEquals;
begin
Match('#');
Expression;
PopCompare;
SetNEqual;
end;
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Less Than" }
procedure Less;
begin
Match('<');
Expression;
PopCompare;
SetLess;
end;
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Greater Than" }
procedure Greater;
begin
Match('>');
Expression;
PopCompare;
SetGreater;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Relation }
procedure Relation;
begin
Expression;
if IsRelop(Look) then begin
Push;
case Look of
'=': Equals;
'#': NotEquals;
'<': Less;
'>': Greater;
end;
end;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Boolean Factor with Leading NOT }
procedure NotFactor;
begin
if Look = '!' then begin
Match('!');
Relation;
NotIt;
end
else
Relation;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Boolean Term }
procedure BoolTerm;
begin
NotFactor;
while Look = '&' do begin
Push;
Match('&');
NotFactor;
PopAnd;
end;
end;
{--------------------------------------------------------------}
{ Recognize and Translate a Boolean OR }
procedure BoolOr;
begin
Match('|');
BoolTerm;
PopOr;
end;
{--------------------------------------------------------------}
{ Recognize and Translate an Exclusive Or }
procedure BoolXor;
begin
Match('~');
BoolTerm;
PopXor;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Boolean Expression }
procedure BoolExpression;
begin
BoolTerm;
while IsOrOp(Look) do begin
Push;
case Look of
'|': BoolOr;
'~': BoolXor;
end;
end;
end;
{--------------------------------------------------------------}
OK, if you've got all that typed in, compile it and give it a
whirl. First, make sure you can still parse an ordinary
arithmetic expression. Then, try a Boolean one. Finally, make
sure that you can assign the results of relations. Try, for
example:
pvx,y,zbx=z>ye.
PROGRAM
VAR X,Y,Z
BEGIN
X=Z>Y
END.
CONTROL STRUCTURES
In KISS, all the control structures will have explicit and unique
keywords bracketing the statement block, so there can be no
confusion as to where things begin and end. This is the modern
approach, used in such respected languages as Ada and Modula 2,
and it completely eliminates the problem of the "dangling else."
Note that I could have chosen to use the same keyword END to end
all the constructs, as is done in Pascal. (The closing '}' in C
serves the same purpose.) But this has always led to confusion,
which is why Pascal programmers tend to write things like
end { loop }
or end { if }
One last thought: The two constructs above each have the non-
terminals
<bool-expression> and <block>
I have no problem with leaving out these keywords, and the parser
has no trouble either, ON CONDITION that we make no errors in the
bool-expression part. On the other hand, if we were to include
these extra keywords we would get yet one more level of insurance
at very little cost, and I have no problem with that, either.
Use your best judgment as to which way to go.
{---------------------------------------------------------------}
{ Branch Unconditional }
{---------------------------------------------------------------}
{ Branch False }
{---------------------------------------------------------------}
{ Recognize and Translate an IF Construct }
procedure DoIf;
var L1, L2: string;
begin
Match('i');
BoolExpression;
L1 := NewLabel;
L2 := L1;
BranchFalse(L1);
Block;
if Look = 'l' then begin
Match('l');
L2 := NewLabel;
Branch(L2);
PostLabel(L1);
Block;
end;
PostLabel(L2);
Match('e');
end;
{--------------------------------------------------------------}
{ Parse and Translate a WHILE Statement }
procedure DoWhile;
var L1, L2: string;
begin
Match('w');
L1 := NewLabel;
L2 := NewLabel;
PostLabel(L1);
BoolExpression;
BranchFalse(L2);
Block;
Match('e');
Branch(L1);
PostLabel(L2);
end;
{--------------------------------------------------------------}
where
{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }
procedure Block;
begin
while not(Look in ['e', 'l']) do begin
case Look of
'i': DoIf;
'w': DoWhile;
else Assignment;
end;
end;
end;
{--------------------------------------------------------------}
OK, add the routines I've given, compile and test them. You
should be able to parse the single-character versions of any of
the control constructs. It's looking pretty good!
LEXICAL SCANNING
{--------------------------------------------------------------}
{ Skip Over an End-of-Line }
procedure NewLine;
begin
while Look = CR do begin
GetChar;
if Look = LF then GetChar;
SkipWhite;
end;
end;
{--------------------------------------------------------------}
If you've got all this done, try the program out and verify that
it will indeed handle white space and newlines.
{--------------------------------------------------------------}
{ Type Declarations }
TabPtr = ^SymTab;
{--------------------------------------------------------------}
{ Variable Declarations }
{--------------------------------------------------------------}
{ Definition of Keywords and Token Types }
const NKW = 9;
NKW1 = 10;
{--------------------------------------------------------------}
{ Table Lookup }
procedure Scan;
begin
GetName;
Token := KWcode[Lookup(Addr(KWlist), Value, NKW) + 1];
end;
{--------------------------------------------------------------}
.
.
{--------------------------------------------------------------}
{ Match a Specific Input String }
{--------------------------------------------------------------}
{ Get an Identifier }
procedure GetName;
begin
NewLine;
if not IsAlpha(Look) then Expected('Name');
Value := '';
while IsAlNum(Look) do begin
Value := Value + UpCase(Look);
GetChar;
end;
SkipWhite;
end;
{--------------------------------------------------------------}
Note that this procedure leaves its result in the global string
Value.
{---------------------------------------------------------------}
{ Parse and Translate a Math Factor }
procedure Factor;
begin
if Look = '(' then begin
Match('(');
BoolExpression;
Match(')');
end
else if IsAlpha(Look) then begin
GetName;
LoadVar(Value[1]);
end
else
LoadConst(GetNum);
end;
{--------------------------------------------------------------}
.
.
{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }
procedure Assignment;
var Name: char;
begin
Name := Value[1];
Match('=');
BoolExpression;
Store(Name);
end;
{---------------------------------------------------------------}
.
.
{--------------------------------------------------------------}
{ Parse and Translate a Data Declaration }
procedure Decl;
begin
GetName;
Alloc(Value[1]);
while Look = ',' do begin
Match(',');
GetName;
Alloc(Value[1]);
end;
end;
{--------------------------------------------------------------}
{---------------------------------------------------------------}
{ Recognize and Translate an IF Construct }
procedure Block; Forward;
procedure DoIf;
var L1, L2: string;
begin
BoolExpression;
L1 := NewLabel;
L2 := L1;
BranchFalse(L1);
Block;
if Token = 'l' then begin
L2 := NewLabel;
Branch(L2);
PostLabel(L1);
Block;
end;
PostLabel(L2);
MatchString('ENDIF');
end;
{--------------------------------------------------------------}
{ Parse and Translate a WHILE Statement }
procedure DoWhile;
var L1, L2: string;
begin
L1 := NewLabel;
L2 := NewLabel;
PostLabel(L1);
BoolExpression;
BranchFalse(L2);
Block;
MatchString('ENDWHILE');
Branch(L1);
PostLabel(L2);
end;
{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }
procedure Block;
begin
Scan;
while not(Token in ['e', 'l']) do begin
case Token of
'i': DoIf;
'w': DoWhile;
else Assignment;
end;
Scan;
end;
end;
{--------------------------------------------------------------}
{ Parse and Translate Global Declarations }
procedure TopDecls;
begin
Scan;
while Token <> 'b' do begin
case Token of
'v': Decl;
else Abort('Unrecognized Keyword ' + Value);
end;
Scan;
end;
end;
{--------------------------------------------------------------}
{ Parse and Translate a Main Program }
procedure Main;
begin
MatchString('BEGIN');
Prolog;
Block;
MatchString('END');
Epilog;
end;
{--------------------------------------------------------------}
{ Parse and Translate a Program }
procedure Prog;
begin
MatchString('PROGRAM');
Header;
TopDecls;
Main;
Match('.');
end;
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
var i: char;
begin
for i := 'A' to 'Z' do
ST[i] := ' ';
GetChar;
Scan;
end;
{--------------------------------------------------------------}
Did it work? If so, then we're just about home. In fact, with a
few minor exceptions we've already got a compiler that's usable.
There are still a few areas that need improvement.
We've done this step before. This time, as usual, I'm doing it a
little differently. I think the approach used here keeps things
just about as simple as possible.
OK, here are the changes that need to be made. First, add the
new typed constant:
NEntry: integer = 0;
{--------------------------------------------------------------}
{ Look for Symbol in Table }
{--------------------------------------------------------------}
{ Allocate Storage for a Variable }
Finally, we must change all the routines that currently treat the
variable name as a single character. These include LoadVar and
Store (just change the type from char to string), and Factor,
Assignment, and Decl (just change Value[1] to Value).
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
var i: integer;
begin
for i := 1 to MaxEntry do begin
ST[i] := '';
SType[i] := ' ';
end;
GetChar;
Scan;
end;
{--------------------------------------------------------------}
That should do it. Try it out and verify that you can, indeed,
use multi-character variable names.
MORE RELOPS
We still have one remaining single-character restriction: the one
on relops. Some of the relops are indeed single characters, but
others require two. These are '<=' and '>='. I also prefer the
Pascal '<>' for "not equals," instead of '#'.
I mentioned then that we can still get away with this, since the
multi-character relops are so few and so limited in their usage.
It's easy to just treat them as special cases and handle them in
an ad hoc manner.
The changes required affect only the code generation routines and
procedures Relation and friends. First, we're going to need two
more code generation routines:
{---------------------------------------------------------------}
{ Set D0 If Compare was <= }
procedure SetLessOrEqual;
begin
EmitLn('SGE D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was >= }
procedure SetGreaterOrEqual;
begin
EmitLn('SLE D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Less Than or Equal" }
procedure LessOrEqual;
begin
Match('=');
Expression;
PopCompare;
SetLessOrEqual;
end;
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Not Equals" }
procedure NotEqual;
begin
Match('>');
Expression;
PopCompare;
SetNEqual;
end;
Part - B
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Less Than" }
procedure Less;
begin
Match('<');
case Look of
'=': LessOrEqual;
'>': NotEqual;
else begin
Expression;
PopCompare;
SetLess;
end;
end;
end;
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Greater Than" }
procedure Greater;
begin
Match('>');
if Look = '=' then begin
Match('=');
Expression;
PopCompare;
SetGreaterOrEqual;
end
else begin
Expression;
PopCompare;
SetGreater;
end;
end;
{---------------------------------------------------------------}
That's all it takes. Now you can process all the relops. Try
it.
INPUT/OUTPUT
{---------------------------------------------------------------}
{ Read Variable to Primary Register }
procedure ReadVar;
begin
EmitLn('BSR READ');
Store(Value);
end;
{---------------------------------------------------------------}
{ Write Variable from Primary Register }
procedure WriteVar;
begin
EmitLn('BSR WRITE');
end;
{--------------------------------------------------------------}
The idea is that READ loads the value from input to the D0, and
WRITE outputs it from there.
But that is really separate from compiler design, so for now I'll
just assume that a library call TINYLIB.LIB exists. Since we now
need it loaded, we need to add a statement to include it in
procedure Header:
{--------------------------------------------------------------}
{ Write Header Info }
procedure Header;
begin
That takes care of that part. Now, we also need to recognize the
read and write commands. We can do this by adding two more
keywords to our list:
{--------------------------------------------------------------}
{ Definition of Keywords and Token Types }
(Note how I'm using upper case codes here to avoid conflict with
the 'w' of WHILE.)
{--------------------------------------------------------------}
{ Process a Write Statement }
procedure DoWrite;
begin
Match('(');
Expression;
WriteVar;
while Look = ',' do begin
Match(',');
Expression;
WriteVar;
end;
Match(')');
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }
procedure Block;
begin
Scan;
while not(Token in ['e', 'l']) do begin
case Token of
'i': DoIf;
'w': DoWhile;
'R': DoRead;
'W': DoWrite;
else Assignment;
end;
Scan;
end;
end;
{--------------------------------------------------------------}
At this point we have TINY completely defined. It's not much ...
actually a toy compiler. TINY has only one data type and no
subroutines ... but it's a complete, usable language. While
you're not likely to be able to write another compiler in it, or
do anything else very seriously, you could write programs to read
some input, perform calculations, and output the results. Not
too bad for a toy.
{--------------------------------------------------------------}
program Tiny10;
{--------------------------------------------------------------}
{ Constant Declarations }
LCount: integer = 0;
NEntry: integer = 0;
{--------------------------------------------------------------}
{ Type Declarations }
{--------------------------------------------------------------}
{ Variable Declarations }
{--------------------------------------------------------------}
{ Definition of Keywords and Token Types }
{--------------------------------------------------------------}
{ Read New Character From Input Stream }
procedure GetChar;
begin
Read(Look);
end;
{--------------------------------------------------------------}
{ Report an Error }
{--------------------------------------------------------------}
{ Report Error and Halt }
{--------------------------------------------------------------}
{ Report What Was Expected }
{--------------------------------------------------------------}
{ Report an Undefined Identifier }
{--------------------------------------------------------------}
{ Recognize an Alpha Character }
{--------------------------------------------------------------}
{ Recognize a Decimal Digit }
{--------------------------------------------------------------}
{ Recognize an AlphaNumeric Character }
{--------------------------------------------------------------}
{ Recognize an Addop }
{--------------------------------------------------------------}
{ Recognize a Mulop }
{--------------------------------------------------------------}
{ Recognize a Boolean Orop }
{--------------------------------------------------------------}
{ Recognize a Relop }
{--------------------------------------------------------------}
{ Recognize White Space }
{--------------------------------------------------------------}
{ Skip Over Leading White Space }
procedure SkipWhite;
begin
while IsWhite(Look) do
GetChar;
end;
{--------------------------------------------------------------}
{ Skip Over an End-of-Line }
procedure NewLine;
begin
while Look = CR do begin
GetChar;
if Look = LF then GetChar;
SkipWhite;
end;
end;
{--------------------------------------------------------------}
{ Match a Specific Input Character }
{--------------------------------------------------------------}
{ Table Lookup }
{--------------------------------------------------------------}
{ Locate a Symbol in Table }
{ Returns the index of the entry. Zero if not present. }
{--------------------------------------------------------------}
{ Look for Symbol in Table }
{--------------------------------------------------------------}
{ Add a New Entry to Symbol Table }
{--------------------------------------------------------------}
{ Get an Identifier }
procedure GetName;
begin
NewLine;
if not IsAlpha(Look) then Expected('Name');
Value := '';
while IsAlNum(Look) do begin
Value := Value + UpCase(Look);
GetChar;
end;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Get a Number }
{--------------------------------------------------------------}
{ Get an Identifier and Scan it for Keywords }
procedure Scan;
begin
GetName;
Token := KWcode[Lookup(Addr(KWlist), Value, NKW) + 1];
end;
{--------------------------------------------------------------}
{ Match a Specific Input String }
{--------------------------------------------------------------}
{ Output a String with Tab }
{--------------------------------------------------------------}
{ Output a String with Tab and CRLF }
{--------------------------------------------------------------}
{ Generate a Unique Label }
{---------------------------------------------------------------}
{ Clear the Primary Register }
procedure Clear;
begin
EmitLn('CLR D0');
end;
{---------------------------------------------------------------}
{ Negate the Primary Register }
procedure Negate;
begin
EmitLn('NEG D0');
end;
{---------------------------------------------------------------}
{ Complement the Primary Register }
procedure NotIt;
begin
EmitLn('NOT D0');
end;
{---------------------------------------------------------------}
{ Load a Constant Value to Primary Register }
{---------------------------------------------------------------}
{ Load a Variable to Primary Register }
{---------------------------------------------------------------}
{ Push Primary onto Stack }
procedure Push;
begin
EmitLn('MOVE D0,-(SP)');
end;
{---------------------------------------------------------------}
{ Add Top of Stack to Primary }
procedure PopAdd;
begin
EmitLn('ADD (SP)+,D0');
end;
{---------------------------------------------------------------}
{ Subtract Primary from Top of Stack }
procedure PopSub;
begin
EmitLn('SUB (SP)+,D0');
EmitLn('NEG D0');
end;
{---------------------------------------------------------------}
{ Multiply Top of Stack by Primary }
procedure PopMul;
begin
EmitLn('MULS (SP)+,D0');
end;
{---------------------------------------------------------------}
{ Divide Top of Stack by Primary }
procedure PopDiv;
begin
EmitLn('MOVE (SP)+,D7');
EmitLn('EXT.L D7');
EmitLn('DIVS D0,D7');
EmitLn('MOVE D7,D0');
end;
{---------------------------------------------------------------}
{ AND Top of Stack with Primary }
procedure PopAnd;
begin
EmitLn('AND (SP)+,D0');
end;
{---------------------------------------------------------------}
{ OR Top of Stack with Primary }
procedure PopOr;
begin
EmitLn('OR (SP)+,D0');
end;
{---------------------------------------------------------------}
{ XOR Top of Stack with Primary }
procedure PopXor;
begin
EmitLn('EOR (SP)+,D0');
end;
{---------------------------------------------------------------}
{ Compare Top of Stack with Primary }
procedure PopCompare;
begin
EmitLn('CMP (SP)+,D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was = }
procedure SetEqual;
begin
EmitLn('SEQ D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was != }
procedure SetNEqual;
begin
EmitLn('SNE D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was > }
procedure SetGreater;
begin
EmitLn('SLT D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was < }
procedure SetLess;
begin
EmitLn('SGT D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was <= }
procedure SetLessOrEqual;
begin
EmitLn('SGE D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was >= }
procedure SetGreaterOrEqual;
begin
EmitLn('SLE D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{ Store Primary to Variable }
{---------------------------------------------------------------}
{ Branch Unconditional }
{---------------------------------------------------------------}
{ Branch False }
{---------------------------------------------------------------}
{ Read Variable to Primary Register }
procedure ReadVar;
begin
EmitLn('BSR READ');
Store(Value[1]);
end;
procedure WriteVar;
begin
EmitLn('BSR WRITE');
end;
{--------------------------------------------------------------}
{ Write Header Info }
procedure Header;
begin
WriteLn('WARMST', TAB, 'EQU $A01E');
end;
{--------------------------------------------------------------}
{ Write the Prolog }
procedure Prolog;
begin
PostLabel('MAIN');
end;
{--------------------------------------------------------------}
{ Write the Epilog }
procedure Epilog;
begin
EmitLn('DC WARMST');
EmitLn('END MAIN');
end;
{---------------------------------------------------------------}
{ Parse and Translate a Math Factor }
procedure Factor;
begin
if Look = '(' then begin
Match('(');
BoolExpression;
Match(')');
end
else if IsAlpha(Look) then begin
GetName;
LoadVar(Value);
end
else
LoadConst(GetNum);
end;
{--------------------------------------------------------------}
{ Parse and Translate a Negative Factor }
procedure NegFactor;
begin
Match('-');
if IsDigit(Look) then
LoadConst(-GetNum)
else begin
Factor;
Negate;
end;
end;
{--------------------------------------------------------------}
{ Parse and Translate a Leading Factor }
procedure FirstFactor;
begin
case Look of
'+': begin
Match('+');
Factor;
end;
'-': NegFactor;
else Factor;
end;
end;
{--------------------------------------------------------------}
{ Recognize and Translate a Multiply }
procedure Multiply;
begin
Match('*');
Factor;
PopMul;
end;
{-------------------------------------------------------------}
{ Recognize and Translate a Divide }
procedure Divide;
begin
Match('/');
Factor;
PopDiv;
end;
{---------------------------------------------------------------}
{ Common Code Used by Term and FirstTerm }
procedure Term1;
begin
while IsMulop(Look) do begin
Push;
case Look of
'*': Multiply;
'/': Divide;
end;
end;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Math Term }
procedure Term;
begin
Factor;
Term1;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Leading Term }
procedure FirstTerm;
begin
FirstFactor;
Term1;
end;
{--------------------------------------------------------------}
{ Recognize and Translate an Add }
procedure Add;
begin
Match('+');
Term;
PopAdd;
end;
{-------------------------------------------------------------}
{ Recognize and Translate a Subtract }
procedure Subtract;
begin
Match('-');
Term;
PopSub;
end;
{---------------------------------------------------------------}
{ Parse and Translate an Expression }
procedure Expression;
begin
FirstTerm;
while IsAddop(Look) do begin
Push;
case Look of
'+': Add;
'-': Subtract;
end;
end;
end;
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Equals" }
procedure Equal;
begin
Match('=');
Expression;
PopCompare;
SetEqual;
end;
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Less Than or Equal" }
procedure LessOrEqual;
begin
Match('=');
Expression;
PopCompare;
SetLessOrEqual;
end;
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Not Equals" }
procedure NotEqual;
begin
Match('>');
Expression;
PopCompare;
SetNEqual;
end;
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Less Than" }
procedure Less;
begin
Match('<');
case Look of
'=': LessOrEqual;
'>': NotEqual;
else begin
Expression;
PopCompare;
SetLess;
end;
end;
end;
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Greater Than" }
procedure Greater;
begin
Match('>');
if Look = '=' then begin
Match('=');
Expression;
PopCompare;
SetGreaterOrEqual;
end
else begin
Expression;
PopCompare;
SetGreater;
end;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Relation }
procedure Relation;
begin
Expression;
if IsRelop(Look) then begin
Push;
case Look of
'=': Equal;
'<': Less;
'>': Greater;
end;
end;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Boolean Factor with Leading NOT }
procedure NotFactor;
begin
if Look = '!' then begin
Match('!');
Relation;
NotIt;
end
else
Relation;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Boolean Term }
procedure BoolTerm;
begin
NotFactor;
while Look = '&' do begin
Push;
Match('&');
NotFactor;
PopAnd;
end;
end;
{--------------------------------------------------------------}
{ Recognize and Translate a Boolean OR }
procedure BoolOr;
begin
Match('|');
BoolTerm;
PopOr;
end;
{--------------------------------------------------------------}
{ Recognize and Translate an Exclusive Or }
procedure BoolXor;
begin
Match('~');
BoolTerm;
PopXor;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Boolean Expression }
procedure BoolExpression;
begin
BoolTerm;
while IsOrOp(Look) do begin
Push;
case Look of
'|': BoolOr;
'~': BoolXor;
end;
end;
end;
{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }
procedure Assignment;
var Name: string;
begin
Name := Value;
Match('=');
BoolExpression;
Store(Name);
end;
{---------------------------------------------------------------}
{ Recognize and Translate an IF Construct }
procedure DoIf;
var L1, L2: string;
begin
BoolExpression;
L1 := NewLabel;
L2 := L1;
BranchFalse(L1);
Block;
if Token = 'l' then begin
L2 := NewLabel;
Branch(L2);
PostLabel(L1);
Block;
end;
PostLabel(L2);
MatchString('ENDIF');
end;
{--------------------------------------------------------------}
{ Parse and Translate a WHILE Statement }
procedure DoWhile;
var L1, L2: string;
begin
L1 := NewLabel;
L2 := NewLabel;
PostLabel(L1);
BoolExpression;
BranchFalse(L2);
Block;
MatchString('ENDWHILE');
Branch(L1);
PostLabel(L2);
end;
{--------------------------------------------------------------}
{ Process a Read Statement }
procedure DoRead;
begin
Match('(');
GetName;
ReadVar;
while Look = ',' do begin
Match(',');
GetName;
ReadVar;
end;
Match(')');
end;
{--------------------------------------------------------------}
{ Process a Write Statement }
procedure DoWrite;
begin
Match('(');
Expression;
WriteVar;
while Look = ',' do begin
Match(',');
Expression;
WriteVar;
end;
Match(')');
end;
{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }
procedure Block;
begin
Scan;
while not(Token in ['e', 'l']) do begin
case Token of
'i': DoIf;
'w': DoWhile;
'R': DoRead;
'W': DoWrite;
else Assignment;
end;
Scan;
end;
end;
{--------------------------------------------------------------}
{ Allocate Storage for a Variable }
{--------------------------------------------------------------}
{ Parse and Translate a Data Declaration }
procedure Decl;
begin
GetName;
Alloc(Value);
while Look = ',' do begin
Match(',');
GetName;
Alloc(Value);
end;
end;
{--------------------------------------------------------------}
{ Parse and Translate Global Declarations }
procedure TopDecls;
begin
Scan;
while Token <> 'b' do begin
case Token of
'v': Decl;
else Abort('Unrecognized Keyword ' + Value);
end;
Scan;
end;
end;
{--------------------------------------------------------------}
{ Parse and Translate a Main Program }
procedure Main;
begin
MatchString('BEGIN');
Prolog;
Block;
MatchString('END');
Epilog;
end;
{--------------------------------------------------------------}
{ Parse and Translate a Program }
procedure Prog;
begin
MatchString('PROGRAM');
Header;
TopDecls;
Main;
Match('.');
end;
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
var i: integer;
begin
for i := 1 to MaxEntry do begin
ST[i] := '';
SType[i] := ' ';
end;
GetChar;
Scan;
end;
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
Prog;
if Look <> CR then Abort('Unexpected data after ''.''');
end.
{--------------------------------------------------------------}
GET YOUR DOMAIN:
type here .com
Get It Now >>
INTRODUCTION
I've got some good news and some bad news. The bad news is that
this installment is not the one I promised last time. What's
more, the one after this one won't be, either.
The good news is the reason for this installment: I've found a
way to simplify and improve the lexical scanning part of the
compiler. Let me explain.
BACKGROUND
The whole thing came to a head when I tried to address the issue
of semicolons. Several people have asked me about them, and
whether or not KISS will have them separating the statements. My
intention has been NOT to use semicolons, simply because I don't
like them and, as you can see, they have not proved necessary.
But I know that many of you, like me, have gotten used to them,
and so I set out to write a short installment to show you how
they could easily be added, if you were so inclined.
THE PROBLEM
{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }
procedure Block;
begin
Scan;
while not(Token in ['e', 'l']) do begin
case Token of
'i': DoIf;
'w': DoWhile;
'R': DoRead;
'W': DoWrite;
else Assignment;
end;
Scan;
end;
end;
{--------------------------------------------------------------}
That simple and fixed convention served us very well when we had
single-character tokens, and it still does. It would make a lot
of sense to apply the same rule to multi-character tokens.
There's a much better way, and that's just to adopt that same
rule that's worked so well before, to apply to TOKENS as well as
single characters. In other words, we'll prefetch tokens just as
we've always done for characters. It seems so obvious once you
think about it that way.
THE SOLUTION
{--------------------------------------------------------------}
{ Get an Identifier }
procedure GetName;
begin
SkipWhite;
if Not IsAlpha(Look) then Expected('Identifier');
Token := 'x';
Value := '';
repeat
Value := Value + UpCase(Look);
GetChar;
until not IsAlNum(Look);
end;
{--------------------------------------------------------------}
{ Get a Number }
procedure GetNum;
begin
SkipWhite;
if not IsDigit(Look) then Expected('Number');
Token := '#';
Value := '';
repeat
Value := Value + Look;
GetChar;
until not IsDigit(Look);
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Get an Operator }
procedure GetOp;
begin
Token := Look;
Value := '';
repeat
Value := Value + Look;
GetChar;
until IsAlpha(Look) or IsDigit(Look) or IsWhite(Look);
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Get the Next Input Token }
procedure Next;
begin
SkipWhite;
if IsAlpha(Look) then GetName
else if IsDigit(Look) then GetNum
else GetOp;
end;
{--------------------------------------------------------------}
***NOTE that here I have put SkipWhite BEFORE the calls rather
than after. This means that, in general, the variable Look will
NOT have a meaningful value in it, and therefore we should NOT
use it as a test value for parsing, as we have been doing so far.
That's the big departure from our normal approach.
Now, remember that before I was careful not to treat the carriage
return (CR) and line feed (LF) characters as white space. This
was because, with SkipWhite called as the last thing in the
scanner, the encounter with LF would trigger a read statement.
If we were on the last line of the program, we couldn't get out
until we input another line with a non-white character. That's
why I needed the second procedure, NewLine, to handle the CRLF's.
But now, with the call to SkipWhite coming first, that's exactly
the behavior we want. The compiler must know there's another
token coming or it wouldn't be calling Next. In other words, it
hasn't found the terminating END yet. So we're going to insist
on more data until we find something.
All this means that we can greatly simplify both the program and
the concepts, by treating CR and LF as whitespace characters, and
eliminating NewLine. You can do that simply by modifying the
function IsWhite:
{--------------------------------------------------------------}
{ Recognize White Space }
We've already tried similar routines in Part VII, but you might
as well try these new ones out. Add them to a copy of the Cradle
and call Next with the following main program:
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
repeat
Next;
WriteLn(Token, ' ', Value);
until Token = '.';
end.
{--------------------------------------------------------------}
This ALMOST works, but not quite. There are two potential
problems: First, in KISS/TINY almost all of our operators are
single-character operators. The only exceptions are the relops
>=, <=, and <>. It seems a shame to treat all operators as
strings and do a string compare, when only a single character
compare will almost always suffice. Second, and much more
important, the thing doesn't WORK when two operators appear
together, as in (a+b)*(c+d). Here the string following 'b' would
be interpreted as a single operator ")*(."
{--------------------------------------------------------------}
{ Get an Operator }
procedure GetOp;
begin
SkipWhite;
Token := Look;
Value := Look;
GetChar;
end;
{--------------------------------------------------------------}
Note that I still give the string Value a value. If you're truly
concerned about efficiency, you could leave this out. When we're
expecting an operator, we will only be testing Token anyhow, so
the value of the string won't matter. But to me it seems to be
good practice to give the thing a value just in case.
Now, in Part VII the function of Next was combined with procedure
Scan, which also checked every identifier against a list of
keywords and encoded each one that was found. As I mentioned at
the time, the last thing we would want to do is to use such a
procedure in places where keywords should not appear, such as in
expressions. If we did that, the keyword list would be scanned
for every identifier appearing in the code. Not good.
{--------------------------------------------------------------}
{ Scan the Current Identifier for Keywords }
procedure Scan;
begin
if Token = 'x' then
Token := KWcode[Lookup(Addr(KWlist), Value, NKW) + 1];
end;
{--------------------------------------------------------------}
There is one last detail. In the compiler there are a few places
that we must actually check the string value of the token.
Mainly, this is done to distinguish between the different END's,
but there are a couple of other places. (I should note in
passing that we could always eliminate the need for matching END
characters by encoding each one to a different character. Right
now we are definitely taking the lazy man's route.)
Armed with these new scanner procedures, we can now begin to fix
the compiler to use them properly. The changes are all quite
minor, but there are quite a few places where changes are
necessary. Rather than showing you each place, I will give you
the general idea and then just give the finished product.
First of all, the code for procedure Block doesn't change, though
its function does:
{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }
procedure Block;
begin
Scan;
while not(Token in ['e', 'l']) do begin
case Token of
'i': DoIf;
'w': DoWhile;
'R': DoRead;
'W': DoWrite;
else Assignment;
end;
Scan;
end;
end;
{--------------------------------------------------------------}
Remember that the new version of Scan doesn't advance the input
stream, it only scans for keywords. The input stream must be
advanced by each procedure that Block calls.
{---------------------------------------------------------------}
{ Parse and Translate a Boolean Expression }
procedure BoolExpression;
begin
BoolTerm;
while IsOrOp(Token) do begin
Push;
case Token of
'|': BoolOr;
'~': BoolXor;
end;
end;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Recognize and Translate an Add }
procedure Add;
begin
Next;
Term;
PopAdd;
end;
{-------------------------------------------------------------}
{---------------------------------------------------------------}
{ Recognize and Translate an IF Construct }
procedure DoIf;
var L1, L2: string;
begin
Next;
BoolExpression;
L1 := NewLabel;
L2 := L1;
BranchFalse(L1);
Block;
if Token = 'l' then begin
Next;
L2 := NewLabel;
Branch(L2);
PostLabel(L1);
Block;
end;
PostLabel(L2);
MatchString('ENDIF');
end;
{--------------------------------------------------------------}
(1) I've deleted the two procedures Prog and Main, and combined
their functions into the main program. They didn't seem to
add to program clarity ... in fact they seemed to just
muddy things up a little.
(2) I've deleted the keywords PROGRAM and BEGIN from the
keyword list. Each one only occurs in one place, so it's
not necessary to search for it.
(6) There was an error in InTable and Locate that caused them
to search all locations instead of only those with valid
data in them. They now search only valid cells. This
allows us to eliminate the initialization of the symbol
table, which was done in Init.
(9) I fixed an error in the Read routine ... the earlier value
did not check for a valid variable name.
CONCLUSION
The resulting compiler for TINY is given below. Other than the
removal of the keyword PROGRAM, it parses the same language as
before. It's just a bit cleaner, and more importantly it's
considerably more robust. I feel good about it.
{--------------------------------------------------------------}
{ Constant Declarations }
LCount: integer = 0;
NEntry: integer = 0;
{--------------------------------------------------------------}
{ Type Declarations }
TabPtr = ^SymTab;
{--------------------------------------------------------------}
{ Variable Declarations }
{--------------------------------------------------------------}
{ Definition of Keywords and Token Types }
const NKW = 9;
NKW1 = 10;
{--------------------------------------------------------------}
{ Read New Character From Input Stream }
procedure GetChar;
begin
Read(Look);
end;
{--------------------------------------------------------------}
{ Report an Error }
procedure Error(s: string);
begin
WriteLn;
WriteLn(^G, 'Error: ', s, '.');
end;
{--------------------------------------------------------------}
{ Report Error and Halt }
{--------------------------------------------------------------}
{ Report What Was Expected }
{--------------------------------------------------------------}
{ Report an Undefined Identifier }
{--------------------------------------------------------------}
{ Report a Duplicate Identifier }
{--------------------------------------------------------------}
{ Check to Make Sure the Current Token is an Identifier }
procedure CheckIdent;
begin
if Token <> 'x' then Expected('Identifier');
end;
{--------------------------------------------------------------}
{ Recognize an Alpha Character }
{--------------------------------------------------------------}
{ Recognize an AlphaNumeric Character }
{--------------------------------------------------------------}
{ Recognize an Addop }
{--------------------------------------------------------------}
{ Recognize a Mulop }
{--------------------------------------------------------------}
{ Recognize a Boolean Orop }
{--------------------------------------------------------------}
{ Recognize a Relop }
{--------------------------------------------------------------}
{ Recognize White Space }
procedure SkipWhite;
begin
while IsWhite(Look) do
GetChar;
end;
{--------------------------------------------------------------}
{ Table Lookup }
{--------------------------------------------------------------}
{ Locate a Symbol in Table }
{ Returns the index of the entry. Zero if not present. }
{--------------------------------------------------------------}
{ Look for Symbol in Table }
{--------------------------------------------------------------}
{ Check to See if an Identifier is in the Symbol Table }
{ Report an error if it's not. }
{--------------------------------------------------------------}
{ Check the Symbol Table for a Duplicate Identifier }
{ Report an error if identifier is already in table. }
{--------------------------------------------------------------}
{ Add a New Entry to Symbol Table }
{--------------------------------------------------------------}
{ Get an Identifier }
procedure GetName;
begin
SkipWhite;
if Not IsAlpha(Look) then Expected('Identifier');
Token := 'x';
Value := '';
repeat
Value := Value + UpCase(Look);
GetChar;
until not IsAlNum(Look);
end;
{--------------------------------------------------------------}
{ Get a Number }
procedure GetNum;
begin
SkipWhite;
if not IsDigit(Look) then Expected('Number');
Token := '#';
Value := '';
repeat
Value := Value + Look;
GetChar;
until not IsDigit(Look);
end;
{--------------------------------------------------------------}
{ Get an Operator }
procedure GetOp;
begin
SkipWhite;
Token := Look;
Value := Look;
GetChar;
end;
{--------------------------------------------------------------}
{ Get the Next Input Token }
procedure Next;
begin
SkipWhite;
if IsAlpha(Look) then GetName
else if IsDigit(Look) then GetNum
else GetOp;
end;
{--------------------------------------------------------------}
{ Scan the Current Identifier for Keywords }
procedure Scan;
begin
if Token = 'x' then
Token := KWcode[Lookup(Addr(KWlist), Value, NKW) + 1];
end;
{--------------------------------------------------------------}
{ Match a Specific Input String }
{--------------------------------------------------------------}
{ Output a String with Tab }
{--------------------------------------------------------------}
{ Output a String with Tab and CRLF }
{--------------------------------------------------------------}
{ Generate a Unique Label }
{--------------------------------------------------------------}
{ Post a Label To Output }
{---------------------------------------------------------------}
{ Clear the Primary Register }
procedure Clear;
begin
EmitLn('CLR D0');
end;
{---------------------------------------------------------------}
{ Negate the Primary Register }
procedure Negate;
begin
EmitLn('NEG D0');
end;
{---------------------------------------------------------------}
{ Complement the Primary Register }
procedure NotIt;
begin
EmitLn('NOT D0');
end;
{---------------------------------------------------------------}
{ Load a Constant Value to Primary Register }
{---------------------------------------------------------------}
{ Load a Variable to Primary Register }
{---------------------------------------------------------------}
{ Push Primary onto Stack }
procedure Push;
begin
EmitLn('MOVE D0,-(SP)');
end;
{---------------------------------------------------------------}
{ Add Top of Stack to Primary }
procedure PopAdd;
begin
EmitLn('ADD (SP)+,D0');
end;
{---------------------------------------------------------------}
{ Subtract Primary from Top of Stack }
procedure PopSub;
begin
EmitLn('SUB (SP)+,D0');
EmitLn('NEG D0');
end;
{---------------------------------------------------------------}
{ Multiply Top of Stack by Primary }
procedure PopMul;
begin
EmitLn('MULS (SP)+,D0');
end;
{---------------------------------------------------------------}
{ Divide Top of Stack by Primary }
procedure PopDiv;
begin
EmitLn('MOVE (SP)+,D7');
EmitLn('EXT.L D7');
EmitLn('DIVS D0,D7');
EmitLn('MOVE D7,D0');
end;
{---------------------------------------------------------------}
{ AND Top of Stack with Primary }
procedure PopAnd;
begin
EmitLn('AND (SP)+,D0');
end;
{---------------------------------------------------------------}
{ OR Top of Stack with Primary }
procedure PopOr;
begin
EmitLn('OR (SP)+,D0');
end;
{---------------------------------------------------------------}
{ XOR Top of Stack with Primary }
procedure PopXor;
begin
EmitLn('EOR (SP)+,D0');
end;
{---------------------------------------------------------------}
{ Compare Top of Stack with Primary }
procedure PopCompare;
begin
EmitLn('CMP (SP)+,D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was = }
procedure SetEqual;
begin
EmitLn('SEQ D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was != }
procedure SetNEqual;
begin
EmitLn('SNE D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was > }
procedure SetGreater;
begin
EmitLn('SLT D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was < }
procedure SetLess;
begin
EmitLn('SGT D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was <= }
procedure SetLessOrEqual;
begin
EmitLn('SGE D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{ Set D0 If Compare was >= }
procedure SetGreaterOrEqual;
begin
EmitLn('SLE D0');
EmitLn('EXT D0');
end;
{---------------------------------------------------------------}
{ Store Primary to Variable }
{---------------------------------------------------------------}
{ Branch Unconditional }
{---------------------------------------------------------------}
{ Branch False }
{---------------------------------------------------------------}
{ Read Variable to Primary Register }
procedure WriteIt;
begin
EmitLn('BSR WRITE');
end;
{--------------------------------------------------------------}
{ Write Header Info }
procedure Header;
begin
WriteLn('WARMST', TAB, 'EQU $A01E');
end;
{--------------------------------------------------------------}
{ Write the Prolog }
procedure Prolog;
begin
PostLabel('MAIN');
end;
{--------------------------------------------------------------}
{ Write the Epilog }
procedure Epilog;
begin
EmitLn('DC WARMST');
EmitLn('END MAIN');
end;
{---------------------------------------------------------------}
{ Allocate Storage for a Static Variable }
{---------------------------------------------------------------}
{ Parse and Translate a Math Factor }
procedure Factor;
begin
if Token = '(' then begin
Next;
BoolExpression;
MatchString(')');
end
else begin
if Token = 'x' then
LoadVar(Value)
else if Token = '#' then
LoadConst(Value)
else Expected('Math Factor');
Next;
end;
end;
{--------------------------------------------------------------}
{ Recognize and Translate a Multiply }
procedure Multiply;
begin
Next;
Factor;
PopMul;
end;
{-------------------------------------------------------------}
{ Recognize and Translate a Divide }
procedure Divide;
begin
Next;
Factor;
PopDiv;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Math Term }
procedure Term;
begin
Factor;
while IsMulop(Token) do begin
Push;
case Token of
'*': Multiply;
'/': Divide;
end;
end;
end;
{--------------------------------------------------------------}
{ Recognize and Translate an Add }
procedure Add;
begin
Next;
Term;
PopAdd;
end;
{-------------------------------------------------------------}
{ Recognize and Translate a Subtract }
procedure Subtract;
begin
Next;
Term;
PopSub;
end;
{---------------------------------------------------------------}
{ Parse and Translate an Expression }
procedure Expression;
begin
if IsAddop(Token) then
Clear
else
Term;
while IsAddop(Token) do begin
Push;
case Token of
'+': Add;
'-': Subtract;
end;
end;
end;
{---------------------------------------------------------------}
{ Get Another Expression and Compare }
procedure CompareExpression;
begin
Expression;
PopCompare;
end;
{---------------------------------------------------------------}
{ Get The Next Expression and Compare }
procedure NextExpression;
begin
Next;
CompareExpression;
end;
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Equals" }
procedure Equal;
begin
NextExpression;
SetEqual;
end;
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Less Than or Equal" }
procedure LessOrEqual;
begin
NextExpression;
SetLessOrEqual;
end;
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Not Equals" }
procedure NotEqual;
begin
NextExpression;
SetNEqual;
end;
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Less Than" }
procedure Less;
begin
Next;
case Token of
'=': LessOrEqual;
'>': NotEqual;
else begin
CompareExpression;
SetLess;
end;
end;
end;
{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Greater Than" }
procedure Greater;
begin
Next;
if Token = '=' then begin
NextExpression;
SetGreaterOrEqual;
end
else begin
CompareExpression;
SetGreater;
end;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Relation }
procedure Relation;
begin
Expression;
if IsRelop(Token) then begin
Push;
case Token of
'=': Equal;
'<': Less;
'>': Greater;
end;
end;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Boolean Factor with Leading NOT }
procedure NotFactor;
begin
if Token = '!' then begin
Next;
Relation;
NotIt;
end
else
Relation;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Boolean Term }
procedure BoolTerm;
begin
NotFactor;
while Token = '&' do begin
Push;
Next;
NotFactor;
PopAnd;
end;
end;
{--------------------------------------------------------------}
{ Recognize and Translate a Boolean OR }
procedure BoolOr;
begin
Next;
BoolTerm;
PopOr;
end;
{--------------------------------------------------------------}
{ Recognize and Translate an Exclusive Or }
procedure BoolXor;
begin
Next;
BoolTerm;
PopXor;
end;
{---------------------------------------------------------------}
{ Parse and Translate a Boolean Expression }
procedure BoolExpression;
begin
BoolTerm;
while IsOrOp(Token) do begin
Push;
case Token of
'|': BoolOr;
'~': BoolXor;
end;
end;
end;
{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }
procedure Assignment;
var Name: string;
begin
CheckTable(Value);
Name := Value;
Next;
MatchString('=');
BoolExpression;
Store(Name);
end;
{---------------------------------------------------------------}
{ Recognize and Translate an IF Construct }
procedure DoIf;
var L1, L2: string;
begin
Next;
BoolExpression;
L1 := NewLabel;
L2 := L1;
BranchFalse(L1);
Block;
if Token = 'l' then begin
Next;
L2 := NewLabel;
Branch(L2);
PostLabel(L1);
Block;
end;
PostLabel(L2);
MatchString('ENDIF');
end;
{--------------------------------------------------------------}
{ Parse and Translate a WHILE Statement }
procedure DoWhile;
var L1, L2: string;
begin
Next;
L1 := NewLabel;
L2 := NewLabel;
PostLabel(L1);
BoolExpression;
BranchFalse(L2);
Block;
MatchString('ENDWHILE');
Branch(L1);
PostLabel(L2);
end;
{--------------------------------------------------------------}
{ Read a Single Variable }
procedure ReadVar;
begin
CheckIdent;
CheckTable(Value);
ReadIt(Value);
Next;
end;
{--------------------------------------------------------------}
{ Process a Read Statement }
procedure DoRead;
begin
Next;
MatchString('(');
ReadVar;
while Token = ',' do begin
Next;
ReadVar;
end;
MatchString(')');
end;
{--------------------------------------------------------------}
{ Process a Write Statement }
procedure DoWrite;
begin
Next;
MatchString('(');
Expression;
WriteIt;
while Token = ',' do begin
Next;
Expression;
WriteIt;
end;
MatchString(')');
end;
{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }
procedure Block;
begin
Scan;
while not(Token in ['e', 'l']) do begin
case Token of
'i': DoIf;
'w': DoWhile;
'R': DoRead;
'W': DoWrite;
else Assignment;
end;
Scan;
end;
end;
{--------------------------------------------------------------}
{ Allocate Storage for a Variable }
procedure Alloc;
begin
Next;
if Token <> 'x' then Expected('Variable Name');
CheckDup(Value);
AddEntry(Value, 'v');
Allocate(Value, '0');
Next;
end;
{--------------------------------------------------------------}
{ Parse and Translate Global Declarations }
procedure TopDecls;
begin
Scan;
while Token = 'v' do
Alloc;
while Token = ',' do
Alloc;
end;
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
begin
GetChar;
Next;
end;
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
MatchString('PROGRAM');
Header;
TopDecls;
MatchString('BEGIN');
Prolog;
Block;
MatchString('END');
Epilog;
end.
{--------------------------------------------------------------}
GET YOUR DOMAIN:
type here .com
Get It Now >>
INTRODUCTION
Now that that's behind us, I can tell you what I set out to in
the first place. This shouldn't take long, and then we can get
back into the mainstream.
SEMICOLONS
a=b c= d e=e+1
I suspect that this is the major ... perhaps ONLY ... reason for
semicolons: to keep programs from looking funny.
Still, there are people who have used semicolons for so long,
they feel naked without them. I'm one of them. Once I had KISS
defined sufficiently well, I began to write a few sample programs
in the language. I discovered, somewhat to my horror, that I
kept putting semicolons in anyway. So now I'm facing the
prospect of a NEW rash of compiler errors, caused by UNWANTED
semicolons. Phooey!
Perhaps more to the point, there are readers out there who are
designing their own languages, which may include semicolons, or
who want to use the techniques of these tutorials to compile
conventional languages like C. In either case, we need to be
able to deal with semicolons.
SYNTACTIC SUGAR
There are two schools of thought on this subject, which are well
represented by two of our most popular languages, C and Pascal.
Those from the other school tend to like Pascal. They argue that
having to type a few extra characters is a small price to pay for
legibility. After all, humans have to read the programs, too.
Their best argument is that each such construct is an opportunity
to tell the compiler that you really mean for it to do what you
said to. The sugary tokens serve as useful landmarks to help you
find your way.
a=1+(2*b+c) b...
Since there is no operator connecting the token 'b' with the rest
of the statement, the compiler will conclude that the expression
ends with the ')', and the 'b' is the beginning of a new
statement. But suppose I have simply left out the intended
operator, and I really want to say:
a=1+(2*b+c)*b...
In this case the compiler will get an error, all right, but it
won't be very meaningful since it will be expecting an '=' sign
after the 'b' that really shouldn't be there.
If, on the other hand, I include a semicolon after the 'b', THEN
there can be no doubt where I intend the statement to end.
Syntactic sugar, then, can serve a very useful purpose by
providing some additional insurance that we remain on track.
Of the two syntaxes, the Pascal one seems on the face of it more
rational, but experience has shown that it leads to some strange
difficulties. People get so used to typing a semicolon after
every statement that they tend to type one after the last
statement in a block, also. That usually doesn't cause any harm
... it just gets treated as a null statement. Many Pascal
programmers, including yours truly, do just that. But there is
one place you absolutely CANNOT type a semicolon, and that's
right before an ELSE. This little gotcha has cost me many an
extra compilation, particularly when the ELSE is added to
existing code. So the C/Ada choice turns out to be better.
Apparently Nicklaus Wirth thinks so, too: In his Modula 2, he
abandoned the Pascal approach.
Given either of these two syntaxes, it's an easy matter (now that
we've reorganized the parser!) to add these features to our
parser. Let's take the last case first, since it's simpler.
{--------------------------------------------------------------}
{ Match a Semicolon }
procedure Semi;
begin
MatchString(';');
end;
{--------------------------------------------------------------}
This procedure works very much like our old Match. It insists on
finding a semicolon as the next token. Having found it, it skips
to the next one.
procedure Block;
begin
Scan;
while not(Token in ['e', 'l']) do begin
case Token of
'i': DoIf;
'w': DoWhile;
'R': DoRead;
'W': DoWrite;
'x': Assignment;
end;
Semi;
Scan;
end;
end;
{--------------------------------------------------------------}
Note carefully the subtle change in the case statement. The call
to Assignment is now guarded by a test on Token. This is to
avoid calling Assignment when the token is a semicolon (which
could happen if the statement is null).
{--------------------------------------------------------------}
{ Parse and Translate Global Declarations }
procedure TopDecls;
begin
Scan;
while Token = 'v' do begin
Alloc;
while Token = ',' do
Alloc;
Semi;
end;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
MatchString('PROGRAM');
Semi;
Header;
TopDecls;
MatchString('BEGIN');
Prolog;
Block;
MatchString('END');
Epilog;
end.
{--------------------------------------------------------------}
It's as easy as that. Try it with a copy of TINY and see how you
like it.
{--------------------------------------------------------------}
{ Parse and Translate a Single Statement }
procedure Statement;
begin
Scan;
case Token of
'i': DoIf;
'w': DoWhile;
'R': DoRead;
'W': DoWrite;
'x': Assignment;
end;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }
procedure Block;
begin
Statement;
while Token = ';' do begin
Next;
Statement;
end;
end;
{--------------------------------------------------------------}
That sure didn't hurt, did it? We can now parse semicolons in
Pascal-like fashion.
A COMPROMISE
Now that we know how to deal with semicolons, does that mean that
I'm going to put them in KISS/TINY? Well, yes and no. I like
the extra sugar and the security that comes with knowing for sure
where the ends of statements are. But I haven't changed my
dislike for the compilation errors associated with semicolons.
So I have what I think is a nice compromise: Make them OPTIONAL!
{--------------------------------------------------------------}
{ Match a Semicolon }
procedure Semi;
begin
if Token = ';' then Next;
end;
{--------------------------------------------------------------}
COMMENTS
SINGLE-CHARACTER DELIMITERS
{--------------------------------------------------------------}
{ Skip A Comment Field }
procedure SkipComment;
begin
while Look <> '}' do
GetCharX;
GetCharX;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Get Character from Input Stream }
{ Skip Any Comments }
procedure GetChar;
begin
GetCharX;
if Look = '{' then SkipComment;
end;
{--------------------------------------------------------------}
Code this up and give it a try. You'll find that you can,
indeed, bury comments anywhere you like. The comments never even
get into the parser proper ... every call to GetChar just returns
any character that's NOT part of a comment.
As a matter of fact, while this approach gets the job done, and
may even be perfectly satisfactory for you, it does its job a
little TOO well. First of all, most programming languages
specify that a comment should be treated like a space, so that
comments aren't allowed to be embedded in, say, variable names.
This current version doesn't care WHERE you put comments.
Second, since the rest of the parser can't even receive a '{'
character, you will not be allowed to put one in a quoted string.
{--------------------------------------------------------------}
{ Recognize White Space }
{--------------------------------------------------------------}
{ Skip Over Leading White Space }
procedure SkipWhite;
begin
while IsWhite(Look) do begin
if Look = '{' then
SkipComment
else
GetChar;
end;
end;
{--------------------------------------------------------------}
OK, give this one a try, too. You'll find that it will let a
comment serve to delimit tokens. It's worth mentioning that this
approach also gives us the ability to handle curly braces within
quoted strings, since within such strings we will not be testing
for or skipping over whitespace.
{--------------------------------------------------------------}
{ Skip A Comment Field }
procedure SkipComment;
begin
while Look <> '}' do begin
GetChar;
if Look = '{' then SkipComment;
end;
GetChar;
end;
{--------------------------------------------------------------}
That does it. As sophisticated a comment-handler as you'll ever
need.
MULTI-CHARACTER DELIMITERS
That's all well and good for cases where a comment is delimited
by single characters, but what about the cases such as C or
standard Pascal, where two characters are required? Well, the
principles are still the same, but we have to change our approach
quite a bit. I'm sure it won't surprise you to learn that things
get harder in this case.
Let's assume we're using the C delimiters '/*' and '*/'. First,
we need to go back to the "GetCharX' approach. In yet another
copy of your compiler, rename GetChar to GetCharX and then enter
the following new procedure GetChar:
{--------------------------------------------------------------}
{ Read New Character. Intercept '/*' }
procedure GetChar;
begin
if TempChar <> ' ' then begin
Look := TempChar;
TempChar := ' ';
end
else begin
GetCharX;
if Look = '/' then begin
Read(TempChar);
if TempChar = '*' then begin
Look := '{';
TempChar := ' ';
end;
end;
end;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Skip A Comment Field }
procedure SkipComment;
begin
repeat
repeat
GetCharX;
until Look = '*';
GetCharX;
until Look = '/';
GetChar;
end;
{--------------------------------------------------------------}
ONE-SIDED COMMENTS
So far I've shown you how to deal with any kind of comment
delimited on the left and the right. That only leaves the one-
sided comments like those in assembler language or in Ada, that
are terminated by the end of the line. In a way, that case is
easier. The only procedure that would need to be changed is
SkipComment, which must now terminate at the newline characters:
{--------------------------------------------------------------}
{ Skip A Comment Field }
procedure SkipComment;
begin
repeat
GetCharX;
until Look = CR;
GetChar;
end;
{--------------------------------------------------------------}
CONCLUSION
At this point we now have the ability to deal with both comments
and semicolons, as well as other kinds of syntactic sugar. I've
shown you several ways to deal with each, depending upon the
convention desired. The only issue left is: which of these
conventions should we use in KISS/TINY?
For the reasons that I've given as we went along, I'm choosing
the following:
Put the code corresponding to these cases into your copy of TINY.
You now have TINY Version 1.2.
INTRODUCTION
language, TINY 1.3, that embodies all these features, and we have
adding some file I/O we could indeed have a working compiler that
could read integer data, perform calculations with it, and output
the results.
"Real" languages have more than one data type, and they support
procedure calls. More than any others, it's these two features
jobs.
been able to put all those issues to rest and can get on with the
procedures. Next time, we'll talk about the basic data types.
write. The reason has nothing to do with the subject itself ...
I've known what I wanted to say for some time, and in fact I
When I first began this series, I told you that we would use
had already done, and just add the new features to the existing
TINY compiler.
dumb to try. Having gotten this far using the idea of small
After all this time, you don't need more buildup than that, so
THE BASICS
All modern CPU's provide direct support for procedure calls, and
what we are doing. We don't need the full TINY compiler, but we
<ident> = <ident>
shown the whole thing here, just to make sure we're all starting
program Calls;
{--------------------------------------------------------------}
{ Constant Declarations }
CR = ^M;
LF = ^J;
{--------------------------------------------------------------}
{ Variable Declarations }
{--------------------------------------------------------------}
procedure GetChar;
begin
Read(Look);
end;
{--------------------------------------------------------------}
{ Report an Error }
begin
WriteLn;
end;
{--------------------------------------------------------------}
begin
Error(s);
Halt;
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
end;
{--------------------------------------------------------------}
begin
TypeOf := ST[n];
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
begin
ST[Name] := T;
end;
{--------------------------------------------------------------}
begin
variable');
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
begin
IsDigit := c in ['0'..'9'];
end;
{--------------------------------------------------------------}
end;
{--------------------------------------------------------------}
{ Recognize an Addop }
begin
end;
{--------------------------------------------------------------}
{ Recognize a Mulop }
begin
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
{ Recognize a Relop }
begin
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
procedure SkipWhite;
begin
while IsWhite(Look) do
GetChar;
end;
{--------------------------------------------------------------}
procedure Fin;
begin
if Look = LF then
GetChar;
end;
end;
{--------------------------------------------------------------}
begin
SkipWhite;
end;
{--------------------------------------------------------------}
{ Get an Identifier }
begin
GetName := UpCase(Look);
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Get a Number }
begin
GetNum := Look;
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
begin
Write(TAB, s);
end;
{--------------------------------------------------------------}
begin
Emit(s);
WriteLn;
end;
{--------------------------------------------------------------}
begin
WriteLn(L, ':');
end;
{--------------------------------------------------------------}
begin
CheckVar(Name);
end;
{--------------------------------------------------------------}
begin
CheckVar(Name);
EmitLn('MOVE D0,(A0)')
end;
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
var i: char;
begin
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Vestigial Version }
procedure Expression;
begin
LoadVar(GetName);
end;
{--------------------------------------------------------------}
procedure Assignment;
begin
Name := GetName;
Match('=');
Expression;
StoreVar(Name);
end;
{--------------------------------------------------------------}
procedure DoBlock;
begin
Assignment;
Fin;
end;
end;
{--------------------------------------------------------------}
procedure BeginBlock;
begin
Match('b');
Fin;
DoBlock;
Match('e');
Fin;
end;
{--------------------------------------------------------------}
begin
ST[N] := 'v';
end;
{--------------------------------------------------------------}
procedure Decl;
begin
Match('v');
Alloc(GetName);
end;
{--------------------------------------------------------------}
procedure TopDecls;
begin
case Look of
'v': Decl;
Fin;
end;
end;
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
TopDecls;
BeginBlock;
end.
{--------------------------------------------------------------}
a variable name to make sure it's a legal one. It's also worth
provide for white space and newlines. Finally, note that the
va (for VAR A)
vb (for VAR B)
vc (for VAR C)
b (for BEGIN)
a=b
b=c
e. (for END.)
As usual, you should also make some deliberate errors, and verify
DECLARING A PROCEDURE
If you're satisfied that our little program works, then it's time
think about the code we'd like to see generated for it:
PROGRAM FOO;
BEGIN .
..
..
END; RTS
..
..
..
..
and the desired assembler code on the right. The first thing to
here! For the great bulk of both the procedure and the main
generated.
{--------------------------------------------------------------}
procedure DoProc;
var N: char;
begin
Match('p');
N := GetName;
Fin;
ST[N] := 'p';
PostLabel(N);
BeginBlock;
Return;
end;
{--------------------------------------------------------------}
Note that I've added a new code generation routine, Return, which
To finish this version, add the following line within the Case
statement in DoBlock:
'p': DoProc;
BNF that drives it, differs from standard Pascal. In the Jensen
DoVars;
DoProcs;
DoMain;
require that order and let you freely mix up the various
OK, try this new version out. Note that we can declare as many
character names!), and the labels and RTS's all come out in the
right places.
What's more, this penalty gets paid at RUN TIME, because extra
on the grounds that I have seen too many abuses of the feature.
Before going on to the next step, it's also worth noting that the
the label and END statement. Let's fix that little oversight:
{--------------------------------------------------------------}
procedure DoMain;
begin
Match('b');
Fin;
Prolog;
DoBlock;
Epilog;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
TopDecls;
DoMain;
end.
{--------------------------------------------------------------}
Note that DoProc and DoMain are not quite symmetrical. DoProc
a 'p' here), while the main program gets no keyword other than
functions are treated just alike, except that the main program
happens to be identified by its name, "main." Since C functions
can appear in any order, the main program can also be anywhere in
point putting anything after the main program ... it could never
than being that part of the code that comes after the global
main program.
BEGIN { of MAIN }
the global declarations ... isn't the main program just one more
declaration, also?
The code also looks much better, at least in the sense that
{--------------------------------------------------------------}
procedure DoMain;
var N: char;
begin
Match('P');
N := GetName;
Fin;
Prolog;
BeginBlock;
end;
{--------------------------------------------------------------}
.
.
{--------------------------------------------------------------}
procedure TopDecls;
begin
case Look of
'v': Decl;
'p': DoProc;
'P': DoMain;
end;
Fin;
end;
end;
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
TopDecls;
Epilog;
end.
{--------------------------------------------------------------}
Since the declaration of the main program is now within the loop
ensure that it's the last thing in the file? And how do we ever
exit from the loop? My answer for the second question, as you
can see, was to bring back our old friend the period. Once the
the code that I've shown, there's nothing to keep the programmer
from adding code after the main program ... even another main
like to use the area just after the program to declare large,
is.
help than that, it's pretty easy to add some logic to kick us out
mains.
like a case where our parser ceases being predictive, and indeed
{--------------------------------------------------------------}
begin
Match('=');
Expression;
StoreVar(Name);
end;
{--------------------------------------------------------------}
procedure AssignOrProc;
begin
Name := GetName;
case TypeOf(Name) of
'v': Assignment(Name);
'p': CallProc(Name);
end;
end;
{--------------------------------------------------------------}
procedure DoBlock;
begin
AssignOrProc;
Fin;
end;
end;
{--------------------------------------------------------------}
generation routine:
{--------------------------------------------------------------}
{ Call a Procedure }
begin
end;
{--------------------------------------------------------------}
equivalent of BASIC's GOSUB construct. Not too bad ... after all
PASSING PARAMETERS
Again, we all know the basic idea of passed parameters, but let's
PROCEDURE FOO(X, Y, Z)
counts. In the example above, the name 'X' simply means "the
basis.
view. But to tell the truth I prefer the latter. For procedures
The statement
Initialize; ,
identifiers, you can't tell the difference between the two. You
give him the benefit of the doubt and assume that he had a good
extra code ... just parse the syntax. The code for processing
the declaration has very much the same form we've seen before
{--------------------------------------------------------------}
procedure FormalList;
begin
Match('(');
FormalParam;
Match(',');
FormalParam;
end;
end;
Match(')');
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
procedure DoProc;
var N: char;
begin
Match('p');
N := GetName;
FormalList;
Fin;
ST[N] := 'p';
PostLabel(N);
BeginBlock;
Return;
end;
{--------------------------------------------------------------}
For now, the code for FormalParam is just a dummy one that simply
{--------------------------------------------------------------}
procedure FormalParam;
begin
Name := GetName;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
procedure Param;
begin
Name := GetName;
end;
{--------------------------------------------------------------}
procedure ParamList;
begin
Match('(');
Param;
Match(',');
Param;
end;
end;
Match(')');
end;
{--------------------------------------------------------------}
begin
ParamList;
Call(Name);
end;
{--------------------------------------------------------------}
this, I've renamed the code generation routine to just Call, and
OK, if you'll add all this code to your translator and try it
out, you'll find that you can indeed parse the syntax properly.
do this. We'll ignore the issue now if for no other reason than
a place for that data and we can deal with the issue then.
There is more than one way to pass a parameter, and the way we do
choose to.
o By value
o By reference (address)
since the same mechanism was used in all cases, with one
There were problems, though. Many people felt that this method
approach.
There was, however, an even more insidious problem, which was not
CALL FOO(A, B, J + 1)
K=J+1
CALL FOO(A, B, K)
Here again, there was copying required, and the burden was on the
CALL FOO(A, B, 4)
pool," so that we only have to store one value for each unique
the literal pool gets CHANGED, to a -7. From then on, every
This means that the value of the scalar is COPIED into a separate
value used only for the call. Since the value passed is a copy,
any way it likes. The value in the caller will not be changed.
the need to copy the parameter. But remember that we're going to
Except for one small little detail: if all parameters are passed
none other than our old friend the FORTRAN parameter, with a new
name and paint job for disguise. Wirth neatly gets around the
works even better yet, because even though you can change the
variable pointed to all you like, you still CAN'T change the
pointer.
yet!
PASS-BY-VALUE
Let's just try some simple-minded things and see where they lead
procedure call:
FOO(X, Y)
Almost the only reasonable way to pass the data is through the
CPU stack. So the code we'd like to see generated might look
When the BSR is executed, the CPU pushes the return address onto
the stack and jumps to FOO. At this point the stack will look
like this:
.
Value of X (2 bytes)
Value of Y (2 bytes)
are:
X: 6(SP)
Y: 4(SP)
PROCEDURE FOO(A, B)
BEGIN
A=B
END
MOVE D0,6(SP)
RTS
This means some changes to the symbol table stuff. In fact, for
procedure has:
And we need to initialize the new table. Now, remember that the
{--------------------------------------------------------------}
procedure ClearParams;
var i: char;
begin
Params[i] := 0;
NumParams := 0;
end;
{--------------------------------------------------------------}
We'll put a call to this procedure in Init, and also at the end
of DoProc:
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
var i: char;
begin
GetChar;
SkipWhite;
ClearParams;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
procedure DoProc;
var N: char;
begin
Match('p');
N := GetName;
FormalList;
Fin;
ST[N] := 'p';
PostLabel(N);
BeginBlock;
Return;
ClearParams;
end;
{--------------------------------------------------------------}
Note that the call within DoProc ensures that the table will be
OK, now we need a few procedures to work with the table. The
etc.:
{--------------------------------------------------------------}
begin
ParamNumber := Params[N];
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
begin
Inc(NumParams);
Params[Name] := NumParams;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
begin
WriteLn(Offset, '(SP),D0');
end;
{--------------------------------------------------------------}
begin
Emit('MOVE D0,');
WriteLn(Offset, '(SP)');
end;
{--------------------------------------------------------------}
procedure Push;
begin
EmitLn('MOVE D0,-(SP)');
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
procedure FormalParam;
begin
AddParam(GetName);
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
begin
if IsParam(n) then
TypeOf := 'f'
else
TypeOf := ST[n];
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
procedure AssignOrProc;
begin
Name := GetName;
case TypeOf(Name) of
'p': CallProc(Name);
Here');
end;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
{ Parse and Translate an Expression }
{ Vestigial Version }
procedure Expression;
begin
Name := GetName;
if IsParam(Name) then
LoadParam(ParamNumber(Name))
else
LoadVar(Name);
end;
{--------------------------------------------------------------}
begin
Match('=');
Expression;
if IsParam(Name) then
StoreParam(ParamNumber(Name))
else
StoreVar(Name);
end;
{--------------------------------------------------------------}
As you can see, these procedures will treat every variable name
encountered as either a formal parameter or a global variable,
The rest is easy. We need only add the semantics to the actual
{--------------------------------------------------------------}
procedure Param;
begin
Expression;
Push;
end;
{--------------------------------------------------------------}
That's it. Add these changes to your program and give it a try.
another, but you cannot DECLARE a nested procedure. You can even
complicated expressions.
WHAT'S WRONG?
back at the code for a procedure call, you'll see that the caller
pushes each actual parameter onto the stack before it calls the
change the stack pointer. That means that the stuff is still
and since the procedure, after all, knows how many parameters
it's got. But that means that it must do something with the
I prefer letting the caller clean up, so that the callee need
the caller is the one who "messed up" the stack in the first
place. But THAT means that the caller must remember how many
items it pushed. To make things easy, I've modified the
{--------------------------------------------------------------}
var N: integer;
begin
N := 0;
Match('(');
Param;
inc(N);
Match(',');
Param;
inc(N);
end;
end;
Match(')');
ParamList := 2 * N;
end;
{--------------------------------------------------------------}
var N: integer;
begin
N := ParamList;
Call(Name);
CleanStack(N);
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
begin
Emit('ADD #');
WriteLn(N, ',SP');
end;
end;
{--------------------------------------------------------------}
OK, if you'll add this code to your compiler, I think you'll find
that the stack is now under control.
the stack pointer. That works fine in our simple examples, since
PROCEDURE FOO(A, B)
BEGIN
A=A+B
END
RTS
This would be wrong. When we push the first argument onto the
stack, the offsets for the two formal parameters are no longer 4
and 6, but are 6 and 8. So the second fetch would fetch A again,
not B.
This is not the end of the world. I think you can see that all
The 68000 instruction set LINK lets you declare such a frame
Since this register may have been in use for something else in
the calling procedure, LINK also pushes the current value of that
register onto the stack. It can also add a value to the stack
pointer and pops the old value back into the register.
Using these two instructions, the code for the previous example
becomes:
UNLK A6
RTS
more than one line, I've created new procedures to deal with it,
{--------------------------------------------------------------}
begin
PostLabel(N);
EmitLn('LINK A6,#0');
end;
{--------------------------------------------------------------}
procedure ProcEpilog;
begin
EmitLn('UNLK A6');
EmitLn('RTS');
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
procedure DoProc;
var N: char;
begin
Match('p');
N := GetName;
FormalList;
Fin;
ST[N] := 'p';
ProcProlog(N);
BeginBlock;
ProcEpilog;
ClearParams;
end;
{--------------------------------------------------------------}
begin
Emit('MOVE ');
WriteLn(Offset, '(A6),D0');
end;
{--------------------------------------------------------------}
begin
Emit('MOVE D0,');
WriteLn(Offset, '(A6)');
end;
{--------------------------------------------------------------}
(Note that the Offset computation changes to allow for the extra
push of A6.)
That's all it takes. Try this out and see how you like it.
Notice that we CAN use formal parameters in any way inside the
loop counters (if we had loops, that is!), etc. So the code is
doing what it's supposed to. To get over this last problem, we
CALL-BY-REFERENCE
again later.
Let's begin by looking at the code we'd like to see generated for
the new case. Using the same example as before, we need the call
FOO(X, Y)
to be translated to:
{--------------------------------------------------------------}
procedure Param;
begin
end;
{--------------------------------------------------------------}
the calling list, so Param can just read the name directly.)
UNLK A6
RTS
StoreParam:
{--------------------------------------------------------------}
begin
Emit('MOVE.L ');
WriteLn(Offset, '(A6),A0');
EmitLn('MOVE (A0),D0');
end;
{--------------------------------------------------------------}
begin
Emit('MOVE.L ');
WriteLn(Offset, '(A6),A0');
EmitLn('MOVE D0,(A0)');
end;
{--------------------------------------------------------------}
ParamList:
ParamList := 4 * N;
We'll just make a little note here, that here's yet another
come first.
LOCAL VARIABLES
say, that's a big gap in our language, and one that needs to be
corrected.
storage just like global ones. That is, each local variable got
a name and allocated address, like any other variable, and was
do tricks like initialize a flag, so that you could tell when you
were entering a procedure for the first time and could do any
that deal with passed (by value) parameters on the stack can
those old FORTRAN programs were pretty darned efficient ... the
I've always supposed that the reason had to do with the two main
that don't need recursion, just to get that feature when you need
it. The idea is that, with static storage, you can use absolute
in faster code.
storage. With the 68000, for example, you shouldn't use absolute
MOVE 8(A6),D0
MOVE X(PC),D0.
dynamic storage.
Since this use of local variables fits so well into the scheme of
adjust the stack pointer downward to make room for them. Formal
work, the same procedures we've already created can take care of
the whole thing.
{--------------------------------------------------------------}
begin
Emit('MOVE ');
WriteLn(Offset, '(A6),D0');
end;
{--------------------------------------------------------------}
begin
Emit('MOVE D0,');
WriteLn(Offset, '(A6)');
end;
{--------------------------------------------------------------}
The idea is that the value of Base will be frozen after we have
the new, local variables, are inserted in the symbol table. This
{--------------------------------------------------------------}
procedure FormalList;
begin
Match('(');
FormalParam;
Match(',');
FormalParam;
end;
end;
Match(')');
Fin;
Base := NumParams;
NumParams := NumParams + 4;
end;
{--------------------------------------------------------------}
(We add four words to make allowances for the return address and
old frame pointer, which end up between the formal parameters and
the locals.)
declaring local variables into the parser. The routines are very
{--------------------------------------------------------------}
procedure LocDecl;
begin
Match('v');
AddParam(GetName);
Fin;
end;
{--------------------------------------------------------------}
var n: integer;
begin
n := 0;
inc(n);
end;
LocDecls := n;
end;
{--------------------------------------------------------------}
to DoProc.
{--------------------------------------------------------------}
procedure DoProc;
var N: char;
k: integer;
begin
Match('p');
N := GetName;
ST[N] := 'p';
FormalList;
k := LocDecls;
ProcProlog(N, k);
BeginBlock;
ProcEpilog;
ClearParams;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
begin
PostLabel(N);
Emit('LINK A6,#');
WriteLn(-2 * k)
end;
{--------------------------------------------------------------}
That should do it. Add these changes and see how they work.
CONCLUSION
At this point you know how to compile procedure declarations and
value. You can also handle local variables. As you can see, the
save that one until after we've dealt with ways to handle
INTRODUCTION
INTRODUCTION
in that part and this one, we would cover the two features that
Instead, I'll be using the same approach that has worked so well
A few words of warning: First, there are some types that I will
not come until much later, for the simple reason that I still
Even if you decide not to allow ANY type conversions (as in Ada,
for example) the problem is still there, and is built into the
simplest approach.
Before diving into the tutorial, I think you'd like to know where
we are going from here ... especially since it's been so long
I have not been idle in the meantime. What I've been doing is
this was causing trouble, and that's why I've gone back to using
only compiler fragments for the last installment and this one.
The obvious way to have our cake and eat it, too, is to break up
the Turbo Unit is an ideal vehicle for doing this. This allows
and boolean expression parsing) into a single unit, and just pull
it in whenever it's needed. In that way, the only code I'll have
I've also been toying with Turbo 5.5, which of course includes
First of all, many of you who have been following this series may
still not have 5.5, and I certainly don't want to force anyone to
series. Secondly, I'm not convinced that the O-O extensions have
all that much value for this application. We've been having some
from you readers. Anyone want to vote for Turbo 5.5 and O-O?
In any case, after the next few installments in the series, the
use for our experiments), one for TINY and one for KISS. I've
pretty much isolated the differences between TINY and KISS, which
are these:
o TINY will support only two data types: The character and the
o TINY will only have two control constructs, the IF and the
One caveat: Since I still don't know much about 80x86 assembler
exercise for the student." I'll make an offer right here and
now: For the person who provides us the first robust retarget to
tokens.
those types are. The obvious vehicle for that is the symbol
deal with it, we'll steal some procedures that we've used before.
{--------------------------------------------------------------}
{ Variable Declarations }
{--------------------------------------------------------------}
Init:
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
var i: char;
begin
ST[i] := '?';
GetChar;
end;
{--------------------------------------------------------------}
We don't really need the next procedure, but it will be helpful
table:
{--------------------------------------------------------------}
procedure DumpTable;
var i: char;
begin
end;
{--------------------------------------------------------------}
It really doesn't matter much where you put this procedure ... I
If you're the cautious type (as I am), you might want to begin
with a test program that does nothing but initializes, then dumps
the table. Just to be sure that we're all on the same wavelength
here, I'm reproducing the entire program below, complete with the
white space:
{--------------------------------------------------------------}
program Types;
{--------------------------------------------------------------}
{ Constant Declarations }
CR = ^M;
LF = ^J;
{--------------------------------------------------------------}
{ Variable Declarations }
{--------------------------------------------------------------}
procedure GetChar;
begin
Read(Look);
end;
{--------------------------------------------------------------}
{ Report an Error }
begin
WriteLn;
end;
{--------------------------------------------------------------}
{ Report Error and Halt }
begin
Error(s);
Halt;
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
procedure DumpTable;
var i: char;
begin
end;
{--------------------------------------------------------------}
end;
{--------------------------------------------------------------}
begin
IsDigit := c in ['0'..'9'];
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
{ Recognize an Addop }
begin
end;
{--------------------------------------------------------------}
{ Recognize a Mulop }
begin
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
{ Recognize a Relop }
begin
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
procedure SkipWhite;
begin
while IsWhite(Look) do
GetChar;
end;
{--------------------------------------------------------------}
procedure Fin;
begin
GetChar;
if Look = LF then
GetChar;
end;
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
{ Get an Identifier }
begin
GetName := UpCase(Look);
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Get a Number }
begin
GetNum := Look;
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
Write(TAB, s);
end;
{--------------------------------------------------------------}
begin
Emit(s);
WriteLn;
end;
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
var i: char;
begin
ST[i] := '?';
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
DumpTable;
end.
{--------------------------------------------------------------}
OK, run this program. You should get a (very fast) printout of
start.
Well, that's even more boring than before! There was no output
at all, since at this point NONE of the names have been declared.
ST['A'] := 'a';
ST['P'] := 'b';
ST['X'] := 'c';
This time, when you run the program, you should get an output
ADDING ENTRIES
and not one that will help us much later. What we need is a
that we're going to need to test the table, to make sure that we
new procedures:
{--------------------------------------------------------------}
begin
TypeOf := ST[N];
end;
{--------------------------------------------------------------}
begin
InTable := TypeOf(N) <> '?';
end;
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
begin
CheckDup(N);
ST[N] := T;
end;
{--------------------------------------------------------------}
AddEntry('A', 'a');
AddEntry('P', 'b');
AddEntry('X', 'c');
and run the program again. Did it work? Then we have the symbol
ALLOCATING STORAGE
declaration is:
Again, we can lift a lot of the code from previous programs. The
the new call to AddEntry will also take care of checking for
duplicate declarations:
{--------------------------------------------------------------}
begin
AddEntry(N, 'v');
end;
{--------------------------------------------------------------}
procedure Decl;
begin
Match('v');
Alloc(GetName);
end;
{--------------------------------------------------------------}
procedure TopDecls;
begin
case Look of
'v': Decl;
end;
Fin;
end;
end;
{--------------------------------------------------------------}
Now, in the main program, add a call to TopDecls and run the
look familiar. Note from the code for TopDecls that the program
While you're at it, try declaring two variables with the same
DECLARING TYPES
syntax should be, etc., but for now I'm going to duck all the
issues and simply declare by executive fiat that our syntax will
be:
where:
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
begin
AddEntry(N, T);
AllocVar(N, T);
end;
{--------------------------------------------------------------}
procedure Decl;
begin
Typ := GetName;
Alloc(GetName, Typ);
end;
{--------------------------------------------------------------}
procedure TopDecls;
begin
case Look of
end;
Fin;
end;
end;
{--------------------------------------------------------------}
Make the changes shown to these procedures, and give the thing a
try. Use the single characters 'b', 'w', and 'l' for the
keywords (they must be lower case, for now). You will see that
from the dumped symbol table that the sizes are also recorded for
later use. What later use? Well, that's the subject of the rest
of this installment.
ASSIGNMENTS
register, D0. It makes sense to use the same idea we used for
Alloc; that is, make a load procedure that can load more than one
{---------------------------------------------------------------}
begin
end;
{---------------------------------------------------------------}
needed:
{---------------------------------------------------------------}
begin
end;
{---------------------------------------------------------------}
Note that these two routines are strictly code generators; they
First of all, we need to make sure that the type we are dealing
recognizer:
{--------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
Next, it would be nice to have a routine that will fetch the type
{--------------------------------------------------------------}
begin
Typ := TypeOf(Name);
if not IsVarType(Typ) then Abort('Identifier ' + Name +
VarType := Typ;
end;
{--------------------------------------------------------------}
{--------------------------------------------------------------}
begin
LoadVar(Name, VarType(Name));
end;
{--------------------------------------------------------------}
find that you're unhappy with the speed, feel free to come back
Load('A');
Load('B');
Load('C');
Load('X');
You can play around with this, and try different combinations of
next:
{---------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
begin
StoreVar(Name, VarType(Name));
end;
{--------------------------------------------------------------}
You can test this one the same way as the loads.
{---------------------------------------------------------------}
procedure Expression;
begin
Load(GetName);
end;
{--------------------------------------------------------------}
procedure Assignment;
begin
Name := GetName;
Match('=');
Expression;
Store(Name);
end;
{--------------------------------------------------------------}
procedure Block;
begin
Assignment;
Fin;
end;
end;
{--------------------------------------------------------------}
use an UPPER CASE 'B' to stand for the BEGIN. So change the
character in the WHILE loop within TopDecls, from '.' to 'B', and
read:
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
TopDecls;
Match('B');
Fin;
Block;
DumpTable;
end.
{--------------------------------------------------------------}
(Note that I've had to sprinkle a few calls to Fin around to get
wb { word b }
lc { long c }
B { begin }
a=a
a=b
a=c
b=a
b=b
b=c
c=a
c=b
c=c
allocates storage. For each assignment, you should get code that
loads a variable of the correct size, and stores one, also of the
correct size.
WRONG!
MOVE.L C(PC),D0
LEA A(PC),A0
MOVE.B D0,(A0)
But now, look at the opposite case. For c=a, the code generated
is:
MOVE.B A(PC),D0
LEA C(PC),A0
MOVE.L D0,(A0)
stored into the lower eight bits of D0. According to the rules
for the 68000 processor, the upper 24 bits are unchanged. This
garbage that was in those high bits will also get stored. Not
good.
So what we have run into here, early on, is the issue of TYPE
CONVERSION, or COERCION.
not the most easy part of a compiler. Most of the bugs I have
path that keeps the compiler simple. I think you'll find that,
rather nicely.
THE COWARD'S WAY OUT
{---------------------------------------------------------------}
begin
EmitLn('CLR.L D0');
end;
{---------------------------------------------------------------}
If you run some tests with this new version, you will find that
For example, consider the case a=b (for the same declarations
MOVE.W B(PC),D0
LEA A(PC),A0
MOVE.B D0,(A0)
In this case, the CLR turns out not to be necessary, since the
minded compilers.
I should point out that, by setting the high bits to zero, we are
{---------------------------------------------------------------}
begin
EmitLn('CLR.L D0');
EmitLn('EXT.L D0');
end;
{---------------------------------------------------------------}
DOUBLE the run time for most operations, and make it even worse
other end ... that is, we convert on the way _OUT_, when the data
mods, like the method, are quite simple. First of all, since we
{---------------------------------------------------------------}
begin
end;
{--------------------------------------------------------------}
Next, let's add a new procedure that will convert from one type
to another:
{---------------------------------------------------------------}
begin
EmitLn('EXT.L D0');
end;
end;
{--------------------------------------------------------------}
{---------------------------------------------------------------}
begin
Typ := VarType(Name);
LoadVar(Name, Typ);
Load := Typ;
end;
{--------------------------------------------------------------}
begin
T2 := VarType(Name);
Convert(T1, T2);
StoreVar(Name, T2);
end;
{--------------------------------------------------------------}
Note that Load is a function, which not only emits the code for a
load, but also returns the variable type. In this way, we always
convert as necessary.
{---------------------------------------------------------------}
begin
Expression := Load(GetName);
end;
{--------------------------------------------------------------}
procedure Assignment;
var Name: char;
begin
Name := GetName;
Match('=');
Store(Name, Expression);
end;
{--------------------------------------------------------------}
Again, note how incredibly simple these two routines are. We've
encapsulated all the type logic into Load and Store, and the
trick of passing the type around makes the rest of the work
will have to get more complex. But you're looking now at the
All this seems like a very simple and clean solution, and it is
indeed. Compile this program and run the same test cases as
before. You will see that all types of data are converted
properly, and there are few if any wasted instructions. Only the
think you can see that we could easily fix up procedure Convert
LITERAL ARGUMENTS
Sharp-eyed readers might have noticed, though, that we don't even
now.
string, and some an integer. The one needed here will return a
{--------------------------------------------------------------}
{ Get a Number }
begin
Val := 0;
GetChar;
end;
GetNum := Val;
SkipWhite;
end;
{---------------------------------------------------------------}
Now, when dealing with literal data, we have one little small
well.
{--------------------------------------------------------------}
begin
Typ := 'B'
Typ := 'W'
LoadConst(N, Typ);
LoadNum := Typ;
end;
{---------------------------------------------------------------}
(I know, I know, the number base isn't really symmetric. You can
easily fixed, and not worth the time or the added complexity to
type:
{---------------------------------------------------------------}
var temp:string;
begin
Str(N, temp);
end;
{--------------------------------------------------------------}
{---------------------------------------------------------------}
begin
if IsAlpha(Look) then
Expression := Load(GetName)
else
Expression := LoadNum(GetNum);
end;
{--------------------------------------------------------------}
(Wow, that sure didn't hurt too bad! Just a few extra lines do
the job.)
OK, compile this code into your program and give it a try.
valid expressions.
ADDITIVE EXPRESSIONS
If you've been following this series from the beginning, I'm sure
you know what's coming next: We'll expand the form for an
The nice part is that we already have a pattern for dealing with
{---------------------------------------------------------------}
begin
if IsAddop(Look) then
Typ := Unop
else
Typ := Term;
Push(Typ);
case Look of
end;
end;
Expression := Typ;
end;
{--------------------------------------------------------------}
function call, and how the local variable Typ gets updated at
each pass.
Note also the new call to a function Unop, which lets us deal
could still use a form more like what we've done before. I've
issues.
For this version, though, we'll retain the same dumb old code,
{---------------------------------------------------------------}
begin
Clear;
Unop := 'W';
end;
{---------------------------------------------------------------}
argument:
{---------------------------------------------------------------}
begin
Move(Size, 'D0', '-(SP)');
end;
{---------------------------------------------------------------}
{---------------------------------------------------------------}
begin
Match('+');
end;
{-------------------------------------------------------------}
begin
Match('-');
end;
{---------------------------------------------------------------}
The simplicity is deceptive, though, because what we've done is
to defer all the logic to PopAdd and PopSub, which are no longer
just code generation routines. They must also now take care of
of the same size, and the result is also of that size. The
chosen to be R7. (Why not R1? Because I have later plans for
{---------------------------------------------------------------}
end;
{---------------------------------------------------------------}
The general idea is that all the "Pop-Op" routines can call this
name:
{---------------------------------------------------------------}
begin
end;
end;
{---------------------------------------------------------------}
The next function does a conversion, but only if the current type
to do:
{---------------------------------------------------------------}
begin
Typ := T1;
if T1 <> T2 then
Typ := T2;
end;
Promote := Typ;
end;
{---------------------------------------------------------------}
{---------------------------------------------------------------}
begin
end;
{---------------------------------------------------------------}
{---------------------------------------------------------------}
begin
Pop(T1);
T2 := SameType(T1, T2);
GenAdd(T2);
PopAdd := T2;
end;
{---------------------------------------------------------------}
begin
Pop(T1);
T2 := SameType(T1, T2);
GenSub(T2);
PopSub := T2;
end;
{---------------------------------------------------------------}
anticlimactic. Once again, you can see that the logic is quite
D7, force the two operands to be the same size, and then generate
the code.
Note the new code generator routines GenAdd and GenSub. These
are vestigial forms of the ORIGINAL PopAdd and PopSub. That is,
add or subtract:
{---------------------------------------------------------------}
begin
end;
{---------------------------------------------------------------}
begin
end;
{---------------------------------------------------------------}
last tested the code. But you have to admit that each new
like to test so many new routines at once, that's OK. You can
stub out routines like Convert, Promote, and SameType, since they
don't read any inputs. You won't get the correct code, of
course, but things should work. Then flesh them out one at a
time.
When testing the program, don't forget that you first have to
declare some variables, and then start the "body" of the program
with an upper-case 'B' (for BEGIN). You should find that the
conversion routines are in, you should see that the correct code
good idea to try some erroneous expressions and see how the
At this point, you may think I've pretty much gone off the deep
the case of UnOp, I'm looking ahead to the time when we're going
to want better code generation. The way the code is organized,
For example, in cases where the value pushed onto the stack does
_NOT_ have to be converted, it's still better to use the "pop and
embed the extra tests into PopAdd and PopSub without changing
MULTIPLICATIVE EXPRESSIONS
identical, so I'll just show them here without much fanfare. The
parenthetical subexpressions:
{---------------------------------------------------------------}
begin
Match('(');
Factor := Expression;
Match(')');
end
else
Factor := LoadNum(GetNum);
end;
{--------------------------------------------------------------}
begin
Match('*');
end;
{--------------------------------------------------------------}
begin
Match('/');
end;
{---------------------------------------------------------------}
begin
Typ := Factor;
Push(Typ);
case Look of
end;
end;
Term := Typ;
end;
{---------------------------------------------------------------}
If you'd like to test the program before we get into that, you
Again, the code won't be correct at this point, but the parser
MULTIPLICATION
longword result.
The actions that we have to take are best shown in the following
table:
T1 --> | | | |
||||
||B|W|L|
T2 V | | | |
-----------------------------------------------------------------
||||
| Convert D7 to W | | |
| MULS | MULS | JSR MUL32 |
||||
-----------------------------------------------------------------
||||
W | Convert D7 to W | | Convert D0 to L |
||||
-----------------------------------------------------------------
||||
L | Convert D7 to L | Convert D7 to L | |
||||
-----------------------------------------------------------------
Second, note that the table is symmetric ... the two operands
enter in the same way. Finally, note that the product is ALWAYS
{---------------------------------------------------------------}
procedure GenMult;
begin
EmitLn('MULS D7,D0')
end;
{---------------------------------------------------------------}
procedure GenLongMult;
begin
EmitLn('JSR MUL32');
end;
{---------------------------------------------------------------}
{---------------------------------------------------------------}
{ Generate Code to Multiply Primary by Stack }
var T: char;
begin
Pop(T1);
T := SameType(T1, T2);
if T = 'L' then
GenLongMult
else
GenMult;
if T = 'B' then
PopMul := 'W'
else
PopMul:= 'L';
end;
{---------------------------------------------------------------}
As you can see, the routine starts off just like PopAdd. The two
arguments are forced to the same type. The two calls to Convert
take care of the case where both operands are bytes. The data
call one of the two code generator routines, and then assign the
DIVISION
If you don't believe it, try dividing any large 32-bit number
long result.
required actions:
T1 --> | | | |
||||
||B|W|L|
T2 V | | | |
-----------------------------------------------------------------
||||
| Convert D7 to L | Convert D7 to L | |
||||
-----------------------------------------------------------------
||||
||||
-----------------------------------------------------------------
||||
L | Convert D7 to L | Convert D7 to L | |
||||
-----------------------------------------------------------------
the dividend is, say, only a byte in the first place. Since the
longword, and there are any high bits set in it, the result of
the division must be zero. We might not get that if we only use
{---------------------------------------------------------------}
begin
Pop(T1);
GenLongDiv;
PopDiv := 'L';
end
else begin
GenDiv;
PopDiv := T1;
end;
end;
{---------------------------------------------------------------}
{---------------------------------------------------------------}
procedure GenDiv;
begin
EmitLn('DIVS D0,D7');
end;
{---------------------------------------------------------------}
{ Divide Top of Stack by Primary (Long) }
procedure GenLongDiv;
begin
EmitLn('JSR DIV32');
end;
{---------------------------------------------------------------}
D0.
OK, install the new procedures for division. At this point you
hasn't been too tough. In fact, in some ways most of the code
The main concept that made things easy was that of converting
of the result. Once this was done, we were able to retain the
I've also ignored the logical operators And, Or, etc. It turns
out that these are pretty easy to handle. All the logical
chip.
types. In general this will not be true ... certainly you don't
case. We are already checking the types of the two operands ...
What we've done here is to collapse such a jump table into far
simplifying rules.
In case you haven't gotten this message yet, it sure appears that
The answer depends on what kind of language you want, and the way
things pretty easy ... much simpler than what we've had to do
here.
were all converted to reals and the expression itself was real.
convert from one type to the other, so that you could force an
This led to two things: code that was easier to write, and code
Thing.
pretty easy!
data types, and you can mix them all freely. If the implicit
have been Heaven, but it turned out to be more like Hell! The
problem was that with so many data types, there had to be a large
typed," which means that in general you can't mix types, even if
they differ only in _NAME_, and yet have the same base type!
programmer from himself, because the compiler kept him from doing
than the debug phase. The same restrictions can also cause
frustration when you really WANT to mix types, and they tend to
Even so, Pascal does permit some implicit conversions. You can
assign an integer to a real value. You can also mix integer and
You can't, however, convert the other way, from real to integer,
"secret."
In the spirit of strong typing, Pascal will not allow you to mix
Turbo Pascal also includes the types Byte, Word, and LongInt.
or otherwise getting the wrong answer. Note that you still can't
mix Byte and Char types, even though they are stored internally
_NO_ implicit type conversions at all, and also will not allow
the language in his compiler doesn't tell you exactly WHICH types
are not compatible ... it only tells you which LINE the error is
in.
errors of this type, and I've spent a lot of time with the
I've offended it. The only real way to fix the error is to keep
So what should we do in TINY and KISS? For the first one, I have
the answer: TINY will support only the types Char and Integer,
longwords will not be supported, we also won't need the MUL32 and
DIV32 run-time routines, nor the logic to figure out when to call
any case.
CONCLUSION
wait so long for it, but hope you feel that it was worth the
wait.
part of the series. After that, I'll give you the new versions
of the TINY and KISS compilers, and then we'll start to look at
optimization issues.