L2 - Structure of a Compiler
L2 - Structure of a Compiler
Compiler
By
Dr. Mubarak Sufyan
1
Review from pervious classes
Lexical Analysis
Formal Language
Languages*
Regular Expressions*
Regular Definitions*
A Regular-Expression Grammar
Finite-state Machine
Scanner Generator Tools
NFA
Deterministic finite automata (DNF)
Converting an NFA to a DFA-Theory
From a Regular Expression to an NFA*
Grammars
Classes of Grammars
Context-Free Grammars
2
Pushdown Machines
Agenda
Language-Processing System Code Optimization
The Phases of a Compiler Symbol Table Management
Prototype compiler structure Literal Table
Compiler – Internal Architecture Temporary Files
Phases of Compilation Error Detection and Reporting
How dose a compiler work? A complete example
An example of compilation Implementation of Compiler
How is the converting done? Summary
Lexical Analysis Qualities in compiler
Tokens For next lectures
Syntax Analysis References
Parse Tree (Syntax Tree)
Semantic Analysis
Intermediate Representation
Code Generation 3
Language-Processing System
4
The Phases
of a
Compiler
5
Prototype compiler structure
6
Compiler - Internal Architecture
❑ Analysis
▪ Known as the front-end of the compiler,
▪ The analysis phase of the compiler reads the source program,
▪ Divides it into core parts and then checks for lexical, grammar and syntax errors.
▪ Collected information about the source code and store it in a data structure called a symbol
table
▪ The analysis phase generates an intermediate representation of the source program and
symbol table, which should be fed to the Synthesis phase as input.
❑ Synthesis
▪ Known as the back-end of the compiler
▪ The synthesis phase generates the target
Program with the help of intermediate
source code representation and symbol
table. 7
The structure of compiler
❑ Symbol table
▪ It is an important data structure created and
maintained by compilers in order to store information
about the occurrence of various entities such as
variable names, function names, objects, classes,
interfaces, etc.
▪ Symbol table is used by both the analysis and the
synthesis parts of a compiler.
❑ Error Handlin
▪ The tasks of the Error Handling process are to detect
each error, report it to the user, and then make some
recovery strategy and implement them to handle the
error.
▪ During this whole process processing time of the
program should not be slow.
8
Phases of Compilation
Code Optimization
❑ Optimization (not in this course!)
Back-End
Target Code
❑ Code generation Generation
9
Phase Output Sample
Programmer (source code producer) Source string A=B+C;
Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
And symbol table with names
Parser (performs syntax analysis based on the Parse tree or abstract syntax tree ;
|
grammar of the programming language) =
/ \
A +
/ \
B C
Semantic analyzer (type checking, etc) Annotated parse tree or abstract syntax
tree
Intermediate code generator Three-address code, quads, or RTL int2fp B t1
+ t1 C t2
:= t2 A
Optimizer Three-address code, quads, or RTL int2fp B t1
+ t1 #2.3 A
Code generator Assembly code MOVF #2.3,r1
ADDF2 r1,r2
MOVF r2,A
10
How does a compiler work?
From the diagram on the pervious page.
There are two main stages in the compiling process:
Analysis (front end ),
Checks that program constructs are legal and meaningful.
Builds up information about objects declared
Synthesis (back end).
Takes analyzed program and generates code necessary for its execution
The analysis stage
Breaks up the source program into pieces,
And creates a generic (language-independent) intermediate
representation of the program.
The synthesis stage
constructs the desired target program from the intermediate representat
ion. 11
12
Example of compilation
• To convert this source code
into target code
• The execution through the
following steps:
• Lexical analysis
• Syntactic analysis
• Semantic analysis
• Storage layout
• Intermediate code generation,
and Optimization
• Code generation
13
How is the converting done?
Input: such as Standard Output: target program:
imperative language (C, C++)
State:
State:
Registers
Variables,
Structures, Memory with Flat Address
Space
Arrays
Computation Machine code:
Expressions (arithmetic, logical, load/store architecture
etc.)
Arithmetic, logical operations
Assignment statements on registers
Control flow (conditionals,
loops) Branch instructions
Procedures
❑ May include other = Lexeme, which mapped into a token (=), no need attribute-value
information such as Initial Lexeme, which mapped into a token (id,2)
▪ Spelling of identifier + Lexeme, which mapped into a token (+)
▪ Value of constant rate Lexeme, which mapped into a token (id,3)
sequence of tokens
Syntax Analysis Steps Parse Tree
Step-1:
Replace E
❑ The job of the syntax analyzer, or parser, is to take a with E * E.
stream of tokens produced by the lexical analyzer and Result: E * E
build a parse tree (or syntax tree).
❑ The parser is basically a program that determines if Step-2:
sentences in a language are constructed properly Replace
according to the rules of the language. leftmost E
with E + E
❑ Importance of Syntax Analysis
Result: E + E *
▪ It is used to check if the code is grammatically E
correct or not.
▪ It helps us to detect all types of syntax errors. Step-3,4,5:
Replace all E’s
▪ It gives an exact description of the error. with id.
▪ It rejects invalid code before actual compiling. Result : id +
id * id
17
Syntax Analysis
There are two general categories of parsers:
❑ Top down parsers, which include
▪ LL(1) table-driven parsers
19
Semantic Analyzer
The semantic analyzer’s job is to attach some meaning to the
structure produced by the parser.
An important part of it type checking, where the compiler
check that each operator has matching operands
Activities include:
❑ Ensuring an identifier is defined before being used in a statement or
expression.
❑ Enforcing the scope rules of the language.
❑ Performing type checking
❑ Producing intermediate code
20
Intermediate Representations (IR)
High-level IR (e.g. AST)
Closer to source code
Hides implementation details
Low-level IR (e.g. three-address code)
Closer to the machine
Exposes details (registers, instructions, etc)
Many tradeoffs in IR design
Most compilers have 1 or maybe 2 IRs:
Typically closer to low-level IR
Better for optimization and code generation
Intermediate Code
❑ Could be kept in an array, temporary file, or linked list.
❑ Representations include P-code and 3-address code 21
Where are we ?
• Optimization output
• Transformed program
• Typically, same level of abstraction
22
Back End
Include:
Object Code Generation
Object Code Optimization
Responsibilities:
Map abstract instructions to real machine architecture.
Allocate storage for variables in registers.
Schedule instructions (often to exploit parallelism)
Finally, output the target code.
23
Code Generation
• An explicit low-level or machine like intermediate representation
• Intermediate representation should have two important properties:
• Easy to produce
• Easy to translate to into the target machine
• Given annotated AST and symbol table, produce target code
• Often done as three steps
• Produce machine-independent low-level representation of the program
(intermediate representation or IR)
• Perform machine-independent optimizations (optional)
• Translate IR into machine-specific target instructions
• Instruction selection n Register allocation
24
Code Generation
25
Code Optimization
The machine-independent code-optimization phase attempts to improve
the intermediate code
so that better target code will result.
Usually better means faster, but other objectives may be desired,
such as shorter code, or target code that consumes less power.
Series of passes - often repeated
Reduce cost
Run faster
Use less memory
Conserve some other resource, like power
Must preserve program semantic
26
Code Optimization (2)
Typical optimizations:
Dead-code elimination,
common sub-expression elimination,
Loop-invariant code motion,
strength reduction
Often contain assumptions about performance tradeoffs of the
underlying machine:
Relative speed of arithmetic operations – plus versus times
Possible parallelism in CPU
Cost of memory versus computation
Sizes of various caches
27
28
Symbol Table
Symbol Table
❑ Keeps information about
identifiers declared in a
program.
❑ Efficient insertion and
lookup is required → hash
table or tree structure may
be used.
❑ Several tables may be
maintained in a list or stack.
29
Symbol-Table Management
A symbol table is a data structure containing a record for each identifier, with
fields for the attributes of the identifier.
Record the identifiers used in the source program and collect information about
various attributes of each identifier, such as its type, its scope
Shared by later phases
30
Literal Table
Literal Table
❑ Stores constants and strings used in a program.
❑ Data in literal table applies globally to a program → deletions are
not necessary
31
Compiler Data Structures
Temporary Files
❑ May not be used if memory constraints are not a
problem
❑ Backpatching of addresses necessary during translation.
32
Compiler Phases
with Symbol Table
33
Error Detection and Reporting
34
A complete
example
35
Full view of compiler
36
37
Implementation of Compiler
Requirements
Correct
The actions requested by the program has to be faithfully executed
Efficient:
Intelligently and efficiently use the available resources to carry out the requests
(the word optimization is used loosely in the compiler community – Optimizing compilers
are never optimal)
Software development tools are available to implement one or more compiler
phases:
Scanner generators
Parser generators
Syntax-directed translation engines
Automatic code generators
Data-flow engines
38
Summary
Lexical analysis (Scanner):
Recognition of symbols, delimiters, and comments
By regular expressions and finite automata
Syntax analysis (Parser):
Determination of hierarchical program structure
By context-free grammars and pushdown automata
Semantic analysis:
Checking context dependencies, data types, ...
B y attribute grammars
Generation of intermediate code:
Translation into (target-independent) intermediate code
By tree translations
Code optimization:
To improve runtime and/or memory behavior
Generation of target code:
Tailored to target system
Additionally:
39
optimization of target code, symbol table, error handling
Summary (2)
Analysis:
lexical/syntax/semantic analysis (determination of syntactic structure, error
handling)
Synthesis:
generation of (intermediate/machine) code + optimization
Front-end:
machine-independent parts (analysis + intermediate code + machine-independent
optimizations)
Back-end:
machine-dependent parts (generation + optimization of machine code)
40
Summary of Translation Steps
Lexical Analysis:
Convert the stream of characters representing input program into a sequence of tokens.
Tokens are the “words” of the programming language
Syntax Analysis Phase:
Recognizes “sentences” in the program using the syntax of the language
Semantic Analysis Phase:
Infers information about the program using the semantics of the language
Intermediate Code Generation Phase:
Generates “abstract” code based on the syntactic structure of the program and the semantic
information from Phase 2.
Optimization Phase:
Refines the generated code using a series of optimizing transformations.
Final Code Generation Phase:
Translates the abstract intermediate code into specific machine instructions.
41
What qualities are important in a
compiler?
Correct code
Output runs fast
Compiler runs fast
Compile time proportional to program size
Support for separate compilation
Good diagnostics for syntax errors
Works well with the debugger
Good diagnostics for flow anomalies
Cross language calls
Consistent, predictable optimization
42
End
43