0% found this document useful (0 votes)
27 views43 pages

L2 - Structure of a Compiler

Uploaded by

winersgoto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views43 pages

L2 - Structure of a Compiler

Uploaded by

winersgoto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

L2 - Structure of a

Compiler
By
Dr. Mubarak Sufyan

1
Review from pervious classes
 Lexical Analysis
 Formal Language
 Languages*
 Regular Expressions*
 Regular Definitions*
 A Regular-Expression Grammar
 Finite-state Machine
 Scanner Generator Tools
 NFA
 Deterministic finite automata (DNF)
 Converting an NFA to a DFA-Theory
 From a Regular Expression to an NFA*
 Grammars
 Classes of Grammars
 Context-Free Grammars
2

 Pushdown Machines
Agenda
 Language-Processing System  Code Optimization
 The Phases of a Compiler  Symbol Table Management
 Prototype compiler structure  Literal Table
 Compiler – Internal Architecture  Temporary Files
 Phases of Compilation  Error Detection and Reporting
 How dose a compiler work?  A complete example
 An example of compilation  Implementation of Compiler
 How is the converting done?  Summary
 Lexical Analysis  Qualities in compiler
 Tokens  For next lectures
 Syntax Analysis  References
 Parse Tree (Syntax Tree)
 Semantic Analysis
 Intermediate Representation
 Code Generation 3
Language-Processing System

4
The Phases
of a
Compiler

5
Prototype compiler structure

6
Compiler - Internal Architecture
❑ Analysis
▪ Known as the front-end of the compiler,
▪ The analysis phase of the compiler reads the source program,
▪ Divides it into core parts and then checks for lexical, grammar and syntax errors.
▪ Collected information about the source code and store it in a data structure called a symbol
table
▪ The analysis phase generates an intermediate representation of the source program and
symbol table, which should be fed to the Synthesis phase as input.
❑ Synthesis
▪ Known as the back-end of the compiler
▪ The synthesis phase generates the target
Program with the help of intermediate
source code representation and symbol
table. 7
The structure of compiler

❑ Symbol table
▪ It is an important data structure created and
maintained by compilers in order to store information
about the occurrence of various entities such as
variable names, function names, objects, classes,
interfaces, etc.
▪ Symbol table is used by both the analysis and the
synthesis parts of a compiler.
❑ Error Handlin
▪ The tasks of the Error Handling process are to detect
each error, report it to the user, and then make some
recovery strategy and implement them to handle the
error.
▪ During this whole process processing time of the
program should not be slow.

8
Phases of Compilation

 In the process of translation, a Lexical Analysis


(Scanning)
compiler goes through several
Syntax Analysis
phases: (Parsing) Front-
❑ Lexical analysis (also called scanning) End
Semantic Analysis
❑ Syntax analysis (also called parsing) Intermediate Code
❑ Semantic analysis Generation

Code Optimization
❑ Optimization (not in this course!)
Back-End
Target Code
❑ Code generation Generation

9
Phase Output Sample
Programmer (source code producer) Source string A=B+C;
Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
And symbol table with names
Parser (performs syntax analysis based on the Parse tree or abstract syntax tree ;
|
grammar of the programming language) =
/ \
A +
/ \
B C

Semantic analyzer (type checking, etc) Annotated parse tree or abstract syntax
tree
Intermediate code generator Three-address code, quads, or RTL int2fp B t1
+ t1 C t2
:= t2 A
Optimizer Three-address code, quads, or RTL int2fp B t1
+ t1 #2.3 A
Code generator Assembly code MOVF #2.3,r1
ADDF2 r1,r2
MOVF r2,A

10
How does a compiler work?
 From the diagram on the pervious page.
 There are two main stages in the compiling process:
 Analysis (front end ),
 Checks that program constructs are legal and meaningful.
 Builds up information about objects declared
 Synthesis (back end).
 Takes analyzed program and generates code necessary for its execution
 The analysis stage
 Breaks up the source program into pieces,
 And creates a generic (language-independent) intermediate
representation of the program.
 The synthesis stage
 constructs the desired target program from the intermediate representat
ion. 11
12
Example of compilation
• To convert this source code
into target code
• The execution through the
following steps:
• Lexical analysis
• Syntactic analysis
• Semantic analysis
• Storage layout
• Intermediate code generation,
and Optimization
• Code generation

13
How is the converting done?
 Input: such as Standard  Output: target program:
imperative language (C, C++)
 State:
 State:
 Registers
 Variables,
 Structures,  Memory with Flat Address
Space
 Arrays
 Computation  Machine code:
 Expressions (arithmetic, logical,  load/store architecture
etc.)
 Arithmetic, logical operations
 Assignment statements on registers
 Control flow (conditionals,
loops)  Branch instructions
 Procedures

The converting is done by the compiler phases 14


Lexical Analysis
❑ The job of the lexical analyzer, or
scanner, is to read the source program
one character at a time and produce
as output a stream of tokens.
❑ The tokens produced by the scanner
serve as input to the next phase, the
parser.
❑ The lexical analyzer is responsible for
breaking these syntaxes into a series
of tokens, by removing whitespace in
the source code.
❑ If the lexical analyzer gets any invalid
token, it generates an error.

• Lexical Analysis used:


• Formal Language
• Finite State Machine 15
• Regular Expressions
Tokens
symbo Details
 Tokens l
❑ Often represented as a positio Lexeme, which mapped into a token (id,1), id=identifier, and 1
listing type n Point to the symbol-Table for position

❑ May include other = Lexeme, which mapped into a token (=), no need attribute-value
information such as Initial Lexeme, which mapped into a token (id,2)
▪ Spelling of identifier + Lexeme, which mapped into a token (+)
▪ Value of constant rate Lexeme, which mapped into a token (id,3)

❑ Scanner needs to * Lexeme, which mapped into a token (*)


generate only one token 60 Lexeme, which mapped into a token (60)
at a time (single symbol
lookahead)
After Lexical
Analysis as the 16

sequence of tokens
Syntax Analysis Steps Parse Tree
Step-1:
Replace E
❑ The job of the syntax analyzer, or parser, is to take a with E * E.
stream of tokens produced by the lexical analyzer and Result: E * E
build a parse tree (or syntax tree).
❑ The parser is basically a program that determines if Step-2:
sentences in a language are constructed properly Replace
according to the rules of the language. leftmost E
with E + E
❑ Importance of Syntax Analysis
Result: E + E *
▪ It is used to check if the code is grammatically E
correct or not.
▪ It helps us to detect all types of syntax errors. Step-3,4,5:
Replace all E’s
▪ It gives an exact description of the error. with id.
▪ It rejects invalid code before actual compiling. Result : id +
id * id

17
Syntax Analysis
 There are two general categories of parsers:
❑ Top down parsers, which include
▪ LL(1) table-driven parsers

▪ Recursive descent parsers (we will write one!)


❑ There are three principle techniques for constructing
the LR parsing tables, they are called:
▪ SLR (simple LR)
• The grammar is LR (the L indicates we are
▪ LR(1) parsers (Canonical LR (LR)).
reading input from the left, and the R
▪ LALR(1) parsers (Look Ahead LR (LALR) indicates we are finding a right-most
derivation).

Syntax Analysis used: 2. Pushdown Machines


1. Grammars Language 3. Corresponding between Machines and
1. Grammars Classes of Languages
2. Classes of Grammar 4. Ambiguities in PL
3. Context-Free Grammar 18
Parse tree (or syntax tree)

 Parse tree (or syntax tree)


❑ A linked structure built by the parser, with
information added by the semantic analyzer
❑ Each node is a record whose fields contain
information about the syntactic construct
which the node represents
❑ Nodes may be represented in various ways:
▪ A C structure
▪ A Pascal or Ada variant record
▪ A C++ or Java class

19
Semantic Analyzer
 The semantic analyzer’s job is to attach some meaning to the
structure produced by the parser.
 An important part of it type checking, where the compiler
check that each operator has matching operands
 Activities include:
❑ Ensuring an identifier is defined before being used in a statement or
expression.
❑ Enforcing the scope rules of the language.
❑ Performing type checking
❑ Producing intermediate code

20
Intermediate Representations (IR)
 High-level IR (e.g. AST)
 Closer to source code
 Hides implementation details
 Low-level IR (e.g. three-address code)
 Closer to the machine
 Exposes details (registers, instructions, etc)
 Many tradeoffs in IR design
 Most compilers have 1 or maybe 2 IRs:
 Typically closer to low-level IR
 Better for optimization and code generation

 Intermediate Code
❑ Could be kept in an array, temporary file, or linked list.
❑ Representations include P-code and 3-address code 21
Where are we ?
• Optimization output
• Transformed program
• Typically, same level of abstraction

22
Back End
 Include:
 Object Code Generation
 Object Code Optimization
 Responsibilities:
 Map abstract instructions to real machine architecture.
 Allocate storage for variables in registers.
 Schedule instructions (often to exploit parallelism)
 Finally, output the target code.

23
Code Generation
• An explicit low-level or machine like intermediate representation
• Intermediate representation should have two important properties:
• Easy to produce
• Easy to translate to into the target machine
• Given annotated AST and symbol table, produce target code
• Often done as three steps
• Produce machine-independent low-level representation of the program
(intermediate representation or IR)
• Perform machine-independent optimizations (optional)
• Translate IR into machine-specific target instructions
• Instruction selection n Register allocation

24
Code Generation

25
Code Optimization
 The machine-independent code-optimization phase attempts to improve
the intermediate code
 so that better target code will result.
 Usually better means faster, but other objectives may be desired,
 such as shorter code, or target code that consumes less power.
 Series of passes - often repeated
 Reduce cost
 Run faster
 Use less memory
 Conserve some other resource, like power
 Must preserve program semantic

26
Code Optimization (2)
 Typical optimizations:
 Dead-code elimination,
 common sub-expression elimination,
 Loop-invariant code motion,
 strength reduction
 Often contain assumptions about performance tradeoffs of the
underlying machine:
 Relative speed of arithmetic operations – plus versus times
 Possible parallelism in CPU
 Cost of memory versus computation
 Sizes of various caches
27
28
Symbol Table

 Symbol Table
❑ Keeps information about
identifiers declared in a
program.
❑ Efficient insertion and
lookup is required → hash
table or tree structure may
be used.
❑ Several tables may be
maintained in a list or stack.

29
Symbol-Table Management

 A symbol table is a data structure containing a record for each identifier, with
fields for the attributes of the identifier.
 Record the identifiers used in the source program and collect information about
various attributes of each identifier, such as its type, its scope
 Shared by later phases

30
Literal Table

 Literal Table
❑ Stores constants and strings used in a program.
❑ Data in literal table applies globally to a program → deletions are
not necessary

31
Compiler Data Structures

 Temporary Files
❑ May not be used if memory constraints are not a
problem
❑ Backpatching of addresses necessary during translation.

32
Compiler Phases
with Symbol Table

33
Error Detection and Reporting

 The syntax and semantic analysis phases usually


handle a large fraction of the errors detectable by
the compiler
 Exception handling

34
A complete
example

35
Full view of compiler

36
37
Implementation of Compiler
 Requirements
 Correct
 The actions requested by the program has to be faithfully executed
 Efficient:
 Intelligently and efficiently use the available resources to carry out the requests
 (the word optimization is used loosely in the compiler community – Optimizing compilers
are never optimal)
 Software development tools are available to implement one or more compiler
phases:
 Scanner generators
 Parser generators
 Syntax-directed translation engines
 Automatic code generators
 Data-flow engines
38
Summary
 Lexical analysis (Scanner):
 Recognition of symbols, delimiters, and comments
 By regular expressions and finite automata
 Syntax analysis (Parser):
 Determination of hierarchical program structure
 By context-free grammars and pushdown automata
 Semantic analysis:
 Checking context dependencies, data types, ...
 B y attribute grammars
 Generation of intermediate code:
 Translation into (target-independent) intermediate code
 By tree translations
 Code optimization:
 To improve runtime and/or memory behavior
 Generation of target code:
 Tailored to target system
 Additionally:
39
 optimization of target code, symbol table, error handling
Summary (2)

 Analysis:
 lexical/syntax/semantic analysis (determination of syntactic structure, error
handling)
 Synthesis:
 generation of (intermediate/machine) code + optimization

 Front-end:
 machine-independent parts (analysis + intermediate code + machine-independent
optimizations)
 Back-end:
 machine-dependent parts (generation + optimization of machine code)

40
Summary of Translation Steps
 Lexical Analysis:
 Convert the stream of characters representing input program into a sequence of tokens.
 Tokens are the “words” of the programming language
 Syntax Analysis Phase:
 Recognizes “sentences” in the program using the syntax of the language
 Semantic Analysis Phase:
 Infers information about the program using the semantics of the language
 Intermediate Code Generation Phase:
 Generates “abstract” code based on the syntactic structure of the program and the semantic
information from Phase 2.
 Optimization Phase:
 Refines the generated code using a series of optimizing transformations.
 Final Code Generation Phase:
 Translates the abstract intermediate code into specific machine instructions.

41
What qualities are important in a
compiler?
 Correct code
 Output runs fast
 Compiler runs fast
 Compile time proportional to program size
 Support for separate compilation
 Good diagnostics for syntax errors
 Works well with the debugger
 Good diagnostics for flow anomalies
 Cross language calls
 Consistent, predictable optimization

42
End

43

You might also like