0% found this document useful (0 votes)

27 views43 pages

L2 - Structure of a Compiler

Uploaded by

winersgoto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views43 pages

L2 - Structure of a Compiler

Uploaded by

winersgoto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

L2 - Structure of a

Compiler
By
Dr. Mubarak Sufyan

1
Review from pervious classes
 Lexical Analysis
 Formal Language
 Languages*
 Regular Expressions*
 Regular Definitions*
 A Regular-Expression Grammar
 Finite-state Machine
 Scanner Generator Tools
 NFA
 Deterministic finite automata (DNF)
 Converting an NFA to a DFA-Theory
 From a Regular Expression to an NFA*
 Grammars
 Classes of Grammars
 Context-Free Grammars
2

 Pushdown Machines
Agenda
 Language-Processing System  Code Optimization
 The Phases of a Compiler  Symbol Table Management
 Prototype compiler structure  Literal Table
 Compiler – Internal Architecture  Temporary Files
 Phases of Compilation  Error Detection and Reporting
 How dose a compiler work?  A complete example
 An example of compilation  Implementation of Compiler
 How is the converting done?  Summary
 Lexical Analysis  Qualities in compiler
 Tokens  For next lectures
 Syntax Analysis  References
 Parse Tree (Syntax Tree)
 Semantic Analysis
 Intermediate Representation
 Code Generation 3
Language-Processing System

4
The Phases
of a
Compiler

5
Prototype compiler structure

6
Compiler - Internal Architecture
❑ Analysis
▪ Known as the front-end of the compiler,
▪ The analysis phase of the compiler reads the source program,
▪ Divides it into core parts and then checks for lexical, grammar and syntax errors.
▪ Collected information about the source code and store it in a data structure called a symbol
table
▪ The analysis phase generates an intermediate representation of the source program and
symbol table, which should be fed to the Synthesis phase as input.
❑ Synthesis
▪ Known as the back-end of the compiler
▪ The synthesis phase generates the target
Program with the help of intermediate
source code representation and symbol
table. 7
The structure of compiler

❑ Symbol table
▪ It is an important data structure created and
maintained by compilers in order to store information
about the occurrence of various entities such as
variable names, function names, objects, classes,
interfaces, etc.
▪ Symbol table is used by both the analysis and the
synthesis parts of a compiler.
❑ Error Handlin
▪ The tasks of the Error Handling process are to detect
each error, report it to the user, and then make some
recovery strategy and implement them to handle the
error.
▪ During this whole process processing time of the
program should not be slow.

8
Phases of Compilation

 In the process of translation, a Lexical Analysis

(Scanning)
compiler goes through several
Syntax Analysis
phases: (Parsing) Front-
❑ Lexical analysis (also called scanning) End
Semantic Analysis
❑ Syntax analysis (also called parsing) Intermediate Code
❑ Semantic analysis Generation

Code Optimization
❑ Optimization (not in this course!)
Back-End
Target Code
❑ Code generation Generation

9
Phase Output Sample
Programmer (source code producer) Source string A=B+C;
Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
And symbol table with names
Parser (performs syntax analysis based on the Parse tree or abstract syntax tree ;
|
grammar of the programming language) =
/ \
A +
/ \
B C

Semantic analyzer (type checking, etc) Annotated parse tree or abstract syntax
tree
Intermediate code generator Three-address code, quads, or RTL int2fp B t1
+ t1 C t2
:= t2 A
Optimizer Three-address code, quads, or RTL int2fp B t1
+ t1 #2.3 A
Code generator Assembly code MOVF #2.3,r1
ADDF2 r1,r2
MOVF r2,A

10
How does a compiler work?
 From the diagram on the pervious page.
 There are two main stages in the compiling process:
 Analysis (front end ),
 Checks that program constructs are legal and meaningful.
 Builds up information about objects declared
 Synthesis (back end).
 Takes analyzed program and generates code necessary for its execution
 The analysis stage
 Breaks up the source program into pieces,
 And creates a generic (language-independent) intermediate
representation of the program.
 The synthesis stage
 constructs the desired target program from the intermediate representat
ion. 11
12
Example of compilation
• To convert this source code
into target code
• The execution through the
following steps:
• Lexical analysis
• Syntactic analysis
• Semantic analysis
• Storage layout
• Intermediate code generation,
and Optimization
• Code generation

13
How is the converting done?
 Input: such as Standard  Output: target program:
imperative language (C, C++)
 State:
 State:
 Registers
 Variables,
 Structures,  Memory with Flat Address
Space
 Arrays
 Computation  Machine code:
 Expressions (arithmetic, logical,  load/store architecture
etc.)
 Arithmetic, logical operations
 Assignment statements on registers
 Control flow (conditionals,
loops)  Branch instructions
 Procedures

The converting is done by the compiler phases 14

Lexical Analysis
❑ The job of the lexical analyzer, or
scanner, is to read the source program
one character at a time and produce
as output a stream of tokens.
❑ The tokens produced by the scanner
serve as input to the next phase, the
parser.
❑ The lexical analyzer is responsible for
breaking these syntaxes into a series
of tokens, by removing whitespace in
the source code.
❑ If the lexical analyzer gets any invalid
token, it generates an error.

• Lexical Analysis used:

• Formal Language
• Finite State Machine 15
• Regular Expressions
Tokens
symbo Details
 Tokens l
❑ Often represented as a positio Lexeme, which mapped into a token (id,1), id=identifier, and 1
listing type n Point to the symbol-Table for position

❑ May include other = Lexeme, which mapped into a token (=), no need attribute-value
information such as Initial Lexeme, which mapped into a token (id,2)
▪ Spelling of identifier + Lexeme, which mapped into a token (+)
▪ Value of constant rate Lexeme, which mapped into a token (id,3)

❑ Scanner needs to * Lexeme, which mapped into a token (*)

generate only one token 60 Lexeme, which mapped into a token (60)
at a time (single symbol
lookahead)
After Lexical
Analysis as the 16

sequence of tokens
Syntax Analysis Steps Parse Tree
Step-1:
Replace E
❑ The job of the syntax analyzer, or parser, is to take a with E * E.
stream of tokens produced by the lexical analyzer and Result: E * E
build a parse tree (or syntax tree).
❑ The parser is basically a program that determines if Step-2:
sentences in a language are constructed properly Replace
according to the rules of the language. leftmost E
with E + E
❑ Importance of Syntax Analysis
Result: E + E *
▪ It is used to check if the code is grammatically E
correct or not.
▪ It helps us to detect all types of syntax errors. Step-3,4,5:
Replace all E’s
▪ It gives an exact description of the error. with id.
▪ It rejects invalid code before actual compiling. Result : id +
id * id

17
Syntax Analysis
 There are two general categories of parsers:
❑ Top down parsers, which include
▪ LL(1) table-driven parsers

▪ Recursive descent parsers (we will write one!)

❑ There are three principle techniques for constructing
the LR parsing tables, they are called:
▪ SLR (simple LR)
• The grammar is LR (the L indicates we are
▪ LR(1) parsers (Canonical LR (LR)).
reading input from the left, and the R
▪ LALR(1) parsers (Look Ahead LR (LALR) indicates we are finding a right-most
derivation).

Syntax Analysis used: 2. Pushdown Machines

1. Grammars Language 3. Corresponding between Machines and
1. Grammars Classes of Languages
2. Classes of Grammar 4. Ambiguities in PL
3. Context-Free Grammar 18
Parse tree (or syntax tree)

 Parse tree (or syntax tree)

❑ A linked structure built by the parser, with
information added by the semantic analyzer
❑ Each node is a record whose fields contain
information about the syntactic construct
which the node represents
❑ Nodes may be represented in various ways:
▪ A C structure
▪ A Pascal or Ada variant record
▪ A C++ or Java class

19
Semantic Analyzer
 The semantic analyzer’s job is to attach some meaning to the
structure produced by the parser.
 An important part of it type checking, where the compiler
check that each operator has matching operands
 Activities include:
❑ Ensuring an identifier is defined before being used in a statement or
expression.
❑ Enforcing the scope rules of the language.
❑ Performing type checking
❑ Producing intermediate code

20
Intermediate Representations (IR)
 High-level IR (e.g. AST)
 Closer to source code
 Hides implementation details
 Low-level IR (e.g. three-address code)
 Closer to the machine
 Exposes details (registers, instructions, etc)
 Many tradeoffs in IR design
 Most compilers have 1 or maybe 2 IRs:
 Typically closer to low-level IR
 Better for optimization and code generation

 Intermediate Code
❑ Could be kept in an array, temporary file, or linked list.
❑ Representations include P-code and 3-address code 21
Where are we ?
• Optimization output
• Transformed program
• Typically, same level of abstraction

22
Back End
 Include:
 Object Code Generation
 Object Code Optimization
 Responsibilities:
 Map abstract instructions to real machine architecture.
 Allocate storage for variables in registers.
 Schedule instructions (often to exploit parallelism)
 Finally, output the target code.

23
Code Generation
• An explicit low-level or machine like intermediate representation
• Intermediate representation should have two important properties:
• Easy to produce
• Easy to translate to into the target machine
• Given annotated AST and symbol table, produce target code
• Often done as three steps
• Produce machine-independent low-level representation of the program
(intermediate representation or IR)
• Perform machine-independent optimizations (optional)
• Translate IR into machine-specific target instructions
• Instruction selection n Register allocation

24
Code Generation

25
Code Optimization
 The machine-independent code-optimization phase attempts to improve
the intermediate code
 so that better target code will result.
 Usually better means faster, but other objectives may be desired,
 such as shorter code, or target code that consumes less power.
 Series of passes - often repeated
 Reduce cost
 Run faster
 Use less memory
 Conserve some other resource, like power
 Must preserve program semantic

26
Code Optimization (2)
 Typical optimizations:
 Dead-code elimination,
 common sub-expression elimination,
 Loop-invariant code motion,
 strength reduction
 Often contain assumptions about performance tradeoffs of the
underlying machine:
 Relative speed of arithmetic operations – plus versus times
 Possible parallelism in CPU
 Cost of memory versus computation
 Sizes of various caches
27
28
Symbol Table

 Symbol Table
❑ Keeps information about
identifiers declared in a
program.
❑ Efficient insertion and
lookup is required → hash
table or tree structure may
be used.
❑ Several tables may be
maintained in a list or stack.

29
Symbol-Table Management

 A symbol table is a data structure containing a record for each identifier, with
fields for the attributes of the identifier.
 Record the identifiers used in the source program and collect information about
various attributes of each identifier, such as its type, its scope
 Shared by later phases

30
Literal Table

 Literal Table
❑ Stores constants and strings used in a program.
❑ Data in literal table applies globally to a program → deletions are
not necessary

31
Compiler Data Structures

 Temporary Files
❑ May not be used if memory constraints are not a
problem
❑ Backpatching of addresses necessary during translation.

32
Compiler Phases
with Symbol Table

33
Error Detection and Reporting

 The syntax and semantic analysis phases usually

handle a large fraction of the errors detectable by
the compiler
 Exception handling

34
A complete
example

35
Full view of compiler

36
37
Implementation of Compiler
 Requirements
 Correct
 The actions requested by the program has to be faithfully executed
 Efficient:
 Intelligently and efficiently use the available resources to carry out the requests
 (the word optimization is used loosely in the compiler community – Optimizing compilers
are never optimal)
 Software development tools are available to implement one or more compiler
phases:
 Scanner generators
 Parser generators
 Syntax-directed translation engines
 Automatic code generators
 Data-flow engines
38
Summary
 Lexical analysis (Scanner):
 Recognition of symbols, delimiters, and comments
 By regular expressions and finite automata
 Syntax analysis (Parser):
 Determination of hierarchical program structure
 By context-free grammars and pushdown automata
 Semantic analysis:
 Checking context dependencies, data types, ...
 B y attribute grammars
 Generation of intermediate code:
 Translation into (target-independent) intermediate code
 By tree translations
 Code optimization:
 To improve runtime and/or memory behavior
 Generation of target code:
 Tailored to target system
 Additionally:
39
 optimization of target code, symbol table, error handling
Summary (2)

 Analysis:
 lexical/syntax/semantic analysis (determination of syntactic structure, error
handling)
 Synthesis:
 generation of (intermediate/machine) code + optimization

 Front-end:
 machine-independent parts (analysis + intermediate code + machine-independent
optimizations)
 Back-end:
 machine-dependent parts (generation + optimization of machine code)

40
Summary of Translation Steps
 Lexical Analysis:
 Convert the stream of characters representing input program into a sequence of tokens.
 Tokens are the “words” of the programming language
 Syntax Analysis Phase:
 Recognizes “sentences” in the program using the syntax of the language
 Semantic Analysis Phase:
 Infers information about the program using the semantics of the language
 Intermediate Code Generation Phase:
 Generates “abstract” code based on the syntactic structure of the program and the semantic
information from Phase 2.
 Optimization Phase:
 Refines the generated code using a series of optimizing transformations.
 Final Code Generation Phase:
 Translates the abstract intermediate code into specific machine instructions.

41
What qualities are important in a
compiler?
 Correct code
 Output runs fast
 Compiler runs fast
 Compile time proportional to program size
 Support for separate compilation
 Good diagnostics for syntax errors
 Works well with the debugger
 Good diagnostics for flow anomalies
 Cross language calls
 Consistent, predictable optimization

42
End

Windows 11 Keyboard Shortcuts Cheat Sheet
100% (1)
Windows 11 Keyboard Shortcuts Cheat Sheet
7 pages
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
From Everand
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
Sherwyn Allibang
5/5 (2)
1 Lexial Analysis
No ratings yet
1 Lexial Analysis
24 pages
Unit 1 Compiler Design
No ratings yet
Unit 1 Compiler Design
124 pages
CS 321 - Compilers: Outline
No ratings yet
CS 321 - Compilers: Outline
8 pages
compiler lec-one (1)
No ratings yet
compiler lec-one (1)
46 pages
Introduction
No ratings yet
Introduction
23 pages
Chapter 1 - Introduction To Comp
No ratings yet
Chapter 1 - Introduction To Comp
27 pages
CS416 Compiler Design
No ratings yet
CS416 Compiler Design
20 pages
Compiler Design: Dr. M. Moshiul Hoque Dept. of CSE, CUET
No ratings yet
Compiler Design: Dr. M. Moshiul Hoque Dept. of CSE, CUET
53 pages
CD Introduction
No ratings yet
CD Introduction
32 pages
66fe65b5746f9CCWeek-02Lecture03
No ratings yet
66fe65b5746f9CCWeek-02Lecture03
47 pages
Chapter 1 (Introduction)
No ratings yet
Chapter 1 (Introduction)
47 pages
CH1 3
No ratings yet
CH1 3
32 pages
Lec00 Outline
No ratings yet
Lec00 Outline
27 pages
Compilers
No ratings yet
Compilers
25 pages
Compiler CH1
No ratings yet
Compiler CH1
24 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
27 pages
Introduction
No ratings yet
Introduction
46 pages
Compiler Design
No ratings yet
Compiler Design
118 pages
phases of compiler
No ratings yet
phases of compiler
36 pages
_CD -unit 1
No ratings yet
_CD -unit 1
46 pages
INTRO TO COMPILERS
No ratings yet
INTRO TO COMPILERS
77 pages
Chapter 1
No ratings yet
Chapter 1
43 pages
Unit-1 PCD
No ratings yet
Unit-1 PCD
28 pages
Lecture1 - Compiler Design
No ratings yet
Lecture1 - Compiler Design
52 pages
Chapter-1 Compiler Design
100% (1)
Chapter-1 Compiler Design
13 pages
Unit 1
No ratings yet
Unit 1
29 pages
Unit 1
No ratings yet
Unit 1
29 pages
Compiler Construction CS-4207: Lecture 1 & 2 Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Lecture 1 & 2 Instructor Name: Atif Ishaq
29 pages
CC_unit_1
No ratings yet
CC_unit_1
70 pages
Basic of Compiler
100% (1)
Basic of Compiler
17 pages
CH 1
No ratings yet
CH 1
23 pages
BCS 324 Lesson 1
No ratings yet
BCS 324 Lesson 1
28 pages
Introduction Compiler
No ratings yet
Introduction Compiler
47 pages
Compiler RNP SP Unit 4
No ratings yet
Compiler RNP SP Unit 4
69 pages
Unit 1 Slides
No ratings yet
Unit 1 Slides
49 pages
Compiler Notes Arv
No ratings yet
Compiler Notes Arv
171 pages
Chapter 1-1
No ratings yet
Chapter 1-1
25 pages
UNIT 1 COMPILER DESIGN
No ratings yet
UNIT 1 COMPILER DESIGN
43 pages
CSE353 Slides
No ratings yet
CSE353 Slides
76 pages
Automata Theory and Compiler Design
No ratings yet
Automata Theory and Compiler Design
55 pages
m433-نظرية المترجمات د عبدالباقي
No ratings yet
m433-نظرية المترجمات د عبدالباقي
146 pages
TK3163 Sem2 2020 1MyCh1.1-1.2 Intro-20200211121547
No ratings yet
TK3163 Sem2 2020 1MyCh1.1-1.2 Intro-20200211121547
39 pages
Compiler
No ratings yet
Compiler
17 pages
Lec#1
No ratings yet
Lec#1
36 pages
Unit 1 - CD Cs3501
No ratings yet
Unit 1 - CD Cs3501
24 pages
Slides 01 - Compiler Construction - UET CS - Introduction
No ratings yet
Slides 01 - Compiler Construction - UET CS - Introduction
37 pages
Compiler Design Slide Chapter 1-6
No ratings yet
Compiler Design Slide Chapter 1-6
250 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
Introduction To Compiler Lexical Analysis Notes
No ratings yet
Introduction To Compiler Lexical Analysis Notes
21 pages
CS602PC - Compiler - Design - Lecture Notes - Unit - 1
100% (2)
CS602PC - Compiler - Design - Lecture Notes - Unit - 1
19 pages
Compiler Design
No ratings yet
Compiler Design
174 pages
compiler design unit 1 srm 21 regulation
No ratings yet
compiler design unit 1 srm 21 regulation
193 pages
Compiler Construction: Instructor: Aunsia Khan
No ratings yet
Compiler Construction: Instructor: Aunsia Khan
35 pages
CSC 415
No ratings yet
CSC 415
52 pages
Introduction To Compiler Design-Unit I
No ratings yet
Introduction To Compiler Design-Unit I
30 pages
Compiler Construction Complete Notes
No ratings yet
Compiler Construction Complete Notes
22 pages
Dive Into Sea of C
From Everand
Dive Into Sea of C
M Ashok
No ratings yet
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
From Everand
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
Dexter Rogers
No ratings yet
DCD, LDR
0% (1)
DCD, LDR
60 pages
Create TNG Project From E20xx Project
No ratings yet
Create TNG Project From E20xx Project
4 pages
log
No ratings yet
log
41 pages
WATCHDOG ASSIGNMENT - KISITU WILFRED
No ratings yet
WATCHDOG ASSIGNMENT - KISITU WILFRED
4 pages
ATmga48!88!168 Datasheet
No ratings yet
ATmga48!88!168 Datasheet
442 pages
18TE72 - Wireless Communication Experiential Learning Report
No ratings yet
18TE72 - Wireless Communication Experiential Learning Report
10 pages
Delta Force BHD: Object Creation Tutorial
No ratings yet
Delta Force BHD: Object Creation Tutorial
12 pages
DD Quiz
No ratings yet
DD Quiz
2 pages
Cbse Class 12 Computer Science Term1 Solved Question Paper 2022
No ratings yet
Cbse Class 12 Computer Science Term1 Solved Question Paper 2022
20 pages
About Function Modules & Explanation
No ratings yet
About Function Modules & Explanation
4 pages
HP 6P/6MP: Printer User's Manual
No ratings yet
HP 6P/6MP: Printer User's Manual
142 pages
Cyber Cafe Synopsis
No ratings yet
Cyber Cafe Synopsis
5 pages
ES MCQ CDAC
No ratings yet
ES MCQ CDAC
54 pages
Matlab Simulink DLL in CANape
No ratings yet
Matlab Simulink DLL in CANape
78 pages
Freeform Windchill Add-On User Guide
No ratings yet
Freeform Windchill Add-On User Guide
34 pages
3710213_merged
No ratings yet
3710213_merged
7 pages
Unit-Iv CNC: Direct Numerical Control (DNC)
No ratings yet
Unit-Iv CNC: Direct Numerical Control (DNC)
4 pages
DevOps Antipatterns
No ratings yet
DevOps Antipatterns
50 pages
VestaCP API
No ratings yet
VestaCP API
17 pages
Master Thesis Raspberry Pi
100% (2)
Master Thesis Raspberry Pi
6 pages
Question Bank: UNIT I - 8086 Microprocessor Part - A (2 Marks)
100% (1)
Question Bank: UNIT I - 8086 Microprocessor Part - A (2 Marks)
10 pages
Cloudcomputinglabmanual Final
No ratings yet
Cloudcomputinglabmanual Final
66 pages
Microsoft Introduction and Products
100% (2)
Microsoft Introduction and Products
6 pages
J Link
100% (1)
J Link
68 pages
Intel SSD Firmware Update Tool 3 0 7 Release Notes-328292-030US
No ratings yet
Intel SSD Firmware Update Tool 3 0 7 Release Notes-328292-030US
20 pages
Whatsnew 6767
No ratings yet
Whatsnew 6767
15 pages
(M7-TECHNICAL) Network and Internet - Capistrano
No ratings yet
(M7-TECHNICAL) Network and Internet - Capistrano
3 pages
Lab 6 - SSD
No ratings yet
Lab 6 - SSD
8 pages
Data Transfer, Sorting
No ratings yet
Data Transfer, Sorting
3 pages