100% found this document useful (1 vote)
207 views

System Software Notes

This document contains notes from a CS1203 System Software class. It defines system software as programs that support computer operation like editors, compilers, operating systems. It discusses the Simplified Instructional Computer (SIC) architecture in two versions - standard and extended (SIC/XE). The standard SIC has 5 registers, 2 addressing modes, and basic instructions. The SIC/XE adds 3 registers, floating point support, and 2 additional instruction formats and addressing modes. Both aim to model real computers while avoiding complexity.

Uploaded by

Theerthesh Gowda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
207 views

System Software Notes

This document contains notes from a CS1203 System Software class. It defines system software as programs that support computer operation like editors, compilers, operating systems. It discusses the Simplified Instructional Computer (SIC) architecture in two versions - standard and extended (SIC/XE). The standard SIC has 5 registers, 2 addressing modes, and basic instructions. The SIC/XE adds 3 registers, floating point support, and 2 additional instruction formats and addressing modes. Both aim to model real computers while avoiding complexity.

Uploaded by

Theerthesh Gowda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 97

CS1203-System

Software notes
Monday, 8 April 2013

cs1203-SYSTEM SOFTWARE NOTES

UNIT-I
INTRODUCTION
Definition of System software:
System software consists of a variety ofprograms that support the
operation of a computer.
Examples:
Text editor, compiler, loader or linker,debugger, macro processors, operating
system,database management systems, softwareengineering tools, etc.
I.SYSTEM SOFTWARE AND MACHINE ARCHITECTURE
One characteristic in which most system software differs from application
software is machine dependency
System software – support operation and use of computer. Application
software - solution to a problem
System programs are intended to support the operation and use of the
computer itself, rather than any particular application.
Example:
 Assemblers translate mnemonic instructions into machine code, the instruction
formats, addressing modes, etc., are of direct concern in assembler design.
 Compilers generate machine code, taking into account such hardware
characteristics as the number and type of registers & machine instruction
available.
 Operating system concerned with the management of nearly all resources of a
computing system.
There are some aspects of system software that do not directly depend
upon the type of computing system being supported ie Machine independency.
Example:
 Mostly Assembler’s general design and logic is basically same on all
computers.
 Compilers use some code optimization techniques like common subexpression
elimination, dead-code elimination etc.
 The process of linking is also machine independent.
Because most system software is machinedependent, we must include
real machines and real pieces of software in our study.
Simplified Instructional Computer (SIC)
SIC is a hypothetical computer that has been carefully designed to
include the hardware features most often found on real machines, while
avoiding unusual or irrelevant complexities.
II.THE SIMPLIFIED INSTRUCTIONAL COMPUTER (SIC)
SIC comes in two versions
 SIC (Standard model)
 XE (“extra equipment”) “extra equipments”, “extra expensive”
The two versions have been designed to be upward compatible, ie., an
object program for the standard SIC machine will also execute properly on a
SIC/XE system.
1.SIC MACHINE ARCHITECTURE:
Memory:

 Memory consists of 8-bit bytes


 Any 3 consecutive bytes form a word (24 bits)
 On SIC,all addresses are byte addresses.
 A word is addressed by the location of its lowest numbered byte.
 Total of 32768 (215) bytes in the computer Memory
Registers:
Five registers, each has special use.
Each register is 24 bits in length
Mnemonic Number Special use
A 0 Accumulator( used for arithmetic
operations)
X 1 Index register(used for Addressing)
L 2 Linkage register(the jump to subroutine
instruction stores the return address in
this
register.)
PC 8 Program counter ,contains the address of
the next instruction to be fetched for
execution.
SW 9 Status word, contains a variety of
information, including a Condition
Code.
Data Formats:
 Integers are stored as 24-bit binary number
 2’s complement representation for negative values
 Characters are stored using 8-bit ASCII codes
 No floating-point hardware on the standard version of SIC
Instruction Formats:
 Standard version of SIC
 24 bits
8 1 15
Opcode X Address

The flag bit x is used to indicate indexed-addressing mode.


Addressing Modes :
There are two addressing modes, indicated by the setting of the x bit in the
instruction.
Target Address Calculation
Mode Indication Target Address Calculation

Direct x=0 TA= address


Indirect x=1 TA= address + ( X )

Parentheses are used to indicate the contents of a register or a memory location.


For example, ( X ) represents the contents of register X.
Example:
Direct addressing mode
LDA TEN
0000 0000 0 001 0000 0000 0000
0 0 1 0 0 0
Opcode X TEN
Effective address(EA)=1000
Content of the address 1000 is loaded to Accumulator.
Indexed addressing mode
STCH BUFFER,X
0101 0100 1 001 0000 0000 0000
5 4 1 0 0 0
Opcode X BUFFER
Effective address(EA)=1000+[X]
=1000+content of the index register X
The Accumulator Content, the character is loaded to the Effective
address.
Instruction Set:
SIC provides a basic set of instructions that are sufficient for most simple
tasks.

 Load and Store registers (LDA, LDX, STA, STX, etc.,)


 Integer Arithmetic Operations (ADD, SUB, MUL, DIV). All arithmetic
operations involve register A and a word in memory, this instruction sets
a condition code (CC) to indicate the result (<, =, or >).
 Conditional jump instructions (JLT, JEQ, JGT) can test the setting of CC
and jump accordingly.
 For Subroutine Linkage (JSUB jumps subroutine, placing the address in
register L, RSUB returns by jumping to the address contained in register
L)
Input and Output:
 On SIC, input and output are performed by transferring 1 byte at a time
to or from the rightmost 8 bits of register A.
 Each device is assigned a unique 8-bit code.
 The Test Device (TD) instruction tests whether the addressed device is
ready to send or receive a byte of data.
 Condition is set, if < means the device is read to send or receive and =
mean the device is not ready.
 If the device is ready then execute a Read Data (RD) or Write Data
(WD). This sequence is repeated for each byte of data to be read or
written.
2.SIC/XE MACHINE ARCHITECTURE
Memory:
Maximum memory available on a SIC/XEsystem is 1 megabyte (220
bytes).
Registers:
Additional registers are provided by SIC/XE
Mnemonic Number Special Use
B 3 Base register, used for addressing
S 4 General working register-no special use.
T 5 General working register-no special use.
F 6 Floating point accumulator
Data Formats:
In addition to SIC data formats there is a 48-bit floating- point data type with
the following format.
1 11 36
S Exponent Fraction

 The fraction is interpreted as a value between 0 & 1.


 For normalized floating-point numbers, the high order bit of the fraction must
be 1. The exponent is interpreted as an unsigned binary number between 0 &
2047.
 If the exponent has value e & the fraction has value f, the absolute value of the
number represented is
f*2(e-1024).
 The sign of floating point number is indicated by the value ofS (0 = +ve & 1 = -
ve).
Instruction Formats :
Four kinds of instruction format.
Instruction which do not refer any memory consists of format 1 and 2.
Format 1(1 byte)
op

Example: RSUB(return to subroutine)


0100 1100

opcode

4 C
Format 2 (2 bytes)
8 4 4
Op r1 r2
Example: COMPR A,S(Compare the contents of registers A & S)
Opcod e A S
1010 0000 0000 0100
8-bit 4-bit 4-bit
2 bytes
A 0 0 4 object code
Format 3(3 bytes)
6 1 1 1 1 1 1 12
Op N i x b p e Disp

Example: LDA #3(Load 3 to Accumulator A)

6 1 1 1 1 1 1 12
0000 00 0 1 0 0 0 0 0000 0000 0011
Opcode n I x b p e
0 1 0 0 0 3 object code
Format 4(4 bytes)
6 1 1 1 1 1 1 20
Op N i x b p e address

Example: +JSUB RDREC(jump to the address, 1036)


6 1 1 1 1 1 1 20
0100 10 1 1 0 0 0 1 0000 0001 0000 0011 0110
Opcode n I x b p e
4 B 1 0 1 0 3 6
Object code
Addressing Modes :
Mode Indication TargetAddress
Calculation

Base relative b=1,p=0 TA= (B) + disp


(0≤disp≤4095)
Program counter relative b=0,p=0 TA= (pc)+ disp
(-2048≤disp≤4095)
For base relative addressing the disp in format 3 is interpreted
as a 12 bit unsigned integer.
 For program counter relative addressing this field is interpreted as a 12 bit
signed integer, with –ve values represented in 2‟s complement notation.
 For format 3 both b and p are set to 0,disp field is taken to be the target address.
For format 4 both b and p are set to 0, the target address is taken from the
address field. This is Direct Addressing.
 Any of these addressing mode is combined with indexed addressing if bit x=1,
the term (X) is added to target address.ieIndexed addressing
 For format3 & 4 If Bit i=1 & n=0, the target address itself is used as the
operand value, no memory reference is performed. This is immediate
addressing.
 If Bit i=0 & n=1, the value contained in this word is then as the address the
operand value. This is indirect addressing.
 If Bit i=0 & n=0 or i=1 & n=1, the target address is taken as the location of the
operand. This is simple addressing.
Example:
SIC/XE instructions and addressing modes
Figure 1.1 a shows the contents of register B,PC and X and of selected
memory locations.
Figure 1.1 b gives the machine code for a series of LDA instructions.
Instruction Set:
Instructions to load and store the new registers
• LDB, STB, etc.
Floating-point arithmetic operations
• ADDF, SUBF, MULF, DIVF
Register move instruction
• RMO
Register-to-register arithmetic operations
• ADDR, SUBR, MULR, DIVR

Supervisor call instruction


• SVC-> It produces an interrupt which could be used for communication
with the operating system.
Input and Output:

In addition to SIC, there are I/O channels that can be used to perform
input and output while CPU is executing other instructions. This allows overlap
of computing and I/O, resulting in more efficient system operation.
The instruction are

 SIO-> Begin or start the input/output channel operation.


 TIO->Test the input/output channel operation.
 HIO-> halt the input/output channel operation.
SIC PROGRAMMING EXAMPLES
SIC and SIC/XE assembler language programming.
Data movement Operations

Arithmetic Operations:
Looping and Indexing operations:
Input Output operations:
Subroutine Calls:
UNIT II
ASSEMBLERS
Definition:
Assembler is system software which is used to convert an assembly
language program to its equivalent object code.
The input to the assembler is a source code written in assembly
language (using mnemonics) and the output is the object code. Assigning
machine addresses to symbolic labels
The design of an assembler depends upon the machine architecture as
the language used is mnemonic language.

I.BASIC ASSEMBLER FUNCTIONS

Source
Program ASSEMBLERObject Code
The basic assembler functions are:
 Translating mnemonic language code to its equivalent object code.
 Assigning machine addresses to symbolic labels.
The design of assembler in other words:
 Convert mnemonic operation codes to their machine language equivalents
 Convert symbolic operands to their equivalent machine addresses
 Decide the proper instruction format Convert the data constants to internal
machine representations
 Write the object program and the assembly listing.
Inaddition to the mnemionic machine instructions, the assembly language
program contains the following assembler directives.
START: Specify name & starting address.
END: End of the program, specify the first execution instruction.
BYTE: Generate character or hexadecimal constant, occupying as many
bytes as needed to represent the constant.
WORD: Generate one word integer constant.
RESB: Reserve the indicated number of bytes for a data area.
RESW: Reserve the indicated number of words for a data area.
Figure 2.1 Shows the assembler language program for the basic version

1 .A simple SIC Assembler

Fig 2.2 shows the same program as in Fig 2.1, with the generated object code
for each statement.
LOC Gives the machine address for each part of the assembled program.
Line numbers for references and not part of the program.

The translation of source program to object code requires us to accomplish


the following functions:
1) Convert mnemonic operation codes to their machine language equivalents – e.g.,
translates STL to 14 (line 10);
2) Convert symbolic operands to their equivalent machine addresses – e.g.,
translate RETADR to 1033 (line 10);
3) Build the machine instructions in the proper format;
4) Convert the data constants specified in the source program into their internal
machine representations – e.g., translate EOF to 454F46 (line 80);
5) Write the object program and the assembly listing.

Considering the statement of line 10, this instruction contains


aFORWARD REFERENCE
A reference to a label that is defined later in the program.
If we attempt to translate the program line by line, we will be unable to
process this statement because we do not know the address that will be assigned
to RETADR.
Because of this, most assemblers make two passes over the source program.

 The first pass scans the source program for label definitions and assigns
addresses.
 The second pass performs most of the actual translation.
In addition to translating the instructions of the source program, the
assembler must process statements called assembler directives (or pseudo-
instructions). These statements are not translated into machine instructions.
Instead they provide instructions to the assembler itself.
Example: BYTE and WORD, which direct the assembler to generate
constants as part of the object program.
 The assembler must write the generated object code onto some output device.
This object program will later be loaded into memory for execution.
The simple object program format contains three types of records
Headercontain program name, starting address and length.
Texttranslated instructions and data of the program.
Endmarks the end of the object program.
End Record
Col.1 E
Col.2-7 Address of first executable instruction in object program
(hexadecimal)
Functions of the two passes of simple assembler
Pass 1(define symbols)
1. Assign addresses to all statements in the program
2. Save the values(addresses) assigned to all labels for use in pass 2.
3. Perform some processing of assembler directives.

Pass 2(assemble instructions and generate object program)


1. Assemble the instructions (translating operation codes and looking up
addresses).
2. Generate data values defined by BYTE, WORD etc.
3. Perform the processing of the assembler directives not done duringpass-1.
4.Write the object program and assembler listing.

2. Assembler Algorithms and Data structure


The simple assembler uses two major internal data structures:
The operation Code Table (OPTAB)
The Symbol Table (SYMTAB).
OPTAB: It is used to lookup mnemonic operation codes and translates them to
their machine language equivalents.
It must contain (at least) the mnemonic operation code and its machine
language equivalent.
In pass 1 the OPTAB is used to look up and validate the operation code in
the source program. In pass 2, it is used to translate the operation codes to
machine language.
In pass 2 we take the information from OPTAB to tell us which
instruction format to use in assembling the instruction, and any peculiarities of
the object code instruction.
OPTAB is usually organized as a hash table, with mnemonic operation
code as the key. The hash table organization is particularly appropriate, since it
provides fast retrieval with a minimum of searching.
Most of the cases the OPTAB is a static table- that is, entries are not
normally added to or deleted from it. In such cases it is possible to design a
special hashing function or other data structure to give optimum performance
for the particular set of keys being stored.

SYMTAB: This table includes the name and value for each label in the source
program, together with flags to indicate the error conditions (e.g., if a symbol is
defined in two different places).
During Pass 1: labels are entered into the symbol table along with their assigned
address value as they are encountered. All the symbols address value should get
resolved at the pass 1.
During Pass 2: Symbols used as operands are looked up the symbol table to
obtain the address value to be inserted in the assembled instructions.
SYMTAB is usually organized as a hash table for efficiency of insertion
and retrieval. Since entries are rarely deleted, efficiency of deletion is the
important criteria for optimization.
LOCCTR:
LOCCTR is initialized to the beginning address mentioned in the
START statement of the program.
After each statement is processed, the length of the assembled instruction
is added to the LOCCTR to make it point to the next instruction. Whenever a
label is encountered in an instruction the LOCCTR value gives the address to be
associated with that label.
There is certain information (such as location counter values and error flags
for statements) that can or should be communicated between the two passes. For
this reason, Pass 1 usually writes an inter-mediate file that contains each source
statement together with its assigned address, error indicators, etc. This file is
used as the input to Pass 2.
Figures 2.4 (a) and (b) (Page 53~54) show the logic flow of the two passes of our
assembler.

Here the first input line is read from the intermediate file.
If the opcode is START, then this line is directly written to the list file. A
header record is written in the object program which gives the starting address
and the length of the program (which is calculated during pass 1).
Then the first text record is initialized. Comment lines are ignored. In the
instruction, for the opcode the OPTAB is searched to find the object code.
If a symbol is there in the operand field, the symbol table is searched to
get the address value for this which gets added to the object code of the opcode.
If the address not found then zero value is stored as operands address. An
error flag is set indicating it as undefined. If symbol itself is not found then store
0 as operand address and the object code instruction is assembled.
If the opcode is BYTE or WORD, then the constant value is converted to
its equivalent object code( for example, for character EOF, its equivalent
hexadecimal value ‘454f46’ is stored).
If the object code cannot fit into the current text record, a new text record is
created and the rest of the instructions object code is listed. The text records are
written to the object program.
Once the whole program is assemble and when the END directive is
encountered, the End record is written.
II.MACHINE-DEPENDENT ASSEMBLER FEATURES:
 Instruction formats and addressing modes
 Programrelocation

Prefix to operands: @ - indirect addressing; # - immediate operands; + - extended


instruction format.
Instructions that refer to memory are normally assembled using either
the program-counter relative or the base relative mode. The assemble directive
BASE (Fig 2.5, line 13) is used in conjunction with base relative addressing.
The main differences between Fig 2.5 (SIC/XE) and Fig 2.1 (SIC) involve
the use of register-to-register instructions (lines 150, 165). In
addition, immediate addressing and indirect addressing have been used as
much as possible (lines 25, 55, and 70). These changes take advantages of the
more advanced SIC/XE architecture.
Because register-to-register instructions are faster than the corresponding
register-to-memory operations.
Also immediate addressing when used the operand is already present as
part of instructions and need not be fetched from anywhere.
Used the indirect addressing will avoid the requirement for another
instruction.

1. Instruction Formats and Addressing Modes


Fig 2.6 shows the object code generated for each statement in the program of
Fig 2.5.

Key points of this subsection:


The translation of the source program, and the handling of
differentinstruction formats and different addressing modes.
START statement specifies a beginning program address of 0.
Translation of register-to-register instructions (such as CLEAR – line 125,
COMPR – line 150):
The assembler must simply convert the mnemonic operation code to
machine language (using OPTAB) and change each register mnemonic to its
numeric equivalent.
Register-to-memory instructions:
assembled using either program-counter relative or base relative addressing;
The assembler must, in either case, calculate a displacement to be assembled as
part of the object instruction.
a) When the displacement is added to the contents of the program counter (PC) or
the base register (B), the correct target address must be computed.
b) The resulting displacement must be small enough to fit in the 12-bit field in the
instruction. This means that the displacement must be between 0 and 4095 (for
base relative mode) or between –2048 and +2047 (for program-counter relative
mode).
If neither program-counter relative nor base relative addressing can be used
(because the displacements are too large), then the 4-byte extended instruction
format (20-bit displacement) must be used.
Example: 15 0006 CLOOP +JSUB RDREC 4B101036
(bit e set to 1 to indicate extended instruction format)
 specify the extended format by using the prefix + (line 15).
If extended format is not specified, the assembler first attempts to translate
the instruction using program-counter relative addressing.
If this is not possible (out of range), the assembler then attempts to usebase
relative addressing.
If neither form is applicable and the extended format is not specified, then
the instruction cannot be properly assembled and the assembler must generate
an error message.
Example: the displacement calculation for program- counter relative and base
relative addressing mode -
A typical example of program-counter relative assembly:
10 0000 FIRST STL RETADR 17202D
1) Note that the program counter is advanced after each instruction is fetched
and before it is executed.
2) While STL is executed, PC will contain the address of the nextinstruction (0003),
where RETADR (line 95) is assigned the address 0030.
3) The displacement we need in the instruction is 30 – 3 = 2D, that is,target
address = (PC) + disp = 3 + 2D = 30.
4) Note that bit p = 1 to indicate PC relative addressing, making the last 2 bytes of
the instruction 202D.
Another example of PC relative addressing:
40 0017 J CLOOP 3F2FEC
The operand address (CLOOP=0006); during instruction execution, the
PC=001A. Thus the displacement = 6 – 1A = -14 (using 2’s complement for
negative number in a 12-bit field = FEC).
The displacement calculation process for base relative addressing is much the
same as for PC relative addressing.
The main difference is that the assembler knows what the contents of the
PC will be at execution time.
Therefore, the programmer must tell the assembler what the base register
will contain during execution of the program so that the assembler can compute
displacements. This is done in our example with the assembler
directive BASE (line 13).
In some case, the programmer can use another assembler
directiveNOBASE to inform the assembler that the contents of the base register
can no longer be relied upon for addressing.
Example for base relative assembly:
160 104E STCH BUFFER,X 57C003
1) According to the BASE statement, register B = 0033 (the address of LENGTH)
during execution.
2) The address BUFFER is 0036.
3) Thus the displacement in the instruction must be 36-33=3.
4) Note that bits x and b are set to 1 to indicate indexed and base relative
addressing.
Immediate addressing mode: the assembly of instruction with immediate
addressing is to convert the immediate operand to its internal representation and
insert it into the instruction.
Example:
55 0020 LDA #3 010003
1) The operand stored in the instruction is 003.
2) Bit i = 1 to indicate immediate addressing.
Another example:

133 103C +LDT #4096 75101000


In this case, the operand (4096) is too large to fit into the 12-bit
displacement field, so the extended instruction format is called for. (If the
operand were too large even for this 20-bit address field, immediate addressing
could not be used.)
A different way of using immediate addressing is shown in the instruction
12 0003 LDB # LENGTH 69202D
1) The immediate operand is the symbol LENGTH.
2) Since the value of this symbol is the address assigned to it, this immediate
instruction has the effect of loading register B with the address of LENGTH.
3) Note that we have combined PC relative addressing with immediate addressing.
(PC = 0006, LENGTH = 0033, disp = 0033 – 0006 = 002D)
2.Program Relocation
Sometimes the actual starting address of the program is not known until
load time. Hence the relocatable program helps in load the program into
memory wherever there is a room for it.
Example:
55 101B LDA THREE 00102D. (fig 2.1)
In the object program (Fig 2.3), this statement is translated as 00102D,
specifying that register A is to be loaded from memory address 102D.
In reality, the assembler does not know the actual location where the
program will be loaded. However, the assembler can identify for the loaderthose
parts of the object program that need modification. An object program that
contains the information necessary to perform this kind of modification is called
a relocatable program.
Fig 2.7 shows different places (0000, 5000, 7420) for locating a program.
For example, in the instruction “+JSUB RDREC”, the address of RDREC is
1036(0000), 6036(5000), 8456(7420).
How to modify the address of RDREC according to different relocating
address?

The solution to the relocation problem:


1) When the assembler generates the object code for JSUB
instruction, it will insert the address of RDREC relative to the start of the
program. (This is the reason we initialized the location counter to 0 for
the assembly.)
2) The assembler will also produce a command for the loader, instructing it
to add the beginning address of the program to the address field in the JSUB
instruction at load time.
The command for the loader must also be a part of the object program. It
can be accomplished by a modification record.
Modification record:
Col.1 M
Col.2-7 Starting location of the address field to be modified, relative
to
the beginning of the program.
Col.8-9 Length of the address field to be modified in half
bytes(hexadecimal)

 length field of a modification record is stored in half-bytes .

 The starting location field of a modification record is the location of the byte
containing the leftmost bits of the address field to be modified.

Example:
The modification record for the +JSUB instruction would be
“M00000705”.
This record specifies that the beginning address of the program is to be
added to a field that begins at address 000007 (relative to the start of the
program) and is 5 half-bytes in length.
Suppose some of the instructions are not required to be modified because:
The operand of the instruction is not a memory address.
Operand is using PC-relative or base relative addressing modes.

Obviously, the only parts of the program that require modification at load
time are those that specify direct (as opposed to relative) addresses.
Fig 2.8 shows the complete object program corresponding to the source
program of Fig 2.5.

III.MACHINE-INDEPENDENT ASSEMBLER FEATURES


Key points of this section: the implementation of literals within an
assembler, two assembler directives (EQU and ORG), the use of expressions in
assembler language, program blocks and control sections.
1.Literals
A literal is identified with the prefix =, which followed by a specification
of the literal value.
Example: 45 001A ENDFIL LDA =C’EOF’ 032010
specifies a 3-byte operand with value ‘EOF’.
Difference between a literal and immediate operand:
1. With immediate addressing, the operand value is assembled as part of the
machine instruction.
2. With a literal, the assembler generates the specified value as a constant at some
other memory location. The address of this generated constant is used as target
address for the machine instruction.

All of the literal operands used in a program are gathered together into
one or more literal pools. Normally literals are placed into a pool at
the end of the program.
In some cases,places literals into a pool at some other location in
the object program. To allow this we introduce the assembler directive
LTORG.
Whenever the LTORG is encountered, it creates a literal pool that
contains all the literal operands used since the beginning of the
program. The literal pool definition is done after LTORG is encountered.
It is better to place the literals close to the instructions.
A literal table is created for the literals which are used in the
program. The literal table contains the literal name, operand value and
length. The literal table is usually created as a hash table on the literal
name.
1.When the assembler encounters a LTORG statement, it creates a literal pool that
contains all of the literal operands used since the previous LTORG (or the
beginning of the program).
2. This literal pool is placed in the object program at the location where the LTORG
directive was encountered (Fig 2.10).
3. Of course, literals placed in a pool by LTORG will not be repeated in the pool at
the end of the program.

If we had not used the LTORG statement on line 93, the literal =C’EOF’
would be placed in the pool at the end of the program.
 Most assemblers recognize duplicate literals – that is, the same literal used in
more than one place in the program – and store only one copy of the specified
data value. For example, the literal =X’05’ is used in our program on lines 215
and 230.
 How to find the duplicate literals? The easiest way to recognize duplicate
literals is by comparison of the character strings defining them (the string
=X’05’).
 The basic data structure that assembler handles literal operands isliteral
table LITTAB. For each literal used, this table contains theliteral name,
the operand value and length, and the address assigned to the operand when it is
placed in a literal pool.
 LITTAB is often organized as a hash table, using the literal name or value as
the key. During pass 1, the assembler searches LITTAB for the specified literal
name (or value). If the literal is already present in the table, no action is needed.
If it is not present, the literal is added to LITTAB (leaving the address
unassigned).
 During pass 2, the operand address for use in generating object code is obtained
by searching LITTAB for each literal operand encountered.

2. Symbol-Defining Statements
EQU Statement:

Most assemblers provide an


assembler directive that allows the programmer to define symbols and specify
their values. The directive used for this EQU (Equate). The general form of the
statement is
Symbol EQU value
This statement defines the given symbol (i.e., entering in the SYMTAB)
and assigning to it the value specified. The value can be aconstant or an
expression involving constants and any other symbol which is already defined.
 One common usage is to define symbolic names that can be used to improve
readability in place of numeric values. For example
+LDT #4096
This loads the register T with immediate value 4096, this does not clearly
what exactly this value indicates. If a statement is included as:
MAXLEN EQU 4096 and then
+LDT #MAXLEN (LINE 133)
 Then it clearly indicates that the value of MAXLEN is some maximum length
value. When the assembler encounters EQU statement, it enters the symbol
MAXLEN along with its value in the symbol table.
 During LDT the assembler searches the SYMTAB for its entry and its
equivalent value as the operand in the instruction.
 Another common usage of EQU statement is for defining values for the
general-purpose registers.
A EQU 0
X EQU 1 and so on
These statements will cause the symbols A, X, L… to be entered into the
symbol table with their respective values.
 As another usage if in a machine that has many general purpose registers named
as R1, R2,…, some may be used as base register, some may be used as
accumulator. Their usage may change from one program to another. In this case
we can define these requirement using EQU statements.
BASE EQU R1
INDEX EQU R2
COUNT EQU R3
One restriction with the usage of EQU is whatever symbol occurs in the
right hand side of the EQU should be predefined. For example, the following
statement is not valid:
BETA EQU ALPHA
ALPHA RESW 1
As the symbol ALPHA is assigned to BETA before it is defined. The value of
ALPHA is not known.
ORG Statement:
This directive can be used to indirectly assign values to the symbols.
The directive is usually called ORG (for origin).

Its general format is:


ORG value
Where value is a constant or an expression involving constants and
previously defined symbols.
When this statement is encountered during assembly of a program, the
assembler resets its location counter (LOCCTR) to the specified value. Since
the values of symbols used as labels are taken from LOCCTR, the ORG
statement will affect the values of all labels defined until the next ORG is
encountered.
ORG is used to control assignment storage in the object program.
Sometimes altering the values may result in incorrect assembly.
ORG can be useful in label definition. Suppose we need to define a
symbol table with the following structure:
SYMBOL 6 Bytes
VALUE 3 Bytes
FLAG 2 Bytes
The table looks like the one given below.

The SYMBOL field contains a 6-byte user-defined symbol;


VALUE is a one-word representation of the value assigned to the symbol;
FLAG is a 2-byte field specifies symbol type and other information.
The space for the table can be reserved by the statement:
STAB RESB 1100
If we want to refer to the entries of the table using indexed addressing,
place the offset value of the desired entry from the beginning of the table in the
index register.
To refer to the fields SYMBOL, VALUE, and FLAGS individually, we
need to assign the values first as shown below:
SYMBOL EQU STAB
VALUE EQU STAB+6
FLAGS EQU STAB+9
To retrieve the VALUE field from the table indicated by register X, we
can write a statement:
LDA VALUE, X
The same thing can also be done using ORG statement in the following way:
STAB RESB 1100
ORG STAB
SYMBOL RESB 6
VALUE RESW 1
FLAG RESB 2
ORG STAB+1100
 The first statement allocates 1100 bytes of memory assigned to label STAB.
 In the second statement the ORG statement initializes the location counter to the
value of STAB. Now the LOCCTR points to STAB.
 The next three lines assign appropriate memory storage to each of SYMBOL,
VALUE and FLAG symbols.
 The last ORG statement reinitializes the LOCCTR to a new value after skipping
the required number of memory for the table STAB (i.e., STAB+1100).
While using ORG, the symbol occurring in the statement should be predefined
as is required in EQU statement.
For example for the sequence of statements below:
ORG ALPHA
BYTE1 RESB 1
BYTE2 RESB 1
BYTE3 RESB 1
ORG
ALPHA RESB 1
In first pass, as the assembler would not know what value to assign to
ALPHA, the other symbol in the next lines also could not be defined in the
symbol table. This is a kind of problem of the forward reference.
3.Expressions:
Assemblers also allow use of expressions in place of operands in the
instruction. Each such expression must be evaluated to generate a single
operand value or address.
Assemblers generally arithmetic expressions formed according to the
normal rules using arithmetic operators +, - *, /. Division is usually defined to
produce an integer result.
Individual terms may be constants, user-defined symbols, or special
terms. The only special term used is * ( the current value of location
counter) which indicates the value of the next unassigned memory location.
Thus the statement
106 BUFFEND EQU *
Assigns a value to BUFFEND, which is the address of the next byte
following the buffer area.
Expressions are classified as either absolute expression or relative
expressions depending on the type of value they produce.
Absolute Expressions:
The expression that uses only absolute terms is absolute expression.
Absolute expression may contain relative term provided the relative terms occur
in pairs with opposite signs for each pair.
Example:
MAXLEN EQU BUFEND-BUFFER
Relative Expressions:
All the relative terms except one can be paired as described in “absolute”.
The remaining unpaired relative term must have a positive sign.
Example:
STAB EQU OPTAB + (BUFEND – BUFFER)
Handling the type of expressions:
To find the type of expression, we must keep track the type of symbols
used. This can be achieved by defining the type in the symbol table against each
of the symbol as shown in the table below:

With this information the assembler can easily determine the type of each
expression used as an operand and generate modification recoprds in the object
programm for relative values.
4.Program Blocks:
Program blocks are referred to be segments of code that are rearranged
within a single object program unit, and control sections (appeared in next
subsection) to be segments that are translated into independent object program
units.
Fig 2.11 shows our example program, as it might be written using program
blocks.
Three blocks are used: The first (unnamed) program block contains the
executable instructions of the program. The second (named CDATA) contains
all data areas that are a few words or less in length. The third(named CBLKS)
contains all data areas that consist of larger blocks of memory.

The assembler directive USE indicates which portions of the source


program belong to the various blocks.
The beginning of program begins Default block (unnamed)
Line 92 signals the beginning of CDATA
Line 103 begins the CBLK block
Line 123 resumes Default block
Line 183 resumes CDATA
Line 208 resumes Default block
Line 252 resumes CDATA
At the end of Pass 1, the latest value of the location counter for each
block indicates the length of that block. The assembler can then assign to each
block a starting address in the object program (beginning with relative location
0).
For code generation during Pass 2, the assembler needs the address
for each symbol relative to the start of the object program (not the start
of an individual program block). This is easily found from the information
in SYMTAB.
The assembler simply adds the location of the symbol, relative
to the start of its block, to the assigned block starting address.
Fig 2.12 shows this process applied to our sample program. Notice that
the symbol MAXLEN (line 107) is shown without a block number. It is an
absolute symbol.

Example: 0006 0 LDA LENGTH 032060


SYMTAB shows the value of the operand (LENGTH) as relative location
0003 within program block 1 (CDATA). The starting address for CDATA is
0066. Thus the desired target address for this instruction is 0003+0066=0069.
We can see that the separation of the program into blocks as considerably
reduced our addressing problems. Because the large buffer area is moved to the
end of the object program, we no longer need to use extended format
instructions on lines 15, 35, and 65.
Fig 2.13 shows the object program corresponding to Fig 2.11. It does not
matter that the Text records of the object program are not in sequence by
address; the loader will simply load the object code from each record at the
indicated address.

Fig 2.14 traces the blocks of the example program through this process of assembly
and loading.

5.Control sections and program linking:

A control section is a part of the program that maintains its identity after
assembly; each such control section can be loaded and relocatedindependently
of the others. Different control sections are most often used for subroutines or
other logical subdivisions of a program.
Control sections differ from program blocks in that they are handled
separately by the assembler
Fig 2.15 shows three control sections: The first section continues (from
COPY) till the CSECT statement on line 109.

EXTDEF (external definition) and EXTREF (external reference).


The EXTDEF statement in a control section names symbols, calledexternal
symbols, that are defined in this control section and may be used by other
sections.
Control section names do not need to be named in an EXTDEF statement
because they are automatically considered to be external symbols.
The EXTREF statement names symbols that are used in this control section
and are defined elsewhere.
Fig 2.16 shows the generated object code for each statement in the program.
Example:
15 0003 CLOOP +JSUB RDREC 4B100000
The operand RDREC is named in the EXTREF statement for the control
section, so this is an external reference.
160 0017 +STCH BUFFER,X 57900000
This instruction makes an external reference BUFFER. The instruction is
assembled using extended format with an address of zero.

The assembler must include information in the object program that will
cause the loader to insert the proper values where they are required.
We need two new record types (Define and Refer) in the object program.

A Define record gives information about external symbols that are defined in
this control section – that is, symbols named by EXTDEF. (The record format
see page 89)
A Refer record lists symbols that are used as external reference by the control
section – that is, symbols named by EXTREF.

Fig 2.17 shows the object program corresponding to the source in Fig 2.16.
Notice that there is a separate set of object program records for each
control section.

Example: The address field for the JSUB on line 15 begins at relative address
0004. Its initial value in the object program is zero. The Modification record
‘M00000405+RDREC’ in control section COPY specifies that the address of
RDREC is to be added to this field, thus producing the correct machine
instruction for execution.
Example: The handling of line 190. The value of this word is to be
BUFEND-BUFFER, where both BUFEND and BUFFER are defined in
another control section. The assembler generates an initial value of zero for this
word. The last two Modification records in RDREC direct that the address of
BUFEND be added to this field, and the address of BUFFER be subtracted from
it. This computation, performed at load time, results in the desired value for the
data word.
5.ASSEMBLER DESIGN

 One-Pass Assembler
 Multi-Pass Assembler

1. One-Pass Assembler
The main problem in designing the assembler using single pass was to
resolve forward references.
We can avoid to some extent the forward references by:
 Eliminating forward reference to data items, by defining all the storage
reservation statements at the beginning of the program rather at the end.
 Unfortunately, forward reference to labels on the instructions cannot be
avoided. (forward jumping)
 To provide some provision for handling forward references by prohibiting
forward references to data items.

There are two types of one-pass assemblers:
 One that produces object code directly in memory for immediate execution
(Load-and-go assemblers).
 The other type produces the usual kind of object code for later execution.
Load-and-Go Assembler
 Load-and-go assembler generates their object code in memory for immediate
execution.
 No object program is written out, no loader is needed.
 It is useful in a system with frequent program development and testing
 The efficiency of the assembly process is an important consideration.
 Programs are re-assembled nearly every time they are run; efficiency of the
assembly process is an important consideration.
 A Load-and-go assembler avoids the overhead of writing the object program
out and reading it back in.
Forward Reference in One-Pass Assemblers:
In load-and-Go assemblers when a forward reference is encountered :
 Omits the operand address if the symbol has not yet been defined
 Enters this undefined symbol into SYMTAB and indicates that it is undefined
 Adds the address of this operand address to a list of forward references
associated with the SYMTAB entry
 When the definition for the symbol is encountered, scans the reference list and
inserts the address.
 At the end of the program, reports the error if there are still SYMTAB entries
indicated undefined symbols.
 For Load-and-Go assembler
o Search SYMTAB for the symbol named in the END statement and jumps to this
location to begin execution if there is no error
After Scanning line 40 of the program:
40 2021 J` CLOOP 302012
The status is that upto this point the symbol RREC is referred once at
location 2013, ENDFIL at 201F and WRREC at location 201C. None of these
symbols are defined.
The below figure shows the object code and symbol table entries as they
would be after scanning line 40 of the program and shows that how the pending
definitions along with their addresses are included in the symbol table.
The first forward reference occurred on line 15. Since the operand
(RDREC) was not yet defined, the instruction was assembled with no value
assigned as the operand address (denoted by ----).
RDREC was then entered into SYMTAB as an undefined symbol
(indicated by *); the address of the operand field (2013) of the instruction was
inserted in a list associated with RDREC.
A similar process was followed with the instructions on lines 30 and 35.

The status after scanning line 160, which has encountered the definition of
RDREC and ENDFIL is as given below:
By this time, some of the forward references (ENDFIL, line 45 and
RDREC, line 125) have been resolved, while others (EXIT, line 175 and
WRREC, line 210) have been added.
When the symbol ENDFIL was defined (known), the assembler placed its
value in the SYMTAB entry; it then inserted this value into the instruction
operand field (at address 201C) as directed by the forward reference list.
From this point on, any references to ENDFIL would not be forward
references, and would not be entered into a list.
 At the end of the program, any SYMTAB entries that are still marked with *
indicate undefined symbols. These should be flagged by the assembler as errors.
 One-pass assemblers that produce object programs follow a slightly different
procedure from that previously described.
1) Forward references are entered into lists as before.
2) When the definition of a symbol is encountered, instructions that made
forward references to that symbol may no longer available in memory for
modification. In general, they will already have been written out as part of a
Text record in the object program. In this case, the assembler must
generateanother Text record with the correct operand address.
3) When the program is loaded, this address will be inserted into the instruction
by the action of the loader.
Fig 2.20 illustrates the above process.
The 2nd Text record contains that object code generated from lines 10
through 40 in Fig 2.18. The operand addresses for the instructions on lines 15,
30, and 35 have been generated as 0000.
When ENDFIL on line 45 is encountered, the assembler generates the
rd
3 Text record. This record specifies that the value 2024 (the address of
ENDFIL) is to be loaded at location 201C (the operand address field of JEQ on
line 30).
When the program is loaded, the value 2024 will replace the 0000
previously loaded.
2. Multi-Pass Assembler:
 For a two pass assembler, forward references in symbol definition are not
allowed:
ALPHA EQU BETA
BETA EQU DELTA
DELTA RESW 1
o Symbol definition must be completed in pass 1.
 Prohibiting forward references in symbol definition is not a serious
inconvenience.
o Forward references tend to create difficulty for a person reading the program.
The symbol BETA cannot be assigned a value when it is
encountered during the first pass because DELTA has not yet been defined. As
a result, ALPHA cannot be evaluated during the second pass.
This means that any assembler that makes only two sequential
passes over the source program cannot resolve such a sequence of definitions.
Multi-Pass Assembler Example Program

Fig 2.21(b) displays symbol table entries resulting from Pass 1 processing of the
statement. The entry &1 indicates that one symbol in the defining expression is
undefined.
Fig 2.21(c) shows two undefined symbols involved in the definition: BUFEND
and BUFFER.

Fig 2.21(d) shows a new undefined symbol PREVBT (dependent on BUFFER)


is added.
Fig 2.21(e) shows that when BUFFER is encountered, PREVBT can be
determined accordingly.

Fig 2.21(f) shows that when BUFEND is defined, MAXLEN and HALFSZ can
be determined accordingly.
5. Implementaion Examples:
MASM Assembler:

UNIT III
LOADERS AND LINKERS

To execute an object program, we needs


Relocation, which modifies the object program so that it can be loaded at an
address different from the location originally specified
Linking, which combines two or more separate object programs and supplies
the information needed to allow references between them?
Loading and Allocation, which allocates memory location and brings the
object program into memory for execution
I.Basic Loader Functions:
1. Design of an absolute loader:
The operation of absolute loader is very simple. The object code is loaded
to specified locations in the memory. At the end the loader jumps to the
specified address to begin execution of the loaded program.
An example object program is shown in Fig 3.1(a).

For a simple absolute loader, all functions are accomplished in a single pass as
follows:
1) The Header record of object programs is checked to verify that the correct
program has been presented for loading.
2) As each Text record is read, the object code it contains is moved to the indicated
address in memory.
3) When the End record is encountered, the loader jumps to the specified address to
begin execution of the loaded program.

Fig 3.1(b) shows a representation of the program from Fig 3.1(a) after loading.
Fig 3.2 shows an algorithm for the absolute loader

It is very important to realize that in Fig 3.1(a), each printed


characterrepresents one byte of the object program record. In Fig 3.1(b), on the
other hand, each printed character represents one hexadecimal digit in memory
(a half-byte).
Therefore, to save space and execution time of loaders, most machines store
object programs in a binary form, with each byte of object code stored as a
single byte in the object program.
3.1.2 A Simple Bootstrap Loader
When a computer is first turned on or restarted, a special type of absolute
loader, called a bootstrap loader, is executed. This bootstrap loads the first
program to be run by the computer – usually an operating system.
Fig 3.3 shows the source code for our bootstrap loader. The bootstrap itself
begins at address 0 in the memory.
Note: each byte of object code to be loaded is represented on device F1 as two
hexadecimal digits just as it is in a Text record of a SIC object program.
1) The object code from device F1 is always loaded into consecutive bytes of
memory, starting at address 80. The
2) main loop of the bootstrap keeps the address of the next memory location to be
loaded in register X.
3) After all of the object code from device F1 has been loaded, the bootstrap jumps
to address 80, which begins the execution of the program that was loaded.
Much of the work of the bootstrap loader is performed by the subroutine GETC.
GETC is used to read and convert a pair of characters from device F1
representing 1 byte of object code to be loaded. For example, two bytes
=C“D8”􀃆‘4438’H converting to one byte ‘D8’H.
The resulting byte is stored at the address currently in register X, using
STCH instruction that refers to location 0 using indexed addressing.
The TIXR instruction is then used to add 1 to the value in X.
II. Machine-Dependent Loader Features
The absolute loader has several potential disadvantages. One of the most
obvious is the need for the programmer to specify the actual address at which it
will be loaded into memory.
Writing absolute programs also makes it difficult to use subroutine libraries
efficiently. This could not be done effectively if all of the subroutines had pre-
assigned absolute addresses.
The need for program relocation is an indirect consequence of the change to
larger and more powerful computers. The way relocation is implemented in a
loader is also dependent upon machine characteristics.
1.Relocation:
Loaders that allow for program relocation are called relocating loaders or
relative loaders.
Two methods for specifying relocation as part of the object program.
The first method: A Modification record is used to describe each part of the
object code that must be changed when the program is relocated.
Fig 3.4 shows a SIC/XE program we use to illustrate this first method of specifying

relocation.

Most of the instructions in this program use relative or immediateaddressing.


The only portions of the assembled program that contain actual
addresses are the extended format instructions on lines 15, 35, and 65.
Thus these are the only items whose values are affected by relocation.
Fig 3.5 displays the object program corresponding to the source in Fig 3.4.

Each Modification record specifies the starting address and lengthof


the field whose value is to be altered.
It then describes the modification to be performed.
In this example, all modifications add the value of the symbol COPY,which
represents the starting address of the program.
The Modification record is not well suited for use with all machine
architectures. Consider, for example, the program in Fig 3.6. This is
arelocatable program written for standard version for SIC.
The important difference between this example and the one in Fig 3.4 is that the
standard SIC machine does not use relative addressing.
In this program the addresses in all the instructions except RSUB must
modified when the program is relocated. This would require 31 Modification
records, which results in an object program more than twice as large as the one
in Fig 3.5.
The second method: Fig 3.7 shows this method applied to our SIC program
example

There are no Modification records.


The Text records are the same as before except that there is a relocation
bitassociated with each word of object code.
Since all SIC instructions occupy one word, this means that there is one
relocation bit for each possible instruction.
The relocation bits are gathered together into a bit mask following the
length indicator in each Text record. In Fig 3.7 this mask is represented (in
character form) as three hexadecimal digits.
If the relocation bit corresponding to a word of object code is set to 1, the
program’s starting address is to be added to this word when the program is
relocated.
A bit value of 0 indicates that no modification is necessary.
If a Text record contains fewer than 12 words of object code, the bits
corresponding to unused words are set to 0.
For example, the bit mask FFC (representing the bit string
111111111100) in the first Text record specifies that all 10 words of object code
are to be modified during relocation.
Example: note that the LDX instruction on line 210 (Fig 3.6) begins a new Text
record.
If it were placed in the preceding Text record, it would not be properly
aligned to correspond to a relocation bit because of the 1-byte data value
generated from line 185.
2. Program Linking
Consider the three (separately assembled) programs in Fig 3.8, each of
which consists of a single control section.
Consider first the reference marked REF1. For the first program
(PROGA), (1) REF1 is simply a reference to a label within the program. (2) It is
assembled in the usual way as a PC relative instruction. (3) No modification for
relocation or linking is necessary.
In PROGB, the same operand refers to an external symbol. (1) The
assembler uses an extended-format instruction with address field set to 00000.
(2) The object program for PROGB (Fig 3.9) contains a Modification record
instructing the loader to add the value of the symbol LISTA to this address
field when the program is linked.
For PROGC, REF1 is handled in exactly the same way.
The reference marked REF2 is processed in a similar manner.
REF3 is an immediate operand whose value is to be the difference between
ENDA and LISTA (that is, the length of the list in bytes).
In PROGA, the assembler has all of the information necessary to compute this
value.
During the assembly of PROGB (and PROGC), the values of the labels are
unknown. In these programs, the expression must be assembled as an external
reference (with two Modification records) even though the final result will be an
absolute value independent of the locations at which the programs are loaded.
 Consider REF4.
The assembler for PROGA can evaluate all of the expression in REF4
except for the value of LISTC. This results in an initial value of ‘000014’H and
one Modification record.
The same expression in PROGB contains no terms that can be evaluated by
the assembler. The object code therefore contains an initial value of 000000
and three Modification records.
For PROGC, the assembler can supply the value of LISTC relative to the
beginning of the program (but not the actual address, which is not known until
the program is loaded). The initial value of this data word contains the relative
address of LISTC (‘000030’H). Modification records instruct the loader to add
the beginning address of the program (i.e., the value of PROGC), to add the
value of ENDA, and to subtract the value of LISTA.
 Fig 3.10(a) shows these three programs as they might appear in memory
after loading and linking. PROGA has been loaded starting at address 4000,
with PROGB and PROGC immediately following.

For example, the value for reference REF4 in PROGA is located at address
4054 (the beginning address of PROGA plus 0054). Fig 3.10(b) shows the
details of how this value is computed.
The initial value (from the Text record) is 000014. To this is added the address
assigned to LISTC, which 4112 (the beginning address of PROGC plus 30).
3. Algorithm and Data Structures for a Linking Loader
The algorithm for a linking loader is considerably more complicated than
the absolute loader algorithm.
A linking loader usually makes two passes over its input, just as an
assembler does. In terms of general function, the two passes of a linking loader
are quite similar to the two passes of an assembler: Pass 1 assigns addresses to
all external symbols.
Pass 2 performs the actual loading, relocation, and linking.
The main data structure needed for our linking loader is an external symbol
table ESTAB. This table, which is analogous to SYMTAB in our assembler
algorithm, is used to store the name and address of each external symbol in the
set of control sections being loaded.
A hashed organization is typically used for this table.
Two other important variables are PROGADDR (program load address)
and CSADDR (control section address).
PROGADDR is the beginning address in memory where the linked
program is to be loaded. Its value is supplied to the loader by the OS.
CSADDR contains the starting address assigned to the control section
currently being scanned by the loader. This value is added to all relative
addresses within the control section to convert them to actual addresses.
The algorithm is presented in Fig 3.11.
During Pass 1 (Fig 3.11(a)), the loader is concerned only with Header and
Define record types in the control sections.

1) The beginning load address for the linked program (PROGADDR) is


obtained from the OS. This becomes the starting address (CSADDR) for the
first control section in the input sequence.
2) The control section name from Header record is entered into ESTAB, with
value given by CSADDR. All external symbols appearing in the Define record
for the control section are also entered into ESTAB. Their addresses are
obtained by adding the value specified in the Define record to CSADDR.
3) When the End record is read, the control section length CSLTH (which was
saved from the End record) is added to CSADDR. This calculation gives the
starting address for the next control section in sequence.
At the end of Pass 1, ESTAB contains all external symbols defined in the
set of control sections together with the address assigned to each.
Many loaders include as an option the ability to print a load map that
shows these symbols and their addresses. For the example of Figs 3.9 and 3.10,
such a load map might look like as shown on the top of page 143.
Pass 2 (Fig 3.11(b)) of our loader performs the actual loading,relocation,
and linking of the program.
1) As each Text record is read, the object code is moved to the specified address
(plus the current value of CSADDR).
2) When a Modification record is encountered, the symbol whose value is to be
used for modification is looked up in ESTAB.
3) This value is then added to or subtracted from the indicated location in
memory.
4) The last step performed by the loader is usually the transferring of control to
the loaded program to begin execution.
 The End record for each control section may contain the address of the first
instruction in that control section to be executed. Our loader takes this as the
transfer point to begin execution.
If more than one control section specifies a transfer address, the loader
arbitrarily uses the last one encountered.
If no control section contains a transfer address, the loader uses the beginning of
the linked program (i.e., PROGADDR) as the transfer point.
Normally, a transfer address would be placed in the End record for a main
program, but not for a subroutine.
 This algorithm can be made more efficient. Assign a reference number, which
is used (instead of the symbol name) in Modification records, to each external
symbol referred to in a control section.
Suppose we always assign the reference number 01 to the control section name.
Fig 3.12 shows the object programs from 3.9 with this change.
III.MACHINE-INDEPENDENT LOADER FEATURES
Loading and linking are often thought of as OS service functions.
Therefore, most loaders include fewer different features than are found in a
typical assembler. They include the use of an automatic library search process
for handling external reference and some common options that can be selected
at the time of loading and linking.
1. Automatic Library Search
 Many linking loaders can automatically incorporate routines from a
subprogram library into the program being loaded.
 Linking loaders that support automatic library search must keep track of
external symbols that are referred to, but not defined, in the primary input to the
loader.
 At the end of Pass 1, the symbols in ESTAB that remain undefined represent
unresolved external references.
The loader searches the library or libraries specified for routines that contain
the definitions of these symbols, and processes the subroutines found by this
search exactly as if they had been part of the primary input stream.
Note that the subroutines fetched from a library in this way may themselves
contain external references. It is therefore necessary to repeat the library search
process until all references are resolved.
If unresolved external references remain after the library search is
completed, these must be treated as errors.
2.Loader Options
Many loaders allow the user to specify options that modify the standard
processing .
Typical loader option 1: allows the selection of alternative sources of input.
Ex., INCLUDE program-name (library-name) might direct the loader to read
the designated object program from a library and treat it as if it were part of the
primary loader input.
Loader option 2: allows the user to delete external symbols or entire control
sections. Ex.,
DELETE csect-name
might instruct the loader to delete the named control section(s) from the set of
programs being loaded.

CHANGE name1, name2


might cause the external symbol name1 to be changed to name2 wherever it
appears in the object programs.
Loader option 3: involves the automatic inclusion of library routines to satisfy
external references. Ex.,
LIBRARY MYLIB
Such user-specified libraries are normally searched before the standard
system libraries. This allows the user to use special versions of the standard
routines.
NOCALL STDDEV, PLOT, CORREL
To instruct the loader that these external references are to remain
unresolved. This avoids the overhead of loading and linking the unneeded
routines, and saves the memory space that would otherwise be required.
IV. LOADER DESIGN OPTIONS
Linking loaders perform all linking and relocation at load time.
There are two alternatives: Linkage editors, which perform linking prior to
load time, and dynamic linking, in which the linking function is performed
at execution time.
Precondition: The source program is first assembled or compiled, producing an
object program. A linking loader performs all linking and relocation operations,
including automatic library search if specified, and loads the linked program
directly into memory for execution.
A linkage editor produces a linked version of the program (load
module orexecutable image), which is written to a file or library for later
execution.
 The essential difference between a linkage editor and a linking loader is
illustrated in Fig 3.13.
1. Linkage Editors
The linkage editor performs relocation of all control sections relative
to the start of the linked program. Thus, all items that need to be modified at
load time have values that are relative to the start of the linked program.
This means that the loading can be accomplished in one pass with no
external symbol table required.
If a program is to be executed many times without being reassembled, the
use of a linkage editor substantially reduces the overhead required.
Linkage editors can perform many useful functions besides simply
preparing an object program for execution. Ex., a typical sequence of linkage
editor commands used:
INCLUDE PLANNER (PROGLIB)
DELETE PROJECT {delete from existing PLANNER}
INCLUDE PROJECT (NEWLIB) {include new version}
REPLACE PLANNER (PROGLIB)
 Linkage editors can also be used to build packages of subroutines or other
control sections that are generally used together. This can be useful when
dealing with subroutine libraries that support high-level programming
languages.
 Linkage editors often include a variety of other options and commands like
those discussed for linking loaders. Compared to linking loaders,linkage editors
in general tend to offer more flexibility and control.

2. Dynamic Linking
Linkage editors perform linking operations before the program is loaded for
execution.
Linking loaders perform these same operations at load time.
Dynamic linking, dynamic loading, or load on call postpones the linking
function until execution time: a subroutine is loaded and linked to the rest of the
program when it is first called.
 Dynamic linking is often used to allow several executing programs to share one
copy of a subroutine or library, ex. run-time support routines for a high-level
language like C.
 With a program that allows its user to interactively call any of the subroutines
of a large mathematical and statistical library, all of the library subroutines
could potentially be needed, but only a few will actually be used in any one
execution.
Dynamic linking can avoid the necessity of loading the entire library for each
execution except those necessary subroutines.
Fig 3.14 illustrates a method in which routines that are to be dynamically
loaded must be called via an OS service request.
 Fig 3.14(a): Instead of executing a JSUB instruction referring to an external
symbol, the program makes a load-and-call service request to OS. The
parameter of this request is the symbolic name of the routine to be called.
 Fig 3.14(b): OS examines its internal tables to determine whether or not the
routine is already loaded. If necessary, the routine is loaded from the specified
user or system libraries.
 Fig 3.14(c): Control is then passed from OS to the routine being called
 Fig 3.14(d): When the called subroutine completes it processing, itreturns to its
caller (i.e., OS). OS then returns control to the program that issued the request.
 Fig 3.14(e): If a subroutine is still in memory, a second call to it may not
require another load operation. Control may simply be passed from the dynamic
loader to the called routine.
3. Bootstrap Loaders
On some computers, an absolute loader program is permanently resident
in a read-only memory (ROM). When some hardware signal occurs, the
machine begins to execute this ROM program. This is referred to as a bootstrap
loader.
Reads a fixed-length record form some device into memory at a fixed
location.
After the read operation is complete, control is automatically transferred
to the address in memory.
If the loading process requires more instructions than can be read in a
single record, this first record causes the reading of others, and these in turn can
cause the reading of more records.

UNIT IV
MACRO PROCESSORS
 A macro instruction (abbreviated to macro) is simply a notational convenience
for the programmer.
• A macro represents a commonly used group of statements in the source
programming language
• Expanding a macros
Replace each macro instruction with the corresponding group of source
language statements
Example:
 On SIC/XE requires a sequence of seven instructions to save the contents of all
registers
• Write one statement like SAVERGS
• A macro processor is not directly related to the architecture of the computer on
which it is to run
• Macro processors can also be used with high-level programming languages,
OS command languages, etc.
I.Basic Processor Functions:
Macro Definition and Expansion
Macro Processor Algorithms and Data structures

I.Macro Definition and Expansion:


Fig 4.1 shows an example of a SIC/XE program using macro instruction.
It defins and use two macro instructions,
 RDBUFF
 WRBUFF.
Two new assembler directives used in Macro Definition
• MACRO
• MEND
A pattern or prototype for the macro instruction
• Macro name (RDBUFF)and parameters(entries in the poperand
field)
Each parameter starts with the character &, which facilitates the
substitution of parameters during macro expansion.
The MEND assembler directive (line 95) makes the end of the macro
definition.
Macro invocation
 Often referred to as a macro call
 Need the name of the macro instruction begin invoked and the
arguments to be used in expanding the macro
Expanded program:
Fig 4.2 shows the expanded program.
No macro instruction definitions
Each macro invocation statement has been expanded into the statements
that form the body of the macro, with the arguments from the macro invocation
for the prototype substituted parameters in the macro prototype.
Macro invocations and subroutine calls aredifferent.
Note also that the macro instructions have been written so that the body
of the macro contains no label.
2.Macro Processor Algorithm and Data Structures
It is easy to design a two-pass macro processor in which all macro
definitions are processed during the first pass, and all macro invocation
statements are expanded during the second pass
However, a two-pass macro processor would not allow the body of one
macro instruction to contain definitions of other macros
Consider thye example in Figure 4.3
Because of the one-pass structure, the definition of a macro must appear
in the source program before any statements that invoke that macro

Three main data structures involved in an one-pass macro processor:


DEFTAB
NAMTAB
ARGTAB
DEFTAB
It contains macro definitions( ie) macro prototype and macro body.
It excludes comment lines.
For efficiency in substituting arguments references are converted to
positional notations.
NAMTAB
It serves as an index for DEFTAB.
It contais all macro names.
It has pointers to the starting and ending of the definitions in DEFTAB.
ARGTAB
It contains arguments based on the position in the argument list.
When a macro invocation statement is encountered, the argument are
entered.
These arguments are substituted for the corresponding parameters in the
macro body when the macro is expanded.
For the macro RDBUFF, the data structure contents used by the macro
processor is shown in fig 4.4.
NAMTAB contains two pointers making the start and end of the
definition.
RDBUFF-DEFTAB contains macro body with arguments replaced by
positional notations.
Example:
Since & INDEV is the first argument in the statement list, TD=X’ &
INDEV’ is replaced as TD=X’?1’.
ARGTAB contains values for those arguments namely F1,BUFFER and
LENGTH for & INDEV, &BUFADR, & RECLTH respectively.
Fig 4.5 represents macro processor algorithm.
The procedure DEFINE which is called when the beginning of a macro
definition is recognized, makes the appropriate entries in DEFTAB and
NAMTAB.
Comparison of Macro Processor Design
 One-pass algorithm
 Every macro must be defined before it is called
 One-pass processor can alternate between macro definition and macro expansion
 Nested macro definitions are allowed but nested calls are not allowed.
 Two-pass algorithm
 Pass1: Recognize macro definitions
 Pass2: Recognize macro calls
 Nested macro definitions are not allowed
II.MACHINE-INDEPENDENT MACRO-PROCESSOR FEATURES:
The design of macro processor doesn‟t depend on the architecture of the
machine.
The features are:
 Concatenation of Macro Parameters
 Generation of unique labels
 Conditional Macro Expansion
 Keyword Macro Parameters
 Concatenation of Macro Parameters
 Concatenation of Macro Parameters
Most macro processor allows parameters to be concatenated with other
character strings. Suppose that a program contains a series of variables named
by the symbols XA1, XA2, XA3,…, another series of variables named XB1,
XB2, XB3,…, etc. If similar processing is to be performed on each series of
labels, the programmer might put this as a macro instruction. The parameter to
such a macro instruction could specify the series of variables to be operated on
(A, B, etc.). The macro processor would use this parameter to construct the
symbols required in the macro expansion (XA1, Xb1, etc.). Suppose that the
parameter to such a macro instruction is named &ID. The body of the macro
definition might contain a statement like LDA X&ID1
1. Concatenation of Macro Parameters
Suppose that a program contains one series of variables named by the
symbols XA1, XA2, XA3, …, another series named by XB1, XB2, XB3, …,
etc.

If similar processing is to be performed on each series of variables, the


programmer might want to incorporate this processing into a macro instruction.
The parameter to such a macro instruction could specify the series of
variables to be operated on (A, B, etc.). The macro processor would use this
parameter to construct the symbols required in the macro expansion (XA1,
XB1, etc.).
 Most macro processors deal with this problem by providing a
specialconcatenation operator.
This operator is the character
For example, the statement LDA X&ID1
so that the end of the parameter &ID is clearly identified.
The macro processor deletes all occurrences of the concatenation operator
immediately after performing parameter substitution, so 􀃆will not appear in the
macro expansion.
 Fig 4.6(a) shows a macro definition that uses the concatenation operator as
previously described. Fig 4.6(b) and (c) shows macro invocation statements and
the corresponding macro expansions.
2. Generation of Unique Labels
Consider the definition of WRBUFF in Fig 4.1. If a label were placed on
the TD instruction on line 135, this label would be defined twice – once for each
invocation of WRBUFF. This duplicate definition would prevent correct
assembly of the resulting expanded program.
Many macro processors avoid these problems by allowing the creation
of special types of labels within macro instructions. Fig 4.7 illustrates one
technique for generating unique labels within a macro expansion.
Fig 4.7(a) shows a definition of the RDBUFF macro. Labels usedwithin the
macro body begin with the special character $. Fig 4.7(b) showsa macro
invocation statement and the resulting macro expansion. Each symbol beginning
with $ has been modified by replacing $ with $AA.
More generally, the character $ will be replaced by $xx, where xx is a two-
character alphanumeric counter of the number of macro instructions expanded.
For the first macro expansion in a program, xx will have the value AA. For
succeeding macro expansions, xx will be set to AB, AC, etc.
3. Conditional Macro Expansion
Most macro processors can also modify the sequence of statements
generated for a macro expansion, depending on the arguments supplied in the
macro invocation. This is called conditional macro expansion.
Fig 4.8 shows the use of one type of conditional macro expansion
statement.
Fig 4.8(a) shows a definition of a macro RDBUFF, the logic and functions
of which are similar to those previously discussed.
Two additional parameters are defined in RDBUFF: &EOR, which
specifies a hexadecimal character code that marks the end of a record,
and&MAXLTH, which specifies the maximum length record that can be read.
1st illustration: The statements on lines 44 through 48 of this definition
illustrate a simple macro-time conditional structure.
The IF statement evaluates a Boolean expression that is its operand (In this case,
it is [&MAXLTH EQ ‘ ‘].). If TRUE, the statements following the IF are
generated until an

ELSE is encountered (Line 45 is generated.).


If FALSE, these statements are skipped, and the statements following the ELSE
are generated (Line 47 is generated.).
The ENDIF statement terminates the conditional expression that was begun by
the IF statement.
2nd illustration: On line 26 through 28, line 27 is another macro processor
directive (SET). This SET statement assigns the value 1 to &EORCK.
The symbol &EORCK is a macro time variable, which can be used to store
working values during the macro expansion. Note any symbol that begins with
the character & and that is not a macro instruction parameter is assumed to be
a macro-time variable. All such variables are initialized to a value of 0.
 Other illustrations: On line 38 through 43 and line 63 through 73.
Fig 4.8 (b-d) shows the expansion of 3 different macro invocation statements
that illustrate the operation of the IF statements in Fig 4.8(a).
Note that the macro processor must maintain a symbol table that contains the
values of all macro-time variables used.
Entries in this table are made or modified when SET statements are
processed. The table is used to look up the current value of a macro-time
variable whenever it is required.
Syntax 1 – IF (Boolean Exp.) (statements) ELSE (statements) ENDIF: If
IF statement is encountered during the expansion of a macro, the specified
Boolean expression is evaluated.
If TRUE, the macro processor continues to process lines from DEFTAB
until it encounters the next ELSE or ENDIF statement. If an ELSE is found, the
macro processor then skips lines in DEFTAB until the next ENDIF. Upon
reaching the ENDIF, it resumes expanding the macro in the usual way.
If FALSE, the macro processor skips ahead in DEFTAB until it finds the
next ELSE or ENDIF statement. The macro processor then resumes normal
macro expansion.
The implementation outlined above does not allow for nested IF structures.
It is extremely important to understand that the testing of Boolean
expressions in IF statements occurs at the time macros are expanded.
By the time the program is assembled, all such decisions (must) have been
made.
The conditional macro expansion directives (must) have been removed. The
same applies to the assignment of values to macro-time variables, and to the
other conditional macro expansion directives.
 Fig 4.9 shows the use of macro-time loop statements. The definition in
Fig 4.9(a) uses a macro-time loop statement WHILE.

The WHILE statement specifies that the following lines, until the next
ENDW statement, are to be generated repeatedly as long as a particular
condition is true. Note that all the generation is done at the macro expansion
time. The conditions to be tested involve macro-time variables and arguments,
not run-time data values.

The use of the WHILE-ENDW structure is illustrated on lines 63 through


73 of Fig 4.9(a). The macro-time variables &EORCT has previously been set
(line 27) to the value %NITEMS(&EOR). %NITEMS is a macro processor
function that returns as its value the number of members in an argument list. For
example, if the argument corresponding to &EOR is (00, 03, 04), then
%NITEMS(&EOR) has the value 3.
The macro-time variable &CTR is used to count the number of times the
lines following the WHILE statement have been generated. The value of &CTR
is initialized to 1 (line 63), and incremented by 1 each time through the loop
(line 71).
Fig 4.9(b) shows the expansion of a macro invocation statement using the
definition in Fig 4.9(a).
Syntax 2 – WHILE (Boolean Exp.) (statements) ENDW: When a WHILE
statement is encountered during macro expansion, the specified Boolean
expression is evaluated.
If the value of this expression is FALSE, the macro processor skips ahead
in DEFTAB until it finds the next ENDW statement, and then resumes normal
macro expansion.
If TRUE, the macro processor continues to process lines from DEFTAB in
the usual way until the next ENDW statement. When ENDW is encountered,
the macro processor returns to the preceding WHILE, re-evaluates the Boolean
expression, and takes action based on the new value of this expression as
previously described.
Note that no nested WHILE structures are allowed.

4. Keyword Macro Parameters


All the macro instruction definitions we have seen thus far usedpositional
parameters. That is, parameters and arguments were associated with each other
according to their positions in the macro prototype and the macro invocation
statement.
 With positional parameters, the programmer must be careful to specify the
arguments in the proper order. If an argument is to be omitted, the macro
invocation statement must contain a null argument (two consecutive commas) to
maintain the correct argument positions. For example, a certain macro
instruction GENER has 10 possible parameters, but in a particular invocation of
the macro, only 3rd and 9th parameters are to be specified. Then, the macro
invocation might look like GENER , , DIRECT, , , , , , 3.
Using a different form of parameter specification, called keyword
parameters, each argument value is written with a keyword that names the
corresponding parameter.
Arguments may appear in any order.
For example, if 3rd parameter in the previous example is named &TYPE and
9th parameter is named &CHANNEL, the macro invocation statement would be
GENER TYPE=DIRECT, CHANNEL=3.
Fig 4.10(a) shows a version of the RDBUFF macro definition using keyword
parameters.

In the macro prototype, each parameter name is followed by an equal


sign (=), which identifies a keyword parameter.
After = sign, a default value is specified for some of the parameters. The
parameter is assumed to have this default value if its name does not appear in
the macro invocation statement.
Default values can simplify the macro definition in many cases.

III. MACRO PROCESSOR DESIGN OPTIONS


1. Recursive Macro Expansion
􀃆Fig 4.11 shows an example of macro invocations within macro definitions.

Fig 4.11(a) shows the definition of RDBUFF. In this case, a macro


invocation (RDCHAR) is invocated in the body of RDBUFF and a related
macro instruction already exists.
The definition of RDCHAR appears in Fig 4.11(b).
Unfortunately, the macro processor design we have discussed previously
cannot handle such invocations of macros within macros.
Fig 4.11(c) shows a macro invocation statement of RDBUFF. According
to the algorithm in Fig 4.5, the procedure EXPAND would be called when the
macro was recognized. The arguments from the macro invocation would be
entered into ARGTAB .
The processing would proceed normally until line 50, which contains a
statement invoking RDCHAR. At that point, PROCESSLINE would call
EXPAND again. This time, ARGTAB would look like as shown in page 201.
The expansion of RDCHAR would also proceed normally. At the end of
this expansion, however, a problem would appear. When the end of the
definition of RDCHAR was recognized, EXPANDING would be set to FALSE.
Thus, the macro processor would “forget” that it had been in themiddle of
expanding a macro when it encountered the RDCHAR statement.
In addition, the arguments from the original macro invocation (RDBUFF)
would be lost because the values in ARGTAB were overwritten with the
arguments from the invocation of RDCHAR.
This cause of these difficulties is the recursive call of the procedure EXPAND.
When the RDBUFF macro invocation is encountered, EXPAND is called.
Later, it calls PROCESSLINE for line 50, which results in another call to
EXPAND before a return is made from the original call.
A similar problem would occur with PROCESSLINE since this procedure
too would be called recursively.
 These problems are not difficult to solve if the macro processor is being written
in a programming language that allows recursive calls.
 If a programming language that supports recursion is not available, the
programmer must take care of handling such items as return
addressesand values of local variables (that is, handling by looping structure
and data values being saved on a stack).
The arguments from the macro invocation would be entered into ARGTAB as
follows:
parameter Value
1 BUFFER
2 LENGTH
3 F1
4 (unused)

The Boolean variable EXPANDING would be set to TRUE, and


expansion of the macro invocation statement would begin. The processing
would proceed normally until statement invoking RDCHAR is processed. This
time, ARGTAB would look like
Parameter Value
1 F1
2 (Unused)
-- --

2. General-Purpose Macro Processors


The most common use of macro processors is as an aid to assembler
language programming. Macro processors have also been developed for some
high-level programming languages. These special-purpose macro
processors are similar in general function and approach. However, the details
differ from language to language.
 The general-purpose macro processors are not dependent on any particular
programming language, but can be used with a variety of different languages.
 There are relatively few general-purpose macro processors. The major reason
is the large number of details that must be dealt within a real programming
language. That is to say, a general-purpose facility must provide some way for a
user to define the specific set of rules to be followed. Therefore, there are some
difficulties in some way.
 Case 1: Comments are usually ignored by a macro processor (at least in
scanning for parameters). However, each programming language has its own
methods for identifying comments.
 Case 2: Another difference between programming languages is related to their
facilities for grouping together terms, expressions, or statements. A general-
purpose macro processor may need to take these groupings into account in
scanning the source statements.
 Case 3: Languages differ substantially in their restrictions on the length
of identifiers and the rules for the formation of constants (i.e. thetokens of the
programming language – for example, identifiers, constants, operators, and
keywords).
 Case 4: Another potential problem with general-purpose macro processors
involves the syntax used for macro definitions and macro invocation statements.
With most special-purpose macro processors, macro invocations are very
similar in form to statements in the source programming language.
3. Macro Processing within Language Translators
The macro processors might be called preprocessors.
Consider an alternative: combining the macro processing functions with the
language translator itself.
 The simplest method of achieving this sort of combination is a line-by-
line macro processor. Using this approach, the macro processor reads the source
program statements and performs all of its functions as previously described.
The output lines are then passed to the language translator as they are generated
(one at a time), instead of being written to an expanded source file.
Thus, the macro processor operates as a sort of input routine for the assembler
or compiler.
 Although a line-by-line macro processor may use some of the same utility
routines as the language translator, the functions of macro processing and
program translation are still relatively independent.
 There exists even closer cooperation between the macro processor and the
assembler or compiler. Such a scheme can be thought of as a language translator
with an integrated macro processor.
An integrated macro processor can potentially make use of any information
about the source program that is extracted by the language translator.
For example, at a relatively simple level of cooperation, the macro processor
may use the results of such translator operations as scanning for symbols,
constants, etc. The macro processor can simply use the resultswithout being
involved in such details as multiple-character operators, continuation lines, and
the rules for token formation.
There are disadvantages to integrated and line-by-line macro processors.
 They must be specially designed and written to work with a particular
implementation of an assembler or compiler.
 The costs of macro processor development must be added to the cost of the
language translator, resulting in a more expensive piece of software.
 The size may be a problem if the translator is to run on a computer with limited
memory.
4.4 Implementation Examples
2. ANSI C Macro Language
In the ANSI C language, definitions and invocations of macros are
handled by a preprocessor. This preprocessor is generally not integrated with
the rest of compiler. Its operation is similar to the macro processor we discussed
before.
Two simple (and commonly used) examples of ANSI C macro definitions:
#define NULL 0
#define EOF (-1)
After these definitions, every occurrence of NULL will be replaced by 0, and
every occurrence of EOF will be replaced by (-1).
It is also possible to use macros like this to make limited changes in the
syntax of the language. For example, after defining the macro
#define EQ ==.
A programmer could write while (I EQ 0)…
The macro processor would convert this into while (I == 0) …
ANSI C macros can also be defined with parameters. Consider, for
example, the macro definition
#define ABSDIFF(X,Y) ((X) > (Y)) ? (X) – (Y) : (Y) – (X))
For example, ABSDIFF (I+1, J-5) would be converted by the macro processor
into
((I+1) > (J-5) ? (I+1) – (J-5) : (J-5) – (I+1)).
The macro version can also be used with different types of data. For example,
we could invoke the macro as ABSDIFF(I, 3.14159) or ABSDIFF(‘D’, ‘A’).
It is necessary to be very careful in writing macro definitions with
parameters. The macro processor simply makes string substitutions, without
considering the syntax of the C language.
For example, if we had written the definition of ABSDIFF as
#define ABSDIFF(X, Y) X>Y ? X-Y : Y-X.
The macro invocation ABSDIFF(3+1, 10-8) would be expanded into

3+1 > 10-8 ? 3+1-10-8 : 10-8–3+1.


The ANSI C preprocessor also provides conditional compilation statements.
For example, in the sequence
#ifndef BUFFER_SIZE
#define BUFFER_SIZE 1024
#endif
the #define will be processed only if BUFFER_SIZE has not already been
defined.
Conditionals are also often used to control the inclusion of debugging
statements in a program. .

UNIT V
SYSTEM SOFTWARE TOOLS
I.TEXT EDITORS
1.Overview of the editing process
An interactive editor is a computer program that allows a user to create
and revise a target document.The term document includes objects such as
computer programs, texts, equations, tables, diagrams, line art and photographs-
anything that one might find on a printed page. Text editor is one in which the
primary elements being edited are character strings of the target text.
The document editing process is an interactive user-computer dialogue
designed to accomplish four tasks:
1) Select the part of the target document to be viewed and manipulated
2) Determine how to format this view on-line and how to display it.
3) Specify and execute operations that modify the target document.
4) Update the view appropriately.
Traveling – Selection of the part of the document to be viewed and edited. It
involves first traveling through the document to locate the area of interest such
as “next screenful”, ”bottom”,and “find pattern”. Traveling specifies where the
area of interest is;
Filtering - The selection of what is to be viewed and manipulated is controlled
by filtering. Filtering extracts the relevant subset of the target document at the
point of interest such as next screenful of text or next statement.
Formatting: Formatting determines how the result of filtering will be seen as a
visible representation (the view) on a display screen or other device.
Editing: In the actual editing phase, the target document is created or altered
with a set of operations such as insert, delete, replace, move or copy.
Manuscript oriented editors operate on elements such as single
characters, words, lines, sentences and paragraphs;
Program-oriented editors operates on elements such as identifiers,
keywords and statements.
2. User-interface
The user of an interactive editor is presented with a conceptual model of
the editing system. The model is an abstract framework on which the editor and
the world on which the operations are based.
The line editors simulated the world of the keypunch they allowed
operations on numbered sequence of 80-character card image lines.
The Screen-editors define a world in which a document is represented as
a quarter-plane of text lines, unbounded both down and to the right. The user
sees, through a cutout, only a rectangular subset of this plane on a multi line
display terminal. The cutout can be moved left or right, and up or down, to
display other portions of the document.
The user interface is also concerned with the input devices, the output
devices, and the interaction language of the system.
Input devices:
The input devices are used to enter elements of text being edited, to enter
commands, and to designate editable elements.
Input devices are categorized as:
1) Text devices
2) Button devices
3) Locator devices
1) Text or string devices are typically typewriter like keyboards on which user
presses and release keys, sending unique code for each key. Virtually all
computer key boards are of the QWERTY type.
2) Button or Choice devices generate an interrupt or set a system flag, usually
causing an invocation of an associated application program. Also special
function keys are also available on the key board. Alternatively, buttons can be
simulated in software by displaying text strings or symbols on the screen. The
user chooses a string or symbol instead of pressing a button.
3) Locator devices: They are two-dimensional analog-to-digital converters that
position a cursor symbol on the screen by observing the user‟s movement of the
device. The most common such devices are the mouseand the tablet.
The Data Tablet is a flat, rectangular, electromagnetically sensitive
panel. Either the ballpoint pen like stylus or a puck, a small device similar to a
mouse is moved over the surface. The tablet returns to a system program the co-
ordinates of the position on the data tablet at which the stylus or puck is
currently located. The program can then map these data-tablet coordinates to
screen coordinates and move the cursor to the corresponding screen position.
Text devices with arrow (Cursor) keys can be used to simulate locator
devices. Each of these keys shows an arrow that point up, down, left or right.
Pressing an arrow key typically generates an appropriate character sequence; the
program interprets this sequence and moves the cursor in the direction of the
arrow on the key pressed.
Voice-input devices: which translate spoken words to their textual equivalents,
may prove to be the text input devices of the future. Voice recognizers are
currently available for command input on some systems.
Output devices The output devices let the user view the elements being edited
and the result of the editing operations.
 The first output devices were teletypewriters and other character-printing
terminals that generated output on paper.
 Next “glass teletypes” based on Cathode Ray Tube (CRT) technology which
uses CRT screen essentially to simulate the hard-copy teletypewriter.
 Today‟s advanced CRT terminals use hardware assistance for such features
as moving the cursor, inserting and deleting characters and lines, and scrolling
lines and pages.
 The modern professional workstations are based on personal computers with
high resolution displays; support multiple proportionally spaced character fonts
to produce realistic facsimiles of hard copy documents.

Interaction language:
The interaction language of the text editor is generally one of several
common types.
The typing oriented or text command-oriented method
It is the oldest of the major editing interfaces. The user communicates
with the editor by typing text strings both for command names and for operands.
These strings are sent to the editor and are usually echoed to the output device.
Typed specification often requires the user to remember the exact form
of all commands, or at least their abbreviations. If the command language is
complex, the user must continually refer to a manual or an on-line Help
function. The typing required can be time consuming for in-experienced users.
Function key interfaces:
Each command is associated with marked key on the key board. This
eliminates much typing. E.g.: Insert key, Shift key, Control key
Disadvantages: Have too many unique keys
Multiple key stroke commands
Menu oriented interface
A menu is a multiple choice set of text strings or icons which are
graphical symbols that represent objects or operations. The user can perform
actions by selecting items for the menus.
The editor prompts the user with a menu. One problem with menu oriented
system can arise when there are many possible actions and several choices are
required to complete an action. The display area of the menu is rather limited.

3.Editor Structure
The command Language Processor
It accepts input from the user‟s input devices, and analyzes the tokens
and syntactic structure of the commands. It functions much like the lexical and
syntactic phases of a compiler. The command language processor may invoke
the semantic routines directly. In a text editor, these semantic routines perform
functions such as editing and viewing.
The semantic routines involve traveling, editing, viewing and display
functions. Editing operations are always specified by the user and display
operations are specified implicitly by the other three categories of operations.
Traveling and viewing operations may be invoked either explicitly by the user
or implicitly by the editing operations.

Editing Component
In editing a document, the start of the area to be edited is determined by
the current editing pointer maintained by the editing component, which is the
collection of modules dealing with editing tasks. The current editing pointer can
be set or reset explicitly by the user using travelling commands, such as next
paragraph and next screen, or implicitly as a side effect of the previous editing
operation such as delete paragraph.

Traveling Component
The traveling component of the editor actually performs the setting of the
current editing and viewing pointers, and thus determines the point at which the
viewing and /or editing filtering begins.
Viewing Component
The start of the area to be viewed is determined by the current viewing
pointer. This pointer is maintained by the viewing component of the editor,
which is a collection of modules responsible for determining the next view. The
current viewing pointer can be set or reset explicitly by the user or implicitly by
system as a result of previous editing operation.
The viewing component formulates an ideal view, often expressed in a
device independent intermediate representation. This view may be a very simple
one consisting of a window‟s worth of text arranged so that lines are not broken
in the middle of the words.
Display Component
It takes the idealized view from the viewing component and maps it to a
physical output device in the most efficient manner. The display component
produces a display by mapping the buffer to a rectangular subset of the screen,
usually a window

Editing Filter
Filtering consists of the selection of contiguous characters beginning at
the current point. The editing filter filters the document to generate a new
editing buffer based on the current editing pointer as well as on the editing filter
parameters
Editing Buffer
It contains the subset of the document filtered by the editing filter based
on the editing pointer and editing filter parameters
Viewing Filter
When the display needs to be updated, the viewing component invokes
the viewing filter. This component filters the document to generate a new
viewing buffer based on the current viewing pointer as well as on the viewing
filter parameters.
Viewing Buffer
It contains the subset of the document filtered by the viewing filter
based on the viewing pointer and viewing filter parameters.
E.g. The user of a certain editor might travel to line 75,and after viewing it,
decide to change all occurrences of “ugly duckling” to “swan” in lines 1
through 50 of the file by using a change command such as
[1,50] c/ugly duckling/swan/
As a part of the editing command there is implicit travel to the first line
of the file. Lines 1 through 50 are then filtered from the document to become
the editing buffer. Successive substitutions take place in this editing buffer
without corresponding updates of the view
In Line editors, the viewing buffer may contain the current line;
inscreen editors, this buffer may contain rectangular cut out of the quarter-plane
of text. This viewing buffer is then passed to the display component of the
editor, which produces a display by mapping the buffer to a rectangular subset
of the screen, usually called a window.
The editing and viewing buffers, while independent, can be related in
many ways. In a simplest case, they are identical: the user edits the material
directly on the screen. On the other hand, the editing and viewing buffers may
be completely disjoint.

Windows typically cover the entire screen or rectangular portion of it. Mapping
viewing buffers to windows that cover only part of the screen is especially
useful for editors on modern graphics based workstations. Such systems can
support multiple windows, simultaneously showing different portions of the
same file or portions of different file. This approach allows the user to perform
inter-file editing operations much more effectively than with a system only a
single window.
The mapping of the viewing buffer to a window is accomplished by two
components of the system.
(i) First, the viewing component formulates an ideal view often
expressed in a device independent intermediate representation. This view
may be a very simple one consisting of a windows worth of text arranged so
that lines are not broken in the middle of words. At the other extreme, the
idealized view may be a facsimile of a page of fully formatted and typeset text
with equations, tables and figures.
(ii) Second the display component takes these idealized views from
the viewing component and maps it to a physical output device the most
efficient manner possible.
The components of the editor deal with a user document on two levels:
(i) In main memory and
(ii) (ii) In the disk file system.
Loading an entire document into main memory may be infeasible.
However if only part of a document is loaded and if many user specified
operations require a disk read by the editor to locate the affected portions,
editing might be unacceptably slow. In some systems this problem is solved by
the mapping the entire file into virtual memory and letting the operating
system perform efficient demand paging.
An alternative is to provide is the editor paging routines which read
one or more logical portions of a document into memory as needed. Such
portions are often termed pages, although there is usually no relationship
between these pages and the hard copy document pages or virtual memory
pages. These pages remain resident in main memory until a user operation
requires that another portion of the document be loaded.
Editors function in three basic types of computing environment:
(i) Time-sharing environment
(ii) Stand-alone environment and
(iii) Distributed environment.
Each type of environment imposes some constraint on the design of an editor
The Time –Sharing Environment
The time sharing editor must function swiftly within the context of the
load on the computer‟s processor, central memory and I/O devices.
The Stand alone Environment
The editor on a stand-alone system must have access to the functions
that the time sharing editors obtain from its host operating system. This may be
provided in pare by a small local operating system or they may be built into the
editor itself if the stand alone system is dedicated to editing.
Distributed Environment
The editor operating in a distributed resource sharing local network must,
like a standalone editor, run independently on each user‟s machine and must,
like a time sharing editor, content for shared resources such as files.
II.INTERACTIVE DEBUGGING SYSTEMS
An interactive debugging system provides programmers with facilities
that aid in testing and debugging of programs interactively.
1.Debugging functions and capabilities
Execution sequencing:
It is the observation and control of the flow of program execution. For
example, the program may be halted after a fixed number of instructions are
executed.
Breakpoints – The programmer may define break points which cause execution
to be suspended, when a specified point in the program is reached. After
execution is suspended, the debugging command is used to analyze the progress
of the program and to diagnose errors detected. Execution of the program can
then be removed.
Conditional Expressions – Programmers can define some conditional
expressions, evaluated during the debugging session, program execution is
suspended, when conditions are met, analysis is made, later execution is
resumed
Gaits- Given a good graphical representation of program progress may even be
useful in running the program in various speeds called gaits.
A Debugging system should also provide functions such as tracing and
traceback.
Tracing can be used to track the flow of execution logic and data
modifications. The control flow can be traced at different levels of detail –
procedure, branch, individual instruction, and so on…
Traceback can show the path by which the current statement in the
program was reached. It can also show which statements have modified a given
variable or parameter. The statements are displayed rather than as hexadecimal
displacements.
Program-display Capabilities
It is also important for a debugging system to have good program display
capabilities. It must be possible to display the program being debugged,
complete with statement numbers.
Multilingual Capability
A debugging system should consider the language in which the program
being debugged is written. Most user environments and many applications
systems involve the use of different programming languages. A single
debugging tool should be available to multilingual situations.
Context Effects
The context being used has many different effects on the debugging interaction.
For example. The statements are different depending on the language
COBOL - MOVE 6.5 TO X
FORTRAN - X = 6.5
Likewise conditional statements should use the notation of the source language
COBOL - IF A NOT EQUAL TO B
FORTRAN - IF (A .NE. B)
Similar differences exist with respect to the form of statement labels,
keywords and so on.
Display of source code
The language translator may provide the source code or source listing
tagged in some standard way so that the debugger has a uniform method of
navigating about it.
Optimization:
It is also important that a debugging system be able to deal with
optimized code. Many optimizations involve the rearrangement of segments of
code in the program.
For eg.
- invariant expressions can be removed from loop
- separate loops can be combined into a single loop
- redundant expression may be eliminated
- elimination of unnecessary branch instructions
The debugging of optimized code requires a substantial amount of
cooperation from the optimizing compiler.
2.Relationship with Other Parts of the System
An interactive debugger must be related to other parts of the system in
many different ways.
Availability
Interactive debugger must appear to be a part of the run-time environment
and an integral part of the system. When an error is discovered, immediate
debugging must be possible because it may be difficult or impossible to
reproduce the program failure in some other environment or at some other
times.

Consistency with security and integrity components


User need to be able to debug in a production environment. When an
application fails during a production run, work dependent on that application
stops. Since the production environment is often quite different from the test
environment, many program failures cannot be repeated outside the production
environment.
Debugger must also exist in a way that is consistent with the security and
integrity components of the system. Use of debugger must be subjected to the
normal authorization mechanism and must leave the usual audit trails. Someone
(unauthorized user) must not access any data or code. It must not be possible to
use the debuggers to interface with any aspect of system integrity.
Coordination with existing and future systems
The debugger must co-ordinate its activities with those of existing and
future language compilers and interpreters.
It is assumed that debugging facilities in existing language will continue
to exist and be maintained. The requirement of cross-language debugger
assumes that such a facility would be installed as an alternative to the individual
language debuggers.
3.User- interface criteria
The interactive debugging system should be user friendly. The facilities of
debugging system should be organized into few basic categories of functions
which should closely reflect common user tasks.
Full – screen displays and windowing systems
The user interaction should make use of full-screen display and
windowing systems. The advantage of such interface is that the information can
be should displayed and changed easily and quickly.
Menus:
 With menus and full screen editors, the user has far less information to enter
and remember
 It should be possible to go directly to the menus without having to retrace an
entire hierarchy.
 When a full-screen terminal device is not available, user should have an
equivalent action in a linear debugging language by providing commands.
Command language:
 The command language should have a clear, logical, simple syntax. Parameters
names should be consistent across set of commands
 Parameters should automatically be checked for errors for type and range
values.
 Defaults should be provided for parameters.
 Command language should minimize punctuations such as parenthesis, slashes,
and special characters
On Line HELP facility
 Good interactive system should have an on-line HELP facility that should
provide help for all options of menu
 Help should be available from any state of the debugging system.
***************************
Posted by thenkumarisamayal at 08:09 No comments:
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest

Home

Subscribe to: Posts (Atom)


Blog Archive

 ▼ 2013 (1)
o ▼ April (1)
 cs1203-SYSTEM SOFTWARE NOTES
About Me

thenkumarisamayal
View my complete profile
Simple theme. Powered by Blogger.

You might also like