0% found this document useful (0 votes)
99 views

Chapter 3b - MP DSB

The document discusses assembly language programming for the 8086/8088 microprocessor. It explains that assembly language provides a simpler way to program than machine language, but assembled programs still need to be converted to machine language by an assembler program in order to run. Common assemblers include MASM and TASM. The document then outlines some key features of assembly language programming, including comments, reserved words, identifiers, statements, and directives used to control assembly of the program.

Uploaded by

sheham ihjam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views

Chapter 3b - MP DSB

The document discusses assembly language programming for the 8086/8088 microprocessor. It explains that assembly language provides a simpler way to program than machine language, but assembled programs still need to be converted to machine language by an assembler program in order to run. Common assemblers include MASM and TASM. The document then outlines some key features of assembly language programming, including comments, reserved words, identifiers, statements, and directives used to control assembly of the program.

Uploaded by

sheham ihjam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Chapter 3(B)

Assembly Language Programming


Introduction
The previous section was about the instruction set for 8086/8088 microprocessor. The size of the
instructions varies from one byte to four byte. For the simplicity of the instructions set the object
code is not shown. Each instruction has equivalent machine code (object code). It is really
difficult to program 8086/8088 using machine language. So we write programs in assembly
language which is easier to program than programming in machine language. While writing
programs in assembly language we use mnemonics of the instructions but not the machine code.
The programs written in assembly language are not executed by the computer, so they have to
be converted to machine language for the computer to understand. The machine language
programs are executed by the computers. To convert a program from assembly language to
machine language a special type of program called assemblers are required. The assemblers and
the format of the assembly language will be discussed in the following sections.

Assemblers
There are two main classes of programming languages: high-level and low level. Programmers
writing in a high-level language, such as C or Basic, use powerful statements, each of which
generates many machine language instructions. Programmers writing in a low-level language,
on the other hand, code symbolic instructions, each of which generates one machine instruction.
Assembly language is a low level language. Here we use code for symbolic instructions; each of
these generates the respective machine language instruction. However, programming in higher
level languages, such as C, C++ is easier compared to assembly language but assembly language
have some advantages which are:
(a) Provides greater control over direct handling hardware
(b) Generates smaller more optimized code
(c) Faster execution time for programs

The low level languages commonly called assembly language uses an assembler to translate the
symbolic instruction into an object file (.OBJ). Then a linker program is used to convert this
object file into executable machine language files. (.EXE or .COM files).

Commonly available assemblers are MASM, (Microsoft Assembler) and TASM (Turbo
Assembler). There are many version of each of these assemblers. The assemblers contain their
own linker program.

Assembly Language Features


Now, we shall discuss the syntax of the commonly available assemblers for 8086.

Comments
Comments are introduced in a program to increase it readability and to help understand
programs. It also serves as an aid when one's program is being read or used by another. A
comment begins with a semicolon (;). Whenever a semi colon is encountered in a program the
assembler assumes all characters to the right of it are comments.

1
Comments may be introduced as a single line or may follow instructions. The following are
examples
1. ; this is a comment
2. ADD AX,CX ;Add instruction comment
A comment appears only in the source program listing and generates no machine code. Any
number of comments may be introduced in a program without affecting the assembled program
size or execution.

Reserved Words
Those students who have read C or C++ will clearly understand what RESERVED words are.
Reserved words are specific characters or strings of characters which are reserved by the
assembler for its specific purposes. These reserved words can only be used under specific condi-
tions else will generate errors in the program execution.

The reserved words are listed by category as follows:


1. Instructions: These are operations whose machine code is executed by the
microprocessor (for example here it is the 8086/8088).
2. Directives: Directives are special purpose instructions which direct or command the
assembler to do a certain operation. Directives include the END, SEGMENT Commands.
3. Operators: Operators such as FAR, SIZE provide settings to the assembler for the
program.
4. Predefined Symbols: Predefined symbols are those which are reserved for specific
purposes (such as return of data to program) by the assembler. These include @Data,
@Model etc.
Using a reserved word for the wrong purpose causes the assembler to generate an error message
or in short, termination of program. This, thus results in the need to remove errors. In short to
save time a programmer must take care not to use reserved words wrongfully.

Identifiers
An identifier is a name used to reference items in your program. There are two types of
identifiers used in assembly language. The first is the name identifier which references the
address of a data item, the second the label identifier which references the address of an in-
struction. An identifier can use the following characters.
 Alphanumeric letters : A through Z and a through z.
 Digits : 0 to 9 [Not as the first character of identifiers]
 Special Digits : (?) question mark,
(_) under score
($) dollar
(@) at
(.) period [not as first character]
The first character of the identifiers can be any of the above characters except the digits and
period (.). The assembler treats uppercase and lowercase as same. Examples of valid identifiers
are HELLO, $A24, PAGE125, etc.

Statements
An assembly language program consists of a set of statements. The statements themselves can be
of two types:
1. Instructions: they are converted to object code by the assembler E.g. MOV, ADD
2
2. Directives: these statements direct an assembler to perform specific action such as define a
data item etc.
The general format of a statement is given below.
[Identifier] operation [operand (s)] [; comments]
The entries, which are enclosed in square brackets are optional and are not required sometimes.
Following are example of statements:
Statement Identifier Operation Operand Comment
Directive COUNT DB 2 ;declare variable
Instruction L20: ADD AX,BX ;Perform ADD
Now let us discuss the elements of a statement in detail.
(i) Identifier: As discussed earlier applies to name of a defined item (or directive) or a
label identifier which references an instruction.
(ii) Operation: The operation must consist of assembly language code instructions. For
example for data items DB, DW or DQ defines a field or area for data. For an
instruction the 8086/8088 instruction set is used. (e.g. ADD, SUB, MOV etc.)
(iii) Operand: This field provides information for the operation field. For data items the
operand defines initial values. For instructions the operand field indicates where the
operation is to take place.
NAME OPERATION OPERAND COMMENT
A_NO DB 5H ;Define byte with 5H value
INC AX ;Increment AX
ADD AX, DX ;Add BX to AX
JMP LABEL ;Jump to memory
;referenced by LABEL

Directives
Assembly languages support a number of statements that allow the programmer to control the
assembly program and its listing. Such statements are called directives. Directives work only
during assembly of the program and do not generate any machine code. We shall discuss
common directives and you shall understand directives better through these examples.

LIST Directives
The list directives help to control the listing of an assembled program. These directives have no
function except the above the have no subsequent effect on program execution. The PAGE and
TITLE are the list directives.

1) PAGE directive
The PAGE directive is included at the start of an assembly program. It function is to tell the
assembler the maximum number of lines to be listed on a page and the maximum number of
characters per line. Its general format is
PAGE [Lines], [Characters]
Now suppose we consider the example
PAGE 60, 132
This PAGE directive will tell the assembler that it is to list 60 lines per page and 132 characters
per line. Omission of the PAGE directive at the start of an assembly program causes the
assembler to use the default value as PAGE 50, 80. The use of PAGE directive is only to tell the
assembler the PAGE size during printing of the program file.

3
2) TITLE Directive
The TITLE directive is used cause the title of the program to be printed on line 2 of each page of
the program listing. The TITLE directive is used once on the start of program. Its syntax is
shown below
TITLE text [comment]
The text is the text you want printed.

SEGMENT Directive
An assembly program executable such as .EXE file consists of one or more segments. These are
Stack, Data, Code and Extra segments. The directive for defining a segment is given below.
NAME OPERATION OPERAND COMMENT
segname SEGMENT [align][combine][‘class’]
...
segname ENDS

The SEGMENT directive defines the start of a segment to the assembler with name segment
name. The segment name should be unique and follows the rules of the assembler. The ENDS
statement follows the name of segment which should be same as the start segment name. The
maximum size of a segment permitted is 64K. The operands in the segment directives are
described as follows:
align type field: The alignment type field indicative where the segment must
begin in memory space. For example, the PARA statement will align the segment at a memory
address divisible by 10H (16 decimal). Thus if we consider the example
segname SEGMENT PARA
the segment whose name is segment name will start at a memory address exactly divisible by
10H.
combine type field: The combine type field indicates to the assembler whether to
combine the segment with other segments present in the program when they are linked after
assembly (i.e. in memory). Combine types are STACK, COMMON, PUBLIC etc. The
commonly encountered STACK SEGMENT in assembly is given below.
STACKSEGEG SEGMENT PARA STACK
The other combine type fields are PUBLIC, COMMON, NONE, AT etc.
class type field: The class field are enclosed by apostrophes denotes the
group type of the segment. It is used to group related segments when linking. The class
recommended by Microsoft are listed below .
CLASS NAME SEGMENT LINKED
'code' CODE SEGMENT
'data' DATA SEGMENT
'stack' STACK SEGMENT

PROC Directive
The code segment contains instructions (or simply executable code) for a program. However, for
modular programs we may define a code segment with many modules called procedures. These
procedures are defined by the PROC directive. The directive for defining the procedure is given
below
NAME OPERATION OPERAND COMMENT
Procname PROC FAR/NEAR ;begin procedure
...
Procname ENDP ;end procedure

A common example for a code segment with a single procedure is listed as follows.

4
NAME OPERATION OPERAND COMMENT
code_seg SEGMENT PARA ; code segment starts
proc_one PROC FAR ; procedure starts
proc_one ENDP ; procedure ends
code_seg ENDS ; code segment ends
In the above example we have defined a procedure with the PROC directive. Here we have used
the name of the procedure as proc_one, though the name can be anything but according to rules
of the assembler. The operand FAR is related to program control and shall be discussed later.
There are many operands type for PROC such as NEAR, SHORT, FAR etc. The ENDP directive
indicates the end of a procedure and should contain same name as in the PROC statement. The
code segment can contain any number of procedures.

ASSUME Directive
An 8086/8088 uses the SS register to address stack memory, the DS register to address data
locations and the CS register to address code segment. However the assembler must be told
which is which. The ASSUME directive is used for this purpose. This directive must be included
in the start of the code segment as follows
OPERATION OPERAND
ASSUME SS:STACKSEG, DS:DATASEG, CS:CODESEG
The assume directive will tell the assembler to use the SS register with the address of the stack
segment whose name is STACKSEG. Similarly, the other operands are associated with the
respective registers and segments. The ES register may also be associated with a program
segment similarly.

END Directive
We have already discussed the ENDS and ENDP directive. The ENDS directive is used to
inform the assembler of the end of a segment while the ENDP directive is used to end a
procedure. The END directive is similarly used to end a program. It format is
Operation Operand
END [procname]
The operand may be blank if the program is not to be executed; you may want to assemble only
data definitions, or you may want to link the program with another module. In most programs,
the operand contains the name of the first or only PROC designated as FAR, where program
execution is to begin.

Processor Directive
Most assemblers assume that the source program is to run on a basic 8086-level computer. As a
result, when you use instructions or features introduced by later processors, you have to notify
the assembler by means of a processor directive, such as .286, .386, .486. The directive may
appear immediately before the instructions, before the code segment, or even at the start fo the
source program for protected mode.

Basic Conventional Directive Assembly Language Program


There are two types of executable programs that we shall discuss. The first is the .EXE program
which we shall discuss here. The second is the .COM program which will be discussed later.
Here the basic .EXE program format is shown.
1 page 60,132
2 TITLE SKELETON(EXE) Skeleton of An .EXE Program
3 ; ------------------------------------------------
4 STACK SEGMENT PARA STACK ‘Stack’
5 ...

5
6 STACK ENDS
7 ; ------------------------------------------------
8 DATASEG SEGMENT PARA ‘Data’
9 ...
10 DATASEG ENDS
11 ;-------------------------------------------------
12 CODESEG SEGMENT PARA ‘Code’
13 MAIN PROC FAR
14 ASSUME SS:STACK,DS:DATASEG,CS:CODESEG
15 MOV AX,DATASEG ;Set address of data segment
16 MOV DS,AX ; in DS
17 ...
18 MOV AX,4C00H ;End processing
19 INT 21H
20 MAIN ENDP ;End of procedure
21 CODESEG ENDS ;End of segment
22 END MAIN ;End of program

Skeleton of an .EXE program

Now let us discuss the above program in detail. We shall examine the program line by line.
Line Explanation
1 The page directive for listing establishes 60 line and 132 characters per line
per page
2 The title directive identifies the assembly program name as SKELETON.ASM
3,7,11 comments to differentiate the program areas (segments) for better
readability to the user
4-6 Defines the stack segment as STACK
8-10 Defines the data segment as DATASEG
12-21 Defines the code segment as CODESEG
13-20 These statements define the only procedure of the code segment as MAIN
14 The ASSUME directive inform the assembler to associate the STACK segment
with SS register, DATASEG segment with data register, CODESEG with the CS
register
By associating segments with segment registers the assembler can determine
the offset address for items in the stack segment, data items in data segment,
instructions in code segment etc
15,16 These two instructions initialize the DS register to address of data segment.
There are two MOV instructions because data cannot be directly copied into
segment register. Thus the first MOV instruction loads AX register with the
address of data segment. The second MOV instruction copies this address
from AX register to DS register.
18,19 The two instructions request an end to the program execution and a return to
the DOS
22 END directive informs the assembler of the end of the program. The MAIN
operand means that the procedure named MAIN is to be the entry point for
subsequent program execution.
Now a point to keep in mind here is that the program is written in symbolic language. For
execution of this program we first have to use an assembler to translate it into object code. A
linker program then translates the object code into machine executable code (.EXE file). Also
when the DOS operating system loads this .EXE file into memory it initializes the CS:IP and

6
SS:SP registers. The DS and ES registers are never initialized and have to be loaded with
respective segment addresses by the program itself. This is clearly done in the above program by
the two MOV instructions of line 15-16. Similarly, the ES register can also be loaded.

Ending a program
Software interrupts are handled by the operating systems such as DOS, UNIX etc. The DOS
operating system has a number of software interrupts which are used for a variety of purposes.
Our main interest here is the INT 21H a commonly used DOS interrupt. This interrupt is used
for many functions such as keyboard input, screen handling, disk I/O and printer output.
However, our interest is the DOS interrupt INT 21H function 4CH which is recognized by DOS
as a request to terminate program execution.
MOV AX,4C00H
INT 21H
Observe the above two lines of code (as in above program). The first line loads 4C00H into AX
register where 4CH is loaded into AH and 00H into AL register. The next line INT 21H requests
DOS for software interrupt 21H. DOS will now check AH register for the interrupt function.
Since 4CH is loaded into AH which is the INT 21H function for program termination, these two
lines will effectively terminate program execution. These two lines which are used in above
program are used to terminate the program.

Example of a Source Program


The following program states a simple but complete assembly source program that adds two
data items in the AX register. The segments are defined in this way.
• STACK contains one entry, DW (Define Word) that defines 32 words initialized to zero, an
adequate size for small programs.
• DATASEG defines three words named VAL1 (initialized with 5471), VAL2 (initialized
with 372), and SUM (uninitialized).
• CODESEG contains the executable instructions for the program, although the first two
statements, PROC and ASSUME, generate no executable code.
page 60,132
TITLE SUM(EXE) Program to add two nos
; -------------------------------------
STACK SEGMENT PARA STACK 'Stack'
DW 32 DUP{0)
STACK ENDS
; ------------------------------------
DATASEG SEGMENT PARA 'Data'
VAL1 DW 5471
VAL2 DW 372
SUM DW ?
DATASEG ENDS
; ------------------------------------
CODESEG SEGMENT PARA 'Code'
MAIN PROC FAR
ASSUME SS:STACK,DS:DATASEG,CS:CODESEG
MOV AX,DATASG ;Set address of data
MOV DS,AX ; segment in DS

MOV AX,VAL1 ;Move 155FH to AX


ADD AX,VAL2 ;Add 0174H to AX
MOV SUM,AX ;Store sum in SUM

MOV AX,4C00H ;End processing

7
INT 21H
MAIN ENDP ;End of procedure
CODESEG ENDS ;End of segment
END MAIN ;End of program

The ASSUME directive tells the assembler to perform these tasks:


• Assign STACK to the SS register so that the processor uses the address in SS for
addressing STACK.
• Assign DATASEG to the DS register so that the processor uses the address in DS for
addressing DATASEG.
• Assign CODESEG to the CS register so that the processor uses the address in CS for
addressing CODESEG.
When loading a program from disk into memory for execution, the program loader sets the
correct segment addresses in SS and CS, but, as shown by the first two MOV instructions,
the program has to initialize DS (and usually ES).

Simplified Segment Directives


A shortcut method of defining segment is provided for in MASM and TASM assemblers. To use
the shortcut method in an assembly program we must first initialize the memory model that the
program will use. The general format for memory model initialization is
.MODEL memory_model
There are many memory models whose requirements are listed below.
Model No of Code Segments No of Data Segments
TINY Every thing is within 64K
SMALL 1 1
MEDIUM >1 1
COMPACT 1 >1
LARGE >1 >1

The TINY model has its code, data and stack segments in one segment. The TINY model is only
used for .COM files. The .MODEL directive generates the required ASSUME directive
statement automatically.

Now the short cut directives used to establish the respective segments are
.STACK [size]
.DATA
.CODE [name]
Each of these directives causes the assembler to automatically generate the required segment
statement and its corresponding ENDS statement. The default segment names are STACK,
_DATA, _TEXT (The CODE segment). However the instruction is used to initialize the address
of data segment in the DS will have the format as given below:
MOV AX,@Data
MOV DS,AX
The stack segment size generated is 1024 bytes or 1 kbyte which can be overrided by giving the
stack segment size. Now we shall show an assembly program using the short cut (simplified)
segment directives.
page 60,132
TITLE SUM(EXE) Program to add two nos
; -------------------------------------
.MODEL

8
.STACK 64 ;Define stack segment
; ----------------------------------------------
.DATA ;Define data segment
VAL1 DW 5471
VAL2 DW 372
SUM DW ?
; ----------------------------------------------
.CODE ;Define code segment
MAIN PROC FAR
ASSUME SS:STACK,DS:DATASEG,CS:CODESEG
MOV AX,DATASG ;Set address of data
MOV DS,AX ; segment in DS

MOV AX,VAL1 ;Move 155FH to AX


ADD AX,VAL2 ;Add 0174H to AX
MOV SUM,AX ;Store sum in SUM

MOV AX,4C00H ;End processing


INT 21H
MAIN ENDP ;End of procedure
END MAIN ;End of program

Here we have overridden the default stack size of 1024 bytes to 32 bytes by use of .STACK 32
directive. Also the ASSUME directive in automatically generated so it is not required to be
coded in the program.

The .STARTUP and .EXIT Directives


The latest version of MASM starting from MASM 6.0 permits the use of simplified directives
for program initialization and termination. Here the .STARTUP directive generated instructions
to initialize segment registers while .EXIT directive generated the INT 21H function 4CH for
program termination.

Defining Data Types


The purpose of data segments is to define data areas and input/output area in memory. The
MASM and TASM assemblers permit the definition of items of various lengths according to a
set of directives that define data. A data item may contain undefined or constant values. The
general format for defining data is
[name] Dn expression
The format for defining the data has contains three fields. Now let us discuss each of these fields
in brief.

Name: An assembly program references the address of a data item by use of its name. The name
field of an item is optional as indicated by square braces.

Directive (Dn): The directive is the assembly command to define a data item. The few directives
for defining a data item are DB (define byte directive), DW (define word directive), DD (define
quad word directive) DF (define far word) DQ (define quad word) and DT (define ten bytes
directive). Each of these defines a data item of a specific size as indicated by their full forms.

Value: The value field defines the initial value of the data item. The general format for defining
an uninitialized data items is given below.
DATAX DB ? ;uninitialized item of

9
;size one byte
The directive can be used to define variable data item initialized with a value item as shown
below.
DATAY DW 25 ; This data item reserves two bytes space
; Containing value 0019H
Architecturally, 8086/8088 stores 16-bit (two by byte) data in two locations in reverse order, that
is, lower byte in lower address and higher byte in higher address as 19 00 for 0019H.

Multiple data items may be declared as


DATAZ DB 11, 12, 14, 15 ... ...
The assembler defines these values in adjacent bytes of memory. A reference to DATAZ
references to the first byte. A reference to DATAZ+1 is to second constant 12. Now for
example the instruction
MOV BL,DATAZ+2
loads the value 15 into the BL registers.

Duplication of data items is permitted by use of DUP directive. The format for defining data is
given below
[name] Dn count DUP(expression)
For example
DB 8 DUP(10) ;8 bytes contain
;OA OA OA OA OA OA OA OA
DW 10 DUP(?) ;Ten words uninitialized
DB 3 DUP (5 DUP(4)) ;Fifteen 4s
The above example duplicates 10 (0AH) decimal eight times in adjacent memory bytes (due to
the directive DB).

Character Strings
Characters are used to enter valid ASCII data containing alphanumeric characters. The DB
directive is the only directive format available to an assembly programmer to enter string. The
assembler translates the character strings and stores them in ASCII format. The general format
to store strings is given below:
DB "Sam’s Copy" ;Double quotes for string, single quote for apostrophe
DB ‘Sam’’ Copy’ ;Single quotes for string, two single quotes for
apostrophe

Numeric Constants Directives


Numeric constants are required for arithmetic values and memory addresses. The constant to be
stored is not defined within quotes and is followed by an optional Radix specifier. Assemblers
convert the constant value into binary and then store the generated bytes in a reverse order. The
numeric formats supported by most assemblers are as follows.

Decimal: Decimal format uses the decimal digits 0 through 9 followed by decimal format
specifier D. Decimal format is the default format. It can also be used by adding the radix
specifier. An example is listed below
A DB 40 ; load decimal value 40
; into memory location referenced by A
B DB 10D ; load decimal in B
C DW 4002D ; load decimal in C
Thus, A will be stored with the hex equivalent of 40 decimal. Thus A will contain 25H.
Similarly, B will contain 0AH. Now C is defined as a double byte and will contain hex

10
equivalent of 4002 Decimal stored in reverse i.e. C contains 5202H. (Observe that the hex
equivalent of 4002 decimal is 0252H which is reversed byte wise i.e. 02 is the last byte and 52
the first byte).

Hexadecimal: Hex format permits the use of hex numeric format to input data value. However
hex numbers must be followed by the H radix specifier.
Example:
AB DW 2524H
Binary: Binary format uses the binary digits 0 and 1, followed by the radix specifier B. A
common use for binary format is to distinguish values for the bit-handling instructions AND,
OR, XOR, and TEST. The binary format permits use of entry of data in the binary number
system. We must, use the binary radix specified by B.

Real: The assembler converts a given real value (a decimal or hex constant followed by the radix
specifier R) into floating-point format for use with a numeric coprocessor.

The EQU Directive


The EQU directive is used for defining a value which can then be used through the program.
Consider the example
NO EQU 10 ;define NO = 10
FIELDMUL DB NO DUP(?)
Now the instruction FIELDMUL will substitute NO as equal to ten. This is due to the statement
NO EQU 10 which defines NO = 10. Now whenever NO variable is used throughout the
program it is replaced by its value by the assembler. Thus the assembler understands statement
as (where NO replaced by equivalent value)
FIELDMUL DB 10 DUP (?)
which defines ten consecutive byte of memory as initialized variables.

The .COM Program


Now let us discuss an important DOS executable file, the .COM file. Till now we have discussed
only the .EXE file. However the .COM file is a very important program file.

Differences between .COM and .EXE Programs


An .EXE program can contain two, three or more segments. However the .COM program is
restricted to one segment and a maximum of 64 K bytes of memory. This 64 Kbytes of space
includes the PSP (Program segment prefix) that DOS inserts before loading the .COM or .EXE
file. The PSP is a 256 byte block which is automatically loaded by DOS for specific functions.
However, DOS does not insert the 512 byte Header block in the .COM file. This header block is
inserted in .EXE files and thus makes them larger than .COM files. Another difference is that
.COM does not have any relocatable address information.

The .COM program generates a stack automatically. Thus, when writing a .COM assembly
language program the programmer can omit the stack. However, in the case when the program
size is large and the stack cannot be accommodated in the 64 Kb program boundary then the
assembler will put the stack in higher memory. The .COM program also does not have a data
segment as the data required is defined in the code segment. However care must be taken to
introduce the directive ORG 100H following the .CODE or code SEGMENT statement so as to
tell the assembler to assemble a program at an offset of 100H. This is done to bypass the 256
byte PSP that DOS introduces before the .COM file. Also when the .COM program is executed
11
the DOS operating system automatically initializes all segment registers with the address of the
PSP. Thus you are not required to load the CS and DS registers with address of the memory
segments.

Example of a .COM Program


page 60,132
TITLE SUM(COM) Program to add two nos
; -------------------------------------
CODESEG SEGMENT PARA 'Code'
ASSUME SS:CODESEG,CS:CODESEG,DS:CODESEG,ES:CODESEG
ORG 100H ;Start at end of PSP
BEGIN: JMP MAIN ;Jump past data
; -------------------------------------
VAL1 DW 5471
VAL2 DW 372
SUM DW ?
; -------------------------------------
MAIN PROC NEAR
MOV AX,VAL1 ;Move 155FH to AX
ADD AX,VAL2 ;Add 0174H to AX
MOV SUM,AX ;Store sum in SUM

MOV AX,4C00H ;End processing


INT 21H
MAIN ENDP ;End of procedure
CODESEG ENDS ;End of segment
END MAIN ;End of program

Now let us briefly describe the above .COM program. Also observe the difference between the
.COM & .EXE program which is already described in the previous sections.
• No defined data or stack segment
• ASSUME statement tells the assembler to load all segment registers to the address of
code segment. Also the ORG directive causes execution to begin 256 bytes (100H) from
the code segment offset. This is done to bypass the 256 byte of the PSP that is loaded by
DOS before the .COM code segment.
• The ORG 100H directive defines the start offset address for program execution. This
offset is loaded into IP register.
• A JMP (jump) statement bypasses the data area which is coded in the code segment.
Another way to code data items is to write them after instructions so as to remove this
JMP statement.

A .COM Program Using Simplified Segment Directives


page 60,132
TITLE SUM(COM) Program to add two nos
; -------------------------------------
.MODEL TINY
.CODE
ORG 100H ;Start at end of PSP
BEGIN: JMP MAIN ;Jump past data
; -------------------------------------
VAL1 DW 5471
VAL2 DW 372
SUM DW ?
; -------------------------------------
MAIN PROC NEAR

12
MOV AX,VAL1 ;Move 155FH to AX
ADD AX,VAL2 ;Add 0174H to AX
MOV SUM,AX ;Store sum in SUM

MOV AX,4C00H ;End processing


INT 21H
MAIN ENDP ;End of procedure
END MAIN ;End of program

Assembling, Linking and Executing Programs


Till now we have discussed basic assembly language programming skills that you will require to
write assembly program in MASM or TASM assembler. Now however we shall discuss how
you assemble, link and execute these assembly programs.

Here we will explain the procedure for typing an assembly language program on an IBM PC
compatible having a DOS based operating system and a MASM or TASM assembler. The
symbolic instructions that you code in assembly language is called a source program. Then an
assembler program translates or assembles this source program into Machine code (also called
object code). Finally, a linker program is used to form the machine addressing for the object
program enabling it to run on any computer. This linked program generated by the linker
program is called an executable. The following figure shows different steps required to assemble
link and execute a program.
1. The assembly step involves translating the source code into object code and
generating an intermediate .OBJ (object) file, or module. One of the assembler's
tasks is to calculate the offsets for every data item in the data segment and for every
instruction in the code segment. The assembler also creates a header immediately in
front of the generated .OBJ module; part of the header contains information about
incomplete addresses. The .OBJ module is not quite in executable form.
2. The link step involves converting the .OBJ module to an .EXE (executable) machine code
module. The linker's tasks include completing any addresses left open by the assembler and
combining separately assembled programs into one executable module.
3. The last step is to load the program for execution. Because the loader knows where the
program is going to load in memory, it is now able to resolve any remaining addresses
still left incomplete in the header. The loader drops the header and creates a program
segment prefix (PSP) immediately before the program loaded in memory.

13
Source Program
A source program is typed in using symbolic language. The program is keyed in by using a text
editor such as the DOS text editor. After typing in the program we have to save the file on disk.
However the source program file instead of being saved as a .TXT (text format) file is saved
with the .ASM (assembly source code format) extension. This is the source program.

14
Although spacing is not compulsory but it aids readability; however this .ASM file is just a text
file i.e., it is not executable by the 8086/8088 processor. The next step involved the assembling
of this .ASM file into an intermediate .OBJ (Object) file.

Assembling a Source (.ASM) Program


The next task as we have discussed in the last section is to assemble the source file (.ASM) to an
object (.OBJ) file. The Microsoft assembler program is MASM.EXE (upto version MASM 5.x)
whereas it is the TASM.EXE for Borland turbo assembler. After MASM version 6.0 the
Microsoft assembler comes with ML.EXE to assemble.

The assembler converts your source statements into machine code and displays any error
messages on the screen. Typical errors include a name that violates naming conventions, an
operation that is spelled incorrectly (such as MOVE instead of MOV), and an operand con-
taining a name that is not defined. Because there are many possible errors (100 or more) and
many different assembler versions, you may refer to your assembler manual for a list. The
assembler attempts to correct some errors but, in any event, reload your editor, correct the .ASM
source program, and reassemble it.

Optional output files from the assembly step are object (.OBJ), listing (.LST), and cross
reference (.CRF or .SBR). You usually request an .OBJ file, which is required for linking a
program into executable form. You'll probably often request an .LST file, especially when it
contains error diagnostics or you want to examine the generated machine code. A .CRF file is
useful for a large program where you want to see which instructions reference which data items.
Also, requesting a .CRF file causes the assembler to generate statement numbers for items in the
.LST file to which the .CRF file refers.

Using Conventional Segment Definitions


The listing a program SUM.ASM (which we have already discussed) produced by the assembler
under the name SUM.LST is given below. The line width is 132 positions as specified by the
PAGE entry.

Note at the top of the listing how the assembler has acted on the PAGE and TITLE directives.
None of the directives, including SEGMENT, PROC, ASSUME, and END, generates machine
code, since they are just messages to the assembler. The listing is arranged horizontally
according to these sections:
1. At the extreme left is the number for each line listed.
2. The second section shows the hex offset addresses of data items and instructions.
3. The third section shows the generated machine code in hexadecimal format.
4. The section to the right is the original source code.
The program itself is organized vertically into three segments, each with its own offset values
for data or instructions. Each segment contains a SEGMENT directive that notifies the
assembler to align the segment on an address that is evenly divisible by 10H (or 16)

SUM(EXE) Program to add two nos Page 1-1


1 page 60,132
2 TITLE SUM(EXE) Program to add two nos
3 ; ------------------------------------
4 0000 STACK SEGMENT PARA STACK 'Stack'
5 0000 0020 [ DW 32 DUP(0)

15
6 0000
7 ]
8
9 0 04 0 STACK ENDS
10 ; ----------------------------------------
11 0000 DATASEG SEGMENT PARA 'Data'
12 0000 00D7 VAL1 DW 5471
13 0002 007D VAL2 DW 372
14 0004 0000 SUM DW ?
15 0006 DATASEG ENDS
16 ; ----------------------------------------
17 0000 CODESEG SEGMENT PARA 'Code'
18 0000 MAIN PROC FAR
19 ASSUME SS:STACK,DS:DATASEG,CS:CODESEG
20 0000 B8 ---- R MOV AX,DATASEG ;Set address of data
21 0003 8E D8 MOV DS,AX ; segment in DS
22
23 0005 Al 0000 R MOV AX,VAL1 ;Move 155FH to AX
24 0008 03 06 0002 R ADD AX,VAL2 ;Add 0174H to AX
25 000C A3 0004 R MOV SUM,AX ;Store sum in SUM
26 000F B8 4C00 MOV AX,4C00H ;End processing
27 0012 CD 21 INT 21H
28 0014 MAIN ENDP ;End of procedure
29 0 014 CODESEG ENDS ;End of segment
30 END MAIN ;End of program
Segments and Groups:
Name Length Align Combine Class
CODESEG .............. 0014 PARA NONE 'CODE'
DATASEG .............. 0006 PARA NONE 'DATA'
STACK ............ 0040 PARA STACK 'STACK'
Symbols:
Name Type Value Attr
MAIN .............. F PROC 0000 CODESEG Length = 0014
VAL1 ................. L WORD 0000 DATASEG
VAL2 ................. L WORD 0002 DATASEG
SUM .................. L WORD 0004 DATASEG
0 Warning Errors
0 Severe Errors

Assembled Program with Conventional Segments

SEGMENT statement itself generates no machine code. The program loader stores the contents
of each segment in memory and initializes its address in a segment register, that is, STACK in
SS, DATASEG in DS, and CODESEG in CS. The beginning of the segment is offset zero bytes
from that address.

Stack Segment: The stack segment contains a DW (Define Word) directive that defines 32
words, each generating a zero value designated by (0). This definition of 32 words is a realistic
size for a stack because a large program may require many interrupts for input/output and calls to
subprograms, all involving use of the stack. The stack segment ends at offset 0040H, which is
equivalent to decimal value 64 (32 words × 2 bytes). The assembler shows the generated
constant to the left as 0020[0000]; that is, 20H (32) zero words.

If the stack is too small to contain all the items pushed onto it, neither the assembler nor the
linker warns you, and the executing program may crash in an unpredictable way.

Data Segment: The program's data segment, DATASEG, contains three defined values, all in
DW (Define Word) format:

16
1. VAL1 defines a word (two bytes) initialized with decimal value 5471, which the assembler
has translated to 155FH (shown on the left).
2. VAL2 defines a word initialized with decimal value 372, assembled as 0174H. The actual
stored values of these two constants are, respectively, 5F15H and 7401, which you can check
with DEBUG.
3. SUM is coded as a DW with ? in the operand to define a word with an uninitialized constant.
The listing shows its contents as 0000.

The offset addresses of VAL1, VAL2, and SUM are, respectively, 0000, 0002, and 0004, which
relate to their field sizes.

Code Segment. The program's code segment, CODESEG, contains the program's executable
code, all in one procedure (PROC). Three statements establish the addressability of the data
segment:
ASSUME SS:STACK,DS:DATASEG,CS:CODESEG
0000 B8 ---- R MOV AX,DATASEG ;Set address of data
0003 8E D8 MOV DS,AX ; segment in DS

• The ASSUME directive relates each segment to its corresponding segment register ASSUME
simply provides information to the assembler, which generates no machine code for it.
• The first MOV instruction "stores" DATASEG in the AX register. Now, an instruction cannot
actually store a segment in a register—the assembler recognizes the reference to a segment and
assumes its address. Note the machine code to the left: B8----R. The four hyphens mean that at
this point the assembler cannot determine the address of DATASEG; the system determines
this address only when the object program is linked and loaded for execution. Because the
loader may locate a program anywhere in memory, the assembler leaves the address open and
indicates the fact with an R (for relocatable); the loader is to replace the incomplete address
with the actual one.
• The second MOV moves the contents of the AX register to DS. Because there is no valid
instruction for a direct move from memory to DS, two instructions are needed to initialize it.

Note that the program does not require the ES register, although many programmers initialize it
as a standard practice.

Although the loader automatically initializes SS and CS when it loads a program for execution,
it is your responsibility to initialize DS, and ES if required.

The first instruction after initializing the DS register is MOV AX,VAL1, which begins at offset
location 0005 and generates machine code Al 0000. The space in the listing between Al (the
operation) and 0000 (the operand) is only for readability. The next instruction, ADD AX,VAL2,
begins at offset location 0008 and generates four bytes of machine code. The instruction, MOV
SUM,AX, copies the sum in AX to SUM at offset 0004 in the data segment. In this example,
machine instructions are two, three, or four bytes in length.

The last statement in the program, END, contains the operand MAIN, which relates to the name
of the PROC at offset 0000. This is the location in the code segment where the program loader is
to transfer control for starting execution.

Following this program listing are a Segments and Groups table and a Symbols table.

17
Segments and Groups Table: This table shows any defined segments and groups. Note that
segments are not listed in the same sequence as they are coded; the assembler used for this
example lists them in alphabetic sequence by name. The table provides the length in bytes of
each segment, the alignment (all are paragraphs), the combine type, and the class.

Symbols Table: This table provides the names of data fields in the data segment (VAL1, VAL2,
and SUM) and the labels applied to instructions in the code segment. For MAIN (the only entry
in the example), Type F PROC means far procedure (far because MAIN, as the entry-point for
execution, must be known outside this program). The Value column gives the offset from the
beginning of the segment for names, labels, and procedures. The column headed "Attr" (for
attribute) provides the segment in which each item is defined.

Using Simplified Segment Directives


It is already shown how to code a program using the simplified segment directives. The
assembly listing of that program is given below. For the simplified segment directives, initialize
DS like this:
MOV AX,@Data
MOV DS,AX
The first part of the symbol table under "Segments and Groups" shows the three segments
renamed by the assembler and listed alphabetically:

SUM(EXE) Program to add two nos Page 1-1


1 page 60,132
2 TITLE A05ASM2 (EXE) Move and add operations
3 ; --------------------------------------------
4 .MODEL SMALL
5 .STACK 64 ;Define stack
6 .DATA ;Define data
7 0000 00D7 VAL1 DW 5471
8 0002 007D VAL2 DW 372
9 0004 0000 SUM DW ?
10 ; --------------------------------------------
11 .CODE ;Define code segment
12 0000 MAIN PROC FAR
13 0000 B8 ---- R MOV AX,©Data ;Set address of data
14 0003 8E D8 MOV DS,AX ;segment in DS
15
16 0005 Al 0000 R MOV AX,VAL1 ;Move 0215 to AX
17 0008 03 06 0002 R ADD AX,VAL2 ;Add 0125 to AX
18 000C A3 0004 R MOV SUM,AX ;Store sum in FLDF
19
20 000F B8 4C00 MOV AX,4C00H ;End processing
21 0012 CD 21 INT 21H
22 0014 MAIN ENDP ;End of procedure
23 END MAIN ;End of program
Segments and Groups:
Name Length Align Combine Class
DGROUP............... GROUP
_DATA .............. 0006 WORD PUBLIC 'DATA'
STACK .............. 0040 PARA STACK 'STACK'
_TEXT .............. 0014 WORD PUBLIC 'CODE'
Symbols:
Name Type Value Attr
MAIN .............. F PROC 0000 _TEXT Length = 0014
VAL1................. L WORD 0000 _DATA

18
VAL2................. L WORD 0002 _DATA
SUM.................. L WORD 0004 _DATA
@CODE................ TEXT _TEXT
@FILENAME .......... TEXT sum
0 Warning Errors
0 Severe Errors

Assembled Program with Simplified Segment Directives

• _DATA, with a length of 6 bytes


• STACK, with a length of 40H (64 bytes)
• _TEXT, for the code segment, with a length of 14H (20 bytes)

Listed under the heading "Symbols" are names defined in the program or default names. The
simplified segment directives provide a number of predefined equates, which begin with an @
symbol and which you are free to reference in a program. As well as @data, they are:
@CODE Equated to the name of the code segment, _TEXT
©FILENAME Name of the program
You may use @code and @data in ASSUME and executable statements, such as MOV AX, @
data.

Two-Pass Assembler
Assemblers typically make two or more passes through a source program in order to resolve
forward references to addresses not yet encountered in the program. During pass 1, the assembler
reads the entire source program and constructs a symbol table of names and labels used in the
program, that is, names of data fields and program labels and their relative locations (offsets)
within the segment. You can see such a symbol table immediately following the assembled
program, where the offsets for VAL1, VAL2, and SUM are 0000, 0002, and 0004 bytes,
respectively. Although the program defines no instruction labels, they would appear in the code
segment with their own offsets. Pass 1 determines the amount of code to be generated for each
instruction.

During pass 2, the assembler uses the symbol table that it constructed in pass 1. Now that it
knows the length and relative position of each data field and instruction, it can complete the
object code for each instruction. It then produces, on request, the various object (.OBJ), list
(.LST), and cross-reference (.CRF) files.

A potential problem in pass 1 is a forward reference. Certain types of instructions in the code
segment may reference the label of an instruction, but the assembler has not yet encountered its
definition. MASM constructs object code based on what it supposes is the length of each
generated machine language instruction. If there are any differences between pass 1 and pass 2
concerning instruction lengths, MASM issues an error message "Phase error between passes."
Such errors are relatively rare, but if one appears, you'll have to trace its cause and correct it.

Since version 6.0, MASM handles instruction lengths more effectively, taking as many passes
through the file as necessary. TASM can assemble a program in one pass, but you may request
that it take more than one if it is having difficulty with forward references.

Linking an Object Program

19
When your program is free of error messages, the next step is to link the object module,
SUM.OBJ, that was produced by the assembler and that contains only machine code. (MASM
6.1 performs assemble and link with the ML command.) The linker performs the following
functions:

• Combines, if requested, more than one separately assembled module into one executable
program, such as two or more assembly programs or an assembly program with a C program.
• Generates an .EXE module and initializes it with special instructions to facilitate its subsequent
loading for execution.

Once you have linked one or more .OBJ modules into an .EXE module, you may execute the
.EXE module any number of times. But whenever you need to make a change in the program,
you must correct the source program, assemble again into an .OBJ module, and link the .OBJ
module into an .EXE module. Even if initially these steps are not entirely clear, you will find that
with only a little experience, they become automatic.

The output files from the link step are executable (.EXE), map (.MAP), and library (.LIB).

Link Map for the First Program: For the program SUM, the linker produced this map:

START STOP LENGTH NAME CLASS

00000H 0003FH 0040H STACK STACK

00040H 00045H 0006H DATASEG DATA

00050H 00063H 0014H CODESEG CODE

Program entry point at 0005:0000

• The stack is the first segment and begins at offset 0 bytes from the start of the program.
Because it is defined as 32 words, it is 64 bytes long, as its length (40H)indicates.
• The data segment begins at the next available paragraph boundary, offset 40H.
• The code segment begins at the next paragraph boundary, offset 50H. (Some assemblers
rearrange the segments into alphabetical order.)
• Program entry point 0005:0000, which is in the form segment:offset, refers to the relative
address of the first executable instruction. In effect, the relative starting address is at
segment location 5[0], offset 0 bytes, which corresponds to the code segment boundary at
50H. The program loader uses this value when it loads the program into memory for
execution.

At this stage, the only error that you are likely to encounter is entering a wrong filename. The
solution is to restart with the link command.

Link Map for the Second Program: The link map for the second program, which uses
simplified segment directives, shows a somewhat different setup from that of the previous
program. First, the assembler has physically rearranged the segments into alphabetical order;
second, succeeding segments are aligned on word (not paragraph) boundaries, as shown by the
link map:

20
START STOP LENGTH NAME CLASS

00000H 00013H 0014H _TEXT CODE

00014H 00019H 0006H _DATA DATA

00020H 0005FH 0040H STACK STACK

Program entry point at 0000:0000

• The code segment is now the first segment and begins at offset 0 bytes from the start of the
program.
• The data segment begins at the next word boundary, offset 14H.
• The stack begins at the next word boundary, offset 20H.
• The program entry point is now 0000:0000, which means that the relative location of the code
segment begins at segment 0, offset 0.

Executing a Program
Having assembled and linked a program, you can now execute it. If the .EXE file is in the
default drive, you could ask the loader to read it into memory for execution by typing
SUM.EXE or SUM (without the .EXE extension)
If you omit typing the file extension, the loader assumes it is an executable .EXE or .COM
program. However, since this program produces no visible output, it is suggested that you run it
under DEBUG and use Trace commands to step through its execution. Key in the following,
including the .EXE extension:
DEBUG SUM.EXE
DEBUG loads the .EXE program module and displays its hyphen prompt.

To view the stack segment, key in D SS:0. The stack contains all zeros because it was initialized
that way.

To view the code segment, key in D CS:0. Compare the displayed machine code with that of the
code segment in the assembled listing:
B8 8ED8A10000 . . .
In this case, the assembled listing does not accurately show the machine code, since the
assembler did not know the address for the operand of the first instruction. You can now
determine this address by examining the displayed code.

To view the contents of the registers, press R followed by <Enter>. SP (Stack Pointer) should
contain 0040H, which is the size of the stack (32 words = 64 bytes = 40H). IP (Instruction
Pointer) should be 0000H. SS and CS are properly initialized for execution; their values depend
on where in memory your program is loaded.

The first instruction MOV AX,xxxx is ready to execute—it and the following MOV instruction
are about to initialize the DS register. To execute the first MOV, press T (for Trace) followed by
<Enter> and note the effect on IP. To execute the second MOV, again press T followed by
<Enter>. Check DS, which is now initialized with the segment address.

The third MOV loads the contents of VAL1 into AX. Press T again and note that AX now

21
contains 00D7. Now press T to execute the ADD instruction and note that AX contains 0154.
Press T to cause MOV to store AX in offset 0004 of the data segment.

To check the contents of the data segment, key in D DS:0. The operation displays the three data
items as D7 00 7D 00 54 01, with the bytes for each word in reverse sequence.

At this point, you can use L to reload and rerun the program or press Q to quit the DEBUG
session.

The Cross-Reference Listing


The assembler generates an optional file that you can use to produce a cross-reference listing of a
program's identifiers, or symbols. The file extension is .SBR for MASM 6.1, .CRF for MASM
5.1, and .XRF for TASM. However, you still have to convert the file to a properly sorted cross-
reference file.

The following figure shows the cross-reference listing produced for the program in for the
program using the conventional segment directive. The symbols in the first column are in
alphabetic order. The numbers in the second column, shown as n#, indicate the line in the .LST
file where each symbol is defined. Numbers to the right of this column are line numbers showing
where the symbol is referenced by other statements. For example, CODESEG is defined in line
17 and is referenced in lines 19 and 29. SUM is defined in line 14 and referenced in line 25+,
where the "+" means its value is modified during program execution (by MOV SUM,AX).

Symbol Cross-Reference (# definition, + modification)


MAIN........................ 18# 28 30

CODE........................ 17
CODESEG..... 17# 19 29

DATA........................ 11
DATASEG..................... 11# 15 19 20

VAL1........................ 12# 23
VAL2........................ 13# 24
SUM......................... 14# 25 +

STACK....................... 4
STACK........ 4# 9 19

Cross-Reference Table

Assembling programs generates a lot of redundant files. You can safely delete .OBJ, .CRF, and
.LST files. Keep .ASM source programs in case of further changes and .EXE files for executing
the programs.

Error Diagnostics
The assembler provides diagnostics for any programming errors that violate its rules. The
following program is similar to the one with simplified segment directive, except that it has a
number of intentional errors inserted for illustrative purposes. The diagnostics will vary by
assembler version.

22
1 page 60,132
2 TITLE SUM (EXE) Coding errors
3 . -----------------------------------
4 .MODEL SMALL
5 .STACK 64
6 .DATA
7 0000 00AF VAL1 DW 5471
8 0002 0096 VAL2 DW 372
9 0004 SUM DW
sum.ASM(9): error A2027: Operand expected
10 ; ----------------------------------
11 .CODE
12 0000 MAIN PROC FAR
13 0000 B8 ---- R MOV AX,©Data ;Address of data
14 0003 8B DO MOV DX,AX ; segment in DS
15
16 MOV AS,VAL1 ;Move 0175 to AX
sum.ASM(16): error A2009: Symbol not defined: AS
17 0005 03 06 0002 R ADD AX,FLDE ;Add 0150 to AX
18 0009 A3 0000 U MOV SOM,AX ;Store sum in SUM
sum.ASM(18): error A2009: Symbol not defined: SOM
19 000C A2 0000 R MOV VAL1,AL ;Store byte value
sum.ASM(19): warning A4031: Operand types must match
20 000F B8 4C00 MOV AX,4C00H ;End processing
21 0012 CD 21 INT 21H
22 0014 MAIN ENDP
sum.ASM(22): error A2006: Phase error between passes
23 END MIAN
sum.ASM(23): error A2009: Symbol not defined: MIAN
1 Warning Errors
5 Severe Errors

Assembler Error Diagnostics

LINE EXPLANATION
9 The definition of SUM requires an operand.
14 DX should be coded as DS, although the assembler does not know that this is an
error.
16 AS should be coded as AX.
18 SOM should be coded as SUM.
19 Field sizes (byte and word) must agree (warning).
22 Correcting the other errors will cause this diagnostic to disappear.
23 MIAN should be coded as MAIN.

Error message 22, "Phase error between passes," occurs when addresses generated in pass 1 of a
two-pass assembler differ from those of pass 2. To isolate an obscure error under MASM 5.1,
use the /D option to list both the pass 1 and the pass 2 files, and compare the offset addresses.

Operators Used in Assembly Language


Operators provide a facility for changing or analyzing operands during an assembly. They do not
generate machine code for their operation. Following are different types of operators.

Arithmetic Operators:

23
These Operators include the familiar arithmetic signs and perform arithmetic during an
assembly. The operators include +, -, *, /, MOD.

Logical Operators:
The logical operators process the bits in an expression. The operators include AND, OR, XOR,
NOT.
e.g., MOV CL,00111100B AND 10101110B

OFFSET Operator:
It returns the offset address of a variable or label.
e.g., MOV DX,OFFSET TBL1 ;equivalent to LEA DX,TBL1

SEG Operator:
It returns segment address of a variable or label.
e.g., MOV DX,SEG TBL1 ;Get address of data segment
MOV AX,SEG L1 ;Get address of code segment

Segment Override:
This operator is coded as a colon (:). It calculates the address of a label or variable relative to a
particular segment.
e.g., MOV BH,ES:10H
MOV CX,CS:[BX]

SHL & SHR Operators:


These operators shift the expression during assembly.
e.g., MOV BH,01011101B SHR 3 ; Load 00001011B

HIGH & LOW Operators:


These operators returns high or low byte of an expression.
e.g.,
VAL EQU 1234H
...
MOV CL,LOW VAL ;Load 34H in CL
MOV CH,HIGH VAL ;Load 12H in CH

Index Operator:
The index operators are coded as brackets ([]). It acts like a plus sign. A typical use of indexing
is to reference data items in tables.
e.g.,
MOV CL,TBL[4] ;Load 34H in CL
MOV CH,TBL[BX] ;Load 12H

PTR Operator:
The PTR operator can be used on data variables and instruction labels. It uses the type specifiers
BYTE, WORD, FWORD, DWORD, QWORD and TBYTE to specify a size in an ambiguous
operand or to override the defined type (DB, DW, etc) for variables.

Length Operator:

24
It returns the number of entries defined by a DUP operator

Type Operator:
It returns the number of bytes, according to the definition of the referenced variable.

Size Operator:
It returns the product of LENGTH times TYPE and is useful only if the referenced variable
contains the DUP entry.
e.g.,
BYTE_VAL DB ?
WORD_TBL DW 10 DUP (?)
...
MOV AX, TYPE BYTE_VAL ;AX=0001H
MOV AX,TYPE WORD_TBL ;AX=0002H
MOV CX,LENGTH WORD_TBL ;CX=000AH
MOV DX,SIZE WORD_TBL ;CX=0014H

Interrupt Services
In assembly language DOS/BIOS functions and interrupts are used for input/output services. An
interrupt occurs when any currently executing program is disturbed (interrupted). Interrupts are
generated for a variety of reasons, usually related to peripheral devices such as keyboard, disk
drive or printer. The Intel 8086 microprocessor recognizes two types of interrupts; hardware and
software. Hardware interrupt is generated when a peripheral device needs attention form
microprocessor. A software interrupt is a call to subroutine located in the operating system,
usually an input-output routine. i.e., a software interrupt calls a built-in subroutine form the
operating system usually DOS/BIOS for input and output operations.

INT 21H is a DOS service for different purposes. This service has many functions but few of
them are given below. The function no of the service is to be loaded in register AH and the other
registers are loaded with the data as required before the interrupt call.

The INT 10H is one among many BIOS interrupt service. We will deal with BIOS INT 10H
because it is important and frequently used interrupt service. BIOS INT 10H is for video display
control. INT 10H also has many functions like INT 21H, and some of them are given below. The
function no of the service is to be loaded in register AH and the other registers are loaded with
the data as required by the function before interrupt call.

BIOS handles INT 00H – 1FH and DOS handles INT 20H – 3FH. The rest interrupt services are
available to the programmer and can be used for other purpose.

When the computer powers up, the system establishes an Interrupt Vector Table in locations
000H – 3FFH of conventional memory. The table provides for 256 (100H) interrupts, each with
a related 4-byte offset:segment address in the form IP:CS. The operand of an interrupt
instructions such as INT 05H identifies the type of request. Since there are 256 entries, each four
bytes long, the table occupies the first 1,024 bytes of memory, from 00H through 3FFH. Each
address in the table relates to a BIOS or DOS routine for a specific interrupt type. Thus bytes 0 –
3 contain the address for interrupt 0, bytes 4-7 for interrupt 1 and so forth. Following
Following are some relevant interrupts

25
INTERRUPT OPERATION INTERRUPT OPERATION
00 Divide by 0 14 Serial port interrupt
01 Single step processing 16 Keyboard interrupt
02 Non maskable interrupt 17 Printer interrupt
(NMI)
03 Breakpoint address 19 Bootstrap loader
04 Overflow 1A Time of the day
05 Print screen 1B Control on keyboard break
08 Interval timer 1C Control on timer interrupt
09 BIOS keyboard interrupt 1D Video table address
0E Disk interrupt 1E Disk table address
10 Video interrupt 1F ASCII character address
11 Equipment check 21 DOS interrupt
12 Memory check 33 Mouse interrupt
13 Disk I/O

DOS Service INT 21H


Following are the DOS INT 21H services which requires the function number (code) in AH. The
content of the other registers depends upon the function number. If the function is a input
function the returned values are stored in some registers. Depending on the function number the
returned values are stored in different registers.

00H: Terminate the current program: INT 21H, function 4CH is used instead
01H: Console input with echo: Wait for a character from the standard input device. The
character is returned in AL and echoed. The function responds to CTRL+BREAK.
MOV AH,01H ;Request keyboard input
INT 21H
02H: Character output: Send the character in DL to the standard output device. The Tab,
Carriage Return, and Line Feed characters act normally, and the operation automatically
advances the cursor.
MOV AH,02H ;Request display character
MOV DL,char ;Character to display
INT 21H
03H: Communication Output: Reads a character from the serial port into AL; a primitive
service, and BIOS INT 14H is preferred.
04H: Communication Output: DL contains the character to transmit; BIOS INT 14H is used
instead.
05H: Printer output: Send the character in DL to the parallel printer port.
06H: Direct console input-output: Reads the character in AL if DL=0FFH else displays
character at DL to the standard output device.
For the read operation if there is no character in the keyboard buffer the operation
sets the zero flag and does not wait for input. If a character is waiting in the buffer, the
operation loads the character in AL and clears the zero flag. The operation does not echo
the character on the screen and does not check for CTRL+BREAK or
CTRL+PRINTSCREEN. A zero in AL means the user has pressed an extended function
key such as HOME, F1 etc.
07H: Console input: Wait for a character from the standard input device. The character is
returned in AL, but not echoed. The operation not responds to CTRL+BREAK. It is

26
equivalent to function 01H but the entered character is not echoed in the screen and it
does not respond to CTRL+BREAK
08H: Console input without echo: Wait for a character from the standard input device. The
character is returned in AL, but not echoed. Respond to CTRL+BREAK. The operation is
same as function 01H but the read character is not echoed in the screen.
09H: String output: Send a string of characters to the standard output device until ‘$’
character is reached. DX contains the offset address of the string.
.DATA
PROMPT DB ‘String to display’,’$’ ;Display string
.CODE
….
MOV AH,09H
LEA DX,PROMPT
INT 21H
0AH: Read string: Read characters from the standard input device. DX points a location whose
first byte gives the max characters allowed to enter, the next byte reserved to store the
actual no of characters entered and the rest space to store the entered characters. This
requirement is called the parameter list. The following example defines the parameter list
PARA_LIST LABEL BYTE ;Start the parameter list
MAX_LEN DB 20 ;Maximum number of input characters
ACT_LEN DB ? ;Actual number of input characters
KB_DATA DB 20 DUP (‘ ‘) ;Characters entered from the keyboard
In the parameter list, the LABEL directive tells the assembler to align on a byte boundary
and gives the location the name PARA_LIST. Because LABEL takes no space,
PARA_LIST and MAX_LEN refer to the same memory location. MAX_LEN defines the
maximum number of keyboard characters, ACT_LEN provides a space for the operation
to insert the actual number of characters entered, and KB_DATA reserves 20 spaces for
the characters. You may use valid names for these fields.
e.g., MOV AH,0AH ;Request keyboard input
LEA DX,PARA_LIST ;Load address of parameter list
INT 21H ;Call interrupt service
The operation reads characters until ENTER key is pressed, stores the entered character
in the KB_DATA field and actual number of characters entered in ACT_LEN field. It
also transfers the ENTER character (ODH) in the input field KB_DATA, but does not
count its entry in the actual length.
0BH: Check keyboard Status: Returns FFH in AL if an input character is available in the
keyboard buffer else returns 00H in AL.
0CH: Clear key board buffer and invoke input functions: The input functions are stored in
AL and other registers should hold the values as required. You may use this function in
association with functions 01H, 06H, 07H, 08H, or 0AH.
MOV AH,0CH ;Request keyboard function
MOV AL,function ;Requires input function
INT 21H

BIOS Service INT 10H for Video Control


BIOS provides interrupt service 10H for video display control. INT 10H also has many
functions like INT 21H, and some of them are given below. The function no of the service is
to be loaded in register AH and the other registers are loaded with the data as required by the
function before interrupt call.

27
Before proceding with the INT 10H functions we have to discuss about the video modes and the
attributes.

Video Modes:
The video mode determines factors such as text or graphics, color or monochrome, screen
resolution, and the number of colors. BIOS INT 10H function 00H is used to initialize the mode
for the currently executiong program or to switch between text and graphics. Setting the mode
also clears the screen.

Following are the familiar text modes:


Mode Row × Cols Type Pages Resolution Colors
00 25 × 40 Color 0-7 360 × 400 16
01 25 × 40 Color 0-7 360 × 400 16
02 25 × 80 Color 0-3 720 × 400 16
03 25 × 80 Color 0-3 720 × 400 16
07 25 × 80 Monochrome 0 720 × 400 1

Following are some graphics modes:


Mode Type Pages Resolution Colors
04H Color 8 320 × 200 4
05H Color 8 320 × 200 4
06H Color 8 640 × 200 2
0DH Color 8 320 × 200 16
0EH Color 4 640 × 200 16
0FH Monochrome 2 640 × 350 1
10H Color 2 640 × 350 16
11H Color 1 640 × 480 2
12H Color 1 640 × 480 16
13H Color 1 320 × 200 256

Attributes
The attribute byte in text mode determines the characteristics of each displayed character. When
a program sets an attribute, it remains set; that is all subsequent displayed characters have the
same attribute until another operation changes it. We can use the INT 10H functions to generate
a screen attribute and perform such actions as scroll up or down, read attribute or character, or
display attribute or character. The attribute has the following format.
Background Foreground
Attribute: BL R G B I R G B
Bit number: 7 6 5 4 3 2 1 0
The letters R, G, B indicate bit positions for red, green and blue, respectively, for each of the
three primary additive colors.
 Bit 7 (BL) sets blinking ( may be disabled)
 Bits 6-4 determine the character’s background color
 Bit 3 (I) sets normal (if 0) or high intensity (if 1)
 Bits 2-0 determine the character’s foreground color
The background and foreground colors should not be shame for the characters to be visible.
Using the above table we construct a number for an attribute.

28
00H Set Video Mode: Load the required mode in AL. This operation also clears the screen.
MOV AH,00H ;Request set mode
MOV AL,03H ;Standard color text
INT 10H ;Call interrupt service
01H Set Cursor Size: The cursor exists only on text mode. To set cursor vertically set the
register CX as:
CH (bits 4-0): starting scan line
CL (bits 4-0): ending scan line
MOV AH,01H ;Request set cursor size
MOV CH,00 ;Start scan line
MOV CL,14 ;End scan line
INT 10H ;Call interrupt service
In VGA mode the scan line is from 0 to 14 (0 to 7 for monochrome). The default size of
the cursor for VGA mode is 13:14 (6:7 for monochrome). This code enlarges the cursor
to its maximum size (0:14).
02H Set Cursor Position: The function sets the cursor anywhere on a screen according to
row:column coordinates. Set the registers as follows:
BH: page number (0 is the default), DH: row, and DL: column.
MOV AH,02H ;Request set cursor
MOV BH,00 ;Page number 0 (normal)
MOV DH,12 ;Row 12
MOV DL,30 ;Column 30
INT 10H ;Call interrupt service
The cursor location on each page is independent of its location on other pages.
03H Return Cursor Status: The function determines the present row, column and size to the
cursor. Store the page number in BH.
The operation leaves AX and BX unchanged and returns these values:
CH: Starting scan line CL: Ending scan line
DH: Row DL: Column
MOV AH,03H ;Request cursor locations
MOV BH,00 ;Page number 0 (normal)
INT 10H ;Call interrupt service
05H Select Active Page: Select the page that is to be displayed. We can create different pages
and request alternating between pages. The operation is simply a request that returns no
values.
MOV AH,05 ;Request active page
MOV AL,PAGE# ;Page number
INT 10H ;Call interrupt service

06H Scroll Up Screen: Scroll upward of lines in a specified area of the screen. Displayed
lines scroll off at the top and blank lines appear at the bottom. Setting AL to 0 caused the
entire screen to scroll up, effectively clearing it. Setting a nonzero value in AL causes the
number of lines to scroll up. Set the following registers as:
AL: Number of rows (00 for full screen) CX: Starting row, Column
BH: Attribute of pixel value DX: Ending row, Column
MOV AH,06 ;Request scroll
MOV AL,01 ;Scroll one line
MOV BH,30H ;Cyan background, black foreground
MOV CH,0C19H ;From row 12, column 25 through
MOV DX,1236H ; row 18, column 54 (window)
INT 10H ;Call interrupt service
Scrolling can be done to create a window in some location in the screen and scroll line in
that window only and can set the desired attribute on the selected area as window.

29
07H Scroll Down Screen: Scrolling down the screen causes the bottom lines to scroll off and
blank lines to appear at the top. It works the same as function 06H, except the fact that
this operation scrolls down. Set the following registers as:
AL: Number of rows (00 for full screen) CX: Starting row, Column
BH: Attribute or pixel value DX: Ending row, Column
08H Read Character and Attribute at Cursor: Read character and its attribute at cursor
from the video display area. Before calling interrupt, set the page number in BH register.
The function delivers character to AL and attribute to AH
09H Display Character And Attribute At Cursor: Display a specified number of characters
at cursor according to given attribute. Set the registers as:
AL: ASCII character BL: Attribute or pixel value
BH: Page number CX: Count
The count in CX specifies the number of times the operation is to repetitively display the
character in AL.
MOV AH,09H ;Request display character
MOV AL,01H ;Happy face for display
MOV BH,0 ;Page number 0 (normal)
MOV BL,16H ;Blue background, brown foreground
MOV CX,60 ;No. of repeated characters
INT 10H ;Call interrupt service
The Operation does not advance the cursor or respond to the Bell, Carriage Return, Line
Feed, or Tab characters; instead it attempts to display them as ASCII characters.
In graphics mode BL is used for defining the foreground colour. If Bit 7 of BL is 0, then
the color replaces the present pixel color else the color is XORed with present pixel color.
0AH Display Character at Cursor: The difference with 09H is that function 09H sets the
attribute whereas function 0AH uses the current value.
AL: ASCII character BL: Pixel value (graphics mode only)
BH: Page number CX: Count
0BH Set the Color Palette: The value in BH (00 or 01) determines the purpose of BL
BH = 00: Select the background color, where BL contains the color value in bits 0-3 (any
of 16 colors).
MOV AH,0BH ;Request color
MOV BH,00 ;Background
MOV BL,04 ;Color red
INT 10H
BH = 01: Select the palette for graphics, where BL contains the palette (0 or 1).
MOV AH,0BH ;Request color
MOV BH,01 ;Select palette
MOV BL,0 ;number 0
INT 10H
0CH Write Pixel Dot: Display a selected color (background and palette) in graphics mode. Set
the registers as:
AL: Color of the pixel CX: Column
BH: Page number DX: Row
The minimum value for the column or row is 0 and the maximum value depends on the
video mode. The following example sets a pixel at column 200 and row 50.
MOV AH,0CH ;Request write dot
MOV AL,03 ;color of pixel
MOV BH,0 ;page number 0
MOV CX,200 ;Horizontal X coordinate (column)
MOV DX,50 ;Vertical Y coordinate (row)
INT 10H ;request interrupt service

30
In all graphics modes except 04 setting bit 7 of AL to 1 causes the operation to be
XORed.
0DH Read Pixel Dot: Read pixel dot to determine its color value. For this set page number in
BH, column in CX, and row in DX. The operation returns the pixel color in AL.
0EH Display in Teletype Mode: Monitor is used as a terminal for simple displays in text and
graphics modes. For this set AL by the character to display, and BL by the foreground
color.
MOV AH,0FH ;Request display
MOV AL,char ;Character to display
MOV BL,color ;Foreground color (graphics mode)
INT 10H ;Call interrupt service
The operation rewpond to backspace, bell, carriage return and line feed but does not
respond to tab.
0FH Get Current Video Mode: The operation returns the values as:
AL: Current video mode AH: umber of screen columns
BH: Active video page
MOV AH,0FH ;Request video mode
INT 10H ;Call interrupt service
CMP AL,03 ;If mode 3
JNE ... ;Jump on not equal

BIOS Service INT 16H for Keyboard Operation


INT 16H is the basic BIOS keyboard operation used extensively by software developers and
provides the following services according to a function code that you load in AH.
03H: Set Typematic Repeat Rate
When you hold down a key for more than one-half second, the keyboard enters typematic
mode and automatically repeats the character. To change the rate, you can use the
function like this:
MOV AH,03H ;Set typematic repeat rate
MOV AL,05H ;Required subfunction
MOV BH,repeat-delay ;Delay before start
MOV BL,repeat-rate ;Speed of repetition
INT 16H
The values for repeat-delay in BH are 0 = 1/4 sec, 1 = 1/2 sec. (default), 2 = 3/4 sec, and
3 = 1 sec. The values for repeat-rate in BL range from 0 (fastest) through 31 (slowest).
05H: Keyboard Write
This operation allows a program to insert characters in the keyboard buffer as if a user had
pressed a key. Load the ASCII character into CH and its scan code into CL. The operation
allows you to enter characters into the buffer until it is full. If full, the operation sets the
Carry Flag and AL to 1.
10H: Read Keyboard Character
This standard keyboard operation checks the keyboard buffer for an entered character. If
none is present, it waits for the user to press a key. If a character is present, the operation
delivers it to AL and its scan code to AH. If the pressed key is an extended function such as
Home or F1, the character in AL is 00H. On the enhanced keyboard, F11 and F12 also re-
turn 00H to AL, but the other newer (duplicate) control keys, such as Home and PageUp,
return E0H. Here are the three possibilities:
Key Pressed AH AL

31
Regular ASCII character Scan code ASCII character
Extended function key Scan code 00H
Extended duplicate control key Scan code EOH

The program can test AL for 00H or E0H to determine whether an extended function key
was pressed:
MOV AH,10H ;Request BIOS keyboard input
INT 16H ;Call interrupt service
CMP AL,00H ;Extended function key?
JE exit ; yes, exit
CMP AL,0E0H ;Extended function key?
JE exit ; yes, exit
Because the operation does not echo the character on the screen, the program has to
request a screen display operation for that purpose.
11H: Determine Whether Character is Present
If an entered character is present in the keyboard buffer, the operation clears the Zero
Flag and delivers the character to AL and its scan code to AH; the entered character
remains in the buffer. If no character is present, the operation sets the Zero Flag and does
not wait. Note that the operation provides a look-ahead feature because the character
remains in the keyboard buffer until function 10H reads it.
12H: Return Keyboard Shift Status
This operation delivers the keyboard status byte from BIOS Data Area 1 at location
40:17H to AL and the byte from 40:18H to AH. The following example tests AL to
determine whether the Left Shift (bit 1) or Right Shift (bit 0) keys are pressed:
MOV AH.12H ;Request shift status
INT 16H ;Call interrupt service
AND AL,00000011B ;Left or right shift pressed?
JZ exit ; yes ...
For the status byte in AH, 1-bits mean the following:
Bit Key Bit Key
7 SysReq pressed 3 Right Alt pressed
6 Caps Lock pressed 2 Right Ctrl pressed
5 Num Lock pressed 1 Left Alt pressed
4 Scroll Lock pressed 0 Left Ctrl pressed

32

You might also like