Lecture 6_merged
Lecture 6_merged
William Stallings
Computer Organization
and Architecture
10th Edition
© 2016 Pearson Education, Inc., Hoboken,
NJ. All rights reserved.
+ Chapter 12
Instruction Sets:
Characteristics and Functions
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Machine Instruction
Characteristics
Multiple Multiple
operands results
16 bits
Examples include:
ADD Add
SUB Subtract
MUL Multiply
DIV Divide
LOAD Load data from memory
STOR Store data to memory
Data Data
processing storage
Data
Control
movement
• Test instructions are used to test the • I/O instructions are needed
value of a data word or the status of a to transfer programs and
computation data into memory and the
• Branch instructions are used to branch results of computations
to a different set of instructions back out to the user
depending on the decision made
Instruction Comment
(a) Three-address instructions
LOAD D AC ¬ D
MPY E AC ¬ AC ´ E
Instruction Comment ADD C AC ¬ AC + C
MOVE Y, A Y¬A STOR Y Y ¬ AC
SUB Y, B Y¬Y–B LOAD A AC ¬ A
MOVE T, D T¬D SUB B AC ¬ AC – B
MPY T, E T¬T´E DIV Y AC ¬ AC ÷ Y
ADD T, C T¬T+C STOR Y Y ¬ AC
DIV Y, T Y¬Y÷T
A- B
Figure 12.3 Programs to Execute Y=
C+ (D´ E)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 12.1
Utilization of Instruction Addresses
(Nonbranching Instructions)
AC = accumulator
T = top of stack
(T – 1) = second element of stack
A, B, C = memory or register locations
Packed decimal
Each decimal digit is represented by a 4-bit code with two digits
stored per byte
To form numbers 4-bit codes are strung together, usually in multiples
of 8 bits
x86
Near pointer A 16-bit, 32-bit, or 64-bit effective address that represents the
offset within a segment. Used for all pointers in a nonsegmented
memory and for references within a segment in a segmented
Data Types
memory.
Far pointer A logical address consisting of a 16-bit segment selector and an
offset of 16, 32, or 64 bi ts. Far pointers are used for memory
references in a segmented memory model where the identity of a
segment being accessed must be specified explicitly.
Bit field A contiguous sequence of bits in which the position of each bit is
considered as an independent unit. A bit string can begin at any bit
position of any byte and can contain up to 32 bits.
Bit string A contiguous sequence of bits, containing from zero to 232 – 1
bits.
Byte string A contiguous sequence of bytes, words, or doublewords,
containing from zero to 232 – 1 bytes.
Floating point See Figure 12.4.
Packed SIMD (single Packed 64-bit and 128-bit data types
instruction, multiple data)
Data types:
Packed byte and packed byte integer
Packed word and packed word integer
Packed doubleword and packed doubleword integer
Packed quadword and packed quadword integer
Packed single-precision floating-point and packed double-precision
floating-point
Byte 2
Byte 1
Byte 0
31 0 31 0
Byte 3 Byte 2 Byte 1 Byte 0 Byte 0 Byte 1 Byte 2 Byte 3
Table 12.3
Set Transfer word of 1s to destination
Push Transfer word from source to top of stack
Pop Transfer word from top of stack to destination
Add Compute sum of two operands
Common
Subtract Compute difference of two operands
Multiply Compute product of two operands
Arithmetic
Divide
Absolute
Compute quotient of two operands
Replace operand by its absolute value
Instruction Set
Negate
Increment
Change sign of operand
Add 1 to operand
Operations
Decrement
AND
Subtract 1 from operand
Perform logical AND
(page 1 of 2)
OR Perform logical OR
NOT (complement) Perform logical NOT
Exclusive-OR Perform logical XOR
Test Test specified condition; set flag(s) based on outcome
Logical
Compare Make logical or arithmetic comparison of two or more
operands; set flag(s) based on outcome
Set Control Class of instructions to set controls for protection
Variables purposes, interrupt handling, timer control, etc.
Shift Left (right) shift operand, introducing constants at end (Table can be found on page
Rotate Left (right) shift operand, with wraparound end 426 in textbook.)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Type Operation Name Description
Jump (branch) Unconditional transfer; load PC with specified address
Jump Conditional Test specified condition; either load PC with specified
address or do nothing, based on condition
Jump to Subroutine Place current program control information in known
location; jump to specified address
Table 12.3
Return Replace contents of PC and other register from known
location
Execute Fetch operand from specified location and execute as
Transfer of Control instruction; do not modify PC
Skip Increment PC to skip next instruction
Skip Conditional Test specified condition; either skip or do nothing based
on condition Common
Halt
Wait (hold)
Stop program execution
Stop program execution; test specified condition
Instruction Set
No operation
repeatedly; resume execution when condition is satisfied
No operation is performed, but program execution is Operations
(page 2 of 2)
continued
Input (read) Transfer data from specified I/O port or device to
destination (e.g., main memory or processor register)
Output (write) Transfer data from specified source to I/O port or device
Input/Output Start I/O Transfer instructions to I/O processor to initiate I/O
operation
Test I/O Transfer status information from I/O system to specified
destination
Translate Translate values in a section of memory based on a table
of correspondences
Conversion
Convert Convert the contents of a word from one form to another (Table can be found on page
(e.g., packed decimal to binary) 426 in textbook.)
Must specify:
• Location of the source and
destination operands
Most fundamental type of • The length of data to be
machine instruction transferred must be indicated
• The mode of addressing for each
operand must be specified
An example of a
more complex
editing
instruction is the
An example EAS/390
is converting Translate (TR)
from instruction
decimal to
binary
200
201
202 SUB X, Y
203 BRZ 211
Unconditional
branch Conditional
branch
210 BR 202
211
Conditional
branch
235
Main
4100 CALL Proc1
4101 Program
4500
RETURN
4800
Procedure
Proc2
RETURN
(a) Initial stack (b) After (c) Initial (d) After (e) After (f) After (g) After
contents CALL Proc1 CALL Proc2 RETURN CALL Proc2 RETURN RETURN
x1 x1
The intent was to provide tools for the compiler writer to produce
optimized machine language translation of high-level language
programs
Table
equal, unsigned); Not carry
B, NAE, C CF=1 Below; Not above or equal (less than,
12.9
unsigned); Carry set
BE, NA CF=1 OR ZF=1 Below or equal; Not above (less than or
equal, unsigned)
E, Z ZF=1 Equal; Zero (signed or unsigned)
G, NLE [(SF=1 AND OF=1) OR (SF=0 Greater than; Not less than or equal (signed) x86
and OF=0)] AND [ZF=0]
Condition
GE, NL (SF=1 AND OF=1) OR (SF=0 Greater than or equal; Not less than (signed)
AND OF=0) Codes
L, NGE (SF=1 AND OF=0) OR (SF=0
AND OF=1)
Less than; Not greater than or equal (signed) for
LE, NG (SF=1 AND OF=0) OR (SF=0 Less than or equal; Not greater than (signed) Conditional
NE, NZ
AND OF=1) OR (ZF=1)
ZF=0 Not equal; Not zero (signed or unsigned)
Jump
NO OF=0 No overflow and
NS SF=0 Not sign (not negative) SETcc
NP, PO PF=0 Not parity; Parity odd Instructions
O OF=1 Overflow
P PF=1 Parity; Parity even
(Table can be found on page
S SF=1 Sign (negative) 440 in the textbook.)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Category Instruction Description
PADD [B, W, D] Parallel add of packed eight bytes, four 16-bit words, or two 32-bit
doublewords, with wraparound.
PADDS [B, W] Add with saturation.
PADDUS [B, W] Add unsigned with saturation
PSUB [B, W, D] Subtract with wraparound.
PSUBS [B, W] Subtract with saturation.
Arithmetic PSUBUS [B, W] Subtract unsigned with saturation
PMULHW Parallel multiply of four signed 16-bit words, with high-order 16
PMULLW
bits of 32-bit result chosen.
Parallel multiply of four signed 16-bit words, with low-order 16 bits
of 32-bit result chosen.
Table 12.10
PMADDWD Parallel multiply of four signed 16-bit words; add together adjacent
pairs of 32-bit results.
PCMPEQ [B, W, D] Parallel compare for equality; result is mask of 1s if true or 0s if
Comparison
PCMPGT [B, W, D]
false.
Parallel compare for greater than; result is mask of 1s if true or 0s if
false.
MMX
PACKUSWB
PACKSS [WB, DW]
Pack words into bytes with unsigned saturation.
Pack words into bytes, or doublewords into words, with signed Instruction Set
saturation.
Conversion PUNPCKH [BW, WD, Parallel unpack (interleaved merge) high-order bytes, words, or
DQ] doublewords from MMX register.
PUNPCKL [BW, WD, Parallel unpack (interleaved merge) low-order bytes, words, or
DQ] doublewords from MMX register.
PAND 64-bit bitwise logical AND
Logical PNDN 64-bit bitwise logical AND NOT
POR 64-bit bitwise logical OR
PXOR 64-bit bitwise logical XOR
PSLL [W, D, Q] Parallel logical left shift of packed words, doublewords, or
quadword by amount specified in MMX register or immediate
value.
Shift PSRL [W, D, Q] Parallel logical right shift of packed words, doublewords, or
quadword.
PSRA [W, D] Parallel arithmetic right shift of packed words, doublewords, or
quadword. (Table can be found on page
Data Transfer MOV [D, Q] Move doubleword or quadword to/from MMX register.
442 in the textbook.)
State Mgt EMMS Empty MMX state (empty FP registers tag bits).
Note: If an instruction supports multiple data types [byte (B), word (W), doubleword (D), quadword
(Q)], the data types are indicated in brackets.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
x86 Single-Instruction, Multiple-
Data (SIMD) Instructions
1996 Intel introduced MMX technology into its Pentium
product line
MMX is a set of highly optimized instructions for multimedia tasks
Parallel addition
Multiply Extend
and subtraction
instructions instructions
instructions
Status register
access
instructions
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Code Symbol Condition Tested Comment
0000 EQ Z=1 Equal
0001 NE Z=0 Not equal
0010 CS/HS C=1 Carry set/unsigned higher or same
0011 CC/LO C=0 Carry clear/unsigned lower
0100 MI N=1 Minus/negative Table 12.11
0101 PL N=0 Plus/positive or zero
0110
0111
VS
VC
V=1
V=0
Overflow
No overflow
ARM
1000 HI C = 1 AND Z = 0 Unsigned higher Conditions
1001 LS C = 0 OR Z = 1 Unsigned lower or same for
1010 GE N=V Signed greater than or equal Conditional
[(N = 1 AND V = 1)
OR (N = 0 AND V = 0) Instruction
1011 LT N≠V Signed less than Execution
[(N = 1 AND V = 0)
OR (N = 0 AND V = 1)]
1100 GT (Z = 0) AND (N = V) Signed greater than
1101 LE (Z = 1) OR (N ≠ V) Signed less than or equal
1110 AL — Always (unconditional)
(Table can be found on
1111 — — This instruction can only be executed
Page 445 in the textbook.)
unconditionally
William Stallings
Computer Organization
and Architecture
10th Edition
© 2016 Pearson Education, Inc., Hoboken,
NJ. All rights reserved.
+ Chapter 13
Instruction Sets: Addressing
Modes and Formats
Direct
Indirect
Register
Register indirect
Displacement
Stack
Operand
Operand
Operand
Operand Operand
Registers Registers Registers
(d) Register (e) Register Indirect (f) Displacement
Instruction
Implicit
Top of Stack
Register
(g) Stack
Operand = A
This mode can be used to define and use constants or set initial
values of variables
Typically the number will be stored in twos complement form
The leftmost bit of the operand field is used as a sign bit
Advantage:
No memory reference other than the instruction fetch is required to
obtain the operand, thus saving one memory or cache cycle in the
instruction cycle
Disadvantage:
The size of the number is restricted to the size of the address field, which,
in most instruction sets, is small compared with the word length
Address field
contains the
effective address of
the operand
Effective address
(EA) = address field
(A)
Was common in
earlier generations
of computers
Limitation is that it
provides only a
limited address
space
EA = (A)
Parentheses are to be interpreted as meaning contents of
Advantage:
For a word length of N an address space of 2N is now available
Disadvantage:
Instruction execution requires two memory references to fetch the operand
One to get its address and a second to get its value
Address field
refers to a
register rather EA = R
than a main
memory address
Advantages: Disadvantage:
• Only a small • The address space
address field is is very limited
needed in the
instruction
• No time-consuming
memory references
are required
EA = (R)
EA = A + (R)
Requires that the instruction have two address fields, at least one
of which is explicit
The value contained in one address field (value = A) is used directly
The other address field refers to a register whose contents are added
to A to produce the effective address
Autoindexing
Automatically increment or decrement the index register after each reference to it
EA = A + (R)
(R) (R) + 1
Postindexing
Indexing is performed after the indirection
EA = (A) + (R)
Preindexing
Indexing is performed before the indirection
EA = (A + (R))
Associated with the stack is a pointer whose value is the address of the top of
the stack
The stack pointer is maintained in a register
Thus references to stack locations in memory are in fact register indirect addresses
Offset
0xC 0x20C 0x5
r0 Destination
0x5 register
r1 for STR
Original
base register
0x200 0x200
(a) Offset
r1 Offset
Updated
base register 0x20C 0xC 0x20C 0x5
r0 Destination
0x5 register
r1 for STR
Original
base register
0x200 0x200
(b) Preindex
r1 Offset
Updated
base register 0x20C 0xC 0x20C
r0 Destination
0x5 register
r1 for STR
Original
base register
0x200 0x200 0x5
(c) Postindex
Branch instructions
The only form of addressing for branch instructions is immediate
Instruction contains 24 bit value
Shifted 2 bits left so that the address is on a word boundary
Effective range ± 32MB from from the program counter
Must include
Define the
an opcode For most
layout of the
and, implicitly instruction
bits of an
or explicitly, sets more than
instruction, in
indicate the one
terms of its
addressing instruction
constituent
mode for each format is used
fields
operand
Number of Register
Number of
addressing versus
operands
modes memory
Input/Output Instructions
1 1 0 Device Opcode
0 2 3 8 9 11
Group 2 Microinstructions
1 1 1 1 CLA SMA SZA SNL RSS OSR HLT 0
0 1 2 3 4 5 6 7 8 9 10 11
Group 3 Microinstructions
1 1 1 1 CLA MQA 0 MQL 0 0 0 1
0 1 2 3 4 5 6 7 8 9 10 11
I = indirect bit
7 Opcode R 8 Opcode
13 3 16
8 bits
0 5 Opcode for RSB RSB
Return from subroutine
0 or 1 0 or 1
0, 1, 2, 3, or 4 bytes 1, 2, or 3 bytes bytes bytes 0, 1, 2, or 4 bytes 0, 1, 2, or 4 bytes
Delivers overall code density comparable with Thumb, together with the
performance levels associated with the ARM ISA
Before Thumb-2 developers had to choose between Thumb for size and
ARM for performance