Co Notes Module 3
Co Notes Module 3
FUNCTIONAL UNITS
• A computer consists of 5 functionally independent main parts:
1) Input
2) Memory
3) ALU
4) Output &
5) Control units.
BUS STRUCTURE
• A bus is a group of lines that serves as a connecting path for several devices.
• A bus may be lines or wires.
• The lines carry data or address or control signal.
• There are 2 types of Bus structures: 1) Single Bus Structure and 2) Multiple Bus Structure.
1) Single Bus Structure
Because the bus can be used for only one transfer at a time, only 2 units can actively use the
bus at any given time.
Bus control lines are used to arbitrate multiple requests for use of the bus.
Advantages:
1) Low cost &
2) Flexibility for attaching peripheral devices.
2) Multiple Bus Structure
Systems that contain multiple buses achieve more concurrency in operations.
Two or more transfers can be carried out at the same time.
Advantage: Better performance.
Disadvantage: Increased cost.
PERFORMANCE
• The most important measure of performance of a computer is how quickly it can execute programs.
• The speed of a computer is affected by the design of
1) Instruction-set.
2) Hardware & the technology in which the hardware is implemented.
3) Software including the operating system.
• Because programs are usually written in a HLL, performance is also affected by the compiler that
translates programs into machine language. (HLL High Level Language).
• For best performance, it is necessary to design the compiler, machine instruction set and hardware in
a co-ordinated way.
• Let us examine the flow of program instructions and data between the memory & the processor.
• At the start of execution, all program instructions are stored in the main-memory.
• As execution proceeds, instructions are fetched into the processor, and a copy is placed in the cache.
• Later, if the same instruction is needed a second time, it is read directly from the cache.
• A program will be executed faster
if movement of instruction/data between the main-memory and the processor is minimized
which is achieved by using the cache.
PROCESSOR CLOCK
• Processor circuits are controlled by a timing signal called a Clock.
• The clock defines regular time intervals called Clock Cycles.
• To execute a machine instruction, the processor divides the action to be performed into a sequence
of basic steps such that each step can be completed in one clock cycle.
• Let P = Length of one clock cycle
R = Clock rate.
• Relation between P and R is given by
(1)
• Equ1 is referred to as the basic performance equation.
• To achieve high performance, the computer designer must reduce the value of T, which means
reducing N and S, and increasing R.
The value of N is reduced if source program is compiled into fewer machine instructions.
The value of S is reduced if instructions have a smaller number of basic steps to perform.
The value of R can be increased by using a higher frequency clock.
• Care has to be taken while modifying values since changes in one parameter may affect the other.
DIGITAL DESIGN & COMPUTER ORGANIZATION
CLOCK RATE
• There are 2 possibilities for increasing the clock rate R:
1) Improving the IC technology makes logic-circuits faster.
This reduces the time needed to compute a basic step. (IC integrated circuits).
This allows the clock period P to be reduced and the clock rate R to be increased.
2) Reducing the amount of processing done in one basic step also reduces the clock period P.
• In presence of a cache, the percentage of accesses to the main-memory is small.
Hence, much of performance-gain expected from the use of faster technology can be realized.
The value of T will be reduced by same factor as R is increased „.‟ S & N are not affected.
PERFORMANCE MEASUREMENT
• Benchmark refers to standard task used to measure how well a processor operates.
• The Performance Measure is the time taken by a computer to execute a given benchmark.
• SPEC selects & publishes the standard programs along with their test results for different application
domains. (SPEC System Performance Evaluation Corporation).
• SPEC Rating is given by
Problem 1:
List the steps needed to execute the machine instruction:
Load R2, LOC
in terms of transfers between the components of processor and some simple control commands.
Assume that the address of the memory-location containing this instruction is initially in register PC.
Solution:
1. Transfer the contents of register PC to register MAR.
2. Issue a Read command to memory.
And, then wait until it has transferred the requested word into register MDR.
3. Transfer the instruction from MDR into IR and decode it.
4. Transfer the address LOCA from IR to MAR.
5. Issue a Read command and wait until MDR is loaded.
6. Transfer contents of MDR to the ALU.
7. Transfer contents of R0 to the ALU.
8. Perform addition of the two operands in the ALU and transfer result into R0.
9. Transfer contents of PC to ALU.
10. Add 1 to operand in ALU and transfer incremented address to PC.
DIGITAL DESIGN & COMPUTER ORGANIZATION
Problem 2:
List the steps needed to execute the machine instruction:
Add R4, R2, R3
in terms of transfers between the components of processor and some simple control commands.
Assume that the address of the memory-location containing this instruction is initially in register PC.
Solution:
1. Transfer the contents of register PC to register MAR.
2. Issue a Read command to memory.
And, then wait until it has transferred the requested word into register MDR.
3. Transfer the instruction from MDR into IR and decode it.
4. Transfer contents of R1 and R2 to the ALU.
5. Perform addition of two operands in the ALU and transfer answer into R3.
6. Transfer contents of PC to ALU.
7. Add 1 to operand in ALU and transfer incremented address to PC.
Problem 3:
(a) Give a short sequence of machine instructions for the task “Add the contents of memory-location A
to those of location B, and place the answer in location C”. Instructions:
Load Ri, LOC
and
Store Ri, LOC
are the only instructions available to transfer data between memory and the general purpose registers.
Add instructions are described in Section 1.3. Do not change contents of either location A or B.
(b) Suppose that Move and Add instructions are available with the formats:
Move Location1, Location2
and
Add Location1, Location2
These instructions move or add a copy of the operand at the second location to the first location,
overwriting the original operand at the first location. Either or both of the operands can be in the memory
or the general-purpose registers. Is it possible to use fewer instructions of these types to accomplish the
task in part (a)? If yes, give the sequence.
Solution:
(a)
Load A, R0
Load B, R1
Add R0, R1
Store R1, C
(b) Yes;
Move B, C
Add A, C
Problem 4:
A program contains 1000 instructions. Out of that 25% instructions requires 4 clock cycles,40%
instructions requires 5 clock cycles and remaining require 3 clock cycles for execution. Find the total time
required to execute the program running in a 1 GHz machine.
Solution:
N = 1000
25% of N= 250 instructions require 4 clock cycles.
40% of N =400 instructions require 5 clock cycles.
35% of N=350 instructions require 3 clock cycles.
T = (N*S)/R= (250*4+400*5+350*3)/1X109 =(1000+2000+1050)/1*109= 4.05 μs.
DIGITAL DESIGN & COMPUTER ORGANIZATION
Problem 5:
For the following processor, obtain the performance.
Clock rate = 800 MHz
No. of instructions executed = 1000
Average no of steps needed / machine instruction = 20
Solution:
Problem 6:
(a) Program execution time T is to be examined for a certain high-level language program. The program
can be run on a RISC or a CISC computer. Both computers use pipelined instruction execution, but
pipelining in the RISC machine is more effective than in the CISC machine. Specifically, the effective
value of S in the T expression for the RISC machine is 1.2, bit it is only 1.5 for the CISC machine. Both
machines have the same clock rate R. What is the largest allowable value for N, the number of
instructions executed on the CISC machine, expressed as a percentage of the N value for the RISC
machine, if time for execution on the CISC machine is to be longer than on the RISC machine?
(b) Repeat Part (a) if the clock rate R for the RISC machine is 15 percent higher than that for the CISC
machine.
Solution:
(a) Let TR = (NR X SR)/RR & TC = (NC X SC)/RC be execution times on RISC and CISC processors.
Equating execution times and clock rates, we have
1.2NR = 1.5NC
Then
NC/NR = 1.2/1.5 = 0.8
Therefore, the largest allowable value for NC is 80% of NR.
Problem 7:
(a) Suppose that execution time for a program is proportional to instruction fetch time. Assume that
fetching an instruction from the cache takes 1 time unit, but fetching it from the main-memory takes 10
time units. Also, assume that a requested instruction is found in the cache with probability 0.96. Finally,
assume that if an instruction is not found in the cache it must first be fetched from the main- memory
into the cache and then fetched from the cache to be executed. Compute the ratio of program execution
time without the cache to program execution time with the cache. This ratio is called the speedup
resulting from the presence of the cache.
(b) If the size of the cache is doubled, assume that the probability of not finding a requested instruction
there is cut in half. Repeat part (a) for a doubled cache size.
Solution:
(a) Let cache access time be 1 and main-memory access time be 20. Every instruction that is
executed must be fetched from the cache, and an additional fetch from the main-memory must
be performed for 4% of these cache accesses.
Therefore,
(b)
DIGITAL DESIGN & COMPUTER ORGANIZATION
BYTE-ADDRESSABILITY
• In byte-addressable memory, successive addresses refer to successive byte locations in the memory.
• Byte locations have addresses 0, 1, 2. . . . .
• If the word-length is 32 bits, successive words are located at addresses 0, 4, 8. . with each word
having 4 bytes.
• Consider a 32-bit integer (in hex): 0x12345678 which consists of 4 bytes: 12, 34, 56, and 78.
Hence this integer will occupy 4 bytes in memory.
Assume, we store it at memory address starting 1000.
On little-endian, memory will look like
Address Value
1000 78
1001 56
1002 34
1003 12
WORD ALIGNMENT
• Words are said to be Aligned in memory if they begin at a byte-address that is a multiple of the
number of bytes in a word.
• For example,
If the word length is 16(2 bytes), aligned words begin at byte-addresses 0, 2, 4 . . . . .
If the word length is 64(2 bytes), aligned words begin at byte-addresses 0, 8, 16 . . . . .
• Words are said to have Unaligned Addresses, if they begin at an arbitrary byte-address.
DIGITAL DESIGN & COMPUTER ORGANIZATION
MEMORY OPERATIONS
• Two memory operations are:
1) Load (Read/Fetch) &
2) Store (Write).
• The Load operation transfers a copy of the contents of a specific memory-location to the processor.
The memory contents remain unchanged.
• Steps for Load operation:
1) Processor sends the address of the desired location to the memory.
2) Processor issues „read‟ signal to memory to fetch the data.
3) Memory reads the data stored at that address.
4) Memory sends the read data to the processor.
• The Store operation transfers the information from the register to the specified memory-location.
This will destroy the original contents of that memory-location.
• Steps for Store operation are:
1) Processor sends the address of the memory-location where it wants to store data.
2) Processor issues „write‟ signal to memory to store the data.
3) Content of register(MDR) is written into the specified memory-location.
One Address Opcode Load A Copy contents of memory- location A Load A Add B Store
Source/Destination into accumulator. C
Add B Add contents of memory- location B to
contents of accumulator register &
place sum back into
accumulator.
Program Explanation
• Consider the program for adding a list of n numbers (Figure 2.9).
• The Address of the memory-locations containing the n numbers are symbolically given as NUM1,
NUM2…..NUMn.
• Separate Add instruction is used to add each number to the contents of register R0.
• After all the numbers have been added, the result is placed in memory-location SUM.
DIGITAL DESIGN & COMPUTER ORGANIZATION
BRANCHING
• Consider the task of adding a list of „n‟ numbers (Figure 2.10).
• Number of entries in the list „n‟ is stored in memory-location N.
• Register R1 is used as a counter to determine the number of times the loop is executed.
• Content-location N is loaded into register R1 at the beginning of the program.
• The Loop is a straight line sequence of instructions executed as many times as needed.
The loop starts at location LOOP and ends at the instruction Branch>0.
• During each pass,
→ address of the next list entry is determined and
→ that entry is fetched and added to R0.
• The instruction Decrement R1 reduces the contents of R1 by 1 each time through the loop.
• Then Branch Instruction loads a new value into the program counter. As a result, the processor
fetches and executes the instruction at this new address called the Branch Target.
• A Conditional Branch Instruction causes a branch only if a specified condition is satisfied. If the
condition is not satisfied, the PC is incremented in the normal way, and the next instruction in sequential
address order is fetched and executed.
CONDITION CODES
• The processor keeps track of information about the results of various operations. This is
accomplished by recording the required information in individual bits, called Condition Code Flags.
• These flags are grouped together in a special processor-register called the condition code register (or
statue register).
• Four commonly used flags are:
1) N (negative) set to 1 if the result is negative, otherwise cleared to 0.
2) Z (zero) set to 1 if the result is 0; otherwise, cleared to 0.
3) V (overflow) set to 1 if arithmetic overflow occurs; otherwise, cleared to 0.
4) C (carry) set to 1 if a carry-out results from the operation; otherwise cleared to 0.
DIGITAL DESIGN & COMPUTER ORGANIZATION
ADDRESSING MODES
• The different ways in which the location of an operand is specified in an instruction are referred to as
Addressing Modes (Table 2.1).
• To execute the Add instruction in fig 2.11 (a), the processor uses the value which is in register R1, as
the EA of the operand.
• It requests a read operation from the memory to read the contents of location B. The value read is the
desired operand, which the processor adds to the contents of register R0.
• Indirect addressing through a memory-location is also possible as shown in fig 2.11(b). In this case,
the processor first reads the contents of memory-location A, then requests a second read operation using
the value B as an address to obtain the operand.
Program Explanation
• In above program, Register R2 is used as a pointer to the numbers in the list, and the operands are accessed
indirectly through R2.
• The initialization-section of the program loads the counter-value n from memory-location N into R1 and uses the
immediate addressing-mode to place the address value NUM1, which is the address of the first number in the list,
into R2. Then it clears R0 to 0.
• The first two instructions in the loop implement the unspecified instruction block starting at LOOP.
• The first time through the loop, the instruction Add (R2), R0 fetches the operand at location NUM1 and adds it to
R0.
• The second Add instruction adds 4 to the contents of the pointer R2, so that it will contain the address value
NUM2 when the above instruction is executed in the second pass through the loop.
DIGITAL DESIGN & COMPUTER ORGANIZATION
• Fig(a) illustrates two ways of using the Index mode. In fig(a), the index register, R1, contains the
address of a memory-location, and the value X defines an offset(also called a displacement) from this
address to the location where the operand is found.
• To find EA of operand:
Eg: Add 20(R1), R2
EA=>1000+20=1020
• An alternative use is illustrated in fig(b). Here, the constant X corresponds to a memory address, and
the contents of the index register define the offset to the operand. In either case, the effective-address
is the sum of two values; one is given explicitly in the instruction, and the other is stored in a register.
Base with Index Mode
• Another version of the Index mode uses 2 registers which can be denoted as
(Ri, Rj)
• Here, a second register may be used to contain the offset X.
• The second register is usually called the base register.
• The effective-address of the operand is given by EA=[Ri]+[Rj]
• This form of indexed addressing provides more flexibility in accessing operands because
both components of the effective-address can be changed.
Base with Index & Offset Mode
• Another version of the Index mode uses 2 registers plus a constant, which can be denoted as
X(Ri, Rj)
• The effective-address of the operand is given by EA=X+[Ri]+[Rj]
• This added flexibility is useful in accessing multiple components inside each item in a record, where
the beginning of an item is specified by the (Ri, Rj) part of the addressing-mode. In other words, this
mode implements a 3-dimensional array.
RELATIVE MODE
• This is similar to index-mode with one difference:
The effective-address is determined using the PC in place of the general purpose register Ri.
• The operation is indicated as X(PC).
• X(PC) denotes an effective-address of the operand which is X locations above or below the current
contents of PC.
• Since the addressed-location is identified "relative" to the PC, the name Relative mode is associated
with this type of addressing.
• This mode is used commonly in conditional branch instructions.
• An instruction such as
Branch > 0 LOOP ;Causes program execution to go to the branch target location
identified by name LOOP if branch condition is satisfied.
17