Week4 Slides
Week4 Slides
Instruc(ons
• Instruc/ons are stored in main
memory.
• Program Counter (PC) points to the 12
next instruc/on.
Lecture 1: EVOLUTION OF COMPUTER SYSTEM – MIPS instruc/ons are 4 bytes (32 bits)
long.
8
Lecture 17: 1:
Lecture DESIGN OF CONTROL
EVOLUTION UNITSYSTEM
OF COMPUTER (PART 1) – All instruc/ons starts from an address 4
that is mul/ple of 4 (last 2 bits 00). instruc/on word
DR. KAMALIKA DATTA
DR. KAMALIKA DATTA – Normally, PC is incremented by 4 to
instruc/on word
DR. KAMALIKA DATTA point to the next instruc/on. 0
DEPARTMENT
DEPARTMENTOF
OFCOMPUTER
COMPUTER SCIENCE ANDENGINEERING,
SCIENCE AND ENGINEERING, NIT
NIT MEGHALAYA
MEGHALAYA
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT MEGHALAYA
2 2 2
3 2 4 2
5 2 6 2
1
12/08/17
7 2 8 2
9 2 10 2
11 2 12 2
2
12/08/17
Kinds of Opera(ons
• The instruc/on decoder and control unit is responsible for performing the • Transfer of data from one register to another.
ac/ons specified by the instruc/on loaded into IR. MOVE R1, R2
• The decoder generates all the control signals in the proper sequence • Perform arithme/c or logic opera/on on data loaded into registers.
required to execute the instruc/on specified by the IR.
ADD R1, R2
• The registers, the ALU and the interconnec/ng bus are collec/vely referred
to as the datapath. • Fetch the content of a memory loca/on and load it into a register.
MOVE R1, LOCA
• Store a word of data from a register into a given memory loca/on.
MOVE LOCA, R1
13 2 14 2
PC
• A typical 3-bus architecture for the processor datapath is
Register File
shown in the next slide. 4 M
U
– The 3-bus organiza/on is internal to the CPU. X A
L
U
– Three buses allow three parallel data transfer opera/ons to be carried
15 2 16 2
END OF LECTURE
Lecture 1: EVOLUTION OF COMPUTER SYSTEM
Lecture 18:1:DESIGN
Lecture OF CONTROL
EVOLUTION UNITSYSTEM
OF COMPUTER (PART 2)
DR. KAMALIKA DATTA
DR. KAMALIKA DATTA
DEPARTMENT DR. KAMALIKA DATTA
DEPARTMENTOF
OFCOMPUTER
COMPUTER SCIENCE ANDENGINEERING,
SCIENCE AND ENGINEERING, NIT
NIT MEGHALAYA
MEGHALAYA
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT MEGHALAYA
17 2 2
3
12/08/17
Organiza(on of a Register
Riin Riin
• A register is used for temporary storage of
data (parallel-in, parallel-out, etc.).
• A register Ri typically has two control signals. • When (Riin = 1), the data available
– Riin : used to load the register with data from Register Ri on bus is loaded into Ri. Register Ri
the bus.
– Riout : used to place the data stored in the
register on the bus.
• Input and output lines of the register Ri are Riout Riout
connected to the bus via controlled switches.
19 2 20 2
R1in
Register Transfer
Riin MOVE R1, R2 // R1 ß R2 Register R1
– Enable the output of R2 by sehng R2out = 1.
• When (Riout = 1), the data from R1out
– Enable the input of register R1 by sehng R1in = 1.
register Ri are placed on the bus. Register Ri – All opera/ons are performed in synchronism with the R2in
processor clock.
• The control signals are asserted at the start of the
clock cycle. Register R2
Riout • Aier data transfer the control signals will return to 0.
– We write as T1: R2out , R1in
Time Step Control Signals R2out
21 2 22 2
Riin
ALU Opera(on Fetching a Word from Memory
Ri
ADD R1, R2 // R1=R1 + R2 Riout
• The steps involved to fetch a word from memory:
– The processor specifies the address of the memory loca/on where the
Yin
– Bring the two operands (R1 and R2) to the two data or instruc/on is stored.
inputs of the ALU.
Y – The processor requests a read opera/on.
One through Y (R1) and another (R2) directly from 4
internal bus. S MUX – The informa/on to be fetched can either be an instruc/on or an
– Result is stored in Z and finally transferred to R1. operand of the instruc/on.
ALU
T1: R1out , Yin
Zin
– The data read is brought from the memory to MDR.
T2: R2out , SelectY, ADD, Zin – Then it can be transferred to the required register or ALU for further
Z
T3: Zout , R1in opera/on.
Zout
23 2 24 2
4
12/08/17
Storing a Word into Memory Connec(ng MDR to Memory Bus and Internal Bus
Memory bus MDRinE MDRin
• The steps involved to store a word into the memory:
– The processor specifies the address of the memory loca/on where the
data is to be wrijen.
– The data to be wrijen in loaded into MDR. MFC Memory MDR Internal
– The processor requests a write opera/on. processor bus
25 2 26 2
• When the processor sends a read request, it has to wait un/l the data is
• Memory read/write opera/on: read from the memory and wrijen into MDR.
– The address of memory loca/on is transferred to MAR. • To accommodate the variability in response /me, the process has to wait
– At the same /me a read/write control signal is provided to indicate un/l it receives an indica/on from the memory that the read opera/on has
the opera/on. been completed.
– For read the data from memory data bus comes to MDR by ac/va/ng • A control signal called Memory FuncKon Complete (MFC) is used for this
MDRinE. purpose.
– For write the data from MDR goes to memory data bus by ac/va/ng – When this signal is 1, indicates that the content of the specified loca/on is read
the signal MDRoutE. and are available on the data line of the memory bus.
– Then the data can be made available to MDR.
27 2 28 2
29 2 30 2
5
12/08/17
31 2 32 2
PC
Register File
M
4 U
X
A
A
L R END OF LECTURE 18
B U
Instruc/on Decoder
IR Data Memory
MDR
Address
MAR
33 2 34 2
Introduc(on
• We select a set of 12 instruc/ons.
• Discuss the control signals required to execute these
instruc/ons on the single-bus processor architecture.
Lecture 1: EVOLUTION OF COMPUTER SYSTEM
Lecture 19:1:DESIGN
Lecture OF CONTROL
EVOLUTION UNITSYSTEM
OF COMPUTER (PART 3)
DR. KAMALIKA DATTA
DR. KAMALIKA DATTA
DEPARTMENT DR. KAMALIKA DATTA
DEPARTMENTOF
OFCOMPUTER
COMPUTER SCIENCE ANDENGINEERING,
SCIENCE AND ENGINEERING, NIT
NIT MEGHALAYA
MEGHALAYA
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT MEGHALAYA
2 36 2
6
12/08/17
Various instruc(ons: Control sequence 1. ADD R1, R2 (R1 = R1+R2)
1. ADD R1, R2 // R1 = R1+R2
2. ADD R1, LOCA // R1 = R1 + Mem[LOCA] Steps Ac(on
3. LOAD R1, LOCA // R1 = Mem[LOCA]
4. STORE LOCA, R1 // Mem[LOCA] = R1 1 PCout, MARin, Read, Select4, Add, Zin
5. MOVE R1, R2 // R1 = R2 2 Zout, PCin, Yin, WMFC
6. MOVE R1, #10 // R1 = 10
3 MDRout, IRin
7. BR LOCA // PC = LOCA
8. BZ LOCA // PC = LOCA if Zero flag is set 4 R1out, Yin
9. INC R1 // R1 = R1 + 4 5 R2out, SelectY, Add, Zin
10. DEC R1 // R1 = R1 – 4
6 Zout, R1in, End
11. CMP R1, R2 // R1 – R2
12. HALT // Machine Halt
37 2 38 2
2. ADD R1, LOCA (R1 = R1 + Mem[LOCA]) 3. LOAD R1, LOCA (R1 = Mem[LOCA])
Steps Ac(on Steps Ac(on
1 PCout, MARin, Read, Select4, Add, Zin 1 PCout, MARin, Read, Select4, Add, Zin
2 Zout, PCin, Yin, WMFC 2 Zout, PCin, Yin, WMFC
3 MDRout, IRin 3 MDRout, IRin
4 Address field of IRout, MARin, Read 4 Address field of IRout, MARin, Read
5 R1out, Yin, WMFC 5 WMFC
6 MDRout, SelectY, Add, Zin 6 MDRout, R1in, END
7 Zout, R1in, End
39 2 40 2
4. STORE LOCA, R1 (Mem[LOCA] = R1) 5. MOVE R1, R2 (R1 = R2)
Steps Ac(on
Steps Ac(on
1 PCout, MARin, Read, Select4, Add, Zin
1 PCout, MARin, Read, Select4, Add, Zin
2 Zout, PCin, Yin, WMFC
2 Zout, PCin, Yin, WMFC
3 MDRout, IRin
3 MDRout, IRin
4 Address field of IRout, MARin
4 R2out, R1in, END
5 R1out, MDRin, Write
6 MDRoutE, WMFC, End
41 2 42 2
7
12/08/17
6. MOVE R1, #10 (R1 = 10) 7. BRANCH Label (PC = PC + offset)
Step Ac(on Step Ac(on
1 PCout, MARin, Read, Select4, Add, Zin 1 out
PC , MAR , Read, Select4, Add, Z
in in
43 2 44 2
8. BZ Label (if Z=1 PC = PC + offset) 9. INC R1 (R1 = R1 + 4)
Steps Ac(on
Step Ac(on
1
PC , MAR , Read, Select4, Add, Z
out in in
1 PCout, MARin, Read, Select4, Add, Zin
2 Zout, PCin, Yin, WMFC
2 Zout, PCin, Yin, WMFC
3 MDRout, IRin
3 MDRout, IRin
4 R1out, Select4, Add, Zin
4 Offset-field-of-IRout, SelectY, Add, Zin, If Z=0 then End
5 Zout, R1in, End
5 Zout, PCin, End
45 2 46 2
10. DEC R1 (R1 = R1 – 4) 11. CMP R1, R2
Steps Ac(on Steps Ac(on
1 PCout, MARin, Read, Select4, Add, Zin 1 PCout, MARin, Read, Select4, Add, Zin
2 Zout, PCin, Yin, WMFC 2 Zout, PCin, Yin, WMFC
3 MDRout, IRin 3 MDRout, IRin
4 R1out, Select4, SUB, Zin 4 R1out, Yin
5 Zout, R1in, End 5 R2out, SelectY, Sub, Zin, End
47 2 48 2
8
12/08/17
12. HALT
Steps Ac(on
1 PCout, MARin, Read, Select4, Add, Zin END OF LECTURE
2 Zout, PCin, Yin, WMFC
3 MDRout, IRin
4 End
49 2 50 2
Introduc/on
• To execute an instruc/on, the processor must generate control signals for the
data path in proper sequence.
– Example: ADD R1, R2
a) R1out, Yin, SelectY
Lecture 1: EVOLUTION OF COMPUTER SYSTEM b) R2out, ADD, Zin
Lecture 20: DESIGN
Lecture OF CONTROL
1: EVOLUTION UNITSYSTEM
OF COMPUTER (PART 4) c) Zout, R1in
• Two alternate approaches:
DR. KAMALIKA DATTA
DR. KAMALIKA DATTA 1. Hardwired control unit design
DR. KAMALIKA DATTA
DEPARTMENT OF
DEPARTMENT OFCOMPUTER
COMPUTERSCIENCE AND ENGINEERING,
SCIENCE AND ENGINEERING,NIT
NIT MEGHALAYA
MEGHALAYA
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT MEGHALAYA
2. Microprogrammed control unit design
51 2 52 2
Hardwired Control unit Sequence of control signals for ADD R1, LOCA
CLK RESET
Clock Control step
counter
.. Steps Ac(on
Step decoder 1 PCout, MARin, Read, Select4, Add, Zin
T T
1 2
.. T
n
2 Zout, PCin, Yin, WMFC
INS1
3 MDRout, IRin
External
INS2 : inputs 4 Address field of IRout, MARin, Read
Instruc/on
IR : Encoder
Decoder
Condition
5 R1out, Yin, WMFC
: codes
INSm 6 MDRout, SelectY, Add, Zin
Run .. End 7 Zout, R1in, End
Control signals
53 2 54 2
9
12/08/17
Hardwired Control Unit Design • The encoder/decoder circuit is a combina/onal circuit which generates
• Assump/on: control signals depending on the inputs provided.
– Each step in this sequence is completed in one clock cycle. • The step decoder generates separate signal line for each step in the control
sequence (T1, T2, T3, etc.).
• A counter is used to keep track of the /me step. – Depending on maximum steps required for an instruc/on, the step decoder is
• The control signals are determined by the following designed.
informa/on: – If a maximum of 10 steps are required, then a 4 x 16 step decoder is used.
– Content of control step counter • Among the total set of instruc/ons, the instruc/on decoder is used to select
– Content of instruc/on register one of them. (That par/cular line will be 1 and rest will be 0).
– Content of condi/onal code flags – If a maximum of 100 instruc/ons are present in the ISA then a 7 x 128 instruc/on
decoder is used.
– External input signals such as MFC (Memory Func/on Complete)
2 2
55 56
57 2 58 2
59 2 60 2
10
12/08/17
MDRout
WMFC
Select
MARin
…
Read
…
R1out
R2out
PCout
corresponding micro rou/ne from CS. Star/ng instr.
Add
End
R1in
PCin
Zout
IRin
Yin
Zin
• The μPC is used to read CWs sequen/ally from IR Address
CS. Generator 1 0 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0
• Every /me a new instruc/on is loaded into IR, 2 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0
output of Star/ng Address Generator is loaded 3 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
Clock μPC
into μPC.
4 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0
• Then, μPC is automa/cally incremented by clock
causing successive microinstruc/ons to be read 5 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0
CW
from CS.
Control Store
6 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0
61 2 62 2
Control Store for “BRANCH LOCN” Horizontal versus Ver/cal Microinstruc/on Encoding
Micro-
MDRout
WMFC
• Broadly there are two alternate schemes to code the control signals in the control
Select
MARin
Read
PCout
instr.
IRout
Add
memory.
End
PCin
Zout
IRin
Yin
… …
Zin
63 2 64 2
65 2 66 2
11
12/08/17
• Advantage: • Again consider that there are k control signals: c1, c2, …, ck.
– Unlimited parallelism is possible in the ac/va/on of the micro-opera/ons. • We encode the control signals in a m-bit word in the m-bit word
• Disadvantage: control memory, where k ≤ 2m.
– Size of the control memory is large (word size is much longer). • Depending on the m-bit control word, exactly one control
DECODER
signal will be ac/vated (= 1), while all others will remain
– Cost of implementa/on is higher.
de-ac/vated (= 0). …
• At most one control signal can be ac/vated in a /me c1 c2 c3 c4 ck
step.
67 2 68 2
• Disadvantage: … … …
– More than one control signals cannot be ac/vated at a /me. c1,1 c1,k1 c2,1 c2,k2 cm,1 cm,ks
– Requires sequen/al ac/va/on of the control signals, and hence more number • Suppose we group the set of k control signals into s groups, containing k1, k2, …, ks signals.
of /me steps. • We encode the control signals in groups as shown, where ki ≤ 2mi.
• Within a group, at most one control signal can be ac/vated in a /me step.
• Parallelism across groups is allowed.
69 2 70 2
Example 1
• Suppose there are 100 control signals in a processor data path.
• Advantages:
– Maximum parallelism as required by the micro-programs can be supported. a) For horizontal encoding, control word size = 100 bits.
– Word size of control memory is less than that for horizontal encoding.
b) For ver/cal encoding, control word size = log2 100 = 7 bits.
– Used in prac/ce.
• Disadvantages: c) For diagonal encoding, suppose aier analysis of the micro-programs, we divide
the control signals into 5 groups, containing 25, 15, 40, 5 and 15 control signals
– Mul/ple decoders (though smaller in sizes) are required.
respec/vely.
25 ≤ 25 15 ≤ 24
• We have: m1 = 5, m2 = 4, m3 = 6, m4 = 3, m5 = 4 40 ≤ 26 5 ≤ 23
• Control word size = 5 + 4 + 6 + 3 + 4 = 22 bits. 15 ≤ 24
71 2 72 2
12
12/08/17
END OF LECTURE 20
Lecture 1: EVOLUTION OF COMPUTER SYSTEM
Lecture 21:
Lecture 1: MIPS IMPLEMENTATION
EVOLUTION OF COMPUTER(PART
SYSTEM1)
73 2 74 2
75 2 76 2
An Assump/on
• A Naïve Approach: • An instruc/on can have up to two source operands:
– Aier fetching and decoding an instruc/on, iden/fy the exact ADD R1, R5, R10
register(s) and/or immediate operands to use, and handle them
accordingly. LW R5, 100(R6)
– The number of register fetches and immediate operand • There are 32 32-bit integer registers, R0 to R31.
processing will vary from instruc/on to instruc/on. – We design the register bank in such a way that two registers
– We do not u/lize the possible overlapping of opera/ons to make can be read simultaneously (i.e. there are 2 read ports).
instruc/on execu/on faster. – We shall later see that performance can be improved by
• Before instruc/on decoding is complete, fetch the register operands and adding a write port (i.e. 2 reads and 1 write opera/ons are
immediate data in case they are required later.
possible per cycle).
77 2 78 2
13
12/08/17
Source Register 1
Read (5 bits) • A Specula/ve Approach:
Port 1 Register Data Des/na/on Register
– Here we try to eliminate the /me required to fetch the
(32 bits) (5 bits) Write register operands and process the immediate data.
REGISTER Register Data Port – When an instruc/on is decoded, at the same /me we fetch
Source Register 1 BANK (32 bits) the register operands and also process the immediate data
Read (5 bits) (i.e. sign extend).
Port 2 Register Data • Possible because their loca/ons in the instruc/on word are fixed.
(32 bits) • If the operands are required, they are already available (no extra
/me required).
• If the operands are not required, they are ignored.
79 2 80 2
83 2 84 2
14
12/08/17
85 2 86 2
Branch instruc/on:
if (cond) PC ß ALUOut;
else PC ß NPC;
87 2 88 2
89 2 90 2
15
12/08/17
91 2 92 2
93 2 94 2
END OF LECTURE 21
Lecture 1: EVOLUTION OF COMPUTER SYSTEM
Lecture 22:
Lecture 1: MIPS IMPLEMENTATION
EVOLUTION OF COMPUTER(PART
SYSTEM2)
95 2 96 2
16
12/08/17
4
– Assume that there is no pipelining. IR ß Mem [PC];
– Also known as single-cycle implementa/on --- only aier one instruc/on is NPC ß PC + 4;
finished can the next instruc/on start. Instruc/on
PC IR
• Later on we shall extent the data path for pipelined implementa/on. Memory
• 32-bit PC
– We shall discuss various pipelining related issues and techniques for faster • 32-bit NPC
execu/on of instruc/ons. • 32-bit IR
• 32-bit adder
97 2 98 2
M From IMM X
Extend M • 32-bit A
(shii 2 places) Branch:
• 32-bit B • 32-bit ALUOut
From WB func ALUOut ß NPC + (Imm << 2);
• 32-bit IMM • 1-bit cond
• 32-bit 2x1 MUX cond ß (A op 0);
99 2 100 2
101 2 102 2
17
12/08/17
4 U
X set, the design of the control unit becomes very easy.
rs M
U A
A X L • Control signals in the data path:
rt A
PC
Instruc/on
IR
Register U a) LoadPC i) LoadIMM q) LoadLMD
Memory rd Bank L O
Data L b) LoadNPC j) MuxALU1 r) MuxWB
U u M M
B M t Memory D c) ReadIM k) MuxALU2 s) WriteReg
U U
103 2 104 2
105 2 106 2
END OF LECTURE 22
107 2
18