1.Week
1.Week
◼ Instruction sets
◼ Instruction Sets: Characteristics and Functions
◼ Instruction Sets: Addressing Modes and Formats
◼ Assembly Language and Related Topics
Course Syllabus-2
◼ The central processing unit
◼ Processor Structure and Function
◼ Reduced Instruction Set Computers (RISCs)
◼ Instruction-Level Parallelism and Superscalar Processors
◼ Control Unit Operation and Microprogrammed Control
◼ Parallel organization
◼ ParallelProcessing
◼ Multicore Computers
+
1.1 Basic Concepts and Computer
Evolution
1.1 Outline
• Organization and Architecture
• Structure and Function
• The IAS Computer
• Gates, Memory Cells, Chips, and Multichip
Modules
• The Evolution of the Intel x86 Architecture
• Embedded Systems
• ARM Architecture
Computer Architecture
Computer Organization
• Attributes of a system • Instruction set, number of
visible to the bits used to represent
programmer various data types, I/O
• Have a direct impact on mechanisms, techniques
the logical execution of a for addressing memory
program
Architectural
Computer
attributes
Architecture
include:
Organizational
Computer
attributes
Organization
include:
Main
Structure I/O
memory
System
Bus
CPU
CPU
Registers ALU
Internal
Bus
Control
Unit
CONTROL
UNIT
Sequencing
Logic
Control Unit
Registers and
Decoders
Control
Memory
• Registers
– Provide storage internal to the CPU
• CPU Interconnection
– Some mechanism that provides for communication among the
control unit, ALU, and registers
Multicore Computer Structure
• Central processing unit (CPU)
– Portion of the computer that fetches and executes instructions
– Consists of an ALU, a control unit, and registers
– Referred to as a processor in a system with a single processing unit
• Core
– An individual processing unit on a processor chip
– May be equivalent in functionality to a CPU on a single-CPU system
– Specialized processing units are also referred to as cores
• Processor
– A physical piece of silicon containing one or more cores
– Is the computer component that interprets and executes instructions
– Referred to as a multicore processor if it contains multiple cores
Cache Memory
• Multiple layers of memory between the processor and main
memory
• Is smaller and faster than main memory
• Used to speed up memory access by placing in the cache data
from main memory that is likely to be used in the near future
• A greater performance improvement may be obtained by using
multiple levels of cache, with level 1 (L1) closest to the core and
additional levels (L2, L3, etc.) progressively farther from the core
MOTHERBOARD
Main memory chips
Processor
I/O chips chip
PROCESSOR CHIP
L3 cache L3 cache
CORE
Arithmetic
Instruction and logic Load/
logic unit (ALU) store logic
L2 instruction L2 data
cache cache
AC MQ
Input-
Arithmetic-logic output
circuits
equipment
(I, O)
MBR
Instructions
and data
Instructions
and data
M(0)
M(1)
M(2)
M(3) PC IBR
M(4) AC: Accumulator register
MQ: multiply-quotient register
MBR: memory buffer register
IBR: instruction buffer register
MAR IR PC: program counter
MAR: memory address register
Main
IR: insruction register
memory
(M)
Control
Control
circuits
signals
M(4092)
M(4093)
M(4095)
Program control unit (CC)
Addresses
0 8 20 28 39
opcode (8 bits) address (12 bits) opcode (8 bits) address (12 bits)
Memory address • Specifies the address in memory of the word to be written from
register (MAR) or read into the MBR
Instruction register (IR) • Contains the 8-bit opcode instruction being executed
Accumulator (AC) and • Employed to temporarily hold operands and results of ALU
multiplier quotient (MQ) operations
The IAS Instruction Set
Instruction Symbolic
Type Opcode Representation Description
Unconditional 00001101 JUMP M(X,0:19) Take next instruction from left half of M(X)
branch 00001110 JUMP M(X,20:39) Take next instruction from right half of M(X)
00001111 JUMP + M(X,0:19) Take next instruction from right half of M(X)
Conditional
Branch 00010000 JUMP + M(X,20:39) If number in the accumulator is nonnegative, take next
instruction from right half of M(X)
00000110 SUB M(X) Subtract M(X) from AC; put the result in AC
00001000 SUB |M(X)| Subtract |M(X)| from AC; put the remainder in AC
Arithmetic 00001011 MUL M(X) Multiply M(X) by MQ; put most significant bits of result
in AC, put least significant bits in MQ
00001100 DIV M(X) Divide AC by M(X); put the quotient in MQ and the
remainder in AC
00010100 LSH Multiply accumulator by 2; that is, shift left one bit position
00010101 RSH Divide accumulator by 2; that is, shift right one position
00010010 STOR M(X,8:19) Replace left address field at M(X) by 12 rightmost bits
Address of AC
modify 00010011 STOR M(X,28:39) Replace right address field at M(X) by 12 rightmost bits
of AC
Boolean Binary
I nput logic Output I nput storage Output
function cell
Read
Activate Write
signal
• Data processing – provided by • The gates and memory cells are constructed of
gates simple digital electronic components
• Data movement – the paths • Exploits the fact that such components as
among components are used to transistors, resistors, and conductors can be
move data from memory to fabricated from a semiconductor such as
memory and from memory silicon
through gates to memory
• Many transistors can be produced at the
• Control – the paths among same time on a single wafer of silicon
components can carry control
signals • Transistors can be connected with a
processor metallization to form circuits
Transistors
• The fundamental building block of digital circuits used to construct
processors, memories, and other digital logic devices
• Discrete component
– A single, self-contained transistor
– Were manufactured separately, packaged in their own containers, and soldered
or wired together onto Masonite-like circuit boards
(a) Close-up of packaged chip (b) Chip on motherboard
ed of
rc
or in
ga w
d
st rk
ci
ul l a
at n
te
gr tio
si o
’s
an w
om e
te n
r
tr irst
in ve
pr oo
In
M
F
100 bn
10 bn
1 bn
100 m
10 m
100,000
10.000
1,000
100
10
1
1947 50 55 60 65 70 75 80 85 90 95 2000 05 11
Clock speeds 108 kHz 108 kHz 2 MHz 2 MHz, 8 MHz, 10 MHz 5 MHz, 8 MHz
Clock speeds 6–12.5 MHz 16–33 MHz 16–33 MHz 25–50 MHz
Addressable memory 16 MB 4 GB 16 MB 4 GB
Virtual memory 1 GB 64 TB 64 TB 64 TB
Cache – – – 8 kB
Clock speeds 16–33 MHz 60–166 MHz 150–200 MHz 200–300 MHz
Number of transistors 1.185 million 3.1 million 5.5 million 7.5 million
Addressable memory 4 GB 4 GB 64 GB 64 GB
Virtual memory 64 TB 64 TB 64 TB 64 TB
512 kB L1 and
Cache 8 kB 8 kB 512 kB L2
1 MB L2
1.06–1.2
Clock speeds 450–660 MHz 1.3–1.8 GHz 4 GHz 4.3 GHz
GHz
Number of transistors 9.5 million 42 million 167 million 1.86 billion 7.2 billion
Virtual memory 64 TB 64 TB 64 TB 64 TB 64 TB
1.5 MB L2/
Cache 512 kB L2 256 kB L2 2 MB L2 14 MB L3
1.5 MB L3
Number of cores 1 1 2 6 10
Pentium Pro
• Continued the move into superscalar organization with aggressive use of register renaming, branch
prediction, data flow analysis, and speculative execution
Pentium II
• Incorporated Intel MMX technology, which is designed specifically to process video, audio, and graphics
data efficiently
Pentium III
•Incorporated additional floating-point instructions
•Streaming SIMD Extensions (SSE)
Pentium 4
• Includes additional floating-point and other enhancements for multimedia
Core
• First Intel x86 micro-core
Core 2
• Extends the Core architecture to 64 bits
• Core 2 Quad provides four cores on a single chip
• More recent Core offerings have up to 10 cores per chip
• An important addition to the architecture was the Advanced Vector Extensions instruction set
Embedded Systems
• The use of electronics and software within a product
• Billions of computer systems are produced each year that are
embedded within larger devices
• Today many devices that use electric power have an embedded
computing system
• Often embedded systems are tightly coupled to their environment
– This can give rise to real-time constraints imposed by the need to interact with the
environment
▪ Constraints such as required speeds of motion, required precision of
measurement, and required time durations, dictate the timing of software
operations
– If multiple activities must be managed simultaneously this imposes more complex
real-time constraints
Custom
logic
Processor Memory
Human Diagnostic
interface port
A/D D/A
conversion Conversion
Actuators/
Sensors
indicators
• Has a processor whose behavior is difficult to observe both by the programmer and
the user
• Is not programmable once the program logic for the device has been burned into
ROM
Cortex-M
• Cortex-M0
• Cortex-M0+
Cortex-R • Cortex-M3
• Cortex-M4
Cortex-A • Cortex-M7
• Cortex-M23
• Cortex-M33