61c Cheatsheet 2
61c Cheatsheet 2
·
text string
c generics -
Jal/jair e key facts
· da + a ·
word
cassembler)
S
* cannot source files : foo. C
↑C relative > no glob
dereference
-
voida
·
per as =
Aptr (don't know # bytes) re102
compiler -
optimizes assembly code
...
uwine dre
in bytes pass1 remember pos of labels replace pseudos
intermediate files : foo
:
(symbol table) +
i , foo ii
-
. .
Pass 2 :
resolve forward references we label addresses
↓
size + n)
[
, -
assembly : for .
clabels outside file)
all
cas
memory access
I
arrays Header : size + pos of other parts of obj file
↓ text : machine code (instructions (
remember to sili index by2 * carr + i) arr(i] Object files : foo O data
=
.
:
binary rep of static data constants , etc c)
(for int) and add to
array
liner
relocation table : lines
of code to be fixed later (by Linker
pointer , then access
by OCtO) =
libs ,
a symbol take : current file's label / addr
↓
1b (1 byte) Sb (1 byte)
Execu + able files : for Out .
-
code 2 debugging info
& h (2 bytes) Sh(2 bytes) Loader coss doesn't rea recompilation of entire prog rams
init reg , load lib
sign extend * 2 LSB bitwise operations ↓ , 2) oad executable into memt Start execution (
a into m
Loader read / da + C
·
·
->
executab If neader for + eX + sizes ,
a & D and memory init PC
create new addr space + load in , registers , set
alb
LOGIC
or
how to get truet info preserve ?
Addressing
No relocate lat
and Xor PC-Relative Addressing >
offsets assembly
logic preserving bool helper Cuint32-t in put)E
bea , bhe , jal , quipc/addi
I neup
here
~ a not (F((p) External R eference
function
02ut2
embly Istatic
uin + 3 2 addresses (data segment
t favenum =
and wio's - torn to o ; alawiTerence
-
b 1 left shift by
-
<
return (input 0 favenum) == o ; Yes relocate
Icode (text segment)
w/ is e info b>> 1 z Iw S (static data) , lui/addi
and
preserving right shift by ,
g O g I
!
g O i
g 8
for
Citiz ; it
08
I I
Return uxur)(win
I I I I g
DeC Binary H2X
I I I I O O O 8 0000 8
+ 32 -
zx
forcinti 232 + S example w/strien I 000 I
=
0 ; ic ; i +
if (1v Xor(i)( in + 4 by t es 2 0 0 1 0
=
is in + x 25 7 1 by
calling return 2
=
convention te
3 3 inty = Strien (Char(2x 3
00 11 3
(callee -
Sa ved
Y 4
① 100000001-1
jo ! 80
257 18
When call fore , preserved registers unchanged
X = =
bytes 5 S
endI n see
not inc
↳
If want to use a preserved register save 7
7 1
8
, msg
big end :
on stack at func start & restore at end Storing OXDEADBEEF >
-
y =
0 g
msD esb S
#
Prologue
9
addi Spsp-4 (decrement stack)
endian little endian 9
SW ra O(54) (store a value on
big A
stack) 18
msb stored first esb stored first
-
# do function stuff
-
I 0 11 B
# 11
Epilogue co lowest address 10 lowest address
Iw ra ocsp) Crestore value from stack C
oxocord x000000000000000003
a
0003
② when
addispsp4
(caller-s aved)
call func
(increment
non-preserved
stack)
changed
OX
OXDE
000000000000000000
OXAD OXBE OXEF OXEF OXBE OXAD OXDE 14
r D
E
,
registers
contain
garbage 15 11 F
(a , t)
control Logic Truth Table
↳
if want to continue using act registers we
#Prologue
1) xOR
(And GOR son XX + 4
X Reg Reg Sub Read I ALL
addi sp sp-4 (decrement stack) notoGNAND (NOR
SW to Ocsp) (store to value on stack) addi XX + 4 I X Reg ImmAdd Read 1 ALL
Gate Design
#function call
jal ra func
XOR : en XX + 4 I X Regimm Add Read I mem
DoD
A
Epilogue X
swXX
#
+ 4 S Reg Imm Add write O X
IW to OCSP) crestore value on stack) ↳ ⑨
Fo
E
bea 1 X AlU B X PC Imm Add Read O X
Representation
Number one O X ALU B X PC Imm Add Read O X
unsignedIntegern dist
>
unsigned Bias one 1 X + 4 B X PC Imm Add O
Read X
is
neg
X
issues Small no
:
Range , nums
(2N-- 1) bids
O X
use
bit 1 ALU B PC Imm Add Read O
-
Range : Sb 2" 1 + b]
sign-magnitude
-
·
( -(2"" 1) 2"" 17
bl + U
Range : -
,
-
first bit=sign (1 =
negative , 0 =
positive when overflow
When over flow happens doesn't
happen
.
eX Decimal : -
18
jair
① adding
Damagioblo
2 pos nume neg
issues : cult arithmetic
hum addingey signed
Ob0111 + 000001 7001000
numbers
jal PC + 4
Two's complement
② adding 2
neg nume posm
0b1000 + 0b1001 70 b 0001
Total
Range : [an , 2- 17 :
2-1 dig it s Neit,ned : an
magi n
sign
Twoi
branch comp output - Whether assel Mux takes +C + Y or Acvout(PC + offset)
Bias :
2n
1) multiple
must persist through function
Heap <Dynamic AllocationsFunctions jump calls" =
put on heap ,
not stack !
exampleC questions
↑ label = jal xo label manually
void Mallo (size + size
allocates memory of size bytes j al label = jal ra label
&
cheatsheet-init/heatsheet
+
-
mmminiti a
all bytes to or Null
other functions # set 0
& pointer -
Alloc , Size of (Cheatsheet)
Sheet =
Cheatsheet * ; allocate space
Size -
+ streen chars)
void realloc (void & per , size + size
Sim string = per
-
char
reallocates
sheet -
student-id =
student_id ;
memory from prev init .
bufferA p tr don't need to allocate
changessuffer's size - "size" for
Chara Strcpy(char dest ,
charm src) spaces pages because
void free (void Ptr)
must for (in + i =
0; i < NUM-PAGES
; i + + ) S pages is ACT VAL ARRAY
memory sections
space alvallocated
for point a
& struct id = Struct-id free old ↓ pointer to ith page , not raw data
for sheet)
& (sneet-pages (i)]
.
a CISS -
to allocate memory
e (preprocessed)
for string
segfault-memory ↳
page-data-malloc(size of (chara (strich
.
De Morgan's Laws outside allo
macros
(contents (i); strien
inter casneSee
doesn't
mem leak- when ↑ need
B =
A + EX + Yy =
X + Y don't free allowed 2 C parameters to persist across func calls , allow space for consider
content six null terms
memory string add 1.
B contents [i]) ;
A + B =
Stropy(page data
,
;
string literals/read-only STATIC
lk-to-d +
-
mb-pat +
setUpper T
p I ② Vector-vector-slice (vector-tav
path ,
int startidx
, int end-idx)3
-
memory allocation
cycle (period)
Vector-ta slice = calloc (1 Size of (rector-t))
109 =
Giga (a) ①array of Kints :
,
j
100 arr = (int) malloc (size of (int) k) if (slice
=
Mega (m) A
; == NULL) Salloc-failed () ; 3
If : Instruction Fetch ② string w/p chars
seeidstarti
a
Send addr to instruction memory
I m E M , read I MEM at addr string = (nara) malloc (size of Gnar) * (p + 1)) ;
Units : PC
register, 4 adder, PCsel mux , I MEM & nxm matrix init 0
to
Alr
operations , branch
arr =
realloccar , basize of (int veslices [venum-slices] =
Slice ;
comparisons
units : Asel
mux Bsel mux , branch
,
comparator , datapath v-rm-slices
+= 1
j
Azu changing
MEM O long jump/branch Is bute offsets return slice ;
1) add new
WB 2) add new input caddys to immen if (Iveis-slice)3 & free size-t size ;
Write 3) add new input <PC+8) to WBMUX allowed a
back PC+y , All operation result
, int data ;
free (vedatal
or memdata >
- ② alswrdrs2 immersi auto load store word ;
Regfile (rs) imm) rs2 size + num-slizes ;
ra (rs) + imm) , + =
3
units : wisel
.
=
mux,
Regfile 1) create new instruction type w/ # of vector's child slices
rs2 , immt update immen
array
rd , rsl
I
,
for Cin + i = 0
; i < venum-slices ; i ++ ) 3 struct vector-t slices ;
=
2) Allow DMEM to read / write in
floating point example same clock cycle Vector-delete (V + Slices[i]) ; bool is-slice ; If is child
3 ↑
3 restor-ti
&
freeing
I sign bit < exponent bits (bias-63)
, elements
,
if (v + num-slices > 0) 3
&
significant bits
Floating Point Rules E free (reslices) ; -
freeing actual array
norm
2
+
2-22-3 2
- 4
Exponent significant meaning
free (v) ;
·
. 5 0 25 0 06a5 representations)
-
2 . 25 0 . 0 125
.
.
(two
↳ convert g O I 8 3
to binary RISC-V Example
001x21 space
10 01 1 cannot free
that individually
=
O *
Anything Denorm
. .
1-254
exponent- Anything Normal
RISC-V addi SpSp-20
sign bit = 1
255 8 Infinity Su so 4(SP)
ra can be saved anywhere
exponent = 01000000
mantissa = 0010000 0
in code swira 16 (sp)
255 Nonzero NaN
mu so as
imm in J/B inst OFFSET
40000000000 8
=
·
32 a
bit state divide signed num , soli - pos
23 num LOOD :
denoum & after code mu go to # input for func
writing moving to race
Exponent mantissa
walkthrough by cour to c ! su Eg OCSD)
I
-
63
1 75 x 2 0
(1 nig
.
=
D jal a race
↳ convert = Loop example :
to binary Iw to O (SD)
63
1 11 x2- num-steps :
been slao continue
I
normalized floats
.
Prologue
#
I Jupdate
value = (-1) Sign a 2Exp Bias 1
exponent +
bias =
Y +
i s As
Significande addi so Xo 0 counter
exponent
.
-
63 = -63
1004-start :
exponent =
0 = denormalized !
denormalized floats a
addi to Xuend addito
toI a d d to countera
bit
j1001
#
EPilOQUe
LSB =
rightmost bit
loop-end
? Start
:
10000000000 Step Size = LSB Clowest Mantissa bit) ao
add so xo
10x00E0 # Epilogue
Our re jr ra
jr ra
1 Pipelining a Hazards
Us. Private variables
Public ↳ ↑ openmp (multithreading
C prog
Proper performan on
within
declared declared inside
ation e #
outside
Drag maomp
>
Pragma
arallel
omp
1 Thread-level Parallelism <TLp) /
parallel thread
(each
catency-time it takes to run one instruction
Call threads have DIRECTIVES
has own copy of
RW access to
Throughput-# of inst
Oparallel
every thread
in
run 1 unit time variable ( -
code within
variable runs a copy of the the block
-
cone inst per clock cycle) (use unused HW for next inst-
add register for each wire int i
carrying value when stages change j Cex if for
single Cy(12
N-stage Pipeline
.
loop , every thread runs every iteration of the for loop
D at a path
patapath
# Dragma omp parallel for
tries
Clock ClOCK f
ad rid forci parale as
#pragma
max (clock period 0 ; i < n ; i + + ) ump
any
C odl
=
Cy(19 full
stages
>
- will be executed independently
umprieeve
latency CIOCK CycIC Clock Cycle a N
Deter
Enrough put , 3 by all threads
mrupu flacod maxopend y
Worse e
see
② parallel for esplits up for loop iterations over various threads
Structural-insufficient Winst a
1) >
nondeterministic order of / number of iterations given to each thread
-
execution
caused By parallel
#yagma fr
:
Omp parallel #pragma omp
1. ID
Ipragma
Reafile-decode reg values + wB ratsepR/W ports
. main mem-inst memory
2 I f+ read/write data esep IMEM/DMEM
for (in + i = 0 ; i < n i ) 3
omp for
; ++
eachthread a ration
...
(NOP) 3
3 WI 11
2) Data-data dep ou insteinst reads prev inst finishes to
.
to same reg
①intomp-get-thread-numcse returns number of thread executing code
soln :
double-pumping
Written-redsumregisterincr Release
-
>
Forwarding ② int omp-get-num-threads()e returns how many total nw threads executing code
- au
Wire 2 ↑ for
: Start of WB following inst
!
Default assume P C 4 execute next line
soln : branch prediction
: Branch false - + ,
multiplethreads
-
other
stored s
line I
addi *
Control Hazard (line1-73 Gain fix ex int X 0 / public variable
-
2 · . =
S taken
addi ③ critical sections
·
· Data Hazard (line7-8) writing
->
Next
-
+8
add i accessing in line section running
Of a # pragma omp parallel3
addi
6 addi a
11234 56 7 S 10
inst 11 12 1314
Label : int +id =
omp-get-thread num() ; /private variable
bea X0x0 Label WB
is
-
MS I
7 add i tO XO9
- Flush
addivx
X X
o
↑ xori 85
addi
X > Flush
# pragma omp criticals
"nop X X -
WB
X = x + +id
add
line[ add
MEM
I
i ID EX
rive addi X X 3
↳,
X
by forward X 3
*
X
CS MEM + 1t8 2 - nop
to line 8(8EX x X X
addi 1 to 2 shared
Coperation c ras
P
-
X reduction
mmwey
Xori tX06
createsand optimizescriticalsectionforforlong
-
together end
ISIMP at
-OR
Key
I dea : rectorized
SIM D
Programming
- calculation-applying operations to several items
-
cas part of a
single rector
simultaneously coperations applied to all pieces of data centire vector) at same time)
inreads vs. Processes
3/1 *↓
source I X3 X2 X 1 X0
sharesame memory
have distthe
!
spaces
source 2
4ClOCKCYCIeSTICOCK
Yo
+ which
rachrun
& a
c op
a
of the
tion)
↓ ↓
· newbaparingcalling
process
~
creating copy of currentpos a
operation oper. Oper oper .
Destination x3p v3 x04y2 xiopY1 x804 yo worker:
child process
= new process
while true :
S I MD example ↳ fork()
returnsA
send- manager :
static int product-rectorized (in+n , in + a) "Ready for work "
*
manager-worker framework
int result [4] ; ·
receive mage from manager
--Mm-set eDi32(1)
M128i prodv = int main(int arge, chara arguss
-
-
-
j ↳
VECTORIZED LOOP if message : "Here's more work"
h/4*4 grabbing elements 4
letupopen ,
for (in + i = 0 ; i <
; it = 4 (E at a time
& rounds down to multiple
>
- do work
of 4
loading in next 4 subsequent elements ,
3 else 5 >
-
break
silze(astingresult manager :
Il worker node code
--mm-store -
sult
.
prod-v Ci
TAIL CASE
3
from multiple of 4 to end while there's work to do :
-
How to go from int arrayteen] -
SIMD rector
resulto = a[i] m i n i n
~
j fOrSIMD load : -- m128i mm-loadu-Sil28(--m1281 * P)
3
-
m by remaining elements
store : void -
mm-stored-sil81 .
-128ip
-
, -- mizsia)
find next task to do
,UseCastinga
require simp rector input type
3 adder-two
SIMD Vector
[1 ,
2 ,
3
1 4] in +
miz8i
ar r < y] 31 , 2 ,
=
adder-one
3, 43 ;
-mm-loadu-Sil28((--m128i)arr) ; Repeat #worker times :
--
=
, ,
-- m128i Sum =
-mm-add-episa ladder- one , adder. two)
;
wait until worker ready
&
·
OpenMpl functions t
for work
cre , from e
-can'tindi mersoharumvector
- e
casted as
SIMD
①Set UP & euro
send to worker "All work doe a
② Sending &
Receiving
name as a
,
2 int
2 int MPl Finalize 2) -
end MPI-Recv(void a buf , int count MPI
-
call at of program , Datatype datatype int source , O, MPI COMM-WORLD , MP1 States a states
ris
_
, - _
Struct
of
Mpl - - rank (MPI COMM-WORLD , intArank) -returns
-
ID of current process
↳ if know
already message source/don't need information
,
(owp and num-processes -
1) and puts it in rank >
- set status address =
MPI - STATUS-IGNORE
leaches
TY Des
I virtual memory
1.
Fully Associative+ blocks can go anywhere in cache * always update
.
2 Direct-mapped e blocks go into specific indices in cache
-processesa A
LRU
POLICY !
replacement
Kul
!-in
fixed
↳ store data in blocks (size = block size of caches Translation
>
Micessor
came,
L
memory &
|+ B
I I
Efficiency via
Locality ?
Otemporal
!
Locality -
# bits : 1002 #indice = num blocks for direct-mapped cache 1 . Program executes load specifying virtual address (VA)
memory
# bits : 9) Extract Virtual page number (VPN)
1092 (size of block)
b) Look up physical page number (PDN) from TBCB miss-page table
useful 3) If Valid bit =
o-page fault
formulas physical address (PA) Offset
c) construct : PPN + -
rect
↓
waa
↓ ↓
Split Address
VPN : Offset I
↑
ES
viw NO
is
i
Read
page
- in TLB2- page ta,
table eniy TLB
from TLB misss
No ↓
3 C's of cache misses valla Bir : page in memory - >
NO
All OCU + &
Bit 0? >
-
Ng
PROTECTION
↓ YES
Bread
PTE
109e(page size)
③ capacity miss-we
·
mapping VA - PA
I : valid entry physical page virtual
corr to is in
↑ , page main memory
, not just disk
Som : increase cache size
valid/ Dirty/permission
cread
doesn't change dirty bit)
Bits
/PDN
↳
I
1 : Write ru page in main
memory :
necessary update disk
(VPNs
H
page table entry # entries = GVPN
bits
attes,a
time
not
get
↑Fama
>
-
local
in
cache performance Mis time to access main memory
↑
Mp
1) single-level cache :
=
access memory on
ar 2) multi-level cache : MP = AMAT of next level
how long it takes to ang
(cycles]
time CAMAT) =
miss rate
↳ can be recursive !
types
21 Hit Rate + [1 Miss rate) a (12 Hit Rate + 12 miss rate a main memory access
① Global -
# accesses missed at that eevel 2 level : AMAT =
a
#of accesses
mis
total # accesses to cache system
mm
② Local -
#accesses missed at that level
#
accesses to Le cache
It
total# accesses l
to
cahe if
#
data in
(Hit Ratel LI
cache
computer total memory :1 MiB
16 KiB Direct-mapped Cache w/l kiB blocks Direct mapped cache
first accesses
M
n m h
ensase
adar
ASO] ,
A[8]
Jean
, A[2567 , A5384],
addr
A[i] ↓ 128 in +e each
↳ bygers at
addrozyepull
,
+ 1023 and A [i + 128] in is y in
addrioy
sin Cl
starts e itatadaocytes
+eS to
addreous
-
same came
block -
,
i nit rule 58 %
, we pull in laddre addr
,
calne
arra
is
16 BLOCKS
,
0
a
is 1 KiB
(1024B)
at end
holds
since
of first
~
only
A
(
array size
at start of and loop , we start at top with
- of array- repeat process hit rate !