PC
shord between files
theads page
,
ore E -
FINE GRAINED COARSE GRA INED
chage stalles
thood Switch long in
of
Cu evens shore
active
only h to
be
processes proces resource
time
cor ,
what
slaws stalle
execution
down individual reduce
don't
- of -
theory
throds toll
without
realy thought Cose
since s
,
will deloged thread
b
be other ents the
stell
from to
Es pipeline
-s
-
it short
bide Stalls
- and lang condition
can c
,in norma
- in is
other threads
of when
I executed not down
slowed
ore
,
Theod stolls
TLP to multithreading
simultaneous
It's the
manage
a
may
flexible
He hadeare
it
but complex
requires
, functional
has
EU resources
more ,
.
A lage of
set register is needed
MULTITUREADING
SIMULTANEOUS adaptive
It to
The
with sistem domenically
combines imp be
10 can
.
the environment execution
possible) hom
the of
allowing (if Is
, allowing that
each and
theed functional
the of thead used
simple
Is all
a
thed latency
other .
Units the long
if in event
incre a
DEEP DIPELINE stopes
dan dissipation
agele
Nighe Leat
fequeug >
-
3 mare S
,
stapes foults
smaller transmission
and
more more ,
henden
delog design
,
SI D date
has
rach its
special
processor purpose memory
· ore ,
with
model 1
simple PC
programming
· computation synchonized
fully
is
& code
He
I of
copy
only
· registers
their
units have
execution addressing and memory
· awn
architecture
vector
variations
· 3 (called multimedia
aso
extensions extensional
Aux)
SSE
(x86 mx
: ,
,
↳
Gr multimedio estension
↑ elements
central controller moltiple
broodcosts to
Is processing
· alls
#Amples and
11 comel (3-core)
vener
=
VETTOR ARCHITETTURE register-to registe
A operates data results
vectors dozen
of
I
simple in of
o
an
operations (vector processing
Used hide
to loteny
· memous unit
unit
scolor
pipelined vector
processor
-vector +
= all menors-to-menay
rector operations
* vector
memory-menars are
processor :
D vector
all stal
Ins
registers
rector operations
vector-register between load and
:
processor
EXAmplE UMIPS
:
No
+ loopt
whaten Cade
+ No control hazards
+ and
No
↓ WAW WAR
VS VECTORS
MmX
Limited set
I
· Limited vector ength
register
· Tend galler vector support
towards
· in microprocessor
SUPER COMPUTER
Fortest machine at computer-band
turn
to they
told problem
in given a
a
into bound problem
10
on
m
flexible used
be
· can as
focuses
machines high for
performance specific
single are
:
usa on
- application tosks simultaneously
metiprogrammed metipoesers rn
:
-
to execute
h to
with
exploit theads
there mustbe I
simp or
processores process
& a
indipendent theod advantages
conti
performance
be build the shelf
can off
from -
· co
fetches and
each agenates doto
its
each Is
processor awn
aun
parallelism the the
by
identified softwar not
by in
12
in os
,
⑨ apersdor CpUs
paolelism achied by
is :
Podbliem
Data data
Quel items be of
pocesed the
can
- : seme
many
time
Level Perdlelism in pandlel
executed
Tor tess and
be
can
- : indipendently
the
defending involved
relasses processes
on
CENTRALIZED/SRMETRIC DISTRIBUTED
METIPROCESSORS support
to count
lage processores
12-100 cales connection
high
requires bandwidth
lange oches high comunication
structure data
of
multiplenos volume
5metric =
comps between processons
Menag Accen
Uniform (UMA) (Non-UMA)
SID us
SIMB :
exploits for
DLP
· : computina
matrix-oriented scientific
- media-oriented and sound
image process
- epenction
efficient data
to
it fetch
because
· only needs
enegs
more 2 per
I
(compared
sequentially mima
continue
allows to to
to think
pequemmer
·
PIPENE account
CPI colelated into
toring :
are
Ideal pipeline P
- Structural (limited
stolls usacres
ne
- Lazords depende with
Data scheduline
solved
(12 compiler
forwarding
I1
on a
- Contral hazards with
branches
(cased carly
solved belayed
eduction
by
- a ,
branch predictors
,
Featener :
higher put
though
- multiple simultaneous
agente
tass
- Fid I lo I
Ex E wis
tipeline
time to -
speed
and
fill the
emety the reduces
- I
fetch a access
endeevenute activities
Memors anc
Write at
5 numming
↳ ↳
register rad
also niregister once
PIPELINE NAZARDS there dependence
where
fault pipeline
HAZARD in is
= a
a
I D Structura different
attent time
at the
from
the
to resarce Is some
use
: same
Data attempt result
↑ it
before ready
to
: use is
a
dependence by
& RAW compiles
: dependence
↑ anti by compiles
WAR : dependence
* output by compilen
Waw :
↑ Control the condition
request execute
the evolated
deciding to
next
of before
: is
I
on
T
Solutions i
are nisetion
Nas
compilation techniques schedeling
instruction
- stolls
bubbles insertion
or
techniques
nu
- farading
data bypessing
: or
COMPLEX PIPELINE
IN-ORDER plasting
not operations
execution point
the
but sflit
simple of
AW in
is
a
· , inseted
stage
FUs detect conflicts the
delon
isove and
>
- to
to
more is
stolls
execution With
Fid lone DWD
Ist Ex high ther
have
used wher
to reformance ore
· :
lathay
long time
with
systems variable
- memory
- access
functions units
multiple and exception
memory precise
- -
· main issues ore : execution
structural of stage
the
conflicts
- structural of the write
conflicts stage
been
- aut-of hazards pat/mader
write solved
ander write
with simple wa
a
- reptions
had handle
to
-
DEDENCENCES the
DEPENDENCE order
would the
auale change of
that
close march
two I
= are
S
I the involved
operande
to
occes
Nome
& location
register
the (nome
: Is
2 or memory
some
use detect
difficult to
↳
casie
Anti to
WAR enome
:
Output Waw
:
Data
↑ Rac
:
-Control determine odering of
the ,
I
:
hozads property
ND dependences
the pipeline the
of
: of
are program
a ,
BRANCH PREDICTION sotisfied
CONDITIONAL and
the condition
toner
te banch if
ARANCH INSTRUCTIONE is
is
the get
brach staed the
adess instead the the
of
of
A) PC
in
to is one
met the instruction
sequential stream
I in
The outcome the branche
the
branch end
and of
ready but
of Ex
ATA one ,
(*)
updated at
salved the
when of
end
or me
is
ore
BRANCH HAZARDS SOLUTIONS
Stall pipeline toven
the until fetch
the the
and
decision is
without stall
forwarding IF
3 ME
EX
: ID w EX
FID mew
SSS
stall
with 2
forwarding IF ME
EX w
ID
: * EX
IF MEW
ID
SS
do better
We with the
enuction of
calier
can PC
on to
During need
branch we :
a
sisten
1 compar
.
2 Compute BTA
update pc stage stall
do steps
these costs
banch
Mips -
in I
processors I
a ID
IF ID w
ME
EX
IF ID EX ME
S W
BRANCH PREDICTION TECHNIQUE
depende
performance prediction
branch
of predictions
of
on occurry ,
bench
cost frequency
inconcet
of an one ,
STATIC branch durage pipeline
Mores which
- 1 toren in
for
: : sense
.
* the actual
before outcome
actions the (mot mips)
known
for is
Bis
a 2.
2 mot
banch mot
durage token if
branch condition is
in
fixed in
for :
ore
branch the sotisfied
each offerie the
during next
pesered
performance is I
entire ,
execution stats
rection
and
into the
(turned
flushed mop)
i the
fetching E at
by the penality
(1
At
toren
banward not predict
formard
3 tonen :
, borward towe
banches and
the
lat loope)
of
going
- as
forward not
branches torm
as
going
- based profiling
S prediction
driver
profile prediction on
:
. collected complicated
information calier and
for runs ,
based profiling
S prediction
driver
profile prediction on
:
. collected complicated
information calier and
for runs ,
additiona
it needs nu schedules indipendent
5
. deloged branch compilar an
: then branch
branch
instruction delog the
slot
the if is
in ,
towen execution continues of BTA
:
- not toren continues the
the
execution with branch
after
I
:
-
There Bolog
the
to Branch Slot
schedule
ore a mass :
hedule
2 From before independent the lous
before branch
: I
-
Appunti per l'esame di Advanced Computer Architectures - parte 2
-
Formulario per l'esame di Advanced Computer Architectures
-
Esempi di Bit Prediction per l'esame di Advanced Computer Architectures
-
Advanced computer architectures notes