Che materia stai cercando?

Anteprima

ESTRATTO DOCUMENTO

Segnali di errore

 Possono essere generati

» dai circuiti di controllo delle memorie, ad esempio quando si

verifica un errore di parità

» dal circuito di decodifica delle istruzioni, quando il codice

operativo dell’istruzione di cui si è fatto il fetch non

corrisponde a nessuna delle istruzioni lecite

» dai circuiti aritmetici, quando si verifica una condizione di

errore (ad esempio divisione per 0).

Architettura dei Sistemi di Elaborazione (a.a. 2005-06)

Segnali per il debug

 Sono utilizzati dai debugger.

 Molti processori prevedono una modalità trace; quando viene attivata

il processore scatena un’eccezione dopo l’esecuzione di ciascuna

istruzione.

 Tale eccezione viene utilizzata per gestire il modo di esecuzione single

step.

 Inoltre, i processori dispongono spesso di apporite istruzioni che

scatenano un’eccezione voluta.

 Queste possono venire utilizzate per implementare i breakpoint.

Architettura dei Sistemi di Elaborazione (a.a. 2005-06)

Eccezioni di privilegio

 Se il processore prevede più modi di funzionamento in

base alla priorità del programma in esecuzione (ad

esempio utente o supervisore), viene attivata

un’eccezione quando si tenta di eseguire un’istruzione

non permessa dal livello di priorità corrente.

 Analogamente, si genera una simile eccezione quando si

tenta di fare accesso ad un’area di memoria senza averne i

diritti.

Architettura dei Sistemi di Elaborazione (a.a. 2005-06)

Le interruzioni nel 68000

» 8 livelli di priorità degli interrupt, codificati su 3 bit

» il livello corrente è memorizzato nel registro di stato

» una richiesta di interrupt è accettata solo se possiede priorità

superiore al livello corrente

» le richieste di livello 7 vengono sempre accettate (interrupt non

mascherabile)

» quando viene accettata una richiesta di interrupt, nello stack

vengono salvato l’indirizzo di ritorno ed il registro di stato

» l’istruzione RTE (Return from Exception) termina una procedura di

servizio, ripristinando il registro di stato.

Architettura dei Sistemi di Elaborazione (a.a. 2005-06)

68000: segnali di interrupt

 Il 68000 possiede 3 segnali per le richieste di interrupt.

Il valore su questi tre segnali determina la priorità della

richiesta.

 Ciascun livello può corrispondere a più dispositivi,

gestiti in daisy chain. ENCODER

IRQ

7 7

IRQ

6 6

IRQ

5 5

IRQ

4 4

IRQ A IPL

3 3 2

IRQ 2

2 2

IRQ A IPL

1 1 1 1

0 0 A IPL

0 0

E

0 1 68000

Architettura dei Sistemi di Elaborazione (a.a. 2005-06)

68000: riconoscimento del dispositivo

 Può avvenire in due modi:

» in modo vettorizzato:il dispositivo mette sul bus un codice da

8 bit, con il quale il processore accede ad una tabella degli

interrupt, dove sono memorizzati gli indirizzi di partenza

delle procedure di servizio

» in modo autovettorizzato: attraverso un apposito segnale, il

dispositivo segnala che non metterà sul bus il suo codice; in

tal caso il codice dell’interrupt è automaticamente associato

al livello della richiesta (IRQ1=25, IRQ2=26, etc.).

Architettura dei Sistemi di Elaborazione (a.a. 2005-06)

La soluzione del M68000

 Il processore M68000 utilizza il

meccanismo degli interrupt

vettorizzati

DEVICE ISR

$XXXX 0  In memoria sono presenti 256

ISR

1 locazioni consecutive dette

8bit vettori di interrupt

INT  Ciascuna di queste locazioni

contiene l’indirizzo di una ISR

CPU  Quando un dispositivo richiede

un’interrupt, invia al processore

un numero di 8 bit che

rappresenta il vettore di

ISR interrupt da utilizzare

255

Architettura dei Sistemi di Elaborazione (a.a. 2005-06)

Gestione delle priorità

 Problemi:

» Mascheramento

» Abilitazione

 Soluzione del 68K:

» Interrupt Priority Level

» Processor Priority Level

» Le interruzioni a priorità 7 non sono mascherabili

IPL2

IPL1

CPU DEVICE

IPL0

I I I SR

2 1 0

Architettura dei Sistemi di Elaborazione (a.a. 2005-06)

Exception Vector Table

RESET (SSP)

0 RESET (PC)

1 UNASSIGNED, RESERVED

16-23 LEVEL 1 AUTOVECTOR

25 LEVEL 7 AUTOVECTOR

31 TRAP #0-15 INSTRUCTIONS

32-47 USER DEVICE INTERRUPTS

64-255

Architettura dei Sistemi di Elaborazione (a.a. 2005-06)

Servizio mediante autovettore

25 LEVEL 1 AUTOVECTOR

64 =100

HEX DEC 31 LEVEL 7 AUTOVECTOR

7C =124

HEX DEC I I I SR

2 1 0 IPL2

IPL1

(60 + 4 * n) CPU DEVICE

HEX IPL0

Architettura dei Sistemi di Elaborazione (a.a. 2005-06)

L’interrupt nel PowerPC

» 1 sola linea per le richieste di interrupt

» bit per l’abilitazione/disabilitazione degli interrupt nel registro di

stato (Machine State Register, MSR)

» indirizzo di ritorno e registro di stato salvati automaticamente in

due registri (SRR0 e SRR1)

» tabella contenente gli indirizzi di partenza delle procedure di

servizio.

Architettura dei Sistemi di Elaborazione (a.a. 2005-06)

Uso degli Interrupt nei S.O.

 I Sistemi Operativi sfruttano pesantemente il

meccanismo dell'Interrupt:

» le procedure che compongono il Sistema Operativo (ad

esempio per la gestione delle periferiche) sono attivate

tramite interrupt software dai programmi (system calls)

tramite interrupt hardware dalle periferiche stesse

» il passaggio all'esecuzione delle procedure di interrupt

corrisponde spesso al passaggio del processore ad una

modalità speciale (supervisor).

Architettura dei Sistemi di Elaborazione (a.a. 2005-06)

Corso di Architettura dei

Sistemi di Elaborazione

Memoria virtuale

Prof. B.Fadini (fadini@unina.it)

a.a. 2006/2007

bruno.fadini@unina.it

Architettura dei Sistemi di Elaborazione

Memoria virtuale

Confronti con cache

e aspetti hw 3

Memoria virtuale

 Introduce un indirizzamento virtuale alla M.P.

inddipendente da effettiva capacità di memoria

fisica

 Introduce sistemi di protezione utili per la

gestione di processi concorrenti

 Suddivide la memoria in pagine (o segmenti)

 La gestione della M.V.si ottiene con

 MMU (Mem. Man. Unit), hw nella CPU

 S.O.. sw strettamente legato ad hw

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 4

La storia

 Fu introdotta per potere eseguire programmi più grossi

della capacità di memoria fisica

 Senza memoria virtuale, era il programmatore che

adattava il programma ai limiti della capacità di memoria,

con due tecniche:

 Esecuzione suddivisa in Job step, ciaascuno produce dati usati

dal successivo

 programma strutturare in parti (overlay), delle quali si

predetermina l’ordine di esecuzione e la simultanea presenza in

memoria

 Grossi limiti a

 Sforzo di programmazione

 Portablità

 Superati da M.V.

 Si programma come se la memoria non avesse limiti

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 5

Il principio della M.V.

 M.V. e principio di località spazio-temporale

 Dal livello gerarchico “dischi” a M.P.

 Tiene in M.P. le parti di programma prossime

all’istruzione in esecuzione e ai dati indirizzati

 All’occorrenza programma e dati sono prelevati

dal disco e messi in M.P.

 Suddivisione del programma in blocchi

Meccanismo concettualmente identico a

memoria cache

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 6

I principi della tecnica M.V.

 Spazio indirizzi di programma (virtuali) V

 Spazio indirizzi di memoria (fisici) M

 Funzione di traduzione VM

 In ogni istante f(v,)=m oppure f(v)=ø

 f(v)=ø , fault è come il miss di cache

 Ottenuta con una Page Mapping Table

 Memoria suddivisa in blocchi

 Pagine: blocchi a lunghezza fissa

 Segmenti: blocchi a dimensione variabile

 Segmenti suddivisi in pagine

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 7

M.V., compilatore e linker

 Compilatore traduce usando spazio virtuale

 Linker suddivide codice in pagine

 In esecuzione, un, apposito hardware in cache

( ) effettua

Translation Lookaside Buffer

immediatamente la traduzioe

 …oppure fa riferimrnto ad una Page Mapping

Table che è gestita dal SO

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 8

Virtuale vs. Cache

VIRTUALE CACHE

 

Obiettivi funzionali Obiettivi prestazionali

Colma divario dimensioni

 Colma divario velocità (evelocità) MP-MS

CPU-MP  Hw+Swe

 Hw  Indirizzo a blocchi

 Indirizzo a pagine  Missing

 Fault  Bit validazione e modifica

 Bit validazione e modifica

Differiscono per alcuni aspetti realizzativi

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 9

Memoria virtuale + cache

Memoria secondaria (a disco)

Trasferimento via DMA

Memoria Principale

Cache

Indirizzo fisico MMU

Indirizzo virtuale

CPU

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli

Hw+Sw di traduzione- 10

Indirizzo fisico

bit Cornice N.. Pag.. Sp.

PMT

+ N.. Pag.. Sp.

Registro-base Indirizzo virtuale

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 11

Bit e campi di controllo

 Bit Pagina in memoria (=validità)

 Bit pagina usata

 Bit pagina modificata

 Campo “diritti di accesso” (protezione e

controllo)

 Solo lettura

 Di pertinenza di n detrrminato processo

 Accessibile a tutti

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 12

TLB

 La tabella dovrebbe essere nella MMU, ma è grande

 Tabella in memoria , ma una parte in MMU

 Translation Lookaside Buffer, TLB: una piccola cache

verso la tabella associativa o set-associativa contente

pagine recentemente usate

 contiene numero pag.virtuale (è associativa!)

 si deve mantenere coeren coerente con tabella

 Rigo di TLB: cornice

n.pag.virt bit

. WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 13

TLB e tabella

 MMU consulta TLB

 Successo: OK

 Fallimento di TLB: prelievo da tabella

In tabella: si aggiorna TLB

 Page Fault: MMU genera interrupt per page fault

 In caso di page fault:

 interviene S.O. per trasferire da disco a MP

 Il processo in atto si interrompe e ne parte un altro

 Problema di interruzione per page-fault per accesso a

un dato durante l’esecuzione di una istruzione: si vioano

le ipotesi del processo interruzione

 Istruzione riprende da capo

 Istruzione riprende dal punto interrotto

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 14

Ancora analogia e differenze con cache

 Problema della sostituzione

 Come nella cache (LRU)

 Problema della scrittura nel disco

 Come nella cache, ma write-through non è adatto:

non è bene accedere al disco di frequente

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 15

MMU con TLB n.ro pagina spiazz

Ind. virtuale

Ind. virtuale

n.p. 

n.p. 

 cornice

n.p. 

n.p. 

n.p. cornice spiazz

Sccesso/fallimentot Ind. fisico

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 16

Un’idea delle dimensioni

Da una pubblicazione del’93

 Dimensioni di pagina 0,5 - 8 kbyte (da ’70 a ’90)

 Tempo di hit (successo) 1–10 cicli di clock (cc)

 Penalizzazioni di fault 10 -6x10 cc

5 5

 Tempo di accesso 10 -5x10 cc

5 5

 Tempo di trasferim. 10 -10 cc

4 5

 Tasso di miss 10 % - 10 %

-5 -3

 Dimensioni M.P. 4M – 2 Gbyte

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 17

Qualche dato su TLB

Dati da Pentium

 MC 68851 pagine da 512 a 32kbyte

 80486: pagine da 4k, set-associativo a 4 vie, 32 elementi

 Pentium pagine da 4k oppure da 4M, set-associativo a 4

vie

 Per istruzioni, 32 elementi

 per dati. 64 elementi(4k) o: 8 elementi .( 4M)

 WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 18

Cooperazione hw-sw

Page fault

 Interrupt:. Il processo è sospeso

 SO intervene per deve determinare posizione di pagina

su MM, che può stare in PMT purché vi siano azioni

coordinate fra compiler, linker, allocatore MM, SO

 SO sceglie la pagina da sostituire e, caricata la nuova,

aggiorna PMT

 CPU provvede a TLB

 Il processo interrotto riprende

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 19

Paging, Interrupt e pipelining

Interrupt di page fault e pipelining

 Istruzoni precedenti sono completate

 Istruzione che ha interrotto e seguenti sono annullate

 Salvato PC di istruzione che ha interrotto

Interrupt durante istruzione di move a blocchi

 Istruzon non interrompibile

 Ripetizione di tutta l’istruzione

Interrupt per istruzione con 2 accessi in MP (p.e. push)

 Potrebbe richiedere 3 page-fault

 Accorgimenti per evitare

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 20

Segmentazione

 In alternativa o a monte della paginazione

 Segmenti separati da programmatore (o compilatore

 Tipico 80086: codice, dati, stack

 Segmenti a lungh. Variabile

 286: fino a 64 kbyte, da 386 4 Gbyte

 E’ una selezione logica; autoreferenziale

 Indirizzo: selettore di segmento+ spiazzamento

 Selettore: contiene indirizzo-base + protezione

 TLB sostituito da “Tabela dei segmenti” in una cache

(pccla) che sostituisce il TLB

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 21

Pagng vs. Segmenting

 I/O: meglio paging (segm. Trasferisce grossi

blocchi non correlati con I/O)

 Logico e missing: meglio segmentazione

 Autoreferenzialità

 Pochi segmenti (3) in un programma (coinvolge

linker)

 Protezione: meglio segmentazione

 Segmentazione prevede un “registro descrittore”

per ogni segmento (pochi) in cache invece di

TLB

Intel: segmentazione + paginazione

 WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 22

Frammentazione

 Allocazione e deallocazioni provocano

frammentazione

 Intervento SO sconsigliato

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli

Corso di Architettura dei

Sistemi di Elaborazione

Protezioni

e Complementi vari

Prof. B.Fadini (fadini@unina.it)

a.a. 2006/2007

bruno.fadini@unina.it

Architettura dei Sistemi di Elaborazione

Le prestazioni 3

Indici di prestazioni

(grossi limiti)

 MIPS

 Dipende al repertorio

 Varia con il programma

 MFLOPS

 Benchmark WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli

Protezioni

Riferimento a serie INTEL 5

Protezione e M.V.

 Protezione = attivazione MV.

 In CR0 (al boot PE=PG=0):

 PE=1 abilita rotezione (modo protetto vs. reale)

 PG=1 abilita la paginazione, PG=0 la segmentazione

 Violazone di protezione intrrupt (eccezione)

 Protezione di pagine e di “oggetti di sistema”

(codici, tabelle, etc.)

 Il switch fra processi (salvataggio e ripristino di

stato) è gestito automaticamente in Intel: la

gestione dei processi = protezione

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 6

I processi

 Un processo è identificato da

 vettore di stato (TSS)

 insieme dei segmenti privati, listati nella sua LDT

 Insieme dei segmenti globali, listati nella GDT

 Attivazione di un processo:

 Caricare registri con immagini in TSS

 Caricare in TR il selettore del TSS

 Caricare in LDTR il selettore di LDT del processo

 Disattivazione di un processo:

 Copiare in TSS i registri di cui sopra

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 7

Registri perun Task

TR, LDTR, GDTR, IDTR limite

selettore base

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 8

Le gate

 Entità in memoria per la gestione del passaggio

di controllo WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 9

I descrittori

 ISR, stato dei task e gate sono segmenti

 Descrittore=8 byte a struttura variabile, :

1° byte, ARB

 S (1 bit): codice e dati / sistema e gate

 P (1 bit) valido o presente / non valido o assente

 DPL, Priority Lev. (2 bit): diritti di accesso

 2° semibyte dipende da S

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 10

Semibyte “utente”

 A (1 bit): Bit pagina usata

 TYPE (3 bit):

 E(1 bit): natura segmento

E=1. eseguibile E=0. dati

o R = permesso di o W=autorizza

lettura (mai scrittura); scrittura

o C=condiviibile da o ED= direzione di

livello di privilegio  espansione

DPL dati verso alto,

stack verso basso)

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 11

Semibyte“sistema”

 TYPE (4 bit):

 0 -8 Non usato

 1 -9 TSS: Segm. Stato di task disponibile (16-32 bit)

 2 -10 LDT: Segm. Tabella Descr. Locale

 3-11 TSS: Segm. Stato di task occupato (16-32 bit)

 4 -12 Gate di chiamata (16-32 bit)

 5 -13 Porta di processo

 6-14 Gate interrupt (16-32 bit)

 7-15 Gate trap (16-32 bit)

 Alcuni bit sono riservati

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 12

Modo protetto

Accesso in memoria via Selettore di descrittore (SD Sel)

15 3 2 1 0

TI RPL

Indice in GDT (TI=0) o in LDT (TI=1)

GDT= Global Desc. Table Livello di privilegio Può essere

cambiato solo da un salto

LDT= Local Descr. Table attraverso un GATE

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 13

Tabelle GDT e LDT

SD sel GDT o LDT

INDICE Descrittore

13 bit Limite

Base

GDTR o LDTR

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 14

Descrittore

 Comprende:

 Indirizzo-base

 Limite

 Quanto detto prima

WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli 15

Descrizione interrupt

 IDT contiene i descrittori che puntano alle ISR

 0 divisione per 0

 1 debugger

 6 opcode non valido

 10 TSS non valido

 11 segmnto assente

 12 violazione limiti di stack

 13 violazione protezione

14 Page fault

 WWWWWWbruno.fadini@unina.it

Università degli Studi di Napoli Pentium

Pentium

PENTIUM Architettura del Pentium

Architettura interna del Pentium Branch

Pipeline interna (U e V) Code Cache Prediction

Branch Prediction Logic

Registri interni 256 bit (32 bytes)

Registri di sistema

Operating mode nel Pentium Prefetch Buffers

64

Real Mode

Protected Mode U V Pipelined

Livelli di Privilegi 32 32 Floating Point

Segmentazione 64 bit Integer Integer Unit

Selettori e Descrittori di segmento Bus interface ALU ALU

64

Call gates, Interruzioni

Task, Task State Segment

Paginazione Mul

32 32

Indirizzi logici, lineari, fisici Add

Register Set

Translation Lookaside Buffer Div

64

32 32

Data Cache

64 2

1

Pentium Pentium

Pipeline Pipeline

486

D2 La pipeline U esegue tutte le istruzioni mentre la V

PF D1 EX WB esegue solo istruzioni semplici cablate

Instr. Address Write

Prefetch Execute

decode generation Back Le istruzioni sulle due pipelines, in generale, non

possono avere dati interdipendenti (RAW, WAW

N.B. lo stadio di memory è assente in quanto è impiegata la cache etc.)

PENTIUM

Instruction pairing (IF= instruction fetch/align) segue

L'istruzione sulla V sempre quella sulla U

La maggioranza delle istruzioni sono

IF microprogrammate ma alcune (ALU, MOV etc.)

U V sono cablate

Vi sono due prefecth buffers da 32 bytes,

D1 D1 alternantisi, che operano in parallelo al Branch

Target Buffer

Le alee sono verificate nello stadio D1 (all'interno

D2 D2 della stessa istruzione)

Le due pipeline operano sincronamente (gli stadi

D1 e D2 sono iniziati e terminati sincronamente;

se la pipe v e' bloccata in EX u puo' avanzare.

EX EX Nessuna istruzione puo' entrare in EX finche' u e

v non son in WB))

Alcune istruzioni possono essere eseguite solo da

WB WB una delle due pipelines

3 4

Pentium

Pentium

Branch Prediction Branch Prediction

mov edx, VALORE : carico 100 double word con VALORE

BRANCH TARGET BUFFER lea eax, VETTORE ; indirizzo di partenza

mov ecx, 396 ; ultimo indirizzo

init_loop: mov [eax],edx (1)

Branch History add eax,04h (2)

cmp eax, ecx (3)

jbe init_loop (4)

Branch Target Address sul 486 ( 1 pipeline) (1,2,3) in 3 clk (4) 3 cicli

totale 600 cicli di clock

Branch Instruction Address sul pentium U V

mov [eax],edx (1) add eax,04h (2)

cmp eax, ecx (3) jbe init_loop (4)

Lo stadio di prefetch ha due buffer da 32 bytes riempiti (1,2) 1 clk e se branch prediction (3,4) 1 cklk

alternativamente e sequenzialmente fino a un branch. Se se branch prediction sbagliata (3,4) richiede 4 clk

la predizione e' TAKEN l'altro buffer e' riempito con il

nuovo codice. In caso di errore le pipelines delle

istruzioni sono azzerate

Se il BTB predice correttamente non si hanno

penalizzazioni; in caso contrario si hanno 6 o 7 clock

aggiuntivi (a seconda delle due pipelines) 6

5

Pentium Pentium

Registri Registri

31 15 8 7 0 15 0 31 0 19 0

31 0

AH AX AL selector base limit

IP

EAX TR 15 0 31 0 19 0

BH BX BL EBX LDTR

31 0 selector base limit

CH CX CL ECX FLAGS EF 31 0 19 0

DH DX DL EDX GDTR base limit

SI ESI 31 0 19 0

DI EDI Sono presenti anche i IDTR base limit

registri Floating Point

BP EBP 8 ad 80 bit

SP ESP 31 0 CR0

15 8 7 0 CR1

RESERVED

CS

SS CR2

DS CR3

ES CR4

FS

GS 7 8

Pentium Pentium

Registri di sistema CR4,CR0 Registri di sistema CR2, CR3

0

31 6 5 4 3 2 1 fisico

Indirizzo

CR4 CR3

0 VME Virtual-8086 mode extension 12

31 11 5 4 3 2 1 0

1 PVI Protected mode virtual interrupt

2 TSD Time stamp disable P P

PAGE DIRECTORY

3 DE Debugging extension C W

BASE

4 PSE Page size extension (4KB-4MB) D T

5MCE Machine check exception PCD Page-level cache disable

29

31 30 0

5 4 3 2 1

181716 PWT Page-level write-through

P C N A W N E T E M P

G D W M P E T S M P E I segnali PCD e PWT di CR3 sono quelli che

CR0 appaiono all'esterno se il paging e'

PG Paging enable disabilitato (v. CR0) oppure se l'accesso e'

CD Cache disable Bit di controllo alla tabella delle pagine di primo livello (DIR)

NW Not writethrough della CACHE

AM Alignment mask (align Check)

WP Write protect

Reserved (numeric error in 486) CR2

Reserved 31 0

TS task switched

Reserved PAGE FAULT LINEAR ADDRESS

MP Monitor coprocessor

PE Protection enabled 9 10

Pentium

Pentium

Gestione della memoria Operating modes

Protezione e controllo nell'accesso ai vari segmenti per REAL MODE emula il funzionamento dell'8086

verificare sia il diritto di accedere sia la correttezza indirizzamento con segmento x 16

dell'indirizzo (ex: stack) + offset

anche se a 32 bit

Vari "environment" ciascuno dei quali deve risultare indirizzo lineare di 4gbyte

separato e protetto dagli altri (multiutenza) gestisce le interruzioni come l'8086

Paging: meccanismo di ottimizzazione della memoria a partire dall'indirizzo 0

che consente l'esecuzione contemporanea di piu'

processi

Supporto per il sistema operativo PROTECTED gestisce

8086 MODE la protezione in 4 livelli di privilegio,

il multitasking con descrittori di task,

la segmentazione con selettori e

Indirizzo Memoria descrittori di segmento

la paginazione con tabelle delle

pagine

le interruzioni con descrittore di

interruzione

VIRTUAL 8086 in Protected Mode, i task operanti

PENTIUM MODE in Virtual 8086 Mode sono

configurati per emulare l'8086 (es.,

una shell DOS in ambiente

Controllo Memoria

Indirizzo multitasking)

traduzione 12

11

Pentium Pentium

Real Mode Protected mode

L'accesso alla memoria avviene tramite i SEGMENTI che in

E' il modo al RESET e permette di modo protetto sono SELETTORI di DESCRITTORI di

configurare il sistema accedendo a tutte le SEGMENTI

locazioni senza protezione SELETTORE DI SEGMENTO

15 3 2 1 0

La modalità di indirizzamento è identica a

quella dell'8086 ma permette di accedere T

INDEX RPL

I

a tutta la memoria reale. Il primo accesso

e' all'indirizzo FFFFFFF0h ma appena il TI: Table Indicator

primo JUMP o CALL intersegment e' 0 ->GDT - 1 -> LDT)

effettuato i bit di indirizzo A20-A31 sono indice che punta ad un descrittore di 8 byte

posti a zero per CS. 13 bit -> 8192 selettori

Due possibili tabelle di descrittori

I segmenti sono di lunghezza 64K GLOBAL DESCRIPTION TABLE (descrive i segmenti

comuni a tutti i task)

LOCAL DESCRIPTION TABLE (descrive i segmenti privati di

Ogni segmento contiene l'indirizzo base un task)

del segmento reale RPL: Request Privilege Level (00 -> Most privileged -

Si passa in modo protetto ponendo a 1 il 11 Least privileged):

il livello di privilegio che ha il task (programma

bit PE nel registro CR0 corrente) può essere cambiato solo da da un salto

attraverso un GATE

13 14

Pentium Pentium

Livelli di privilegio Tabelle dei descrittori

(LDT)

GDT

Op. Sys. N

Op. Sys. SELETTORE DI SEGMENTO

Services N-1

Kernel N-2

INDEX ---

---

---

---

---

---

x 8 ---

0 4

1 3

2 2

Op. Sys. GDTR 1

3 Applications LIMIT

Services ( LDTR) 0

BASE

Stesso funzionamento per GDT, LDT e IDT

CPL current PL livelli\o di privilegio di Code segment corrente LDT fa parte di GDT

un task con un dato CPL puo' usare un segmento solo se I descrittori sono caricati nella cache quando indirizzati

RPL>=CPL segment descriptor cache register

EPL (effective PL) max ( CPL, RPL) 15 0 63 0

CS access base limit

DS access base limit

SS access base limit

ES access base limit

FS access base limit

GS access base limit

15 16

Pentium Pentium

Indirizzamento con

Descrittore di Segmento Segmentazione

LOGICAL ADDRESS

3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 9 8 7 6 5 4 3 2 1 0

1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 Base

D A P

Limit DS selector

Base 31:24 G 0 P S Type 23:16

B v L

19:16 64 bit

Base 15:00 Limit 15:00 LDTR

AV: disponibile per il software

BASE: indirizzo base del segmento 32 bit offset

DB: parallelismo del segmento (16/32 bit)

PL: livello di privilegio (00....11) + base

G: granularità (limite come multiplo di 1 o 4096 bytes)

LIMIT: limite del segmento (lunghezza) 20 bit

P: segmento presente in memoria (paginazione) segm. descriptor

LINEAR ADDRESS

S: tipo di descrittore (sistema/applicazione)

TYPE: tipo di segmento(dato/codice- R/W - expand index( 13 bit)

down/up) LDT

Riservato

I campi possono essere differentemente LOGICAL ADDRESS

interpretati a seconda del TIPO e di altri indirizzo dovuto alla segmentazione

parametri L'indirizzo ottenuto LINEAR ADDRESS puo' essere un

La dimensione di un segmento è multipla INDIRIZZO VIRTUALE che, se il PAGING è attivo, deve

di byte (max 1 Mbytes) o di 4 Kbytes essere tradotto in indirizzo fisico

(max 4 Gbytes) 17 18

Pentium Pentium

Indirizzamento Indirizzamento Displ..

48 bit pointer 0

47 32 31 selector

LIMIT Base Register

SS selector

SELECTOR OFFSET GS selector Index Register

FS selector

OPERAND ES selector

DS selector

Rights X

CS

Limit

Base address Scale

+

DESCRIPTOR REGISTERS

Segment

GDT ( LDT) base address Access Rights SS

L'indirizzo ottenuto è un INDIRIZZO Access Rights GS

Limit

VIRTUALE che, se il PAGING è attivo, Access Rights FS

Limit

Base Address

deve essere tradotto in indirizzo fisico Access Rights ES

Limit +

Base Address

Access Rights DS

I descrittori sono in una cache interna alla Limit LINEAR

Base Address

Access Rights CS

CPU e sono caricati ogni qualvolta ad essi Limit ADDRESS

Base Address

si fa riferimento nelle due tabelle che Limit

Base Address (indirizzo virtuale)

contengono tutti i descrittori Base Address

CACHE

19 20

Pentium

Pentium

System segment- CALL GATES Call gates

ci sono tre tipi di descrittori di segmenti di sistema CODE

PL0

TASK STATE SEGMENT descrive un task

LOCAL DESCRIPTION TABLE punta ad una ldt

GATE CODE

GATE e' usato per passare ad un livello di privilegio piu' alto GATE

PL1

Un segmento può usare solo codice al suo stesso livello

di privilegio o maggiore (maggiore il livello, più sicuro è il

codice).

CPL pl del cs corrente

RPL pl del segmnto dell'operando CODE

EPL= max( RPL, CPL) PL2

DPL pl del descrittore puntato dal segmento

dell'operando

si accede al dato solo se EPL <-= DPL CODE

GATE

PL3

call far, interrupt e trap gates permettono di eseguire una GATE

routine a privilegio diverso 22

21

Pentium Pentium

Call gates Interruzioni

pentium

15 0 31 0 IDTR register

OFFSET (non usato)

SELETTORE IDT base addrss

esempio di far call IDT limit

mem.

GDT

OFFSET DPL COUNT

SELETTORE +

GATE descriptor int n

( interrupt gate)

Interrupt type * 8

( 0<= n < 256)

BASE DPL BASE CODE

+ BASE IDT

Linear Addr interrupt

Entry point della procedura handler

Nel selettore del GATE è indicato se si tratta di GDT o LDT

23 24

Pentium Pentium

Interruzioni Task

IDT OFFSET INT. PROC. Task

+

INTERRUPT GATE

INTERRUPT e' il processo in esecuzione ed e' definito da un

VECTOR TASK STATE SEGMENT che ne definisce il

CONTESTO e dotato di un insieme di segmenti

indirizzabili dalla propria LDT (oltre che dalla

GDT)

GDT O LDT Attivare un task significa inizializzare i registri

contenuti nel TSS

del processore con i registri

DSTINATION

CODE

SEGMENT (con una attivazione da

Commutare un task

SEGMENT DESCRIPTOR parte del task stesso o da parte di una

caricare il TSS corrente

interruzione) significa

salvando il contesto e attivare un nuovo task

Interruzioni

Faults, Traps 25 26

Pentium Pentium

Task state segment Nested Tasks

punta al precedente TSS

BACK LINK 00 TOP LEVEL TASK NESTED TASK NESTED TASK

ESP0 SS0

ESP1 SS1

ESP2 SS2

CR3 NT=0 NT=1

EIP NT=1

EFLAGS

EAX

ECX

EDX LINK LINK

EBX LINK

ESP

EBP

ESI

EDI ES

CS

SS

DS

FS NT=1 Task Register

GS

LDT

I/o map base 000000 T 64h EFLAGS

Il Task Register contiene il selettore del TSS descriptor

TR nel Pentium task gate descriptor

104 byte 27 28

Pentium

Pentium

I/O Paging INDIRIZZI FISICI

INDIRIZZI LINEARI

PAGE PAGE

MAPPING

PAGE PAGE

La possibilità di effettuare I/O dipende dal bits di IOPL nei

flags PAGE

I bit della mappa di I/O nel TSS controllano l'accesso alle PAGE

porte di I/O PAGE

L'accesso alle porte di I/O è possibile se: PAGE

≤ ΙΟPL

CPL oppure PAGE

I/O PERMISSION MAP OK PAGE

La mappa di I/O contiene un bit per ogni porta accessibile

EX: il bit di controllo per la porta 41(decimale) si trova nel

41-esimo bit della mappa ovvero nel PRIMO bit del

sesto byte. PAGE PAGE

PAGE PAGE

PAGE PAGE

PAGE PAGE

PAGE PAGE

30

29

Pentium Pentium

Spazi di indirizzamento Paging

Numero di pagina virtuale Offset in pagina

LOGICAL ADDRESS SPACE

16 byte (selector) + 32 byte offset = 48bit Indirizzo lineare o virtuale

64 Tbyte

LINEAR ADDRESS +

in protected

dopo la segmentazione TABELLA DELLE PAGINE

indirizzo di 32 byte (BASE nel descrittore + Base della tabella

OFFSET) delle pagine (registro)

per indirizzare 4GBYTE

PHYSICAL ADDRESS 32 bit Protezione

spazio di indirizzamento 4, 16,..., 256

reale: Dirty bit

Mbyte... Reference bit

In memoria

di

PAGINAZIONE permette di allocare pagine

4K o 4 Mbye indirizzate da un indirizzo

VIRTUALE nell'indirizzo fisico disponibile Numero di pagina fisico Offset in

pagina

PAGINAZIONE e' indipendente dalla

SEGMENTAZIONE 31 32

Pentium Pentium

Paging nel Pentium Tabelle delle Pagine da 4K

Indirizzo lineare o

DIR PAGE OFFSET virtuale

31 22 21 12 11 0 DIR

C W

Page frame address 31:12 Av. 0 0 0 A UWP

D T ELEMENT

Utilizzabile dal software

1023

PAGE Dimensione pagina

Utilizzata

DIRECTORY Disabilitazione cache

Write through

PAGE User/Supervisor

TABLE

0 Scrivibile

Presente in memoria

CR3 page

Due accessi in memoria TABLE

C W

frame Page frame address 31:12 Av. 00 D A UWP

per due livelli di tabella D T ELEMENT

address

per la PAGINAZIONE Utilizzabile dal software

Dirty

oltre all'accesso alla GDT Utilizzata

o LDT per la Disabilitazione cache

Write through

SEGMENTAZIONE User/Supervisor

Numero di pagina fisico Offset in Scrivibile

Presente in memoria

pagina

33 34

Pentium

Pentium

TLB PENTIUM TLB

Translation lookaside buffer (TLB)

cache che memorizza le page entries piu' recentemente

usate ( intel garantisce 98% hit rate) TLB del PENTIUM: due per DATI (4 vie-

64 elementi per pagine da 4 K oppure 4

Numero di pagina virtuale Offset in pagina vie 8 elementi per pagine da 4 MB) e

uno per ISTRUZIONI (4 vie 32 elementi

TLB per entrambi i tipi di pagine)

I TLB dei dati sono doppia porta (due

Tag + Indirizzo di pagina pipelines)

I TLB dei dati e delle istruzioni sono

protetti da bit di parita'

Il rimpiazzamento avviene tramite un

Confronto Tags algoritmo LRU che richiede 3 bit per set

Selezione Elemento

Dato 36

35

Pentium Pentium

PENTIUM TLB Pseudo LRU

CODE TLB (pagine da 4 KB) Meccanismo di rimpiazzamento (per l TLB, analogo per

le caches che sono pero' a due vie):

4-way, 32 entries I 4 elementi di un set sono indicati con I0, I1, I2 e I3

Se una linea e' non valida viene rimpiazzata.

17 16

31 12 11 0 Vi sono tre bit (B0, B1 e B2) per set

TAG SEL OFFSET Se l'ultimo accesso al set e' stato a I0 o a I1 allora B0=1

altrimenti B0=0

Se l'ultimo accesso alla coppia I0:I1 e' stato a I0 allora

TAG Prot. Phys. Addr. Attr.

4 B1=1 altrimenti B1=0

Se l'utlimo accesso alla coppia I2:I3 e' stato a I2 allora

TAG Prot. Phys. Addr. Attr.

4 B2=1 altrimenti B2=0

TAG Prot. Phys. Addr. Attr.

4 All'atto del rimpiazzamento:

TAG Prot. Phys. Addr. Attr.

4 La cache prima seleziona quale fra I0:I1 e I2:I3 ha avuto

l'accesso meno recente (B0) e poi seleziona all'interno

Phys. Addr. Attr.

TAG Prot.

4 della coppia B0=0 ?

Phys. Addr. Attr.

TAG Prot.

4 Si (IO:I1) piu' remoti No (I2:I3) piu' remoti

TAG Prot. Phys. Addr. Attr.

4 B1=0 ? B2=0 ?

Si No Si No

Esiste un secondo TLB (8 elementi - 4 vie) I0 I1 I2 I3

RIMPIAZZA

per le pagine da 4 MB

37 38

Pentium

Indirizzamento completo

15 0 31 0 Indirizzo

OFFSET

SELECTOR Logico

Descrittore Indirizzo lineare

31 22 21 12 11 0

DIR OFFSET

TABLE OPERANDO

P.TBL.ENTRY

DIR ENTRY

CR3 39

PowerPC User Instruction Set Architecture

Book I

Version 2.01

September 2003

Manager:

Joe Wetzel/Poughkeepsie/IBM

Technical Content:

Ed Silha/Austin/IBM Cathy May/Watson/IBM Brad Frey/Austin/IBM

The following paragraph does not apply to the United Kingdom or any country or state where such provisions are

inconsistent with local law.

The specifications in this manual are subject to change without notice. This manual is provided “AS IS”. Interna-

tional Business Machines Corp. makes no warranty of any kind, either expressed or implied, including, but not

limited to, the implied warranties of merchantability and fitness for a particular purpose.

International Business Machines Corp. does not warrant that the contents of this publication or the accompanying

source code examples, whether individually or as one or more groups, will meet your requirements or that the

publication or the accompanying source code examples are error-free.

This publication could include technical inaccuracies or typographical errors. Changes are periodically made to

the information herein; these changes will be incorporated in new editions of the publication.

Address comments to IBM Corporation, Internal Zip 9630, 11400 Burnett Road, Austin, Texas 78758-3493. IBM

may use or distribute whatever information you supply in any way it believes appropriate without incurring any

obligation to you.

The following terms are trademarks of the International Business Machines Corporation in the United States

and/or other countries:

IBM PowerPC RISC/System 6000 POWER POWER2 POWER4 POWER4+ IBM System/370

Notice to U.S. Government Users—Documentation Related to Restricted Rights—Use, duplication or disclosure is

subject to restrictions set fourth in GSA ADP Schedule Contract with IBM Corporation.

 Copyright International Business Machines Corporation, 1994, 2003. All rights reserved.

ii PowerPC User Instruction Set Architecture

Version 2.01

Preface

This document defines the PowerPC User Instruction Operating Environment Architecture defines the

Set Architecture. It covers the base instruction set system (privileged) instructions and related facilities.

and related facilities available to the application pro- Book IV, PowerPC Implementation Features defines

grammer. the implementation-dependent aspects of a particular

implementation.

Other related documents define the PowerPC Virtual

Environment Architecture, the PowerPC Operating As used in this document, the term “PowerPC Archi-

tecture” refers to the instructions and facilities

Environment Architecture, and PowerPC Implementa- described in Books I, II, and III. The description of the

tion Features. Book II, PowerPC Virtual Environment instantiation of the PowerPC Architecture in a given

Architecture defines the storage model and related implementation includes also the material in Book IV

instructions and facilities available to the application for that implementation.

programmer, and the time-keeping facilities available

to the application programmer. Book III, PowerPC Preface iii

Version 2.01

iv PowerPC User Instruction Set Architecture

Version 2.01

Table of Contents 2.1 Branch Processor Overview 17

Chapter 1. Introduction . . . .

1

. . . . . . . . . 17

2.2 Instruction Execution Order . . . .

1.1 Overview 1

. . . . . . . . . . . . . . . . 2.3 Branch Processor Registers 18

. . . .

1.2 Computation Modes 1

. . . . . . . . . 2.3.1 Condition Register 18

. . . . . . . . .

1.3 Instruction Mnemonics and 2.3.2 Link Register 19

. . . . . . . . . . . .

Operands 1

. . . . . . . . . . . . . . . . . 2.3.3 Count Register 19

. . . . . . . . . . .

1.4 Compatibility with the POWER 2.4 Branch Processor Instructions 20

. . .

Architecture 2

. . . . . . . . . . . . . . . . 2.4.1 Branch Instructions 20

. . . . . . . .

1.5 Document Conventions 2

. . . . . . . 2.4.2 System Call Instruction 25

. . . . . .

1.5.1 Definitions and Notation 2

. . . . . 2.4.3 Condition Register Logical

1.5.2 Reserved Fields 3

. . . . . . . . . . Instructions 26

. . . . . . . . . . . . . . . .

1.5.3 Description of Instruction Operation 4 2.4.4 Condition Register Field

1.6 Processor Overview 6

. . . . . . . . . Instruction 28

. . . . . . . . . . . . . . . . .

7

1.7 Instruction Formats . . . . . . . . .

1.7.1 I-Form 8

. . . . . . . . . . . . . . . . Chapter 3. Fixed-Point Processor 29

. .

1.7.2 B-Form 8

. . . . . . . . . . . . . . . . 3.1 Fixed-Point Processor Overview 29

. .

1.7.3 SC-Form 8

. . . . . . . . . . . . . . . 3.2 Fixed-Point Processor Registers 29

. .

1.7.4 D-Form 8

. . . . . . . . . . . . . . . . 3.2.1 General Purpose Registers 29

. . . .

1.7.5 DS-Form 8

. . . . . . . . . . . . . . . 30

3.2.2 Fixed-Point Exception Register .

1.7.6 X-Form 9

. . . . . . . . . . . . . . . . 3.3 Fixed-Point Processor Instructions 31

1.7.7 XL-Form 9

. . . . . . . . . . . . . . . 3.3.1 Fixed-Point Storage Access

1.7.8 XFX-Form 9

. . . . . . . . . . . . . . Instructions 31

. . . . . . . . . . . . . . . .

1.7.9 XFL-Form 9

. . . . . . . . . . . . . . 31

3.3.2 Fixed-Point Load Instructions . .

9

1.7.10 XS-Form . . . . . . . . . . . . . . 38

3.3.3 Fixed-Point Store Instructions . .

1.7.11 XO-Form 9

. . . . . . . . . . . . . . 3.3.4 Fixed-Point Load and Store with

10

1.7.12 A-Form . . . . . . . . . . . . . . . Byte Reversal Instructions 42

. . . . . . .

1.7.13 M-Form 10

. . . . . . . . . . . . . . . 3.3.5 Fixed-Point Load and Store

1.7.14 MD-Form 10

. . . . . . . . . . . . . . Multiple Instructions 44

. . . . . . . . . . .

1.7.15 MDS-Form 10

. . . . . . . . . . . . . 3.3.6 Fixed-Point Move Assist

10

1.7.16 Instruction Fields . . . . . . . . . 45

Instructions . . . . . . . . . . . . . . . .

12

1.8 Classes of Instructions . . . . . . . 48

3.3.7 Other Fixed-Point Instructions . .

12

1.8.1 Defined Instruction Class . . . . . 3.3.8 Fixed-Point Arithmetic Instructions 49

1.8.2 Illegal Instruction Class 12

. . . . . . 3.3.9 Fixed-Point Compare Instructions 58

12

1.8.3 Reserved Instruction Class . . . . 3.3.10 Fixed-Point Trap Instructions 60

. .

1.9 Forms of Defined Instructions 13

. . . 3.3.11 Fixed-Point Logical Instructions 62

1.9.1 Preferred Instruction Forms 13

. . . 3.3.12 Fixed-Point Rotate and Shift

1.9.2 Invalid Instruction Forms 13

. . . . . Instructions 68

. . . . . . . . . . . . . . . .

13

1.10 Optionality . . . . . . . . . . . . . . 3.3.13 Move To/From System Register

1.11 Exceptions 13

. . . . . . . . . . . . . . 78

Instructions . . . . . . . . . . . . . . . .

1.12 Storage Addressing 14

. . . . . . . .

1.12.1 Storage Operands 14

. . . . . . . . Chapter 4. Floating-Point Processor 81

1.12.2 Effective Address Calculation 14

. . 4.1 Floating-Point Processor Overview 81

4.2 Floating-Point Processor Registers 82

Chapter 2. Branch Processor 17

. . . . . 4.2.1 Floating-Point Registers 82

. . . . . .

Table of Contents v

Version 2.01

5.3.6 PowerPC Cache Management

4.2.2 Floating-Point Status and Control Instructions in Little-Endian Mode

Register 127

83

. . . . . . . . . . . . . . . . . . .

4.3 Floating-Point Data 5.3.7 PowerPC I/O in Little-Endian

85

. . . . . . . . . .

4.3.1 Data Format Mode

85 128

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

4.3.2 Value Representation 5.3.8 Origin of Endian

86 128

. . . . . . . . . . . . . . . .

4.3.3 Sign of Result 87

. . . . . . . . . . . . Chapter 6. Optional Facilities and

4.3.4 Normalization and

Denormalization 87

. . . . . . . . . . . . . Instructions that are being Phased

4.3.5 Data Handling and Precision 88

. . . Out of the Architecture 131

. . . . . . . .

4.3.6 Rounding 89

. . . . . . . . . . . . . . 6.1 Move To Condition Register from

4.4 Floating-Point Exceptions 89

. . . . . . XER 131

. . . . . . . . . . . . . . . . . . . .

4.4.1 Invalid Operation Exception 91

. . .

4.4.2 Zero Divide Exception 92

. . . . . . . Appendix A. Suggested

4.4.3 Overflow Exception 93

. . . . . . . . Floating-Point Models 133

. . . . . . . . .

4.4.4 Underflow Exception 93

. . . . . . . . A.1 Floating-Point Round to

4.4.5 Inexact Exception 94

. . . . . . . . . Single-Precision Model 133

. . . . . . . .

4.5 Floating-Point Execution Models 94

. . A.2 Floating-Point Convert to Integer

4.5.1 Execution Model for IEEE Model 138

. . . . . . . . . . . . . . . . . .

Operations 95

. . . . . . . . . . . . . . . . A.3 Floating-Point Convert from

4.5.2 Execution Model for Multiply-Add Integer Model 141

. . . . . . . . . . . . . .

Type Instructions 96

. . . . . . . . . . . . .

4.6 Floating-Point Processor Appendix B. Assembler Extended

97

Instructions . . . . . . . . . . . . . . . . Mnemonics

4.6.1 Floating-Point Storage Access 143

. . . . . . . . . . . . . . . . .

Instructions B.1 Symbols

97 143

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

97 144

4.6.2 Floating-Point Load Instructions B.2 Branch Mnemonics . . . . . . . .

.

4.6.3 Floating-Point Store Instructions 100 B.2.1 BO and BI Fields 144

. . . . . . . . .

4.6.4 Floating-Point Move Instructions 104 144

B.2.2 Simple Branch Mnemonics . .

4.6.5 Floating-Point Arithmetic B.2.3 Branch Mnemonics

Instructions Incorporating Conditions 145

105

. . . . . . . . . . . . . . . . . . . . . .

B.2.4 Branch Prediction 146

4.6.6 Floating-Point Rounding and . . . . . . . .

109 B.3 Condition Register Logical

Conversion Instructions . . . . . . . .

4.6.7 Floating-Point Compare Mnemonics 147

. . . . . . . . . . . . . . . 147

Instructions 113 B.4 Subtract Mnemonics

. . . . . . . . . . . . . . . . . . . . . .

B.4.1 Subtract Immediate 147

4.6.8 Floating-Point Status and Control . . . . . . .

B.4.2 Subtract

114

Register Instructions 148

. . . . . . . . . . . . . . . . . . . . . . . 148

B.5 Compare Mnemonics . . . . . . .

Chapter 5. Optional Facilities and 149

B.5.1 Doubleword Comparisons . . .

B.5.2 Word Comparisons 149

. . . . . . .

Instructions 117

. . . . . . . . . . . . . . . . . 150

B.6 Trap Mnemonics . . . . . . . . . .

5.1 Fixed-Point Processor Instructions 118 151

B.7 Rotate and Shift Mnemonics . . .

5.1.1 Move To/From System Register B.7.1 Operations on Doublewords 151

. .

Instructions 118

. . . . . . . . . . . . . . . B.7.2 Operations on Words 152

. . . . . .

5.2 Floating-Point Processor B.8 Move To/From Special Purpose

Instructions 119

. . . . . . . . . . . . . . . Register Mnemonics 153

. . . . . . . . .

5.2.1 Floating-Point Arithmetic 153

B.9 Miscellaneous Mnemonics . . . .

Instructions 120

. . . . . . . . . . . . . . .

5.2.2 Floating-Point Select Instruction 121 Appendix C. Programming

122

5.3 Little-Endian . . . . . . . . . . . . . Examples

5.3.1 Byte Ordering 122 155

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.3.2 Structure Mapping Examples 122 C.1 Multiple-Precision Shifts 155

. . . . . .

5.3.3 PowerPC Byte Ordering 123 C.2 Floating-Point Conversions 158

. . . . . . . . .

5.3.4 PowerPC Data Addressing in C.2.1 Conversion from Floating-Point

Little-Endian Mode 125 Number to Floating-Point Integer 158

. . . . . . . . . . . . .

5.3.5 PowerPC Instruction Addressing

in Little-Endian Mode 126

. . . . . . . . .

vi PowerPC User Instruction Set Architecture

Version 2.01 E.14 Multiple Register Loads

C.2.2 Conversion from Floating-Point 165

. . . . .

Number to Signed Fixed-Point Integer E.15 Load/Store Multiple Instructions 165

Doubleword E.16 Move Assist Instructions 165

158

. . . . . . . . . . . . . . . . . . .

C.2.3 Conversion from Floating-Point E.17 Move To/From SPR 165

. . . . . . . .

E.18 Effects of Exceptions on FPSCR

Number to Unsigned Fixed-Point Bits FR and FI

Integer Doubleword 158 166

. . . . . . . . . . . . . . . . . . . . . . .

C.2.4 Conversion from Floating-Point E.19 Store Floating-Point Single

Number to Signed Fixed-Point Integer Instructions 166

. . . . . . . . . . . . . . .

Word E.20 Move From FPSCR 166

158

. . . . . . . . . . . . . . . . . . . . . . . . . . .

E.21 Zeroing Bytes in the Data Cache 166

C.2.5 Conversion from Floating-Point E.22 Synchronization

Number to Unsigned Fixed-Point 166

. . . . . . . . . .

E.23 Move To Machine State Register

Integer Word 159

. . . . . . . . . . . . . . Instruction

C.2.6 Conversion from Signed 166

. . . . . . . . . . . . . . . .

Fixed-Point Integer Doubleword to E.24 Direct-Store Segments 167

. . . . . .

Floating-Point Number E.25 Segment Register Manipulation

159

. . . . . . . .

C.2.7 Conversion from Unsigned Instructions 167

. . . . . . . . . . . . . . .

Fixed-Point Integer Doubleword to E.26 TLB Entry Invalidation 167

. . . . . .

Floating-Point Number E.27 Alignment Interrupts

159 167

. . . . . . . . . . . . . . .

C.2.8 Conversion from Signed E.28 Floating-Point Interrupts 167

. . . . .

E.29 Timing Facilities

Fixed-Point Integer Word to 167

. . . . . . . . .

Floating-Point Number 159 E.29.1 Real-Time Clock 167

. . . . . . . . . . . . . . . .

C.2.9 Conversion from Unsigned 168

E.29.2 Decrementer . . . . . . . . . . 168

Fixed-Point Integer Word to E.30 Deleted Instructions . . . . . . .

159 169

E.31 Discontinued Opcodes

Floating-Point Number . . . . . . . . . . . . . .

C.3 Floating-Point Selection E.32 POWER2 Compatibility

160 170

. . . . . . . . . . .

160

C.3.1 Comparison to Zero E.32.1 Cross-Reference for Changed

. . . . . .

C.3.2 Minimum and Maximum 160 POWER2 Mnemonics 170

. . . . . . . . . . . . .

C.3.3 Simple if-then-else E.32.2 Floating-Point Conversion to

Constructions Integer

160 170

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

E.32.3 Floating-Point Interrupts 170

160

C.3.4 Notes . . . . . . . . . . . . . . . . . . .

E.32.4 Trace 170

. . . . . . . . . . . . . . .

Appendix D. Cross-Reference for E.32.5 Deleted Instructions 171

. . . . . .

E.32.6 Discontinued Opcodes 171

. . . . .

Changed POWER Mnemonics 161

. . . Appendix F. New Instructions 173

. . . .

Appendix E. Incompatibilities with

the POWER Architecture 163

. . . . . . . Appendix G. Illegal Instructions 175

. .

E.1 New Instructions, Formerly

Privileged Instructions 163

. . . . . . . . Appendix H. Reserved Instructions 177

163

E.2 Newly Privileged Instructions . . 163

E.3 Reserved Fields in Instructions . Appendix I. Opcode Maps 179

. . . . . .

E.4 Reserved Bits in Registers 163

. . . .

E.5 Alignment Check 163

. . . . . . . . . . Appendix J. PowerPC Instruction

E.6 Condition Register 164

. . . . . . . . . Set Sorted by Opcode

E.7 LK and Rc Bits 164 193

. . . . . . . . . . . . . . . . . . . .

E.8 BO Field 164

. . . . . . . . . . . . . . . Appendix K. PowerPC Instruction

E.9 BH Field 164

. . . . . . . . . . . . . . .

E.10 Branch Conditional to Count Set Sorted by Mnemonic 199

. . . . . . .

164

Register . . . . . . . . . . . . . . . . .

E.11 System Call 164

. . . . . . . . . . . . Index 205

. . . . . . . . . . . . . . . . . . . . . .

E.12 Fixed-Point Exception Register

(XER) 165

. . . . . . . . . . . . . . . . . . . Last Page - End of Document 209

. . . .

E.13 Update Forms of Storage Access

Instructions 165

. . . . . . . . . . . . . . . Table of Contents vii

Version 2.01

viii PowerPC User Instruction Set Architecture

Version 2.01

Figures

1. Logical processing model 29. Floating-point single format 85

6

. . . . . . . . . . . . . . . . . .

30. Floating-point double format

2. PowerPC user register set 85

7 . . . . . . . .

. . . . . . . . . 31. IEEE floating-point fields

3. I instruction format 85

8

. . . . . . . . . . . . . . . . . . . . . . .

32. Approximation to real numbers

4. B instruction format 86

8

. . . . . . . . . . . . . . . . . . .

33. Selection of Z1 and Z2

5. SC instruction format 8 89

. . . . . . . . . . . . . . . . . . . . . . .

6. D instruction format 34. IEEE 64-bit execution model 95

8

. . . . . . . . . . . . . . . . . . . . .

35. Interpretation of G, R, and X bits

7. DS instruction format 95

8

. . . . . . . . . . . . . . . . . .

36. Location of the Guard, Round, and Sticky

8. X instruction format 9

. . . . . . . . . . . . . bits in the IEEE execution model

9. XL instruction format 95

9

. . . . . . . . . . . . . . . . . .

37. Multiply-add 64-bit execution model

10. XFX instruction format 96

9

. . . . . . . . . . . . . . .

38. Location of the Guard, Round, and Sticky

11. XFL instruction format 9

. . . . . . . . . . . bits in the multiply-add execution model 96

12. XS instruction format 9

. . . . . . . . . . . . .

13. XO instruction format 39. C structure 's', showing values of elements 123

9

. . . . . . . . . . . . 40. Big-Endian mapping of structure 's' 123

14. A instruction format 10

. . . . . . . . . . . . . . . .

15. M instruction format 41. Little-Endian mapping of structure 's' 123

10

. . . . . . . . . . . . . .

16. MD instruction format 42. PowerPC Little-Endian, structure 's' in

10

. . . . . . . . . . . .

17. MDS instruction format storage subsystem 124

10 . . . . . . . . . . . .

. . . . . . . . . . . 43. PowerPC Little-Endian, structure 's' as

18

18. Condition Register . . . . . . . . . . . . . seen by processor

19. Link Register 125

19

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

20. Count Register 19 44. Little-Endian mapping of word 'w' stored at

. . . . . . . . . . . . . . .

21. BO field encodings 20 address 5 125

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22. "at" bit encodings 45. PowerPC Little-Endian, word 'w' stored at

20

. . . . . . . . . . . . . .

23. BH field encodings address 5 in storage subsystem 126

21

. . . . . . . . . . . . . . . . . .

24. General Purpose Registers 46. Assembly language program 'p' 126

29

. . . . . . . . . . . . . .

47. Big-Endian mapping of program 'p' 126

25. Fixed-Point Exception Register 30

. . . . . . . . . .

48. Little-Endian mapping of program 'p' 126

26. Floating-Point Registers 83

. . . . . . . . . . . .

49. PowerPC Little-Endian, program 'p' in

27. Floating-Point Status and Control Register 83 storage subsystem

85 127

28. Floating-Point Result Flags . . . . . . . . . . . . . . . . . . . . .

Figures ix

Version 2.01

x PowerPC User Instruction Set Architecture

Version 2.01

Chapter 1. Introduction

1.1 Overview 1.7.9 XFL-Form 9

1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.7.10 XS-Form

1.2 Computation Modes 1 9

. . . . . . . . . . . . . .

. . . . . . . . .

1.3 Instruction Mnemonics and 1.7.11 XO-Form 9

. . . . . . . . . . . . . .

Operands 1.7.12 A-Form

1 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.4 Compatibility with the POWER 1.7.13 M-Form 10

. . . . . . . . . . . . . . .

Architecture 1.7.14 MD-Form 10

2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.7.15 MDS-Form

1.5 Document Conventions 2 10

. . . . . . . . . . . . .

. . . . . . .

1.5.1 Definitions and Notation 1.7.16 Instruction Fields

2 10

. . . . . . . . .

. . . . .

1.5.2 Reserved Fields 3 12

1.8 Classes of Instructions

. . . . . . . . . . . . . . . . . 12

1.5.3 Description of Instruction Operation 4 1.8.1 Defined Instruction Class . . . . . 12

1.8.2 Illegal Instruction Class

6

1.6 Processor Overview . . . . . . . . . . . . . . .

1.8.3 Reserved Instruction Class 12

1.7 Instruction Formats 7

. . . . . . . . . . . . . 13

1.9 Forms of Defined Instructions

1.7.1 I-Form 8

. . . . . . . . . . . . . . . . . . .

1.7.2 B-Form 1.9.1 Preferred Instruction Forms 13

8

. . . . . . . . . . . . . . . . . . .

1.7.3 SC-Form 1.9.2 Invalid Instruction Forms 13

8

. . . . . . . . . . . . . . . . . . . .

1.7.4 D-Form 1.10 Optionality

8 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.7.5 DS-Form 1.11 Exceptions

8 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.7.6 X-Form 1.12 Storage Addressing

9

. . . . . . . . . . . . . . . . . . . . . . . .

1.7.7 XL-Form 14

1.12.1 Storage Operands

9

. . . . . . . . . . . . . . . . . . . . . . . 14

1.12.2 Effective Address Calculation

1.7.8 XFX-Form 9

. . . . . . . . . . . . . . . .

1.1 Overview 1.3 Instruction Mnemonics and

Operands

This chapter describes computation modes, compat-

ibility with the POWER Architecture, document con- The description of each instruction includes the mne-

ventions, a processor overview, instruction formats, monic and a formatted list of operands. Some exam-

storage addressing, and instruction fetching. ples are the following.

stw RS,D(RA)

1.2 Computation Modes addis RT,RA,SI

PowerPC-compliant Assemblers will support the mne-

Processors provide two execution enviroments, 32-bit monics and operand lists exactly as shown. They

and 64-bit. In both of these environments (modes), should also provide certain extended mnemonics, as

instructions that set a 64-bit register affect all 64 bits, described in Appendix B, “Assembler Extended

and the value placed into the register is independent Mnemonics” on page 143.

of mode. Chapter 1. Introduction 1

Version 2.01

register into which the result of an operation is

1.4 Compatibility with the placed.

POWER Architecture ■ (RA|0) means the contents of register RA if the

RA field has the value 1-31, or the value 0 if the

RA field is 0.

The PowerPC Architecture provides binary compat- ■

ibility for POWER application programs, except as Bits in registers, instructions, and fields are spec-

described in Appendix E, “Incompatibilities with the ified as follows.

POWER Architecture” on page 163. — Bits are numbered left to right, starting with

bit 0.

Many of the PowerPC instructions are identical to — Ranges of bits are specified by two numbers

POWER instructions. For some of these the PowerPC separated by a colon (:). The range p:q con-

instruction name and/or mnemonic differs from that in sists of bits p through q.

POWER. To assist readers familiar with the POWER ■

Architecture, POWER mnemonics are shown with the X means bit p of register/field X.

p

individual instruction descriptions when they differ ■ X means bits p through q of register/field X.

p:q

from the PowerPC mnemonics. Also, Appendix D, ■

“Cross-Reference for Changed POWER Mnemonics” X means bits p, q, ... of register/field X.

p q ...

on page 161 provides a cross-reference from POWER ■ ¬ ( R A ) means the one's complement of the con-

mnemonics to PowerPC mnemonics for the tents of register RA.

instructions in Books I, II, and III. ■ Field i refers to bits 4× i through 4× i + 3 of a reg-

References to the POWER Architecture include ister.

POWER2 implementations of the POWER Architecture ■ A period (.) as the last character of an instruction

unless otherwise stated. mnemonic means that the instruction records

status information in certain fields of the Condi-

tion Register as a side effect of execution, as

1.5 Document Conventions described in Chapter 2 through Chapter 4.

■ The symbol is used to describe the concat-

||

enation of two values. For example, 010 111 is

||

1.5.1 Definitions and Notation the same as 010111.

■ n th

x means x raised to the n power.

The following definitions and notation are used ■ n

throughout the PowerPC Architecture documents. x means the replication of x, n times (i.e., x con-

n n

catenated to itself n− 0 and 1 are

1 times).

■ A program is a sequence of related instructions. special cases:

■ Quadwords are 128 bits, doublewords are 64 bits, n

— 0 means a field of n bits with each bit equal

words are 32 bits, halfwords are 16 bits, and 5

to 0. Thus 0 is equivalent to 0b00000.

bytes are 8 bits. n 1 means a field of n bits with each bit equal

■ 5

All numbers are decimal unless specified in some to 1. Thus 1 is equivalent to 0b11111.

special way. ■ Positive means greater than zero.

— 0bnnnn means a number expressed in binary ■ Negative means less than zero.

format. ■

— 0xnnnn means a number expressed in A system library program is a component of the

hexadecimal format. system software that can be called by an applica-

tion program using a Branch instruction.

Underscores may be used between digits. ■ A system service program is a component of the

■ RT, RA, R1, ... refer to General Purpose Regis- system software that can be called by an applica-

ters. tion program using a System Call instruction.

■ FRT, FRA, FR1, ... refer to Floating-Point Regis- ■ The system trap handler is a component of the

ters. system software that receives control when the

■ (x) means the contents of register x, where x is conditions specified in a Trap instruction are sat-

the name of an instruction field. For example, isfied.

(RA) means the contents of register RA, and ■ The system error handler is a component of the

(FRA) means the contents of register FRA, where system software that receives control when an

RA and FRA are instruction fields. Names such error occurs. The system error handler includes

as LR and CTR denote registers, not fields, so a component for each of the various kinds of

parentheses are not used with them. Paren- error. These error-specific components are

theses are also omitted when register x is the referred to as the system alignment error

2 PowerPC User Instruction Set Architecture

Version 2.01

handler, the system data storage error handler, 1.5.2 Reserved Fields

etc.

■ Each bit and field in instructions, and in status Reserved fields in instructions are ignored by the

and control registers (e.g., XER, FPSCR) and processor.

Special Purpose Registers, is either defined or

reserved. The handling of reserved bits in System Registers

(e.g., XER, FPSCR) is implementation-dependent.

■ /, //, ///, ... denotes a reserved field in an instruc- Unless otherwise stated, software is permitted to

tion. write any value to such a bit. A subsequent reading

■ Latency refers to the interval from the time an of the bit returns 0 if the value last written to the bit

instruction begins execution until it produces a was 0 and returns an undefined value (0 or 1) other-

result that is available for use by a subsequent wise.

instruction. Programming Note

■ Unavailable refers to a resource that cannot be

used by the program. For example, storage is Reserved fields in instructions should be coded as

unavailable if access to it is denied. See Book III, zero, and reserved bits in System Registers

PowerPC Operating Environment Architecture. should be set to zero, because these fields and

bits may be assigned a meaning in some future

■ The results of executing a given instruction are version of the architecture, such that the value

said to be boundedly undefined if they could have zero will be consistent with the “old behavior”.

been achieved by executing an arbitrary finite

sequence of instructions (none of which yields

boundedly undefined results) in the state the Programming Note

processor was in before executing the given

instruction. Boundedly undefined results may It is the responsibility of software to preserve bits

include the presentation of inconsistent state to that are now reserved in System Registers, as

the system error handler as described in the they may be assigned a meaning in some future

section entitled “Concurrent Modification and version of the architecture.

Execution of Instructions” in Book II. Boundedly

undefined results for a given instruction may vary In order to accomplish this preservation in imple-

between implementations, and between different mentation-independent fashion, software should

executions on the same implementation, and are do the following.

not further defined in this document. ■ Initialize each such register supplying zeros

■ The sequential execution model is the model of for all reserved bits.

program execution described in Section 2.2, Alter (defined) bit(s) in the register by reading

“Instruction Execution Order” on page 17. the register, altering only the desired bit(s),

and then writing the new value back to the

register.

The XER and FPSCR are partial exceptions to this

recommendation. Software can alter the status

bits in these registers, preserving the reserved

bits, by executing instructions that have the side

effect of altering the status bits. Similarly, soft-

ware can alter any defined bit in the FPSCR by

executing a Floating-Point Status and Control Reg-

ister instruction. Using such instructions is likely

to yield better performance than using the method

described in the second item above.

When a currently reserved bit is subsequently

assigned a meaning, every effort will be made to

have the value to which the system initializes the

bit correspond to the “old behavior”.

Chapter 1. Introduction 3

Version 2.01

MEM(x, y) Contents of y bytes of storage

1.5.3 Description of Instruction starting at address x. In 32-bit mode

Operation the high-order 32 bits of the 64-bit

value x are ignored.

(x, y) Result of rotating the 64-bit value x

ROTL

A formal description is given of the operation of each 64

instruction. In addition, the operation of most left y positions

instructions is described by a semiformal language at ROTL (x, y) Result of rotating the 64-bit value

32

the register transfer level (RTL). This RTL uses the x||x left y positions, where x is 32

notation given below, in addition to the definitions and bits long

notation described in Section 1.5.1, “Definitions and SINGLE(x) Result of converting x from floating-

Notation” on page 2. Some of this notation is also point double format to floating-point

used in the formal descriptions of instructions. RTL single format, using the model shown

notation not summarized here should be self- on page 100

explanatory. SPREG(x) Special Purpose Register x

TRAP Invoke the system trap handler

The RTL descriptions cover the normal execution of characterization Reference to the setting of status

the instruction, except that “standard” setting of the bits, in a standard way that is

Condition Register, Fixed-Point Exception Register, explained in the text

and Floating-Point Status and Control Register are not undefined An undefined value. The value may

shown. (“Non-standard” setting of these registers, vary between implementations, and

such as the setting of the Condition Register by the between different executions on the

Compare instructions, is shown.) The RTL same implementation.

descriptions do not cover cases in which the system CIA Current Instruction Address, which is

error handler is invoked, or for which the results are the 64-bit address of the instruction

boundedly undefined. being described by a sequence of

RTL. Used by relative branches to

The RTL descriptions specify the architectural trans- set the Next Instruction Address

formation performed by the execution of an instruc- (NIA), and by Branch instructions

tion. They do not imply any particular implementation. with L K = 1 to set the Link Register.

In 32-bit mode the high-order 32 bits

Notation Meaning of CIA are always set to 0. Does not

← correspond to any architected reg-

Assignment

← Assignment of an instruction effec- ister.

iea tive address. In 32-bit mode the NIA Next Instruction Address, which is

high-order 32 bits of the 64-bit target the 64-bit address of the next

address are set to 0. instruction to be executed. For a

¬ NOT logical operator successful branch, the next instruc-

+ Two's complement addition tion address is the branch target

− address: in RTL, this is indicated by

Two's complement subtraction, unary assigning a value to NIA. For other

minus

Multiplication instructions that cause non-

× Division (yielding quotient) sequential instruction fetching (see

÷ Square root Book III, PowerPC Operating Envi-

=, Equals, Not Equals relations ronment Architecture), the RTL is

≤ ≥

, , , similar. For instructions that do not

Signed comparison relations

< >

u u

, Unsigned comparison relations branch, and do not otherwise cause

< >

? Unordered comparison relation instruction fetching to be non-

&, | AND, OR logical operators sequential, the next instruction

⊕ ≡

, Exclusive OR, Equivalence logical address is CIA+4. In 32-bit mode

operators ((a≡ b) = (a⊕ ¬ b ) ) the high-order 32 bits of NIA are

ABS(x) Absolute value of x always set to 0. Does not corre-

CEIL(x) Least integer x spond to any architected register.

DOUBLE(x) Result of converting x from floating- if ... then ... else ... Conditional execution, indenting

point single format to floating-point shows range; else is optional.

double format, using the model do Do loop, indenting shows range.

shown on page 97 “To” and/or “ b y ” clauses specify

EXTS(x) Result of extending x on the left with incrementing an iteration variable,

sign bits and a “while” clause gives termi-

≤ x

FLOOR(x) Greatest integer nation conditions.

GPR(x) General Purpose Register x leave Leave innermost do loop, or do loop

MASK(x, y) Mask having 1s in positions x described in leave statement.

through y (wrapping if x y) and 0s

>

elsewhere

4 PowerPC User Instruction Set Architecture

Version 2.01

for For loop, indenting shows range. Table 1. Operator precedence

Clause after “ f o r ” specifies the enti-

ties for which to execute the body of Operators Associativity

the loop. subscript, function evaluation left to right

The precedence rules for RTL operators are summa- pre-superscript (replication), right to left

rized in Table 1. Operators higher in the table are post-superscript (exponentiation)

applied before those lower in the table. Operators at −

unary , ¬ right to left

the same level in the table associate from left to

right, from right to left, or not at all, as shown. (For , left to right

× ÷

example, associates from left to right, so a− b− c = −

+, left to right

b)− c.) Parentheses are used to override the eval-

(a−

uation order implied by the table or to increase left to right

||

clarity; parenthesized expressions are evaluated u u

≠ ≤ ≥

=, , , ? left to right

, , , , ,

< >

> <

before serving as operands. ⊕ ≡

&, , left to right

| left to right

: (range) none

← none

Chapter 1. Introduction 5

Version 2.01

There are no computational instructions that modify

1.6 Processor Overview storage. To use a storage operand in a computation

and then modify the same or another storage

location, the contents of the storage operand must be

The processor implements the instruction set, the loaded into a register, modified, and then stored back

storage model, and other facilities defined in this doc- to the target location. Figure 1 is a logical represen-

ument. Instructions that the processor can execute tation of instruction processing. Figure 2 on page 7

fall into three classes: shows the registers of the PowerPC User Instruction

■ branch instructions Set Architecture.

■ fixed-point instructions ÚÄÄÄÄÄÄÄÄÄÄÄ¿

³ ³

■ floating-point instructions Branch

ÚÄÄÄÄÄÄÄÄÄH³ ³

Processing

Branch instructions are described in Section 2.4, ³ ÀÄÄÄÄÄÂÄÄÄÄÄÙ

“Branch Processor Instructions” on page 20. Fixed- ³ ³ Fixed-Point and

³ ³

point instructions are described in Section 3.3, “Fixed- Floating-Point

³ ³

Point Processor Instructions” on page 31. Instructions

³ ÚÄÄÄÄÄÄÁÄÄÄÄÄÄÄ¿

Floating-point instructions are described in Section ³ ³ ³

4.6, “Floating-Point Processor Instructions” on ³ ↓ ↓

page 97. ³ ÚÄÄÄÄÄÄÄÄÄÄÄ¿ ÚÄÄÄÄÄÄÄÄÄÄÄ¿

³ ³ ³ ³ ³

Fixed-Pt Float-Pt

Fixed-point instructions operate on byte, halfword, ³ ³ ³ ³ ³

Processing Processing

word, and doubleword operands. Floating-point ³ ÀÄÄÄÄÄÄÄÄÄÄÄÙ ÀÄÄÄÄÄÄÄÄÄÄÄÙ

instructions operate on single-precision and double- ³ ↑ ↑

precision floating-point operands. The PowerPC ³ ³ ³

Data to/from

Architecture uses instructions that are four bytes long ³ ³ ³

Storage

and word-aligned. It provides for byte, halfword, ³ ↓ ↓

word, and doubleword operand fetches and stores ³ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ

between storage and a set of 32 General Purpose ³ ↑

Registers (GPRs). It also provides for word and ³ ³

doubleword operand fetches and stores between ³ ↓

storage and a set of 32 Floating-Point Registers ³ ÚÄÄÄÄÄÄÄÄÄÄÄ¿

ÀÄÄÄÄÄÄÄÄÄÄ´ ³

(FPRs). Storage

³ ³

ÀÄÄÄÄÄÄÄÄÄÄÄÙ

Signed integers are represented in two's complement Instructions

form. from Storage

Figure 1. Logical processing model

6 PowerPC User Instruction Set Architecture

Version 2.01 CR Condition Register (page 18)

0 31

LR Link Register (page 19)

0 63

CTR Count Register (page 19)

0 63

GPR 0

GPR 1

... General Purpose Registers (page 29)

...

GPR 31

0 63

XER Fixed-Point Exception Register (page 30)

0 63

FPR 0

FPR 1

... Floating-Point Registers (page 82)

...

FPR 31

0 63

FPSCR Floating-Point Status and Control Register (page 83)

0 31

Figure 2. PowerPC user register set If an instruction is coded such that a field contains a

1.7 Instruction Formats value that is not valid for the field (e.g., the values 0,

1, and 2 are defined for the field but it contains a

value of 3), the instruction form is invalid (see Section

All instructions are four bytes long and word-aligned. 1.9.2, “Invalid Instruction Forms” on page 13).

Thus, whenever instruction addresses are presented

to the processor (as in Branch instructions) the low- Split Field Notation

order two bits are ignored. Similarly, whenever the

processor develops an instruction address the low-

order two bits are zero. In some cases an instruction field occupies more than

one contiguous sequence of bits, or occupies one con-

Bits 0:5 always specify the opcode (OPCD, below). tiguous sequence of bits that are used in permuted

Many instructions also have an extended opcode (XO, order. Such a field is called a split field. In the

below). The remaining bits of the instruction contain format diagrams given below and in the individual

one or more fields as shown below for the different instruction layouts, the name of a split field is shown

instruction formats. in small letters, once for each of the contiguous

sequences. In the RTL description of an instruction

The format diagrams given below show horizontally having a split field, and in certain other places where

all valid combinations of instruction fields. The dia- individual bits of a split field are identified, the name

grams include instruction fields that are used only by of the field in small letters represents the concat-

instructions defined in Book II, PowerPC Virtual Envi- enation of the sequences from left to right. In all

ronment Architecture, or in Book III, PowerPC Oper- other places, the name of the field is capitalized and

ating Environment Architecture. represents the concatenation of the sequences in

some order, which need not be left to right, as

described for each affected instruction.

Chapter 1. Introduction 7

Version 2.01

1.7.1 I-Form 1.7.4 D-Form

0 6 11 16 31

0 6 30 31 OPCD RT RA D

OPCD LI AA LK OPCD RT RA SI

Figure 3. I instruction format OPCD RS RA D

OPCD RS RA UI

1.7.2 B-Form OPCD BF / L RA SI

OPCD BF / L RA UI

OPCD TO RA SI

0 6 11 16 30 31

OPCD BO BI BD AA LK OPCD FRT RA D

OPCD FRS RA D

Figure 4. B instruction format Figure 6. D instruction format

1.7.3 SC-Form 1.7.5 DS-Form

0 6 11 16 20 27 30 31

OPCD /// /// // LEV // 1 / 0 6 11 16 30 31

OPCD RT RA DS XO

Figure 5. SC instruction format OPCD RS RA DS XO

Figure 7. DS instruction format

8 PowerPC User Instruction Set Architecture

Version 2.01

1.7.6 X-Form 1.7.7 XL-Form

0 6 11 16 21 31 0 6 11 16 21 31

OPCD RT RA RB XO / OPCD BT BA BB XO /

OPCD RT RA NB XO / OPCD BO BI /// BH XO LK

OPCD RT / SR /// XO / OPCD BF // BFA // /// XO /

OPCD RT /// RB XO / OPCD /// /// /// XO /

OPCD RT /// /// XO / Figure 9. XL instruction format

OPCD RS RA RB XO Rc

OPCD RS RA RB XO 1 1.7.8 XFX-Form

OPCD RS RA RB XO /

OPCD RS RA NB XO / 0 6 11 21 31

OPCD RS RA SH XO Rc OPCD RT spr XO /

OPCD RS RA /// XO Rc OPCD RT tbr XO /

OPCD RS / SR /// XO / OPCD RT 0 /// XO /

OPCD RS /// RB XO / OPCD RT 1 FXM / XO /

OPCD RS /// /// XO / OPCD RS 0 FXM / XO /

OPCD RS /// L /// XO / OPCD RS 1 FXM / XO /

OPCD BF / L RA RB XO / OPCD RS spr XO /

OPCD BF // FRA FRB XO / Figure 10. XFX instruction format

OPCD BF // BFA // /// XO /

OPCD BF // /// U / XO Rc 1.7.9 XFL-Form

OPCD BF // /// /// XO /

OPCD /// TH RA RB XO / 0 6 7 15 16 21 31

OPCD /// L /// RB XO / OPCD / FLM / FRB XO Rc

OPCD /// L /// /// XO / Figure 11. XFL instruction format

OPCD TO RA RB XO /

OPCD FRT RA RB XO / 1.7.10 XS-Form

OPCD FRT /// FRB XO Rc

OPCD FRT /// /// XO Rc 0 6 11 16 21 30 31

OPCD FRS RA RB XO / OPCD RS RA sh XO sh Rc

OPCD BT /// /// XO Rc Figure 12. XS instruction format

OPCD /// RA RB XO /

OPCD /// /// RB XO / 1.7.11 XO-Form

OPCD /// /// /// XO /

Figure 8. X instruction format 0 6 11 16 21 22 31

OPCD RT RA RB OE XO Rc

OPCD RT RA RB / XO Rc

OPCD RT RA /// OE XO Rc

Figure 13. XO instruction format

Chapter 1. Introduction 9

Version 2.01

1.7.12 A-Form 1.7.16 Instruction Fields

AA (30)

0 6 11 16 21 26 31 Absolute Address bit.

OPCD FRT FRA FRB FRC XO Rc 0 The immediate field represents an address

relative to the current instruction address.

OPCD FRT FRA FRB /// XO Rc For I-form branches the effective address of

OPCD FRT FRA /// FRC XO Rc the branch target is the sum of the LI field

sign-extended to 64 bits and the address of

OPCD FRT /// FRB /// XO Rc the branch instruction. For B-form branches

the effective address of the branch target is

Figure 14. A instruction format the sum of the BD field sign-extended to 64

bits and the address of the branch instruc-

1.7.13 M-Form tion.

1 The immediate field represents an absolute

address. For I-form branches the effective

0 6 11 16 21 26 31 address of the branch target is the LI field

OPCD RS RA RB MB ME Rc sign-extended to 64 bits. For B-form

branches the effective address of the branch

OPCD RS RA SH MB ME Rc target is the BD field sign-extended to 64

bits.

Figure 15. M instruction format BA (11:15)

Field used to specify a bit in the CR to be used as

1.7.14 MD-Form a source.

BB (16:20)

Field used to specify a bit in the CR to be used as

0 6 11 16 21 27 30 31 a source.

OPCD RS RA sh mb XO shRc BD (16:29)

OPCD RS RA sh me XO shRc Immediate field used to specify a 14-bit signed

two's complement branch displacement which is

Figure 16. MD instruction format concatenated on the right with 0b00 and sign-

extended to 64 bits.

1.7.15 MDS-Form BF (6:8)

Field used to specify one of the CR fields or one

of the FPSCR fields to be used as a target.

0 6 11 16 21 27 31 BFA (11:13)

OPCD RS RA RB mb XO Rc Field used to specify one of the CR fields or one

of the FPSCR fields to be used as a source.

OPCD RS RA RB me XO Rc BH (19:20)

Figure 17. MDS instruction format Field used to specify a hint in the Branch Condi-

tional to Link Register and Branch Conditional to

Count Register instructions. The encoding is

described in Section 2.4.1, “Branch Instructions”

on page 20.

BI (11:15)

Field used to specify a bit in the CR to be tested

by a Branch Conditional instruction.

BO (6:10)

Field used to specify options for the Branch Con-

ditional instructions. The encoding is described in

Section 2.4.1, “Branch Instructions” on page 20.

BT (6:10)

Field used to specify a bit in the CR or in the

FPSCR to be used as a target.

10 PowerPC User Instruction Set Architecture

Version 2.01

D (16:31) 1 Set the Link Register. The address of the

instruction following the Branch instruction is

Immediate field used to specify a 16-bit signed placed into the Link Register.

two's complement integer which is sign-extended

to 64 bits. MB (21:25) and ME (26:30)

DS (16:29) Fields used in M-form instructions to specify a

Immediate field used to specify a 14-bit signed 64-bit mask consisting of 1-bits from bit M B + 3 2

two's complement integer which is concatenated through bit ME+32 inclusive and 0-bits else-

on the right with 0b00 and sign-extended to 64 where, as described in Section 3.3.12, “Fixed-

bits. Point Rotate and Shift Instructions” on page 68.

FLM (7:14) MB (21:26)

Field mask used to identify the FPSCR fields that Field used in MD-form and MDS-form instructions

are to be updated by the to specify the first 1-bit of a 64-bit mask, as

mtfsf instruction. described in Section 3.3.12, “Fixed-Point Rotate

FRA (11:15) and Shift Instructions” on page 68.

Field used to specify an FPR to be used as a

source. ME (21:26)

Field used in MD-form and MDS-form instructions

FRB (16:20) to specify the last 1-bit of a 64-bit mask, as

Field used to specify an FPR to be used as a described in Section 3.3.12, “Fixed-Point Rotate

source. and Shift Instructions” on page 68.

FRC (21:25) NB (16:20)

Field used to specify an FPR to be used as a Field used to specify the number of bytes to

source. move in an immediate Move Assist instruction.

FRS (6:10) OPCD (0:5)

Field used to specify an FPR to be used as a Primary opcode field.

source. OE (21)

FRT (6:10) Field used by XO-form instructions to enable

Field used to specify an FPR to be used as a setting OV and SO in the XER.

target. RA (11:15)

FXM (12:19) Field used to specify a GPR to be used as a

Field mask used to identify the CR fields that are source or as a target.

to be written by the mtcrf and mtocrf instructions, RB (16:20)

mfocrf instruction.

or read by the Field used to specify a GPR to be used as a

L (10 or 15) source.

Field used to specify whether a fixed-point

Compare instruction is to compare 64-bit Rc (31)

numbers or 32-bit numbers. RECORD bit.

Field used by the Move To Machine State Reg- 0 Do not alter the Condition Register.

ister and TLB Invalidate Entry instructions (see 1 Set Condition Register Field 0 or Field 1 as

Book III, PowerPC Operating Environment Archi- described in Section 2.3.1, “Condition

tecture). Register” on page 18.

L (9:10) RS (6:10)

Field used by the Synchronize instruction (see Field used to specify a GPR to be used as a

Book II, PowerPC Virtual Environment Architec- source.

ture). RT (6:10)

LEV (20:26) Field used to specify a GPR to be used as a

Field used by the System Call instruction. target.

LI (6:29) SH (16:20, or 16:20 and 30)

Immediate field used to specify a 24-bit signed Field used to specify a shift amount.

two's complement integer which is concatenated

on the right with 0b00 and sign-extended to 64 SI (16:31)

bits. Immediate field used to specify a 16-bit signed

integer.

LK (31)

LINK bit. SPR (11:20)

Field used to specify a Special Purpose Register

0 Do not set the Link Register. for the mtspr and mfspr instructions.

Chapter 1. Introduction 11

Version 2.01

SR (12:15) instructions that are now reserved may become

defined.

Field used by the Segment Register Manipulation

instructions (see Book III, PowerPC Operating

Environment Architecture). 1.8.1 Defined Instruction Class

TBR (11:20)

Field used by the Move From Time Base instruc- This class of instructions contains all the instructions

tion (see Book II, PowerPC Virtual Environment defined in the PowerPC User Instruction Set Architec-

Architecture). ture, PowerPC Virtual Environment Architecture, and

TH (9:10) PowerPC Operating Environment Architecture.

Field used by the optional data stream variant of

the dcbt instruction (see Book II, PowerPC Virtual In general, defined instructions are guaranteed to be

Environment Architecture). provided in all implementations. The only exceptions

are instructions that are optional instructions. These

TO (6:10) exceptions are identified in the instruction

Field used to specify the conditions on which to descriptions.

trap. The encoding is described in Section 3.3.10,

“Fixed-Point Trap Instructions” on page 60. A defined instruction can have preferred and/or

U (16:19) invalid forms, as described in Section 1.9.1, “Pre-

Immediate field used as the data to be placed ferred Instruction Forms” on page 13 and Section

into a field in the FPSCR. 1.9.2, “Invalid Instruction Forms” on page 13.

UI (16:31)

Immediate field used to specify a 16-bit unsigned 1.8.2 Illegal Instruction Class

integer.

XO (21:29, 21:30, 22:30, 26:30, 27:29, 27:30, or 30:31) This class of instructions contains the set of

Extended opcode field. instructions described in Appendix G, “Illegal

Instructions” on page 175. Illegal instructions are

available for future extensions of the PowerPC Archi-

tecture; that is, some future version of the PowerPC

1.8 Classes of Instructions Architecture may define any of these instructions to

perform new functions.

An instruction falls into exactly one of the following Any attempt to execute an illegal instruction will

three classes: cause the system illegal instruction error handler to

Defined be invoked and will have no other effect.

Illegal

Reserved An instruction consisting entirely of binary 0s is guar-

anteed always to be an illegal instruction. This

The class is determined by examining the opcode, and increases the probability that an attempt to execute

the extended opcode if any. If the opcode, or combi- data or uninitialized storage will result in the invoca-

nation of opcode and extended opcode, is not that of tion of the system illegal instruction error handler.

a defined instruction or of a reserved instruction, the

instruction is illegal. 1.8.3 Reserved Instruction Class

A given instruction is in the same class for all imple-

mentations of the PowerPC Architecture. In future This class of instructions contains the set of

versions of this architecture, instructions that are now instructions described in Appendix H, “Reserved

illegal may become defined (by being added to the Instructions” on page 177.

architecture) or reserved (by being assigned to one of

the special purposes described in Appendix H, Reserved instructions are allocated to specific pur-

“Reserved Instructions” on page 177). Similarly, poses that are outside the scope of the PowerPC

Architecture.

12 PowerPC User Instruction Set Architecture

Version 2.01 ■

Any attempt to execute a reserved instruction will: the Load/Store Floating-Point with Update

instructions

■ perform the actions described in Book IV,

PowerPC Implementation Features for the imple- Assembler Note

mentation if the instruction is implemented; or

■ cause the system illegal instruction error handler Assemblers should report uses of invalid instruc-

to be invoked if the instruction is not imple- tion forms as errors.

mented.

1.9 Forms of Defined Instructions 1.10 Optionality

Some of the defined instructions are optional. The

1.9.1 Preferred Instruction Forms optional instructions are defined in Chapter 5,

“Optional Facilities and Instructions” on page 117.

Some of the defined instructions have preferred Additional optional instructions may be defined in

forms. For such an instruction, the preferred form will Books II and III (e.g., see the section entitled “Look-

execute in an efficient manner, but any other form aside Buffer Management” in Book III, and the chap-

may take significantly longer to execute than the pre- ters entitled “Optional Facilities and Instructions” in

ferred form. Book II and Book III).

Instructions having preferred forms are: Any attempt to execute an optional instruction that is

■ not provided by the implementation will cause the

the Condition Register Logical instructions

■ system illegal instruction error handler to be invoked.

the Load Quadword instruction

■ the Load/Store Multiple instructions

■ In addition to instructions, other kinds of optional

the Load/Store String instructions

■ facilities, such as registers, may be defined in Books

the Or Immediate instruction (preferred form of II and III. The effects of attempting to use an optional

no-op)

■ facility that is not provided by the implementation are

the Move To Condition Register Fields instruction described in Books II and III as appropriate.

1.9.2 Invalid Instruction Forms 1.11 Exceptions

Some of the defined instructions can be coded in a

form that is invalid. An instruction form is invalid if

one or more fields of the instruction, excluding the There are two kinds of exception, those caused

opcode fields and reserved fields, are coded incor- directly by the execution of an instruction and those

rectly in a manner that can be deduced by examining caused by an asynchronous event. In either case, the

only the instruction encoding. exception may cause one of several components of

the system software to be invoked.

Any attempt to execute an invalid form of an instruc-

tion will either cause the system illegal instruction The exceptions that can be caused directly by the

error handler to be invoked or yield boundedly unde- execution of an instruction include the following:

fined results. Exceptions to this rule are stated in the ■ an attempt to execute an illegal instruction, or an

instruction descriptions. attempt by an application program to execute a

“privileged” instruction (see Book III, PowerPC

If a field (excluding opcode fields) of one of the Operating Environment Architecture) (system

instructions identified below contains a value other illegal instruction error handler or system privi-

than that specified in the layout diagram, the instruc- leged instruction error handler)

tion form is invalid. ■

■ the execution of a defined instruction using an

the Store Conditional instructions (see Book II, invalid form (system illegal instruction error

PowerPC Virtual Environment Architecture) handler or system privileged instruction error

These invalid forms are not discussed further. The handler)

invalid forms of the instructions in the following list ■ the execution of an optional instruction that is not

are identified in the instruction descriptions. provided by the implementation (system illegal

■ the Branch Conditional to Count Register instruction error handler)

(bcctr[ l]) instruction ■

■ an attempt to access a storage location that is

the Load/Store with Update instructions

■ unavailable (system instruction storage error

the Load Multiple instruction

■ handler or system data storage error handler)

the Load String instructions Chapter 1. Introduction 13

Version 2.01

■ Operand length is implicit for each instruction.

an attempt to access storage with an effective

address alignment that is invalid for the instruc- The operand of a single-register Storage Access

tion (system alignment error handler) instruction has a “natural” alignment boundary equal

■ the execution of a System Call instruction to the operand length. In other words, the “natural”

(system service program) address of an operand is an integral multiple of the

■ operand length. A storage operand is said to be

the execution of a Trap instruction that traps aligned if it is aligned at its natural boundary; other-

(system trap handler) wise it is said to be unaligned.

■ the execution of a floating-point instruction that

causes a floating-point enabled exception to exist Storage operands for single-register Storage Access

(system floating-point enabled exception error instructions have the following characteristics.

handler) (Although not permitted as storage operands,

quadwords are shown because quadword alignment is

The exceptions that can be caused by an asynchro- desirable for certain storage operands.)

nous event are described in Book III, PowerPC Oper-

ating Environment Architecture. Operand Length Addr if aligned

60:63

The invocation of the system error handler is precise, Byte 8 bits xxxx

except that if one of the imprecise modes for invoking Halfword 2 bytes xxx0

the system floating-point enabled exception error Word 4 bytes xx00

handler is in effect (see page 90) then the invocation Doubleword 8 bytes x000

of the system floating-point enabled exception error Quadword 16 bytes 0000

handler may be imprecise. When the system error Note: An “ x ” in an address bit position indicates

handler is invoked imprecisely, the excepting instruc- that the bit can be 0 or 1 independent of the state of

tion does not appear to complete before the next other bits in the address.

instruction starts (because one of the effects of the

excepting instruction, namely the invocation of the

system error handler, has not yet occurred). The concept of alignment is also applied more gener-

ally, to any datum in storage. For example, a 12-byte

Additional information about exception handling can datum in storage is said to be word-aligned if its

be found in Book III, PowerPC Operating Environment address is an integral multiple of 4.

Architecture. Some instructions require their storage operands to

have certain alignments. In addition, alignment may

affect performance. For single-register Storage

1.12 Storage Addressing Access instructions the best performance is obtained

when storage operands are aligned. Additional

effects of data placement on performance are

A program references storage using the effective described in Book II, PowerPC Virtual Environment

address computed by the processor when it executes Architecture.

a Storage Access or Branch instruction (or certain

other instructions described in Book II, PowerPC Instructions are always four bytes long and word-

Virtual Environment Architecture, and Book III, aligned.

PowerPC Operating Environment Architecture), or

when it fetches the next sequential instruction. 1.12.2 Effective Address Calculation

1.12.1 Storage Operands An effective address is computed by the processor

when executing a Storage Access or Branch instruc-

Bytes in storage are numbered consecutively starting tion (or certain other instructions described in Book II,

with 0. Each number is the address of the corre- PowerPC Virtual Environment Architecture, and Book

sponding byte. III, PowerPC Operating Environment Architecture) or

when fetching the next sequential instruction. The fol-

Storage operands may be bytes, halfwords, words, or lowing provides an overview of this process. More

doublewords, or, for the Load/Store Multiple and detail is provided in the individual instruction

Move Assist instructions, a sequence of bytes or descriptions.

words. The address of a storage operand is the

address of its first byte (i.e., of its lowest-numbered Effective address calculations, for both data and

byte). Byte ordering is Big-Endian. However, if the instruction accesses, use 64-bit two's complement

optional Little-Endian facility is implemented the addition. All 64 bits of each address component par-

system can be operated in a mode in which byte ticipate in the calculation regardless of mode (32-bit

ordering is Little-Endian; see Section 5.3.

14 PowerPC User Instruction Set Architecture

Version 2.01

or 64-bit). In this computation one operand is an the GPR designated by RB (or the value zero for

address (which is by definition an unsigned number) lswi, lsdi, stswi, and stsdi) are added to the con-

and the second is a signed offset. Carries out of the tents of the GPR designated by RA or to zero if

most significant bit are ignored. RA=0.

■ With D-form instructions, the 16-bit D field is sign-

In 64-bit mode, the entire 64-bit result comprises the extended to form a 64-bit address component. In

64-bit effective address. The effective address arith- computing the effective address of a data

metic wraps around from the maximum address, element, this address component is added to the

64

2 1, to address 0. contents of the GPR designated by RA or to zero

if R A = 0 .

In 32-bit mode, the low-order 32 bits of the 64-bit ■

result comprise the effective address for the purpose With DS-form instructions, the 14-bit DS field is

of addressing storage. The high-order 32 bits of the concatenated on the right with 0b00 and sign-

64-bit effective address are ignored for the purpose of extended to form a 64-bit address component. In

accessing data, but are included whenever an effec- computing the effective address of a data

tive address is placed into a GPR by Load with Update element, this address component is added to the

and Store with Update instructions. The high-order 32 contents of the GPR designated by RA or to zero

bits of the 64-bit effective address are effectively set if R A = 0 .

to 0 for the purpose of fetching instructions, and ■ With I-form Branch instructions, the 24-bit LI field

explicitly so whenever an effective address is placed is concatenated on the right with 0b00 and sign-

into the Link Register by Branch instructions having extended to form a 64-bit address component. If

L K = 1 . The high-order 32 bits of the 64-bit effective A A = 0 , this address component is added to the

address are set to 0 in Special Purpose Registers address of the Branch instruction to form the

when the system error handler is invoked. As used to effective address of the next instruction. If

address storage, the effective address arithmetic A A = 1 , this address component is the effective

appears to wrap around from the maximum address, address of the next instruction.

32

2 1, to address 0 in 32-bit mode. ■ With B-form Branch instructions, the 14-bit BD

The 64-bit current instruction address and next field is concatenated on the right with 0b00 and

instruction address are not affected by a change from sign-extended to form a 64-bit address compo-

32-bit mode to 64-bit mode, but they are affected by a nent. If A A = 0 , this address component is added

change from 64-bit mode to 32-bit mode. In the latter to the address of the Branch instruction to form

case, the high-order 32 bits are set to 0. the effective address of the next instruction. If

A A = 1 , this address component is the effective

RA is a field in the instruction which specifies an address of the next instruction.

address component in the computation of an effective ■ With XL-form Branch instructions, bits 0:61 of the

address. A zero in the RA field indicates the absence Link Register or the Count Register are concat-

of the corresponding address component. A value of enated on the right with 0b00 to form the effec-

zero is substituted for the absent component of the tive address of the next instruction.

effective address address computation. This substi- ■ With sequential instruction fetching, the value 4 is

tution is shown in the instruction descriptions as added to the address of the current instruction to

(RA|0). form the effective address of the next instruction.

Effective addresses are computed as follows. In the ■ For an exception to sequential addressing when a

descriptions below, it should be understood that “the change from 32- to 64-bit mode occurs, see the

contents of a GPR” refers to the entire 64-bit con- section entitled “Address Wrapping Combined

tents, independent of mode, but that in 32-bit mode with Changing MSR Bit SF” in Book III.

only bits 32:63 of the 64-bit result of the computation

are used to address storage. If the size of the operand of a storage access instruc-

tion is more than one byte, the effective address for

■ With X-form instructions, in computing the effec- each byte after the first is computed by adding 1 to

tive address of a data element, the contents of the effective address of the preceding byte.

Chapter 1. Introduction 15

Version 2.01

16 PowerPC User Instruction Set Architecture

Version 2.01

Chapter 2. Branch Processor

2.1 Branch Processor Overview 2.4.1 Branch Instructions

17 20

. . . . . . . . . . . .

2.4.2 System Call Instruction

17

2.2 Instruction Execution Order 25

. . . . . . . . . .

2.3 Branch Processor Registers 2.4.3 Condition Register Logical

18

. . . . Instructions

2.3.1 Condition Register 18 26

. . . . . . . . . . . . . . . .

. . . . . . . . .

2.3.2 Link Register 2.4.4 Condition Register Field

19

. . . . . . . . . . . .

2.3.3 Count Register Instruction

19 28

. . . . . . . . . . . . . . . . .

. . . . . . . . . . .

2.4 Branch Processor Instructions 20

. . . The model of program execution in which the

2.1 Branch Processor Overview processor appears to execute one instruction at a

time, completing each instruction before beginning to

execute the next instruction is called the “sequential

This chapter describes the registers and instructions execution model”. In general, the processor obeys

that make up the Branch Processor facility. Section the sequential execution model. For the instructions

2.3, “Branch Processor Registers” on page 18 and facilities defined in this Book, the only exceptions

describes the registers associated with the Branch to this rule are the following.

Processor. Section 2.4, “Branch Processor ■

Instructions” on page 20 describes the instructions A floating-point exception occurs when the

associated with the Branch Processor. processor is running in one of the Imprecise float-

ing-point exception modes (see Section 4.4,

“Floating-Point Exceptions” on page 89). The

instruction that causes the exception does not

2.2 Instruction Execution Order complete before the next instruction begins exe-

cution, with respect to setting exception bits and

(if the exception is enabled) invoking the system

In general, instructions appear to execute sequen- error handler.

tially, in the order in which they appear in storage.

The exceptions to this rule are listed below. ■ A Store instruction modifies one or more bytes in

■ an area of storage that contains instructions that

Branch instructions for which the branch is taken will subsequently be executed. Before an instruc-

cause execution to continue at the target address tion in that area of storage is executed, software

specified by the Branch instruction. synchronization is required to ensure that the

■ Trap instructions for which the trap conditions are instructions executed are consistent with the

satisfied, and System Call instructions, cause the results produced by the Store instruction.

appropriate system handler to be invoked. Programming Note

■ Exceptions can cause the system error handler to This software synchronization will generally

be invoked, as described in Section 1.11, be provided by system library programs (see

“Exceptions” on page 13. the section entitled “Instruction Storage” in

■ Returning from a system service program, Book II). Application programs should call the

system trap handler, or system error handler appropriate system library program before

causes execution to continue at a specified attempting to execute modified instructions.

address. Chapter 2. Branch Processor 17

Version 2.01

Positive (GT)

1

2.3 Branch Processor Registers The result is positive.

2 Zero (EQ)

2.3.1 Condition Register The result is zero.

3 Summary Overflow (SO)

The Condition Register (CR) is a 32-bit register which at the

This is a copy of the final state of XER

SO

reflects the result of certain operations, and provides completion of the instruction.

a mechanism for testing (and branching). Programming Note

CR CR Field 0 may not reflect the “true” (infinitely

precise) result if overflow occurs; see Section

0 31 3.3.8, “Fixed-Point Arithmetic Instructions” on

Figure 18. Condition Register page 49.

The bits in the Condition Register are grouped into

eight 4-bit fields, named CR Field 0 (CR0), ..., CR Field The stwcx. and stdcx. instructions (see Book II,

7 (CR7), which are set in one of the following ways. PowerPC Virtual Environment Architecture) also set

CR Field 0.

■ Specified fields of the CR can be set by a move

to the CR from a GPR (mtcrf, mtocrf). For all floating-point instructions in which R c = 1 , CR

■ A specified field of the CR can be set by a move Field 1 (bits 4:7 of the Condition Register) is set to the

to the CR from another CR field (mcrf), from Floating-Point exception status, copied from bits 0:3 of

(mcrxr), or from the FPSCR (mcrfs).

XER

32:35 the Floating-Point Status and Control Register. These

■ CR Field 0 can be set as the implicit result of a bits are interpreted as follows.

fixed-point instruction.

■ CR Field 1 can be set as the implicit result of a Bit Description

floating-point instruction. 4 Floating-Point Exception Summary (FX)

■ A specified CR field can be set as the result of at the

This is a copy of the final state of FPSCR

FX

either a fixed-point or a floating-point Compare completion of the instruction.

instruction. 5 Floating-Point Enabled Exception Summary (FEX)

Instructions are provided to perform logical oper- at

This is a copy of the final state of FPSCR

FEX

ations on individual CR bits and to test individual CR the completion of the instruction.

bits. 6 Floating-Point Invalid Operation Exception

For all fixed-point instructions in which R c = 1 , and for Summary (VX)

addic., andi., and andis., the first three bits of CR at the

This is a copy of the final state of FPSCR

VX

Field 0 (bits 0:2 of the Condition Register) are set by completion of the instruction.

signed comparison of the result to zero, and the 7 Floating-Point Overflow Exception (OX)

fourth bit of CR Field 0 (bit 3 of the Condition Reg- at

This is a copy of the final state of FPSCR

OX

ister) is copied from the SO field of the XER. “Result” the completion of the instruction.

here refers to the entire 64-bit value placed into the

target register in 64-bit mode, and to bits 32:63 of the For Compare instructions, a specified CR field is set

64-bit value placed into the target register in 32-bit to reflect the result of the comparison. The bits of the

mode. specified CR field are interpreted as follows. A com-

if (64-bit mode) plete description of how the bits are set is given in

then M 0 the instruction descriptions in Section 3.3.9, “Fixed-

else M 32 Point Compare Instructions” on page 58 and Section

< 0 then c

if (target_register) 0b100

M:63 4.6.7, “Floating-Point Compare Instructions” on

> 0 then c

else if (target_register) 0b010

M:63 page 113.

else c 0b001

← c

CR0 XER

|| Bit Description

SO

If any portion of the result is undefined, then the 0 Less Than, Floating-Point Less Than (LT, FL)

value placed into the first three bits of CR Field 0 is SI

For fixed-point Compare instructions, (RA) <

u

undefined. or (RB) (signed comparison) or (RA) UI or (RB)

<

(unsigned comparison). For floating-point

The bits of CR Field 0 are interpreted as follows. Compare instructions, (FRA) (FRB).

<

Bit Description

0 Negative (LT)

The result is negative.

18 PowerPC User Instruction Set Architecture

Version 2.01

1 Greater Than, Floating-Point Greater Than (GT, 2.3.2 Link Register

FG)

For fixed-point Compare instructions, (RA) SI

> The Link Register (LR) is a 64-bit register. It can be

u UI or (RB)

or (RB) (signed comparison) or (RA) > used to provide the branch target address for the

(unsigned comparison). For floating-point Branch Conditional to Link Register instruction, and it

Compare instructions, (FRA) (FRB).

> holds the return address after Branch instructions for

Equal, Floating-Point Equal

2 (EQ, FE) which L K = 1 .

For fixed-point Compare instructions, (RA) = SI,

UI, or (RB). For floating-point Compare LR

instructions, (FRA) = (FRB). 0 63

Summary Overflow, Floating-Point Unordered

3 (SO, FU) Figure 19. Link Register

For fixed-point Compare instructions, this is a

copy of the final state of XER at the completion

SO 2.3.3 Count Register

of the instruction. For floating-point Compare

instructions, one or both of (FRA) and (FRB) is a

NaN. The Count Register (CTR) is a 64-bit register. It can

be used to hold a loop count that can be decremented

during execution of Branch instructions that contain

an appropriately coded BO field. If the value in the

Count Register is 0 before being decremented, it is

− 1 afterward. The Count Register can also be used

to provide the branch target address for the Branch

Conditional to Count Register instruction.

CTR

0 63

Figure 20. Count Register

Chapter 2. Branch Processor 19

Version 2.01

2.4 Branch Processor Instructions

2.4.1 Branch Instructions

The sequence of instruction execution can be changed BO Description

by the Branch instructions. Because all instructions

are on word boundaries, bits 62 and 63 of the gener- 0000z Decrement the CTR, then branch if the

≠ 0 and CR = 0

ated branch target address are ignored by the decremented CTR

M:63 BI

processor in performing the branch. 0001z Decrement the CTR, then branch if the

= 0 and CR = 0

decremented CTR

M:63 BI

The Branch instructions compute the effective = 0

address (EA) of the target in one of the following four 001at Branch if CR

BI

ways, as described in Section 1.12.2, “Effective 0100z Decrement the CTR, then branch if the

Address Calculation” on page 14. ≠ 0 and CR = 1

decremented CTR

M:63 BI

1. Adding a displacement to the address of the 0101z Decrement the CTR, then branch if the

Branch instruction (Branch or Branch Conditional = 0 and CR = 1

decremented CTR

M:63 BI

with A A = 0 ) . = 1

011at Branch if CR

BI

2. Specifying an absolute address (Branch or

Branch Conditional with A A = 1 ) . 1a00t Decrement the CTR, then branch if the

decremented CTR 0

M:63

3. Using the address contained in the Link Register

(Branch Conditional to Link Register). 1a01t Decrement the CTR, then branch if the

decremented CTR = 0

M:63

4. Using the address contained in the Count Reg-

ister (Branch Conditional to Count Register). 1z1zz Branch always

In all four cases, in 32-bit mode the final step in the Notes:

address computation is setting the high-order 32 bits 1. “ z ” denotes a bit that is ignored.

of the target address to 0. 2. The “ a ” and “ t ” bits are used as described

below.

For the first two methods, the target addresses can

be computed sufficiently ahead of the Branch instruc- Figure 21. BO field encodings

tion that instructions can be prefetched along the

target path. For the third and fourth methods, pre- The “ a ” and “ t ” bits of the BO field can be used by

fetching instructions along the target path is also pos- software to provide a hint about whether the branch

sible provided the Link Register or the Count Register is likely to be taken or is likely not to be taken, as

is loaded sufficiently ahead of the Branch instruction. shown in Figure 22.

Branching can be conditional or unconditional, and at Hint

the return address can optionally be provided. If the

return address is to be provided (LK=1), the effective 00 No hint is given

address of the instruction following the Branch 01 Reserved

instruction is placed into the Link Register after the

branch target address has been computed; this is 10 The branch is very likely not to be taken

done regardless of whether the branch is taken. 11 The branch is very likely to be taken

For Branch Conditional instructions, the BO field Figure 22. "at" bit encodings

specifies the conditions under which the branch is

taken, as shown in Figure 21. In the figure, M = 0 in Programming Note

64-bit mode and M = 3 2 in 32-bit mode. If the BO field

specifies that the CTR is to be decremented, the Many implementations have dynamic mechanisms

entire 64-bit CTR is decremented regardless of the for predicting whether a branch will be taken.

mode. Because the dynamic prediction is likely to be

very accurate, and is likely to be overridden by

any hint provided by the “ a t ” bits, the “ a t ” bits

should be set to 0b00 unless the static prediction

implied by at=0b10 or at=0b11 is highly likely to

be correct.

20 PowerPC User Instruction Set Architecture

Version 2.01

For Branch Conditional to Link Register and Branch Programming Note

Conditional to Count Register instructions, the BH Many implementations have dynamic mechanisms

field provides a hint about the use of the instruction, for predicting the target addresses of bclr[ l] and

as shown in Figure 23. bcctr[ l] instructions. These mechanisms may

cache return addresses (i.e., Link Register values

set by Branch instructions for which L K = 1 and for

BH Hint which the branch was taken) and recently used

branch target addresses. To obtain the best per-

00 bclr[ l]: The instruction is a subroutine return formance across the widest range of implementa-

bcctr[ l]: The instruction is not a subroutine tions, the programmer should obey the following

return; the target address is likely to rules.

be the same as the target address ■ Use Branch instructions for which L K = 1 only

used the preceding time the branch as subroutine calls (including function calls,

was taken etc.).

■ Pair each subroutine call (i.e., each Branch

01 bclr[ l]: The instruction is not a subroutine instruction for which L K = 1 and the branch is

return; the target address is likely to taken) with a bclr instruction that returns from

be the same as the target address the subroutine and has BH=0b00.

used the preceding time the branch Do not use bclrl as a subroutine call. (Some

implementations access the return address

was taken cache at most once per instruction; such

bcctr[ l]: Reserved implementations are likely to treat bclrl as a

10 Reserved subroutine return, and not as a subroutine

call.)

11 bclr[ l] and bcctr[ l]: The target address is not For bclr[ l] and bcctr[ l], use the appropriate

predictable value in the BH field.

The following are examples of programming con-

Figure 23. BH field encodings ventions that obey these rules. In the examples,

BH is assumed to contain 0b00 unless otherwise

stated. In addition, the “ a t ” bits are assumed to

Programming Note be coded appropriately.

The hint provided by the BH field is independent

of the hint provided by the “ a t ” bits (e.g., the BH Let A, B, and Glue be specific programs.

field provides no indication of whether the branch ■ Loop counts:

is likely to be taken). Keep them in the Count Register, and use a

bc instruction ( L K = 0 ) to decrement the count

and to branch back to the beginning of the

loop if the decremented count is nonzero.

Extended mnemonics for branches ■ Computed goto's, case statements, etc.:

Use the Count Register to hold the address to

Many extended mnemonics are provided so that bcctr instruction ( L K = 0 ,

branch to, and use a

Branch Conditional instructions can be coded with and BH=0b11 if appropriate) to branch to the

portions of the BO and BI fields as part of the mne- selected address.

monic rather than as part of a numeric operand. ■ Direct subroutine linkage:

Some of these are shown as examples with the Here A calls B and B returns to A. The two

Branch instructions. See Appendix B, “Assembler branches should be as follows.

Extended Mnemonics” on page 143 for additional

extended mnemonics. bl or bcl instruction

— A calls B: use a

(LK=1). bclr instruction

— B returns to A: use a

Programming Note ( L K = 0 ) (the return address is in, or can

be restored to, the Link Register).

The hints provided by the “ a t ” bits and by the BH ■

field do not affect the results of executing the Indirect subroutine linkage:

instruction. Here A calls Glue, Glue calls B, and B returns

to A rather than to Glue. (Such a calling

sequence is common in linkage code used

The “ z ” bits should be set to 0, because they may when the subroutine that the programmer

be assigned a meaning in some future version of wants to call, here B, is in a different module

the architecture. from the caller; the Binder inserts “glue”

code to mediate the branch.) The three

branches should be as follows.

bl or bcl instruction

— A calls Glue: use a

(LK=1).

(Programming Note continues in next column....)

Chapter 2. Branch Processor 21

Version 2.01

Programming Note (continued) Compatibility Note

— Glue calls B: place the address of B into The bits corresponding to the current “ a ” and “ t ”

the Count Register, and use a bits, and to the current “ z ” bits except in the

bcctr “branch always” BO encoding, had different

instruction (LK=0). meanings in versions of the architecture that

— B returns to A: use a bclr instruction precede Version 2.00.

( L K = 0 ) (the return address is in, or can

be restored to, the Link Register). ■ The bit corresponding to the “ t ” bit was called

■ the “ y ” bit. The “ y ” bit indicated whether to

Function call: use the architected default prediction ( y = 0 )

Here A calls a function, the identity of which or to use the complement of the default pre-

may vary from one instance of the call to diction (y=1). The default prediction was

another, instead of calling a specific program defined as follows.

B. This case should be handled using the

conventions of the preceding two bullets, — If the instruction is bc[ l] a] with a nega-

[

depending on whether the call is direct or tive value in the displacement field, the

indirect, with the following differences. branch is taken. (This is the only case in

which the prediction corresponding to the

— If the call is direct, place the address of “ y ” bit differs from the prediction corre-

the function into the Count Register, and sponding to the “ t ” bit.)

use a bcctrl instruction ( L K = 1 ) instead of

a bl or bcl instruction. — In all other cases (bc[ l] a] with a non-

[

— For the bcctr[ l] instruction that branches negative value in the displacement field,

to the function, use BH=0b11 if appro- bclr[ l], or bcctr[ l]), the branch is not

priate. taken.

■ The BO encodings that test both the Count

Register and the Condition Register had a “ y ”

bit in place of the current “ z ” bit. The

meaning of the “ y ” bit was as described in

the preceding item.

■ The “ a ” bit was a “ z ” bit.

Because these bits have always been defined

either to be ignored or to be treated as hints, a

given program will produce the same result on

any implementation regardless of the values of

the bits. Also, because even the “ y ” bit is

ignored, in practice, by most processors that

implement versions of the architecture that

precede Version 2.00, the performance of a given

program on those processors will not be affected

by the values of the bits.

22 PowerPC User Instruction Set Architecture

Version 2.01

Branch I-form Branch Conditional B-form

b target_addr (AA=0 LK=0) bc BO,BI,target_addr (AA=0 LK=0)

ba target_addr (AA=1 LK=0) bca BO,BI,target_addr (AA=1 LK=0)

bl target_addr (AA=0 LK=1) bcl BO,BI,target_addr (AA=0 LK=1)

bla target_addr (AA=1 LK=1) bcla BO,BI,target_addr (AA=1 LK=1)

18 LI AA LK 16 BO BI BD AA LK

0 6 30 31 0 6 11 16 30 31

if AA then NIA EXTS(LI if (64-bit mode)

0b00)

||

iea

← ←

then M

CIA + EXTS(LI

else NIA 0b00) 0

||

iea ←

← CIA + 4 32

if LK then LR else M

iea ← −

then CTR

if ¬BO CTR 1

2

target_addr specifies the branch target address. ← ⊕ )

| ((CTR =/ 0)

BO BO

ctr_ok 2 M:63 3

← ≡

cond_ok )

| (CR BO

BO 0 BI 1

If A A = 0 then the branch target address is the sum of if ctr_ok & cond_ok then

LI 0b00 sign-extended and the address of this

|| if AA then NIA EXTS(BD 0b00)

||

iea

instruction, with the high-order 32 bits of the branch CIA + EXTS(BD

else NIA 0b00)

||

iea

target address set to 0 in 32-bit mode. CIA + 4

if LK then LR iea

The BI field specifies the Condition Register bit to be

If A A = 1 then the branch target address is the value tested. The BO field is used to resolve the branch as

0b00 sign-extended, with the high-order 32 bits of

LI || described in Figure 21. target_addr specifies the

the branch target address set to 0 in 32-bit mode. branch target address.

If L K = 1 then the effective address of the instruction If A A = 0 then the branch target address is the sum of

following the Branch instruction is placed into the Link BD 0b00 sign-extended and the address of this

||

Register. instruction, with the high-order 32 bits of the branch

target address set to 0 in 32-bit mode.

Special Registers Altered:

LR (if L K = 1 ) If A A = 1 then the branch target address is the value

BD 0b00 sign-extended, with the high-order 32 bits

||

of the branch target address set to 0 in 32-bit mode.

If L K = 1 then the effective address of the instruction

following the Branch instruction is placed into the Link

Register.

Special Registers Altered:

CTR (if BO = 0 )

2

LR (if L K = 1 )

Extended Mnemonics:

Examples of extended mnemonics for Branch Condi-

tional:

Extended: Equivalent to:

blt target bc 12,0,target

bne cr2,target bc 4,10,target

bdnz target bc 16,0,target

Chapter 2. Branch Processor 23

Version 2.01

Branch Conditional to Link Register Branch Conditional to Count Register

XL-form XL-form

bclr BO,BI,BH (LK=0) bcctr BO,BI,BH (LK=0)

bclrl BO,BI,BH (LK=1) bcctrl BO,BI,BH (LK=1)

POWER mnemonics: bcr, bcrl] POWER mnemonics: bcc, bccl]

[ [

19 BO BI /// BH 16 LK 19 BO BI /// BH 528 LK

0 6 11 16 19 21 31 0 6 11 16 19 21 31

← ≡ )

if (64-bit mode) cond_ok | (CR BO

BO 0 BI 1

← ←

if cond_ok then NIA

then M CTR 0b00

0 ||

0:61

iea

← ←

if LK then LR CIA + 4

else M 32 iea

← −

then CTR

if ¬BO CTR 1

2 ← ⊕ )

| ((CTR =/ 0)

BO BO

ctr_ok The BI field specifies the Condition Register bit to be

2 M:63 3

← ≡

cond_ok )

| (CR BO

BO tested. The BO field is used to resolve the branch as

0 BI 1 ←

if ctr_ok & cond_ok then NIA LR 0b00

|| described in Figure 21. The BH field is used as

0:61

iea

if LK then LR CIA + 4 described in Figure 23. The branch target address is

iea CTR 0b00, with the high-order 32 bits of the

||

0:61

The BI field specifies the Condition Register bit to be branch target address set to 0 in 32-bit mode.

tested. The BO field is used to resolve the branch as

described in Figure 21. The BH field is used as If L K = 1 then the effective address of the instruction

described in Figure 23. The branch target address is following the Branch instruction is placed into the Link

LR 0b00, with the high-order 32 bits of the

|| Register.

0:61

branch target address set to 0 in 32-bit mode. If the “decrement and test CTR” option is specified

If L K = 1 then the effective address of the instruction = 0 ) , the instruction form is invalid.

(BO 2

following the Branch instruction is placed into the Link

Register. Special Registers Altered:

LR (if L K = 1 )

Special Registers Altered:

CTR (if BO = 0 ) Extended Mnemonics:

2

LR (if L K = 1 ) Examples of extended mnemonics for Branch Condi-

tional to Count Register:

Extended Mnemonics:

Examples of extended mnemonics for Branch Condi- Extended: Equivalent to:

tional to Link Register: bcctr 4,6 bcctr 4,6,0

bltctr bcctr 12,0,0

Extended: Equivalent to: bnectr cr2 bcctr 4,10,0

bclr 4,6 bclr 4,6,0

bltlr bclr 12,0,0

bnelr cr2 bclr 4,10,0

bdnzlr bclr 16,0,0

Programming Note

bclr, bclrl, bcctr, and bcctrl each serve as both a

basic and an extended mnemonic. The Assembler

will recognize a bclr, bclrl, bcctr, or bcctrl mne-

monic with three operands as the basic form, and

bclr, bclrl, bcctr, or

a bcctrl mnemonic with two

operands as the extended form. In the extended

form the BH operand is omitted and assumed to

be 0b00.

24 PowerPC User Instruction Set Architecture

Version 2.01

2.4.2 System Call Instruction

This instruction provides the means by which a

program can call upon the system to perform a

service.

System Call SC-form Programming Note

sc serves as both a basic and an extended mne-

sc LEV monic. The Assembler will recognize an sc mne-

monic with one operand as the basic form, and an

POWER mnemonic: svca]

[ sc mnemonic with no operand as the extended

form. In the extended form the LEV operand is

17 /// /// // LEV // 1 / omitted and assumed to be 0.

0 6 11 16 20 27 30 31 In application programs the value of the LEV

operand for sc should be 0.

This instruction calls the system to perform a service.

A complete description of this instruction can be

found in Book III, PowerPC Operating Environment Compatibility Note

Architecture. For a discussion of POWER compatibility with

respect to instruction bits 16:29, see Appendix E,

The use of the LEV field is described in Book III. The “Incompatibilities with the POWER Architecture”

value contained in the LEV field is effectively limited on page 163.

to 0 or 1, bits 0:5 of the LEV field being treated as a

reserved field in this form.

When control is returned to the program that exe-

cuted the System Call instruction, the contents of the

registers will depend on the register conventions used

by the program providing the system service.

This instruction is context synchronizing (see Book III,

PowerPC Operating Environment Architecture).

Special Registers Altered:

Dependent on the system service Chapter 2. Branch Processor 25

Version 2.01

2.4.3 Condition Register Logical Instructions

The Condition Register Logical instructions have pre- Extended mnemonics for Condition

ferred forms; see Section 1.9.1, “Preferred Instruction Register logical operations

Forms” on page 13. In the preferred forms, the BT

and BB fields satisfy the following rule. A set of extended mnemonics is provided that allow

■ The bit specified by BT is in the same Condition additional Condition Register logical operations,

Register field as the bit specified by BB. beyond those provided by the basic Condition Reg-

ister Logical instructions, to be coded easily. Some of

these are shown as examples with the Condition Reg-

ister Logical instructions. See Appendix B, “Assem-

bler Extended Mnemonics” on page 143 for additional

extended mnemonics.

Condition Register AND XL-form Condition Register OR XL-form

crand BT,BA,BB cror BT,BA,BB

19 BT BA BB 257 / 19 BT BA BB 449 /

0 6 11 16 21 31 0 6 11 16 21 31

← ←

CR CR & CR CR CR | CR

BT BA BB BT BA BB

The bit in the Condition Register specified by BA is The bit in the Condition Register specified by BA is

ANDed with the bit in the Condition Register specified ORed with the bit in the Condition Register specified

by BB, and the result is placed into the bit in the Con- by BB, and the result is placed into the bit in the Con-

dition Register specified by BT. dition Register specified by BT.

Special Registers Altered: Special Registers Altered:

CR CR

BT BT

Extended Mnemonics:

Example of extended mnemonics for Condition Reg-

ister OR:

Extended: Equivalent to:

crmove Bx,By cror Bx,By,By

Condition Register XOR XL-form Condition Register NAND XL-form

crxor BT,BA,BB crnand BT,BA,BB

19 BT BA BB 193 / 19 BT BA BB 225 /

0 6 11 16 21 31 0 6 11 16 21 31

← ←

CR CR CR CR ¬(CR & CR )

BT BA BB BT BA BB

The bit in the Condition Register specified by BA is The bit in the Condition Register specified by BA is

XORed with the bit in the Condition Register specified ANDed with the bit in the Condition Register specified

by BB, and the result is placed into the bit in the Con- by BB, and the complemented result is placed into the

dition Register specified by BT. bit in the Condition Register specified by BT.

Special Registers Altered: Special Registers Altered:

CR CR

BT BT

Extended Mnemonics:

Example of extended mnemonics for Condition Reg-

ister XOR:

Extended: Equivalent to:

crclr Bx crxor Bx,Bx,Bx

26 PowerPC User Instruction Set Architecture

Version 2.01

Condition Register NOR XL-form Condition Register Equivalent XL-form

crnor BT,BA,BB creqv BT,BA,BB

19 BT BA BB 33 / 19 BT BA BB 289 /

0 6 11 16 21 31 0 6 11 16 21 31

← ← ≡

)

CR ¬(CR | CR CR CR CR

BT BA BB BT BA BB

The bit in the Condition Register specified by BA is The bit in the Condition Register specified by BA is

ORed with the bit in the Condition Register specified XORed with the bit in the Condition Register specified

by BB, and the complemented result is placed into the by BB, and the complemented result is placed into the

bit in the Condition Register specified by BT. bit in the Condition Register specified by BT.

Special Registers Altered: Special Registers Altered:

CR CR

BT BT

Extended Mnemonics: Extended Mnemonics:

Example of extended mnemonics for Condition Reg- Example of extended mnemonics for Condition Reg-

ister NOR: ister Equivalent:

Extended: Equivalent to: Extended: Equivalent to:

crnot Bx,By crnor Bx,By,By crset Bx creqv Bx,Bx,Bx

Condition Register AND with Condition Register OR with Complement

Complement XL-form XL-form

crandc BT,BA,BB crorc BT,BA,BB

19 BT BA BB 129 / 19 BT BA BB 417 /

0 6 11 16 21 31 0 6 11 16 21 31

← ←

CR & ¬CR CR | ¬CR

CR CR

BT BA BB BT BA BB

The bit in the Condition Register specified by BA is The bit in the Condition Register specified by BA is

ANDed with the complement of the bit in the Condi- ORed with the complement of the bit in the Condition

tion Register specified by BB, and the result is placed Register specified by BB, and the result is placed into

into the bit in the Condition Register specified by BT. the bit in the Condition Register specified by BT.

Special Registers Altered: Special Registers Altered:

CR CR

BT BT

Chapter 2. Branch Processor 27

Version 2.01

2.4.4 Condition Register Field

Instruction

Move Condition Register Field XL-form

mcrf BF,BFA

19 BF // BFA // /// 0 /

0 6 9 11 14 16 21 31

CR CR

× × × ×

4 BF:4 B F + 3 4 BFA:4 B F A + 3

The contents of Condition Register field BFA are

copied to Condition Register field BF.

Special Registers Altered:

CR field BF

28 PowerPC User Instruction Set Architecture

Version 2.01

Chapter 3. Fixed-Point Processor

3.1 Fixed-Point Processor Overview 3.3.6 Fixed-Point Move Assist

29

. . 29 Instructions

3.2 Fixed-Point Processor Registers 45

. . . . . . . . . . . . . . . .

. . 3.3.7 Other Fixed-Point Instructions

3.2.1 General Purpose Registers 48

29

. . . . . .

30 3.3.8 Fixed-Point Arithmetic Instructions 49

3.2.2 Fixed-Point Exception Register .

3.3 Fixed-Point Processor Instructions 31 3.3.9 Fixed-Point Compare Instructions 58

3.3.1 Fixed-Point Storage Access 3.3.10 Fixed-Point Trap Instructions 60

. .

Instructions 3.3.11 Fixed-Point Logical Instructions 62

31

. . . . . . . . . . . . . . . .

3.3.1.1 Storage Access Exceptions 31 3.3.12 Fixed-Point Rotate and Shift

. . . 31

3.3.2 Fixed-Point Load Instructions Instructions 68

. . . . . . . . . . . . . . . . . .

3.3.3 Fixed-Point Store Instructions 38 3.3.12.1 Fixed-Point Rotate Instructions 68

. . 74

3.3.4 Fixed-Point Load and Store with 3.3.12.2 Fixed-Point Shift Instructions .

3.3.13 Move To/From System Register

Byte Reversal Instructions 42

. . . . . . . Instructions

3.3.5 Fixed-Point Load and Store 78

. . . . . . . . . . . . . . . .

Multiple Instructions 44

. . . . . . . . . . .

3.1 Fixed-Point Processor Overview

This chapter describes the registers and instructions 3.2.1 General Purpose Registers

that make up the Fixed-Point Processor facility.

Section 3.2, “Fixed-Point Processor Registers” All manipulation of information is done in registers

describes the registers associated with the Fixed- internal to the Fixed-Point Processor. The principal

Point Processor. Section 3.3, “Fixed-Point Processor storage internal to the Fixed-Point Processor is a set

Instructions” on page 31 describes the instructions of 32 General Purpose Registers (GPRs). See

associated with the Fixed-Point Processor. Figure 24. GPR 0

3.2 Fixed-Point Processor GPR 1

Registers ...

...

GPR 30

GPR 31

0 63

Figure 24. General Purpose Registers

Each GPR is a 64-bit register.

Chapter 3. Fixed-Point Processor 29

Version 2.01

Overflow (OV)

33

3.2.2 Fixed-Point Exception Register The Overflow bit is set to indicate that an

overflow has occurred during execution of an

The Fixed-Point Exception Register (XER) is a 64-bit instruction. XO-form Add, Subtract From, and

register. Negate instructions having O E = 1 set it to 1 if

the carry out of bit M is not equal to the

carry out of bit M + 1 , and set it to 0 other-

XER wise. XO-form Multiply Low and Divide

0 63 instructions having O E = 1 set it to 1 if the

result cannot be represented in 64 bits

Figure 25. Fixed-Point Exception Register (mulld, divd, divdu) or in 32 bits (mullw, divw,

and set it to 0 otherwise. The OV bit

divwu),

The bit definitions for the Fixed-Point Exception Reg- is not altered by Compare instructions, nor

ister are shown below. Here M = 0 in 64-bit mode and by other instructions (except mtspr to the

M = 3 2 in 32-bit mode. mcrxr) that cannot overflow.

XER, and

The bits are set based on the operation of an instruc- Carry (CA)

34

tion considered as a whole, not on intermediate The Carry bit is set as follows, during exe-

results (e.g., the Subtract From Carrying instruction, cution of certain instructions. Add Carrying,

the result of which is specified as the sum of three Subtract From Carrying, Add Extended, and

values, sets bits in the Fixed-Point Exception Register Subtract From Extended types of instructions

based on the entire operation, not on an intermediate set it to 1 if there is a carry out of bit M, and

sum). set it to 0 otherwise. Shift Right Algebraic

instructions set it to 1 if any 1-bits have been

Bit(s) Description shifted out of a negative operand, and set it

to 0 otherwise. The CA bit is not altered by

0:31 Reserved Compare instructions, nor by other

32 Summary Overflow (SO) instructions (except Shift Right Algebraic,

The Summary Overflow bit is set to 1 when- mtspr to the XER, and mcrxr) that cannot

ever an instruction (except mtspr) sets the carry.

Overflow bit. Once set, the SO bit remains 35:56 Reserved

set until it is cleared by an mtspr instruction

mcrxr instruction.

(specifying the XER) or an 57:63 This field specifies the number of bytes to be

It is not altered by Compare instructions, nor transferred by a Load String Indexed or Store

mtspr

by other instructions (except to the String Indexed instruction.

mcrxr) that cannot overflow. Exe-

XER, and

cuting an mtspr instruction to the XER, sup- Compatibility Note

plying the values 0 for SO and 1 for OV, For a discussion of POWER compatibility with

causes SO to be set to 0 and OV to be set to respect to XER, see Appendix E, “Incompatibilities

1. with the POWER Architecture” on page 163.

30 PowerPC User Instruction Set Architecture

Version 2.01

3.3 Fixed-Point Processor Instructions

3.3.1 Fixed-Point Storage Access Instructions

The Storage Access instructions compute the effective Programming Note

address (EA) of the storage to be accessed as The DS field in DS-form Storage Access

described in Section 1.12.2, “Effective Address instructions is a word offset, not a byte offset like

Calculation” on page 14. the D field in D-form Storage Access instructions.

However, for programming convenience, Assem-

Programming Note blers should support the specification of byte

offsets for both forms of instruction.

The la extended mnemonic permits computing an

effective address as a Load or Store instruction

would, but loads the address itself into a GPR

rather than loading the value that is in storage at 3.3.1.1 Storage Access Exceptions

that address. This extended mnemonic is

described in Section B.9, “Miscellaneous Storage accesses will cause the system data storage

Mnemonics” on page 153. error handler to be invoked if the program is not

allowed to modify the target storage (Store only), or if

the program attempts to access storage that is una-

vailable.

3.3.2 Fixed-Point Load Instructions

The byte, halfword, word, or doubleword in storage Programming Note

addressed by EA is loaded into register RT. In some implementations, the Load Algebraic and

Load with Update instructions may have greater

Many of the Load instructions have an “update” form, latency than other types of Load instructions.

in which register RA is updated with the effective Moreover, Load with Update instructions may take

address. For these forms, if RA≠ 0 and RA≠ RT, the longer to execute in some implementations than

effective address is placed into register RA and the the corresponding pair of a non-update Load

storage element (byte, halfword, word, or doubleword) instruction and an Add instruction.

addressed by EA is loaded into RT.

In the preferred form of the Load Quadword instruc-

≠ RT+1.

tion RA Chapter 3. Fixed-Point Processor 31

Version 2.01

Load Byte and Zero D-form Load Byte and Zero Indexed X-form

lbz RT,D(RA) lbzx RT,RA,RB

34 RT RA D 31 RT RA RB 87 /

0 6 11 16 31 0 6 11 16 21 31

← ←

if RA = 0 then b if RA = 0 then b

0 0

← ←

(RA) (RA)

else b else b

← ←

b + EXTS(D) b + (RB)

EA EA

← ←

56 56

0 0

RT RT

MEM(EA, 1) MEM(EA, 1)

|| ||

Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum

The byte in storage addressed by EA is loaded into (RA|0)+(RB). The byte in storage addressed by EA is

RT . RT . RT

are set to 0. loaded into RT are set to 0.

56:63 0:55 56:63 0:55

Special Registers Altered: Special Registers Altered:

None None

Load Byte and Zero with Update Load Byte and Zero with Update

D-form Indexed X-form

lbzu RT,D(RA) lbzux RT,RA,RB

35 RT RA D 31 RT RA RB 119 /

0 6 11 16 31 0 6 11 16 21 31

← ←

EA EA

(RA) + EXTS(D) (RA) + (RB)

← ←

56 56

0 0

RT RT

MEM(EA, 1) MEM(EA, 1)

|| ||

← ←

RA RA

EA EA

Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB).

The byte in storage addressed by EA is loaded into The byte in storage addressed by EA is loaded into

. RT are set to 0. . RT are set to 0.

RT RT

56:63 0:55 56:63 0:55

EA is placed into register RA. EA is placed into register RA.

If R A = 0 or RA=RT, the instruction form is invalid. If R A = 0 or RA=RT, the instruction form is invalid.

Special Registers Altered: Special Registers Altered:

None None

32 PowerPC User Instruction Set Architecture

Version 2.01

Load Halfword and Zero D-form Load Halfword and Zero Indexed

X-form

lhz RT,D(RA) lhzx RT,RA,RB

40 RT RA D 31 RT RA RB 279 /

0 6 11 16 31 0 6 11 16 21 31

if RA = 0 then b 0

← ←

(RA)

else b if RA = 0 then b 0

← ←

b + EXTS(D)

EA (RA)

else b

← 48 ←

0

RT MEM(EA, 2)

|| EA b + (RB)

← 48

0 MEM(EA, 2)

RT ||

Let the effective address (EA) be the sum (RA|0)+D.

The halfword in storage addressed by EA is loaded Let the effective address (EA) be the sum

into RT . RT are set to 0. (RA|0)+(RB). The halfword in storage addressed by

48:63 0:47 . RT are set to 0.

EA is loaded into RT 48:63 0:47

Special Registers Altered:

None Special Registers Altered:

None

Load Halfword and Zero with Update Load Halfword and Zero with Update

D-form Indexed X-form

lhzu RT,D(RA) lhzux RT,RA,RB

41 RT RA D 31 RT RA RB 311 /

0 6 11 16 31 0 6 11 16 21 31

← ←

EA EA

(RA) + EXTS(D) (RA) + (RB)

← ←

48 48

0 0

RT RT

MEM(EA, 2) MEM(EA, 2)

|| ||

← ←

RA RA

EA EA

Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB).

The halfword in storage addressed by EA is loaded The halfword in storage addressed by EA is loaded

into RT . RT are set to 0. into RT . RT are set to 0.

48:63 0:47 48:63 0:47

EA is placed into register RA. EA is placed into register RA.

If R A = 0 or RA=RT, the instruction form is invalid. If R A = 0 or RA=RT, the instruction form is invalid.

Special Registers Altered: Special Registers Altered:

None None

Chapter 3. Fixed-Point Processor 33

Version 2.01

Load Halfword Algebraic D-form Load Halfword Algebraic Indexed

X-form

lha RT,D(RA) lhax RT,RA,RB

42 RT RA D 31 RT RA RB 343 /

0 6 11 16 31 0 6 11 16 21 31

if RA = 0 then b 0

← ←

(RA)

else b if RA = 0 then b 0

← ←

b + EXTS(D)

EA (RA)

else b

← ←

RT EXTS(MEM(EA, 2)) EA b + (RB)

RT EXTS(MEM(EA, 2))

Let the effective address (EA) be the sum (RA|0)+D.

The halfword in storage addressed by EA is loaded Let the effective address (EA) be the sum

into RT . RT are filled with a copy of bit 0 of (RA|0)+(RB). The halfword in storage addressed by

48:63 0:47

the loaded halfword. . RT are filled with a copy

EA is loaded into RT 48:63 0:47

of bit 0 of the loaded halfword.

Special Registers Altered:

None Special Registers Altered:

None

Load Halfword Algebraic with Update Load Halfword Algebraic with Update

D-form Indexed X-form

lhau RT,D(RA) lhaux RT,RA,RB

43 RT RA D 31 RT RA RB 375 /

0 6 11 16 31 0 6 11 16 21 31

← ←

EA EA

(RA) + EXTS(D) (RA) + (RB)

← ←

RT RT

EXTS(MEM(EA, 2)) EXTS(MEM(EA, 2))

← ←

RA RA

EA EA

Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB).

The halfword in storage addressed by EA is loaded The halfword in storage addressed by EA is loaded

. RT are filled with a copy of bit 0 of . RT are filled with a copy of bit 0 of

into RT into RT

48:63 0:47 48:63 0:47

the loaded halfword. the loaded halfword.

EA is placed into register RA. EA is placed into register RA.

If R A = 0 or RA=RT, the instruction form is invalid. If R A = 0 or RA=RT, the instruction form is invalid.

Special Registers Altered: Special Registers Altered:

None None

34 PowerPC User Instruction Set Architecture

Version 2.01

Load Word and Zero D-form Load Word and Zero Indexed X-form

lwz RT,D(RA) lwzx RT,RA,RB

POWER mnemonic: l] POWER mnemonic: lx]

[ [

32 RT RA D 31 RT RA RB 23 /

0 6 11 16 31 0 6 11 16 21 31

← ←

if RA = 0 then b if RA = 0 then b

0 0

← ←

(RA) (RA)

else b else b

← ←

b + EXTS(D) b + (RB)

EA EA

← ←

32 32

0 0

MEM(EA, 4) MEM(EA, 4)

RT RT

|| ||

Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum

The word in storage addressed by EA is loaded into (RA|0)+(RB). The word in storage addressed by EA

RT . RT . RT

are set to 0. is loaded into RT are set to 0.

32:63 0:31 32:63 0:31

Special Registers Altered: Special Registers Altered:

None None

Load Word and Zero with Update Load Word and Zero with Update

D-form Indexed X-form

lwzu RT,D(RA) lwzux RT,RA,RB

POWER mnemonic: lu] POWER mnemonic: lux]

[ [

33 RT RA D 31 RT RA RB 55 /

0 6 11 16 31 0 6 11 16 21 31

← ←

EA EA

(RA) + EXTS(D) (RA) + (RB)

← ←

32 32

0 0

RT RT

MEM(EA, 4) MEM(EA, 4)

|| ||

← ←

RA RA

EA EA

Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB).

The word in storage addressed by EA is loaded into The word in storage addressed by EA is loaded into

. RT are set to 0. . RT are set to 0.

RT RT

32:63 0:31 32:63 0:31

EA is placed into register RA. EA is placed into register RA.

If R A = 0 or RA=RT, the instruction form is invalid. If R A = 0 or RA=RT, the instruction form is invalid.

Special Registers Altered: Special Registers Altered:

None None

Chapter 3. Fixed-Point Processor 35

Version 2.01

Load Word Algebraic DS-form Load Word Algebraic Indexed X-form

lwa RT,DS(RA) lwax RT,RA,RB

58 RT RA DS 2 31 RT RA RB 341 /

0 6 11 16 30 31 0 6 11 16 21 31

← ←

if RA = 0 then b if RA = 0 then b

0 0

← ←

(RA) (RA)

else b else b

← ←

b + EXTS(DS b + (RB)

EA EA

0b00)

||

← ←

RT

EXTS(MEM(EA, 4)) EXTS(MEM(EA, 4))

RT

Let the effective address (EA) be the sum Let the effective address (EA) be the sum

(RA|0)+(DS||0b00). (RA|0)+(RB). The word in storage addressed by EA

The word in storage addressed

. RT . RT

by EA is loaded into RT are filled with a is loaded into RT are filled with a copy of

32:63 0:31 32:63 0:31

copy of bit 0 of the loaded word. bit 0 of the loaded word.

Special Registers Altered: Special Registers Altered:

None None

Load Word Algebraic with Update

Indexed X-form

lwaux RT,RA,RB

31 RT RA RB 373 /

0 6 11 16 21 31

EA (RA) + (RB)

RT EXTS(MEM(EA, 4))

RA EA

Let the effective address (EA) be the sum (RA)+(RB).

The word in storage addressed by EA is loaded into

RT . RT are filled with a copy of bit 0 of the

32:63 0:31

loaded word.

EA is placed into register RA.

If R A = 0 or RA=RT, the instruction form is invalid.

Special Registers Altered:

None

36 PowerPC User Instruction Set Architecture

Version 2.01

Load Doubleword DS-form Load Doubleword Indexed X-form

ld RT,DS(RA) ldx RT,RA,RB

58 RT RA DS 0 31 RT RA RB 21 /

0 6 11 16 30 31 0 6 11 16 21 31

← ←

if RA = 0 then b if RA = 0 then b

0 0

← ←

(RA) (RA)

else b else b

← ←

b + EXTS(DS b + (RB)

EA EA

0b00)

||

← ←

RT

MEM(EA, 8) MEM(EA, 8)

RT

Let the effective address (EA) be the sum Let the effective address (EA) be the sum

(RA|0)+(DS||0b00). (RA|0)+(RB). The doubleword in storage addressed

The doubleword in storage

addressed by EA is loaded into RT. by EA is loaded into RT.

Special Registers Altered: Special Registers Altered:

None None

Load Doubleword with Update DS-form Load Doubleword with Update Indexed

X-form

ldu RT,DS(RA) ldux RT,RA,RB

58 RT RA DS 1 31 RT RA RB 53 /

0 6 11 16 30 31 0 6 11 16 21 31

EA (RA) + EXTS(DS 0b00)

||

← ←

MEM(EA, 8)

RT EA (RA) + (RB)

← ←

RA EA RT MEM(EA, 8)

RA EA

Let the effective address (EA) be the sum

(RA)+(DS||0b00). The doubleword in storage Let the effective address (EA) be the sum (RA)+(RB).

addressed by EA is loaded into RT. The doubleword in storage addressed by EA is loaded

into RT.

EA is placed into register RA. EA is placed into register RA.

If R A = 0 or RA=RT, the instruction form is invalid. If R A = 0 or RA=RT, the instruction form is invalid.

Special Registers Altered:

None Special Registers Altered:

None

Chapter 3. Fixed-Point Processor 37

Version 2.01

3.3.3 Fixed-Point Store Instructions ■

The contents of register RS are stored into the byte, If RA≠ 0, the effective address is placed into reg-

halfword, word, or doubleword in storage addressed ister RA.

by EA. ■ If RS=RA, the contents of register RS are copied

to the target storage element and then EA is

Many of the Store instructions have an “update” form, placed into RA (RS).

in which register RA is updated with the effective

address. For these forms, the following rules apply.

Store Byte D-form Store Byte Indexed X-form

stb RS,D(RA) stbx RS,RA,RB

38 RS RA D 31 RS RA RB 215 /

0 6 11 16 31 0 6 11 16 21 31

← ←

if RA = 0 then b if RA = 0 then b

0 0

← ←

(RA) (RA)

else b else b

← ←

EA EA

b + EXTS(D) b + (RB)

← ←

MEM(EA, 1) MEM(EA, 1)

(RS) (RS)

56:63 56:63

Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum

(RS) are stored into the byte in storage addressed (RA|0)+(RB). (RS) are stored into the byte in

56:63 56:63

by EA. storage addressed by EA.

Special Registers Altered: Special Registers Altered:

None None

Store Byte with Update D-form Store Byte with Update Indexed X-form

stbu RS,D(RA) stbux RS,RA,RB

39 RS RA D 31 RS RA RB 247 /

0 6 11 16 31 0 6 11 16 21 31

← ←

EA EA

(RA) + EXTS(D) (RA) + (RB)

← ←

MEM(EA, 1) MEM(EA, 1)

(RS) (RS)

56:63 56:63

← ←

RA EA RA EA

Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB).

(RS) are stored into the byte in storage addressed (RS) are stored into the byte in storage addressed

56:63 56:63

by EA. by EA.

EA is placed into register RA. EA is placed into register RA.

If R A = 0 , the instruction form is invalid. If R A = 0 , the instruction form is invalid.

Special Registers Altered: Special Registers Altered:

None None

38 PowerPC User Instruction Set Architecture

Version 2.01

Store Halfword D-form Store Halfword Indexed X-form

sth RS,D(RA) sthx RS,RA,RB

44 RS RA D 31 RS RA RB 407 /

0 6 11 16 31 0 6 11 16 21 31

← ←

if RA = 0 then b if RA = 0 then b

0 0

← ←

(RA) (RA)

else b else b

← ←

b + EXTS(D) b + (RB)

EA EA

← ←

MEM(EA, 2) MEM(EA, 2)

(RS) (RS)

48:63 48:63

Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum

(RS) (RA|0)+(RB). (RS)

are stored into the halfword in storage are stored into the halfword in

48:63 48:63

addressed by EA. storage addressed by EA.

Special Registers Altered: Special Registers Altered:

None None

Store Halfword with Update D-form Store Halfword with Update Indexed

X-form

sthu RS,D(RA) sthux RS,RA,RB

45 RS RA D 31 RS RA RB 439 /

0 6 11 16 31 0 6 11 16 21 31

EA (RA) + EXTS(D)

← ←

MEM(EA, 2) (RS) EA (RA) + (RB)

48:63

← ←

RA EA MEM(EA, 2) (RS) 48:63

RA EA

Let the effective address (EA) be the sum (RA)+D.

(RS) are stored into the halfword in storage Let the effective address (EA) be the sum (RA)+(RB).

48:63

addressed by EA. are stored into the halfword in storage

(RS) 48:63

addressed by EA.

EA is placed into register RA. EA is placed into register RA.

If R A = 0 , the instruction form is invalid. If R A = 0 , the instruction form is invalid.

Special Registers Altered:

None Special Registers Altered:

None

Chapter 3. Fixed-Point Processor 39

Version 2.01

Store Word D-form Store Word Indexed X-form

stw RS,D(RA) stwx RS,RA,RB

POWER mnemonic: st] POWER mnemonic: stx]

[ [

36 RS RA D 31 RS RA RB 151 /

0 6 11 16 31 0 6 11 16 21 31

← ←

if RA = 0 then b if RA = 0 then b

0 0

← ←

(RA) (RA)

else b else b

← ←

b + EXTS(D) b + (RB)

EA EA

← ←

MEM(EA, 4) MEM(EA, 4)

(RS) (RS)

32:63 32:63

Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum

(RS) are stored into the word in storage (RA|0)+(RB). (RS) are stored into the word in

32:63 32:63

addressed by EA. storage addressed by EA.

Special Registers Altered: Special Registers Altered:

None None

Store Word with Update D-form Store Word with Update Indexed X-form

stwu RS,D(RA) stwux RS,RA,RB

POWER mnemonic: stu] POWER mnemonic: stux]

[ [

37 RS RA D 31 RS RA RB 183 /

0 6 11 16 31 0 6 11 16 21 31

← ←

EA EA

(RA) + EXTS(D) (RA) + (RB)

← ←

MEM(EA, 4) MEM(EA, 4)

(RS) (RS)

32:63 32:63

← ←

RA RA

EA EA

Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB).

(RS) are stored into the word in storage (RS) are stored into the word in storage

32:63 32:63

addressed by EA. addressed by EA.

EA is placed into register RA. EA is placed into register RA.

If R A = 0 , the instruction form is invalid. If R A = 0 , the instruction form is invalid.

Special Registers Altered: Special Registers Altered:

None None

40 PowerPC User Instruction Set Architecture

Version 2.01

Store Doubleword DS-form Store Doubleword Indexed X-form

std RS,DS(RA) stdx RS,RA,RB

62 RS RA DS 0 31 RS RA RB 149 /

0 6 11 16 30 31 0 6 11 16 21 31

← ←

if RA = 0 then b if RA = 0 then b

0 0

← ←

(RA) (RA)

else b else b

← ←

b + EXTS(DS b + (RB)

EA EA

0b00)

||

← ←

MEM(EA, 8)

(RS) (RS)

MEM(EA, 8)

Let the effective address (EA) be the sum Let the effective address (EA) be the sum

(RA|0)+(DS||0b00). (RA|0)+(RB). (RS) is stored into the doubleword in

(RS) is stored into the

doubleword in storage addressed by EA. storage addressed by EA.

Special Registers Altered: Special Registers Altered:

None None

Store Doubleword with Update DS-form Store Doubleword with Update Indexed

X-form

stdu RS,DS(RA) stdux RS,RA,RB

62 RS RA DS 1 31 RS RA RB 181 /

0 6 11 16 30 31 0 6 11 16 21 31

EA (RA) + EXTS(DS 0b00)

||

← ←

(RS)

MEM(EA, 8) EA (RA) + (RB)

← ←

RA EA MEM(EA, 8) (RS)

RA EA

Let the effective address (EA) be the sum

(RA)+(DS||0b00). (RS) is stored into the doubleword Let the effective address (EA) be the sum (RA)+(RB).

in storage addressed by EA. (RS) is stored into the doubleword in storage

addressed by EA.

EA is placed into register RA. EA is placed into register RA.

If R A = 0 , the instruction form is invalid. If R A = 0 , the instruction form is invalid.

Special Registers Altered:

None Special Registers Altered:

None

Chapter 3. Fixed-Point Processor 41

Version 2.01

3.3.4 Fixed-Point Load and Store with Byte Reversal Instructions

Programming Note

These instructions have the effect of loading and

storing data in Little-Endian byte order.

In some implementations, the Load Byte-Reverse

instructions may have greater latency than other

Load instructions.

Load Halfword Byte-Reverse Indexed Load Word Byte-Reverse Indexed

X-form X-form

lhbrx RT,RA,RB lwbrx RT,RA,RB

POWER mnemonic: lbrx]

[

31 RT RA RB 790 / 31 RT RA RB 534 /

0 6 11 16 21 31 0 6 11 16 21 31

if RA = 0 then b 0

← (RA)

else b ←

if RA = 0 then b 0

EA b + (RB) ← (RA)

else b

← 48

0

RT MEM(EA+1, 1) MEM(EA, 1) ←

|| || EA b + (RB)

← 32

0 MEM(EA+3, 1)

RT MEM(EA+2, 1)

|| ||

Let the effective address (EA) be the sum MEM(EA+1, 1) MEM(EA, 1)

|| ||

(RA|0)+(RB). Bits 0:7 of the halfword in storage

. Bits 8:15 of

addressed by EA are loaded into RT Let the effective address (EA) be the sum

56:63

the halfword in storage addressed by EA are loaded (RA|0)+(RB). Bits 0:7 of the word in storage

into RT . RT are set to 0. . Bits 8:15 of

addressed by EA are loaded into RT

48:55 0:47 56:63

the word in storage addressed by EA are loaded into

Special Registers Altered: RT . Bits 16:23 of the word in storage addressed

48:55

None by EA are loaded into RT . Bits 24:31 of the word

40:47 .

in storage addressed by EA are loaded into RT 32:39

RT are set to 0.

0:31

Special Registers Altered:

None

42 PowerPC User Instruction Set Architecture

Version 2.01

Store Halfword Byte-Reverse Indexed Store Word Byte-Reverse Indexed

X-form X-form

sthbrx RS,RA,RB stwbrx RS,RA,RB

POWER mnemonic: stbrx]

[

31 RS RA RB 918 / 31 RS RA RB 662 /

0 6 11 16 21 31 0 6 11 16 21 31

if RA = 0 then b 0

← (RA)

else b ←

if RA = 0 then b 0

EA b + (RB) ← (RA)

else b

← (RS)

MEM(EA, 2) (RS) ←

|| b + (RB)

EA

56:63 48:55 ←

MEM(EA, 4) (RS) (RS) (RS)

(RS) || || ||

Let the effective address (EA) be the sum 56:63 48:55 40:47 32:39

(RA|0)+(RB). (RS) are stored into bits 0:7 of the Let the effective address (EA) be the sum

56:63 are

halfword in storage addressed by EA. (RS) (RA|0)+(RB). (RS) are stored into bits 0:7 of the

48:55 56:63

stored into bits 8:15 of the halfword in storage word in storage addressed by EA. (RS) are stored

48:55

addressed by EA. into bits 8:15 of the word in storage addressed by EA.

(RS) are stored into bits 16:23 of the word in

40:47

Special Registers Altered: storage addressed by EA. (RS) are stored into

32:39

None bits 24:31 of the word in storage addressed by EA.

Special Registers Altered:

None

Chapter 3. Fixed-Point Processor 43

Version 2.01

3.3.5 Fixed-Point Load and Store Multiple Instructions

The Load/Store Multiple instructions have preferred Compatibility Note

forms; see Section 1.9.1, “Preferred Instruction For a discussion of POWER compatibility with

Forms” on page 13. In the preferred forms, storage respect to the alignment of the EA for the Load

alignment satisfies the following rule. Multiple Word and Store Multiple Word

■ The combination of the EA and RT (RS) is such instructions, see Appendix E, “Incompatibilities

that the low-order byte of GPR 31 is loaded with the POWER Architecture” on page 163. For

(stored) from (into) the last byte of an aligned compatibility with future versions of the PowerPC

quadword in storage. Architecture, these EAs should be word-aligned.

Load Multiple Word D-form Store Multiple Word D-form

lmw RT,D(RA) stmw RS,D(RA)

POWER mnemonic: lm] POWER mnemonic: stm]

[ [

46 RT RA D 47 RS RA D

0 6 11 16 31 0 6 11 16 31

← ←

if RA = 0 then b if RA = 0 then b

0 0

← ←

(RA) (RA)

else b else b

← ←

EA EA

b + EXTS(D) b + EXTS(D)

← ←

r r

RT RS

≤ ≤

do while r do while r

31 31

← ←

32

0

GPR(r) MEM(EA, 4)

MEM(EA, 4) GPR(r)

|| 32:63

← ←

r

r r + 1 r + 1

← ←

EA EA

EA + 4 EA + 4

Let n = (32− Let n = (32−

RT). Let the effective address (EA) be RS). Let the effective address (EA) be

the sum (RA|0)+D. the sum (RA|0)+D.

n consecutive words starting at EA are loaded into n consecutive words starting at EA are stored from

the low-order 32 bits of GPRs RT through 31. The the low-order 32 bits of GPRs RS through 31.

high-order 32 bits of these GPRs are set to zero. Special Registers Altered:

None

If RA is in the range of registers to be loaded,

including the case in which R A = 0 , the instruction

form is invalid.

Special Registers Altered:

None

44 PowerPC User Instruction Set Architecture

Version 2.01

3.3.6 Fixed-Point Move Assist Instructions

The Move Assist instructions allow movement of data For some implementations, using GPR 4 for RS and

from storage to registers or from registers to storage RT may result in slightly faster execution than using

without concern for alignment. These instructions can GPR 5; see Book IV, PowerPC Implementation Fea-

be used for a short move between arbitrary storage tures.

locations or to initiate a long move between unaligned

storage fields.

The Load/Store String instructions have preferred

forms; see Section 1.9.1, “Preferred Instruction

Forms” on page 13. In the preferred forms, register

usage satisfies the following rules.

■ RS = 4 or 5

■ RT = 4 or 5

■ ≤

last register loaded/stored 12 Chapter 3. Fixed-Point Processor 45

Version 2.01

Load String Word Immediate X-form Load String Word Indexed X-form

lswi RT,RA,NB lswx RT,RA,RB

POWER mnemonic: lsi] POWER mnemonic: lsx]

[ [

31 RT RA NB 597 / 31 RT RA RB 533 /

0 6 11 16 21 31 0 6 11 16 21 31

← ←

if RA = 0 then EA if RA = 0 then b

0 0

← ←

(RA) (RA)

else EA else b

← b + (RB)

if NB = 0 then n EA

32 ←

← n XER

NB

else n 57:63

← ←

− −

r

RT RT

r 1 1

← ←

32 32

i i ←

do while n > 0 RT undefined

if i = 32 then do while n > 0

r if i = 32 then

r + 1 (mod 32)

← ←

r

GPR(r) 0 r + 1 (mod 32)

← ←

GPR(r) GPR(r)

MEM(EA, 1) 0

i : i + 7

← ← MEM(EA, 1)

i GPR(r)

i + 8 i : i + 7

← i

if i = 64 then i i + 8

32

← ←

EA + 1 if i = 64 then i 32

EA ← ←

n n 1 EA + 1

EA ← −

n n 1

Let the effective address (EA) be (RA|0). Let n = NB Let the effective address (EA) be the sum

if NB≠ 0, n = 32 if NB=0; n is the number of bytes to (RA|0)+(RB). Let n = XER ; n is the number of

load. Let nr = CEIL(n÷ 4); nr is the number of regis- 57:63

bytes to load. Let nr = CEIL(n÷ 4); nr is the number

ters to receive data. of registers to receive data.

n consecutive bytes starting at EA are loaded into

− If n > 0 , n consecutive bytes starting at EA are loaded

GPRs RT through R T + n r 1. Data are loaded into the −

into GPRs RT through R T + n r 1. Data are loaded

low-order four bytes of each GPR; the high-order four into the low-order four bytes of each GPR; the high-

bytes are set to 0. order four bytes are set to 0.

Bytes are loaded left to right in each register. The Bytes are loaded left to right in each register. The

sequence of registers wraps around to GPR 0 if sequence of registers wraps around to GPR 0 if

required. If the low-order four bytes of register

− required. If the low-order four bytes of register

1 are only partially filled, the unfilled low-

R T + n r − 1 are only partially filled, the unfilled low-

R T + n r

order byte(s) of that register are set to 0. order byte(s) of that register are set to 0.

If RA is in the range of registers to be loaded, If n = 0 , the contents of register RT are undefined.

including the case in which R A = 0 , the instruction

form is invalid. If RA or RB is in the range of registers to be loaded,

including the case in which R A = 0 , either the system

Special Registers Altered: illegal instruction error handler is invoked or the

None results are boundedly undefined. If R T = R A or

RT=RB, the instruction form is invalid.

Special Registers Altered:

None

46 PowerPC User Instruction Set Architecture

Version 2.01

Store String Word Immediate X-form Store String Word Indexed X-form

stswi RS,RA,NB stswx RS,RA,RB

POWER mnemonic: stsi] POWER mnemonic: stsx]

[ [

31 RS RA NB 725 / 31 RS RA RB 661 /

0 6 11 16 21 31 0 6 11 16 21 31

← ←

if RA = 0 then EA if RA = 0 then b

0 0

← ←

(RA) (RA)

else EA else b

← b + (RB)

if NB = 0 then n EA

32 ←

← n XER

NB

else n 57:63

← ←

− −

r

RS RS

r 1 1

← ←

32 32

i i

do while n > 0 do while n > 0

← ←

if i = 32 then r if i = 32 then r

r + 1 (mod 32) r + 1 (mod 32)

← ←

GPR(r) GPR(r)

MEM(EA, 1) MEM(EA, 1)

i : i + 7 i : i + 7

← ←

i i

i + 8 i + 8

← ←

if i = 64 then i if i = 64 then i

32 32

← ←

EA + 1 EA + 1

EA EA

← ←

− −

n n

n n

1 1

Let the effective address (EA) be (RA|0). Let n = NB Let the effective address (EA) be the sum

if NB≠ (RA|0)+(RB). Let n = XER ; n is the number of

0, n = 32 if NB=0; n is the number of bytes to 57:63

bytes to store. Let nr = CEIL(n÷

4); nr is the number of regis-

store. Let nr = CEIL(n÷ 4); nr is the number

ters to supply data. of registers to supply data.

n consecutive bytes starting at EA are stored from If n > 0 , n consecutive bytes starting at EA are stored

− −

GPRs RS through R S + n r 1. Data are stored from from GPRs RS through R S + n r 1. Data are stored

the low-order four bytes of each GPR. from the low-order four bytes of each GPR.

Bytes are stored left to right from each register. The Bytes are stored left to right from each register. The

sequence of registers wraps around to GPR 0 if sequence of registers wraps around to GPR 0 if

required. required.

Special Registers Altered: If n = 0 , no bytes are stored.

None Special Registers Altered:

None

Chapter 3. Fixed-Point Processor 47

Version 2.01

3.3.7 Other Fixed-Point Instructions result placed into the target register. In 64-bit mode,

The remainder of the fixed-point instructions use the these bits are set by signed comparison of the result

contents of the General Purpose Registers (GPRs) as to zero. In 32-bit mode, these bits are set by signed

source operands, and place results into GPRs, into the comparison of the low-order 32 bits of the result to

Fixed-Point Exception Register (XER), and into Condi- zero.

tion Register fields. In addition, the Trap instructions

test the contents of a GPR or XER bit, invoking the Unless otherwise noted and when appropriate, when

system trap handler if the result of the specified test CR Field 0 and the XER are set they reflect the value

is true. placed into the target register.

These instructions treat the source operands as

signed integers unless the instruction is explicitly Programming Note

identified as performing an unsigned operation. Instructions with the OE bit set or that set CA may

execute slowly or may prevent the execution of

The X-form and XO-form instructions with R c = 1 , and subsequent instructions until the instruction has

the D-form instructions addic., andi., and andis., set completed.

the first three bits of CR Field 0 to characterize the

48 PowerPC User Instruction Set Architecture

Version 2.01

3.3.8 Fixed-Point Arithmetic Instructions

The XO-form Arithmetic instructions with R c = 1 , and Extended mnemonics for addition and

the D-form Arithmetic instruction addic., set the first subtraction

three bits of CR Field 0 as described in Section 3.3.7,

“Other Fixed-Point Instructions” on page 48. Several extended mnemonics are provided that use

the Add Immediate and Add Immediate Shifted

addic, addic., subfic, addc, subfc, adde, subfe, addme, instructions to load an immediate value or an address

subfme, addze, and subfze always set CA, to reflect into a target register. Some of these are shown as

the carry out of bit 0 in 64-bit mode and out of bit 32 examples with the two instructions.

in 32-bit mode. The XO-form Arithmetic instructions

set SO and OV when O E = 1 to reflect overflow of the The PowerPC Architecture supplies Subtract From

result. Except for the Multiply Low and Divide instructions, which subtract the second operand from

instructions, the setting of these bits is mode- the third. A set of extended mnemonics is provided

dependent, and reflects overflow of the 64-bit result in that use the more “normal” order, in which the third

64-bit mode and overflow of the low-order 32-bit operand is subtracted from the second, with the third

result in 32-bit mode. For XO-form Multiply Low and operand being either an immediate field or a register.

Divide instructions, the setting of these bits is mode- Some of these are shown as examples with the appro-

independent, and reflects overflow of the 64-bit result priate Add and Subtract From instructions.

for mulld, divd, and divdu, and overflow of the low-

order 32-bit result for mullw, divw, and divwu. See Appendix B, “Assembler Extended Mnemonics”

on page 143 for additional extended mnemonics.

Programming Note

Notice that CR Field 0 may not reflect the “true”

(infinitely precise) result if overflow occurs.

Add Immediate D-form Add Immediate Shifted D-form

addi RT,RA,SI addis RT,RA,SI

POWER mnemonic: cal] POWER mnemonic: cau]

[ [

14 RT RA SI 15 RT RA SI

0 6 11 16 31 0 6 11 16 31

← ← 16

if RA = 0 then RT if RA = 0 then RT 0)

EXTS(SI) EXTS(SI ||

← ← 16

else RT 0)

(RA) + EXTS(SI) (RA) + EXTS(SI

else RT ||

The sum (RA|0) + SI is placed into register RT. The sum (RA|0) + (SI 0x0000) is placed into reg-

||

ister RT.

Special Registers Altered:

None Special Registers Altered:

None

Extended Mnemonics: Extended Mnemonics:

Examples of extended mnemonics for Add Immediate: Examples of extended mnemonics for Add Immediate

Shifted:

Extended: Equivalent to:

li Rx,value addi Rx,0,value Extended: Equivalent to:

la Rx,disp(Ry) addi Rx,Ry,disp

subi Rx,Ry,value addi Rx,Ry,− lis Rx,value addis Rx,0,value

value subis Rx,Ry,value addis Rx,Ry,− value

Programming Note

addi, addis, add, and subf are the preferred

instructions for addition and subtraction, because

they set few status bits.

Notice that addi and addis use the value 0, not the

contents of GPR 0, if R A = 0 . Chapter 3. Fixed-Point Processor 49

Version 2.01

Add XO-form Subtract From XO-form

add RT,RA,RB (OE=0 Rc=0) subf RT,RA,RB (OE=0 Rc=0)

add. RT,RA,RB (OE=0 Rc=1) subf. RT,RA,RB (OE=0 Rc=1)

addo RT,RA,RB (OE=1 Rc=0) subfo RT,RA,RB (OE=1 Rc=0)

addo. RT,RA,RB (OE=1 Rc=1) subfo. RT,RA,RB (OE=1 Rc=1)

POWER mnemonics: cax, cax., caxo, caxo.]

[ 31 RT RA RB OE 40 Rc

31 RT RA RB OE 266 Rc 0 6 11 16 21 22 31

0 6 11 16 21 22 31 ←

RT ¬(RA) + (RB) + 1

RT (RA) + (RB) The sum ¬ ( R A ) + (RB) + 1 is placed into register

RT.

The sum (RA) + (RB) is placed into register RT. Special Registers Altered:

Special Registers Altered: CR0 (if R c = 1 )

CR0 (if R c = 1 ) SO OV (if OE=1)

SO OV (if OE=1) Extended Mnemonics:

Example of extended mnemonics for Subtract From:

Extended: Equivalent to:

sub Rx,Ry,Rz subf Rx,Rz,Ry

Add Immediate Carrying D-form Add Immediate Carrying and Record

D-form

addic RT,RA,SI addic. RT,RA,SI

POWER mnemonic: ai]

[ POWER mnemonic: ai.]

[

12 RT RA SI 13 RT RA SI

0 6 11 16 31 0 6 11 16 31

RT (RA) + EXTS(SI) ←

RT (RA) + EXTS(SI)

The sum (RA) + SI is placed into register RT. The sum (RA) + SI is placed into register RT.

Special Registers Altered:

CA Special Registers Altered:

CR0 CA

Extended Mnemonics: Extended Mnemonics:

Example of extended mnemonics for Add Immediate

Carrying: Example of extended mnemonics for Add Immediate

Carrying and Record:

Extended: Equivalent to: Extended: Equivalent to:

subic Rx,Ry,value addic Rx,Ry,− value subic. Rx,Ry,value addic. Rx,Ry,− value

50 PowerPC User Instruction Set Architecture

Version 2.01

Subtract From Immediate Carrying

D-form

subfic RT,RA,SI

POWER mnemonic: sfi]

[ 8 RT RA SI

0 6 11 16 31

RT ¬(RA) + EXTS(SI) + 1

The sum ¬ ( R A ) + SI + 1 is placed into register RT.

Special Registers Altered:

CA

Add Carrying XO-form Subtract From Carrying XO-form

addc RT,RA,RB (OE=0 Rc=0) subfc RT,RA,RB (OE=0 Rc=0)

addc. RT,RA,RB (OE=0 Rc=1) subfc. RT,RA,RB (OE=0 Rc=1)

addco RT,RA,RB (OE=1 Rc=0) subfco RT,RA,RB (OE=1 Rc=0)

addco. RT,RA,RB (OE=1 Rc=1) subfco. RT,RA,RB (OE=1 Rc=1)

POWER mnemonics: a, a., ao, ao.] POWER mnemonics: sf, sf., sfo, sfo.]

[ [

31 RT RA RB OE 10 Rc 31 RT RA RB OE 8 Rc

0 6 11 16 21 22 31 0 6 11 16 21 22 31

← ←

RT RT

(RA) + (RB) ¬(RA) + (RB) + 1

The sum (RA) + (RB) is placed into register RT. The sum ¬ ( R A ) + (RB) + 1 is placed into register

RT.

Special Registers Altered:

CA Special Registers Altered:

CR0 (if R c = 1 ) CA

SO OV (if OE=1) CR0 (if R c = 1 )

SO OV (if OE=1)

Extended Mnemonics:

Example of extended mnemonics for Subtract From

Carrying:

Extended: Equivalent to:

subc Rx,Ry,Rz subfc Rx,Rz,Ry

Chapter 3. Fixed-Point Processor 51

Version 2.01

Add Extended XO-form Subtract From Extended XO-form

adde RT,RA,RB (OE=0 Rc=0) subfe RT,RA,RB (OE=0 Rc=0)

adde. RT,RA,RB (OE=0 Rc=1) subfe. RT,RA,RB (OE=0 Rc=1)

addeo RT,RA,RB (OE=1 Rc=0) subfeo RT,RA,RB (OE=1 Rc=0)

addeo. RT,RA,RB (OE=1 Rc=1) subfeo. RT,RA,RB (OE=1 Rc=1)

POWER mnemonics: ae, ae., aeo, aeo.] POWER mnemonics: sfe, sfe., sfeo, sfeo.]

[ [

31 RT RA RB OE 138 Rc 31 RT RA RB OE 136 Rc

0 6 11 16 21 22 31 0 6 11 16 21 22 31

← ←

RT RT

(RA) + (RB) + CA ¬(RA) + (RB) + CA

The sum (RA) + (RB) + CA is placed into register The sum ¬ ( R A ) + (RB) + CA is placed into register

RT. RT.

Special Registers Altered: Special Registers Altered:

CA CA

CR0 (if R c = 1 ) CR0 (if R c = 1 )

SO OV (if OE=1) SO OV (if OE=1)

Add to Minus One Extended XO-form Subtract From Minus One Extended

XO-form

addme RT,RA (OE=0 Rc=0)

addme. RT,RA (OE=0 Rc=1) subfme RT,RA (OE=0 Rc=0)

addmeo RT,RA (OE=1 Rc=0) subfme. RT,RA (OE=0 Rc=1)

addmeo. RT,RA (OE=1 Rc=1) subfmeo RT,RA (OE=1 Rc=0)

subfmeo. RT,RA (OE=1 Rc=1)

POWER mnemonics: ame, ame., ameo, ameo.]

[ POWER mnemonics: sfme, sfme., sfmeo, sfmeo.]

[

31 RT RA /// OE 234 Rc 31 RT RA /// OE 232 Rc

0 6 11 16 21 22 31 0 6 11 16 21 22 31

← −

RT (RA) + CA 1 ← −

RT ¬(RA) + CA 1

64

The sum (RA) + CA + 1 is placed into register RT. 64

The sum ¬ ( R A ) + CA + 1 is placed into register

Special Registers Altered: RT.

CA

CR0 (if R c = 1 ) Special Registers Altered:

SO OV (if OE=1) CA

CR0 (if R c = 1 )

SO OV (if OE=1)

52 PowerPC User Instruction Set Architecture

Version 2.01

Add to Zero Extended XO-form Subtract From Zero Extended XO-form

addze RT,RA (OE=0 Rc=0) subfze RT,RA (OE=0 Rc=0)

addze. RT,RA (OE=0 Rc=1) subfze. RT,RA (OE=0 Rc=1)

addzeo RT,RA (OE=1 Rc=0) subfzeo RT,RA (OE=1 Rc=0)

addzeo. RT,RA (OE=1 Rc=1) subfzeo. RT,RA (OE=1 Rc=1)

POWER mnemonics: aze, aze., azeo, azeo.] POWER mnemonics: sfze, sfze., sfzeo, sfzeo.]

[ [

31 RT RA /// OE 202 Rc 31 RT RA /// OE 200 Rc

0 6 11 16 21 22 31 0 6 11 16 21 22 31

← ←

RT RT

(RA) + CA ¬(RA) + CA

The sum (RA) + CA is placed into register RT. The sum ¬ ( R A ) + CA is placed into register RT.

Special Registers Altered: Special Registers Altered:

CA CA

CR0 (if R c = 1 ) CR0 (if R c = 1 )

SO OV (if OE=1) SO OV (if OE=1)

Programming Note

The setting of CA by the Add and Subtract From

instructions, including the Extended versions

thereof, is mode-dependent. If a sequence of

these instructions is used to perform extended-

precision addition or subtraction, the same mode

should be used throughout the sequence.

Negate XO-form

neg RT,RA (OE=0 Rc=0)

neg. RT,RA (OE=0 Rc=1)

nego RT,RA (OE=1 Rc=0)

nego. RT,RA (OE=1 Rc=1)

31 RT RA /// OE 104 Rc

0 6 11 16 21 22 31

RT ¬(RA) + 1

The sum ¬ ( R A ) + 1 is placed into register RT.

If the processor is in 64-bit mode and register RA

contains the most negative 64-bit number (0x8000_

0000_0000_0000), the result is the most negative

number and, if OE=1, OV is set to 1. Similarly, if the

processor is in 32-bit mode and (RA) contain the

32:63

most negative 32-bit number (0x8000_0000), the low-

order 32 bits of the result contain the most negative

32-bit number and, if OE=1, OV is set to 1.

Special Registers Altered:

CR0 (if R c = 1 )

SO OV (if OE=1)

Chapter 3. Fixed-Point Processor 53

Version 2.01

Multiply Low Immediate D-form

mulli RT,RA,SI

POWER mnemonic: muli]

[ 7 RT RA SI

0 6 11 16 31

prod (RA) × EXTS(SI)

0:127

RT prod 64:127

The 64-bit first operand is (RA). The 64-bit second

operand is the sign-extended value of the SI field.

The low-order 64 bits of the 128-bit product of the

operands are placed into register RT.

Both operands and the product are interpreted as

signed integers.

Special Registers Altered:

None

Multiply Low Doubleword XO-form Multiply Low Word XO-form

mulld RT,RA,RB (OE=0 Rc=0) mullw RT,RA,RB (OE=0 Rc=0)

mulld. RT,RA,RB (OE=0 Rc=1) mullw. RT,RA,RB (OE=0 Rc=1)

mulldo RT,RA,RB (OE=1 Rc=0) mullwo RT,RA,RB (OE=1 Rc=0)

mulldo. RT,RA,RB (OE=1 Rc=1) mullwo. RT,RA,RB (OE=1 Rc=1)

POWER mnemonics: muls, muls., mulso, mulso.]

[

31 RT RA RB OE 233 Rc 31 RT RA RB OE 235 Rc

0 6 11 16 21 22 31 0 6 11 16 21 22 31

prod (RA) × (RB)

0:127 ←

← RT × (RB)

(RA)

RT prod 32:63 32:63

64:127 The 32-bit operands are the low-order 32 bits of RA

The 64-bit operands are (RA) and (RB). The low-order and of RB. The 64-bit product of the operands is

64 bits of the 128-bit product of the operands are placed into register RT.

placed into register RT. If O E = 1 then OV is set to 1 if the product cannot be

If O E = 1 then OV is set to 1 if the product cannot be represented in 32 bits.

represented in 64 bits. Both operands and the product are interpreted as

Both operands and the product are interpreted as signed integers.

signed integers. Special Registers Altered:

Special Registers Altered: CR0 (if R c = 1 )

CR0 (if R c = 1 ) SO OV (if OE=1)

SO OV (if OE=1) Programming Note

Programming Note For mulli and mullw, the low-order 32 bits of the

The XO-form Multiply instructions may execute product are the correct 32-bit product for 32-bit

faster on some implementations if RB contains mode.

the operand having the smaller absolute value. For mulli and mulld, the low-order 64 bits of the

product are independent of whether the operands

are regarded as signed or unsigned 64-bit inte-

gers. For mulli and mullw, the low-order 32 bits

of the product are independent of whether the

operands are regarded as signed or unsigned

32-bit integers.

54 PowerPC User Instruction Set Architecture

Version 2.01

Multiply High Doubleword XO-form Multiply High Word XO-form

mulhd RT,RA,RB (Rc=0) mulhw RT,RA,RB (Rc=0)

mulhd. RT,RA,RB (Rc=1) mulhw. RT,RA,RB (Rc=1)

31 RT RA RB / 73 Rc 31 RT RA RB / 75 Rc

0 6 11 16 21 22 31 0 6 11 16 21 22 31

← ←

prod (RA) × (RB) prod (RA) × (RB)

0:127 0:63 32:63 32:63

← ←

RT RT prod

prod 0:63 32:63 0:31

RT undefined

0:31

The 64-bit operands are (RA) and (RB). The high- The 32-bit operands are the low-order 32 bits of RA

order 64 bits of the 128-bit product of the operands and of RB. The high-order 32 bits of the 64-bit

are placed into register RT. product of the operands are placed into RT . The

32:63

contents of RT are undefined.

Both operands and the product are interpreted as 0:31

signed integers. Both operands and the product are interpreted as

signed integers.

Special Registers Altered:

CR0 (if R c = 1 ) Special Registers Altered:

CR0 (bits 0:2 undefined in 64-bit mode) (if R c = 1 )

Multiply High Doubleword Unsigned Multiply High Word Unsigned XO-form

XO-form mulhwu RT,RA,RB (Rc=0)

mulhwu. RT,RA,RB (Rc=1)

mulhdu RT,RA,RB (Rc=0)

mulhdu. RT,RA,RB (Rc=1) 31 RT RA RB / 11 Rc

31 RT RA RB / 9 Rc 0 6 11 16 21 22 31

0 6 11 16 21 22 31 ←

prod × (RB)

(RA)

0:63 32:63 32:63

← RT prod

prod (RA) × (RB) 32:63 0:31

0:127

← RT undefined

RT prod 0:31

0:63 The 32-bit operands are the low-order 32 bits of RA

The 64-bit operands are (RA) and (RB). The high- and of RB. The high-order 32 bits of the 64-bit

order 64 bits of the 128-bit product of the operands . The

product of the operands are placed into RT

are placed into register RT. 32:63

contents of RT are undefined.

0:31

Both operands and the product are interpreted as Both operands and the product are interpreted as

unsigned integers, except that if R c = 1 the first three unsigned integers, except that if R c = 1 the first three

bits of CR Field 0 are set by signed comparison of the bits of CR Field 0 are set by signed comparison of the

result to zero. result to zero.

Special Registers Altered: Special Registers Altered:

CR0 (if R c = 1 ) CR0 (bits 0:2 undefined in 64-bit mode) (if R c = 1 )

Chapter 3. Fixed-Point Processor 55

Version 2.01

Divide Doubleword XO-form Divide Word XO-form

divd RT,RA,RB (OE=0 Rc=0) divw RT,RA,RB (OE=0 Rc=0)

divd. RT,RA,RB (OE=0 Rc=1) divw. RT,RA,RB (OE=0 Rc=1)

divdo RT,RA,RB (OE=1 Rc=0) divwo RT,RA,RB (OE=1 Rc=0)

divdo. RT,RA,RB (OE=1 Rc=1) divwo. RT,RA,RB (OE=1 Rc=1)

31 RT RA RB OE 489 Rc 31 RT RA RB OE 491 Rc

0 6 11 16 21 22 31 0 6 11 16 21 22 31

← ← )

dividend (RA) dividend EXTS((RA)

0:63 0:63 32:63

← ←

divisor )

divisor (RB) EXTS((RB)

0:63 0:63 32:63

← ←

RT dividend ÷ divisor

RT dividend ÷ divisor 32:63 ←

RT undefined

0:31

The 64-bit dividend is (RA). The 64-bit divisor is (RB). The 64-bit dividend is the sign-extended value of

The 64-bit quotient of the dividend and divisor is . The 64-bit divisor is the sign-extended

(RA)

placed into register RT. The remainder is not sup- 32:63

value of (RB) . The 64-bit quotient is formed. The

plied as a result. 32:63

low-order 32 bits of the 64-bit quotient are placed into

RT . The contents of RT are undefined. The

Both operands and the quotient are interpreted as 32:63 0:31

remainder is not supplied as a result.

signed integers. The quotient is the unique signed

integer that satisfies Both operands and the quotient are interpreted as

signed integers. The quotient is the unique signed

dividend divisor) r

= (quotient × + integer that satisfies

≤ r

where 0 |divisor| if the dividend is nonnegative,

< dividend divisor) r

= (quotient × +

− ≤

|divisor| r

and 0 if the dividend is negative.

< ≤ r

where 0 |divisor| if the dividend is nonnegative,

<

If an attempt is made to perform any of the divisions − ≤

|divisor|

and r 0 if the dividend is negative.

<

0x8000_0000_0000_0000 ÷ 1

<anything> ÷ 0 If an attempt is made to perform any of the divisions

then the contents of register RT are undefined as are −

0x8000_0000 ÷ 1

(if R c = 1 ) the contents of the LT, GT, and EQ bits of <anything> ÷ 0

CR Field 0. In these cases, if O E = 1 then OV is set to then the contents of register RT are undefined as are

1. (if R c = 1 ) the contents of the LT, GT, and EQ bits of

CR Field 0. In these cases, if O E = 1 then OV is set to

Special Registers Altered: 1.

CR0 (if R c = 1 )

SO OV (if OE=1) Special Registers Altered:

CR0 (bits 0:2 undefined in 64-bit mode) (if R c = 1 )

Programming Note SO OV (if OE=1)

The 64-bit signed remainder of dividing (RA) by

(RB) can be computed as follows, except in the Programming Note

− 63

case that (RA) = and (RB) = 1.

2 The 32-bit signed remainder of dividing (RA) 32:63

divd RT,RA,RB # RT = quotient by (RB) can be computed as follows, except in

32:63

mulld RT,RT,RB # RT = quotient*divisor − −

31

the case that (RA) = 2 and (RB) = 1.

subf RT,RT,RA # RT = remainder 32:63 32:63

divw RT,RA,RB # RT = quotient

mullw RT,RT,RB # RT = quotient*divisor

subf RT,RT,RA # RT = remainder

56 PowerPC User Instruction Set Architecture

Version 2.01

Divide Doubleword Unsigned XO-form Divide Word Unsigned XO-form

divdu RT,RA,RB (OE=0 Rc=0) divwu RT,RA,RB (OE=0 Rc=0)

divdu. RT,RA,RB (OE=0 Rc=1) divwu. RT,RA,RB (OE=0 Rc=1)

divduo RT,RA,RB (OE=1 Rc=0) divwuo RT,RA,RB (OE=1 Rc=0)

divduo. RT,RA,RB (OE=1 Rc=1) divwuo. RT,RA,RB (OE=1 Rc=1)

31 RT RA RB OE 457 Rc 31 RT RA RB OE 459 Rc

0 6 11 16 21 22 31 0 6 11 16 21 22 31

← ← 32

0

dividend (RA) dividend (RA)

||

0:63 0:63 32:63

← ← 32

0

divisor divisor

(RB) (RB)

||

0:63 0:63 32:63

← ←

RT dividend ÷ divisor

RT dividend ÷ divisor 32:63 ←

RT undefined

0:31

The 64-bit dividend is (RA). The 64-bit divisor is (RB). The 64-bit dividend is the zero-extended value of

The 64-bit quotient of the dividend and divisor is . The 64-bit divisor is the zero-extended

(RA)

placed into register RT. The remainder is not sup- 32:63

value of (RB) . The 64-bit quotient is formed. The

plied as a result. 32:63

low-order 32 bits of the 64-bit quotient are placed into

RT . The contents of RT are undefined. The

Both operands and the quotient are interpreted as 32:63 0:31

remainder is not supplied as a result.

unsigned integers, except that if R c = 1 the first three

bits of CR Field 0 are set by signed comparison of the Both operands and the quotient are interpreted as

result to zero. The quotient is the unique unsigned unsigned integers, except that if R c = 1 the first three

integer that satisfies bits of CR Field 0 are set by signed comparison of the

result to zero. The quotient is the unique unsigned

dividend divisor) r

= (quotient × + integer that satisfies

≤ r

where 0 divisor.

< dividend divisor) r

= (quotient × +

If an attempt is made to perform the division ≤ r

where 0 divisor.

<

<anything> ÷ 0

then the contents of register RT are undefined as are If an attempt is made to perform the division

(if R c = 1 ) the contents of the LT, GT, and EQ bits of <anything> ÷ 0

CR Field 0. In this case, if O E = 1 then OV is set to 1. then the contents of register RT are undefined as are

(if R c = 1 ) the contents of the LT, GT, and EQ bits of

Special Registers Altered: CR Field 0. In this case, if O E = 1 then OV is set to 1.

CR0 (if R c = 1 )

SO OV (if OE=1) Special Registers Altered:

CR0 (bits 0:2 undefined in 64-bit mode) (if R c = 1 )

Programming Note SO OV (if OE=1)

The 64-bit unsigned remainder of dividing (RA) by

(RB) can be computed as follows. Programming Note

divdu RT,RA,RB # RT = quotient The 32-bit unsigned remainder of dividing

mulld RT,RT,RB # RT = quotient*divisor (RA) by (RB) can be computed as follows.

subf RT,RT,RA # RT = remainder 32:63 32:63

divwu RT,RA,RB # RT = quotient

mullw RT,RT,RB # RT = quotient*divisor

subf RT,RT,RA # RT = remainder

Chapter 3. Fixed-Point Processor 57

Version 2.01

3.3.9 Fixed-Point Compare Instructions

The fixed-point Compare instructions compare the The CR field is set as follows.

contents of register RA with (1) the sign-extended Bit Name Description

value of the SI field, (2) the zero-extended value of

the UI field, or (3) the contents of register RB. The 0 LT (RA) SI or (RB) (signed comparison)

<

comparison is signed for cmpi and cmp, and unsigned u

(RA) UI or (RB) (unsigned comparison)

<

for cmpli and cmpl. 1 GT (RA) SI or (RB) (signed comparison)

>

u

The L field controls whether the operands are treated (RA) UI or (RB) (unsigned comparison)

>

as 64-bit or 32-bit quantities, as follows: 2 EQ (RA) = SI, UI, or (RB)

L Operand length 3 SO Summary Overflow from the XER

0 32-bit operands

1 64-bit operands Extended mnemonics for compares

When the operands are treated as 32-bit signed quan-

tities, bit 32 of the register (RA or RB) is the sign bit. A set of extended mnemonics is provided so that

The Compare instructions set one bit in the leftmost compares can be coded with the operand length as

three bits of the designated CR field to 1, and the part of the mnemonic rather than as a numeric

other two to 0. XER is copied to bit 3 of the desig- operand. Some of these are shown as examples with

SO

nated CR field. the Compare instructions. See Appendix B, “Assem-

bler Extended Mnemonics” on page 143 for additional

extended mnemonics.

Compare Immediate D-form Compare X-form

cmpi BF,L,RA,SI cmp BF,L,RA,RB

11 BF / L RA SI 31 BF / L RA RB 0 /

0 6 9 10 11 16 31 0 6 9 10 11 16 21 31

← ←

if L = 0 then a ) if L = 0 then a )

EXTS((RA) EXTS((RA)

32:63 32:63

else a b )

(RA) EXTS((RB) 32:63

← else a

if a < EXTS(SI) then c (RA)

0b100 ←

← b (RB)

else if a > EXTS(SI) then c 0b010

← ←

if a < b then c

0b001 0b100

else c

← ←

CR c XER else if a > b then c 0b010

||

× ×

4 BF:4 B F + 3 SO ← 0b001

else c

The contents of register RA ((RA) sign-extended ←

32:63 CR c XER

||

to 64 bits if L = 0 ) are compared with the sign- × ×

4 BF:4 B F + 3 SO

extended value of the SI field, treating the operands The contents of register RA ((RA) if L = 0 ) are

32:63

as signed integers. The result of the comparison is if

compared with the contents of register RB ((RB) 32:63

placed into CR field BF. L = 0 ) , treating the operands as signed integers. The

result of the comparison is placed into CR field BF.

Special Registers Altered:

CR field BF Special Registers Altered:

CR field BF

Extended Mnemonics: Extended Mnemonics:

Examples of extended mnemonics for Compare Imme-

diate: Examples of extended mnemonics for Compare:

Extended: Equivalent to: Extended: Equivalent to:

cmpdi Rx,value cmpi 0,1,Rx,value cmpd Rx,Ry cmp 0,1,Rx,Ry

cmpwi cr3,Rx,value cmpi 3,0,Rx,value cmpw cr3,Rx,Ry cmp 3,0,Rx,Ry

58 PowerPC User Instruction Set Architecture

Version 2.01

Compare Logical Immediate D-form Compare Logical X-form

cmpli BF,L,RA,UI cmpl BF,L,RA,RB

10 BF / L RA UI 31 BF / L RA RB 32 /

0 6 9 10 11 16 31 0 6 9 10 11 16 21 31

← ←

32 32

0 0

if L = 0 then a (RA) if L = 0 then a (RA)

|| ||

32:63 32:63

← 32

0

else a b (RB)

(RA) || 32:63

u 48

0

( UI) then c else a

if a < (RA)

0b100

|| ←

u 48

0

( UI) then c (RB)

b

else if a > 0b010

|| ← ←

u b then c

if a <

0b001 0b100

else c

← ←

u

c b then c

CR XER else if a > 0b010

||

× ×

4 BF:4 B F + 3 SO ←

else c 0b001

The contents of register RA ((RA) zero-extended ←

32:63 c

CR XER

||

× ×

4 BF:4 B F + 3 SO

48

0

to 64 bits if L = 0 ) are compared with UI, treating

|| The contents of register RA ((RA) if L = 0 ) are

the operands as unsigned integers. The result of the 32:63

compared with the contents of register RB ((RB) if

comparison is placed into CR field BF. 32:63

L = 0 ) , treating the operands as unsigned integers.

The result of the comparison is placed into CR field

Special Registers Altered: BF.

CR field BF Special Registers Altered:

Extended Mnemonics: CR field BF

Examples of extended mnemonics for Compare

Logical Immediate: Extended Mnemonics:

Examples of extended mnemonics for Compare

Extended: Equivalent to: Logical:

cmpldi Rx,value cmpli 0,1,Rx,value

cmplwi cr3,Rx,value cmpli 3,0,Rx,value Extended: Equivalent to:

cmpld Rx,Ry cmpl 0,1,Rx,Ry

cmplw cr3,Rx,Ry cmpl 3,0,Rx,Ry

Chapter 3. Fixed-Point Processor 59

Version 2.01

3.3.10 Fixed-Point Trap Instructions

The Trap instructions are provided to test for a speci- TO Bit ANDed with Condition

fied set of conditions. If any of the conditions tested 0 Less Than, using signed comparison

by a Trap instruction are met, the system trap handler 1 Greater Than, using signed comparison

is invoked. If none of the tested conditions are met, 2 Equal

instruction execution continues normally. 3 Less Than, using unsigned comparison

4 Greater Than, using unsigned comparison

The contents of register RA are compared with either

the sign-extended value of the SI field or the contents Extended mnemonics for traps

of register RB, depending on the Trap instruction. For

tdi and td, the entire contents of RA (and RB) partic- A set of extended mnemonics is provided so that

ipate in the comparison; for twi and tw, only the con- traps can be coded with the condition as part of the

tents of the low-order 32 bits of RA (and RB) mnemonic rather than as a numeric operand. Some

participate in the comparison. of these are shown as examples with the Trap

instructions. See Appendix B, “Assembler Extended

This comparison results in five conditions which are Mnemonics” on page 143 for additional extended

ANDed with TO. If the result is not 0 the system trap mnemonics.

handler is invoked. These conditions are as follows.

Trap Doubleword Immediate D-form Trap Word Immediate D-form

tdi TO,RA,SI twi TO,RA,SI

POWER mnemonic: ti]

[

2 TO RA SI 3 TO RA SI

0 6 11 16 31 0 6 11 16 31

a (RA) then TRAP

if (a < EXTS(SI)) & TO ←

0 a )

EXTS((RA) 32:63

if (a > EXTS(SI)) & TO then TRAP

1 if (a < EXTS(SI)) & TO then TRAP

0

if (a = EXTS(SI)) & TO then TRAP

2 if (a > EXTS(SI)) & TO then TRAP

1

u

if (a < EXTS(SI)) & TO then TRAP

3 if (a = EXTS(SI)) & TO then TRAP

2

u

if (a > EXTS(SI)) & TO then TRAP u

4 if (a < EXTS(SI)) & TO then TRAP

3

u

if (a > EXTS(SI)) & TO then TRAP

The contents of register RA are compared with the 4

sign-extended value of the SI field. If any bit in the The contents of RA are compared with the sign-

32:63

TO field is set to 1 and its corresponding condition is extended value of the SI field. If any bit in the TO

met by the result of the comparison, the system trap field is set to 1 and its corresponding condition is met

handler is invoked. by the result of the comparison, the system trap

handler is invoked.

Special Registers Altered:

None Special Registers Altered:

None

Extended Mnemonics: Extended Mnemonics:

Examples of extended mnemonics for Trap

Doubleword Immediate: Examples of extended mnemonics for Trap Word

Immediate:

Extended: Equivalent to:

tdlti Rx,value tdi 16,Rx,value Extended: Equivalent to:

tdnei Rx,value tdi 24,Rx,value twgti Rx,value twi 8,Rx,value

twllei Rx,value twi 6,Rx,value

60 PowerPC User Instruction Set Architecture

Version 2.01

Trap Doubleword X-form Trap Word X-form

td TO,RA,RB tw TO,RA,RB

POWER mnemonic: t]

[

31 TO RA RB 68 / 31 TO RA RB 4 /

0 6 11 16 21 31 0 6 11 16 21 31

a (RA)

b (RB) ← )

a EXTS((RA) 32:63

then TRAP

if (a < b) & TO ←

0 b )

EXTS((RB) 32:63

if (a > b) & TO then TRAP

1 if (a < b) & TO then TRAP

0

if (a = b) & TO then TRAP

2 if (a > b) & TO then TRAP

1

u

if (a < b) & TO then TRAP

3 if (a = b) & TO then TRAP

2

u

if (a > b) & TO then TRAP u

4 if (a < b) & TO then TRAP

3

u b) & TO then TRAP

if (a >

The contents of register RA are compared with the 4

contents of register RB. If any bit in the TO field is The contents of RA are compared with the con-

32:63

set to 1 and its corresponding condition is met by the . If any bit in the TO field is set to 1

tents of RB 32:63

result of the comparison, the system trap handler is and its corresponding condition is met by the result of

invoked. the comparison, the system trap handler is invoked.

Special Registers Altered: Special Registers Altered:

None None

Extended Mnemonics: Extended Mnemonics:

Examples of extended mnemonics for Trap Examples of extended mnemonics for Trap Word:

Doubleword: Extended: Equivalent to:

Extended: Equivalent to: tweq Rx,Ry tw 4,Rx,Ry

tdge Rx,Ry td 12,Rx,Ry twlge Rx,Ry tw 5,Rx,Ry

tdlnl Rx,Ry td 5,Rx,Ry trap tw 31,0,0

Chapter 3. Fixed-Point Processor 61

Version 2.01

3.3.11 Fixed-Point Logical Instructions Extended mnemonics for logical

The Logical instructions perform bit-parallel oper-

ations on 64-bit operands. operations

The X-form Logical instructions with R c = 1 , and the An extended mnemonic is provided that generates the

D-form Logical instructions andi. and andis., set the preferred form of “no-op” (an instruction that does

first three bits of CR Field 0 as described in Section nothing). This is shown as an example with the OR

3.3.7, “Other Fixed-Point Instructions” on page 48. Immediate instruction.

The Logical instructions do not change the SO, OV,

and CA bits in the XER. Extended mnemonics are provided that use the OR

and NOR instructions to copy the contents of one reg-

ister to another, with and without complementing.

These are shown as examples with the two

instructions.

See Appendix B, “Assembler Extended Mnemonics”

on page 143 for additional extended mnemonics.

AND Immediate D-form AND Immediate Shifted D-form

andi. RA,RS,UI andis. RA,RS,UI

POWER mnemonic: andil.] POWER mnemonic: andiu.]

[ [

28 RS RA UI 29 RS RA UI

0 6 11 16 31 0 6 11 16 31

← ←

48 32 16

RA 0 RA 0 0)

(RS) & ( UI) (RS) & ( UI

|| || ||

48 32

The contents of register RS are ANDed with 0 UI The contents of register RS are ANDed with 0 UI

|| ||

16

and the result is placed into register RA. 0 and the result is placed into register RA.

||

Special Registers Altered: Special Registers Altered:

CR0 CR0

62 PowerPC User Instruction Set Architecture

Version 2.01

OR Immediate D-form OR Immediate Shifted D-form

ori RA,RS,UI oris RA,RS,UI

POWER mnemonic: oril] POWER mnemonic: oriu]

[ [

24 RS RA UI 25 RS RA UI

0 6 11 16 31 0 6 11 16 31

← ←

48 32 16

0 0 0)

RA RA

(RS) | ( UI) (RS) | ( UI

|| || ||

48 32

The contents of register RS are ORed with 0 The contents of register RS are ORed with 0

UI UI

|| || ||

16

0 and the result is placed into register RA.

and the result is placed into register RA. Special Registers Altered:

The preferred “no-op” (an instruction that does None

nothing) is:

ori 0,0,0

Special Registers Altered:

None

Extended Mnemonics:

Example of extended mnemonics for OR Immediate:

Extended: Equivalent to:

nop ori 0,0,0

XOR Immediate D-form XOR Immediate Shifted D-form

xori RA,RS,UI xoris RA,RS,UI

POWER mnemonic: xoril] POWER mnemonic: xoriu]

[ [

26 RS RA UI 27 RS RA UI

0 6 11 16 31 0 6 11 16 31

← ←

⊕ ⊕

48 32 16

RA 0 RA 0 0)

(RS) ( UI) (RS) ( UI

|| || ||

48 32

The contents of register RS are XORed with 0 The contents of register RS are XORed with 0

UI UI

|| ||

16

0 and the result is placed into register RA.

and the result is placed into register RA. ||

Special Registers Altered: Special Registers Altered:

None None

Chapter 3. Fixed-Point Processor 63

Version 2.01

AND X-form OR X-form

and RA,RS,RB (Rc=0) or RA,RS,RB (Rc=0)

and. RA,RS,RB (Rc=1) or. RA,RS,RB (Rc=1)

31 RS RA RB 28 Rc 31 RS RA RB 444 Rc

0 6 11 16 21 31 0 6 11 16 21 31

← ←

RA RA

(RS) & (RB) (RS) | (RB)

The contents of register RS are ANDed with the con- The contents of register RS are ORed with the con-

tents of register RB and the result is placed into reg- tents of register RB and the result is placed into reg-

ister RA. ister RA.

Special Registers Altered: Special Registers Altered:

CR0 (if R c = 1 ) CR0 (if R c = 1 )

Extended Mnemonics:

Example of extended mnemonics for OR:

Extended: Equivalent to:

mr Rx,Ry or Rx,Ry,Ry

XOR X-form NAND X-form

xor RA,RS,RB (Rc=0) nand RA,RS,RB (Rc=0)

xor. RA,RS,RB (Rc=1) nand. RA,RS,RB (Rc=1)

31 RS RA RB 316 Rc 31 RS RA RB 476 Rc

0 6 11 16 21 31 0 6 11 16 21 31

← ←

RA RA

(RS) (RB) ¬((RS) & (RB))

The contents of register RS are XORed with the con- The contents of register RS are ANDed with the con-

tents of register RB and the result is placed into reg- tents of register RB and the complemented result is

ister RA. placed into register RA.

Special Registers Altered: Special Registers Altered:

CR0 (if R c = 1 ) CR0 (if R c = 1 )

Programming Note

nand or nor with R S = R B can be used to obtain

the one's complement.

64 PowerPC User Instruction Set Architecture

Version 2.01

NOR X-form Equivalent X-form

nor RA,RS,RB (Rc=0) eqv RA,RS,RB (Rc=0)

nor. RA,RS,RB (Rc=1) eqv. RA,RS,RB (Rc=1)

31 RS RA RB 124 Rc 31 RS RA RB 284 Rc

0 6 11 16 21 31 0 6 11 16 21 31

← ← ≡

RA RA

¬((RS) | (RB)) (RS) (RB)

The contents of register RS are ORed with the con- The contents of register RS are XORed with the con-

tents of register RB and the complemented result is tents of register RB and the complemented result is

placed into register RA. placed into register RA.

Special Registers Altered: Special Registers Altered:

CR0 (if R c = 1 ) CR0 (if R c = 1 )

Extended Mnemonics:

Example of extended mnemonics for NOR:

Extended: Equivalent to:

not Rx,Ry nor Rx,Ry,Ry

AND with Complement X-form OR with Complement X-form

andc RA,RS,RB (Rc=0) orc RA,RS,RB (Rc=0)

andc. RA,RS,RB (Rc=1) orc. RA,RS,RB (Rc=1)

31 RS RA RB 60 Rc 31 RS RA RB 412 Rc

0 6 11 16 21 31 0 6 11 16 21 31

← ←

RA RA

(RS) & ¬(RB) (RS) | ¬(RB)

The contents of register RS are ANDed with the com- The contents of register RS are ORed with the com-

plement of the contents of register RB and the result plement of the contents of register RB and the result

is placed into register RA. is placed into register RA.

Special Registers Altered: Special Registers Altered:

CR0 (if R c = 1 ) CR0 (if R c = 1 )

Chapter 3. Fixed-Point Processor 65

Version 2.01

Extend Sign Byte X-form Extend Sign Halfword X-form

extsb RA,RS (Rc=0) extsh RA,RS (Rc=0)

extsb. RA,RS (Rc=1) extsh. RA,RS (Rc=1)

POWER mnemonics: exts, exts.]

[

31 RS RA /// 954 Rc 31 RS RA /// 922 Rc

0 6 11 16 21 31 0 6 11 16 21 31

s (RS) 56

RA (RS) ←

56:63 56:63 s (RS)

← 48

56

s

RA ←

0:55 RA (RS)

48:63 48:63

← 48

RA s

. Bit 56 of register RS

(RS) are placed into RA 0:47

56:63 56:63

is placed into RA . . Bit 48 of register RS

(RS) are placed into RA

0:55 48:63 48:63

is placed into RA .

0:47

Special Registers Altered:

CR0 (if R c = 1 ) Special Registers Altered:

CR0 (if R c = 1 )

Extend Sign Word X-form

extsw RA,RS (Rc=0)

extsw. RA,RS (Rc=1)

31 RS RA /// 986 Rc

0 6 11 16 21 31

s (RS) 32

RA (RS)

32:63 32:63

← 32

RA s

0:31

(RS) are placed into RA . Bit 32 of register RS

32:63 32:63

is placed into RA .

0:31

Special Registers Altered:

CR0 (if R c = 1 )

66 PowerPC User Instruction Set Architecture

Version 2.01

Count Leading Zeros Doubleword Count Leading Zeros Word X-form

X-form cntlzw RA,RS (Rc=0)

cntlzw. RA,RS (Rc=1)

cntlzd RA,RS (Rc=0)

cntlzd. RA,RS (Rc=1) POWER mnemonics: cntlz, cntlz.]

[

31 RS RA /// 58 Rc 31 RS RA /// 26 Rc

0 6 11 16 21 31 0 6 11 16 21 31

← ←

n 0 n 32

do while n < 64 do while n < 64

if (RS) = 1 then leave = 1 then leave

if (RS)

n n

← ←

n n + 1 n n + 1

← ← −

RA n RA n 32

A count of the number of consecutive zero bits A count of the number of consecutive zero bits

starting at bit 0 of register RS is placed into RA. This starting at bit 32 of register RS is placed into RA.

number ranges from 0 to 64, inclusive. This number ranges from 0 to 32, inclusive.

If R c = 1 , CR Field 0 is set to reflect the result. If R c = 1 , CR Field 0 is set to reflect the result.

Special Registers Altered: Special Registers Altered:

CR0 (if R c = 1 ) CR0 (if R c = 1 )

Programming Note

For both Count Leading Zeros instructions, if

R c = 1 then LT is set to 0 in CR Field 0.

Chapter 3. Fixed-Point Processor 67

Version 2.01

3.3.12 Fixed-Point Rotate and Shift Instructions

Extended mnemonics for rotates and

The Fixed-Point Processor performs rotation oper-

ations on data from a GPR and returns the result, or a shifts

portion of the result, to a GPR. The Rotate and Shift instructions, while powerful, can

The rotation operations rotate a 64-bit quantity left by be complicated to code (they have up to five oper-

a specified number of bit positions. Bits that exit from ands). A set of extended mnemonics is provided that

position 0 enter at position 63. allow simpler coding of often-used functions such as

clearing the leftmost or rightmost bits of a register,

Two types of rotation operation are supported. left justifying or right justifying an arbitrary field, and

performing simple rotates and shifts. Some of these

For the first type, denoted rotate or ROTL , the

64 64 are shown as examples with the Rotate instructions.

value rotated is the given 64-bit value. The rotate 64 See Appendix B, “Assembler Extended Mnemonics”

operation is used to rotate a given 64-bit quantity. on page 143 for additional extended mnemonics.

For the second type, denoted rotate , the

or ROTL

32 32 3.3.12.1 Fixed-Point Rotate Instructions

value rotated consists of two copies of bits 32:63 of

the given 64-bit value, one copy in bits 0:31 and the These instructions rotate the contents of a register.

other in bits 32:63. The rotate operation is used to

32 The result of the rotation is

rotate a given 32-bit quantity. ■ inserted into the target register under control of a

The Rotate and Shift instructions employ a mask gen- mask (if a mask bit is 1 the associated bit of the

erator. The mask is 64 bits long, and consists of rotated data is placed into the target register,

1-bits from a start bit, mstart, through and including a and if the mask bit is 0 the associated bit in the

stop bit, mstop, and 0-bits elsewhere. The values of target register remains unchanged); or

mstart and mstop range from 0 to 63. If mstart > ■ ANDed with a mask before being placed into the

mstop, the 1-bits wrap around from position 63 to target register.

position 0. Thus the mask is formed as follows:

if mstart mstop then The Rotate Left instructions allow right-rotation of the

= ones

mask contents of a register to be performed (in concept) by

mstart:mstop

mask = zeros a left-rotation of 64− n, where n is the number of bits

all other bits

else by which to rotate right. They allow right-rotation of

mask = ones the contents of the low-order 32 bits of a register to

mstart:63

mask = ones be performed (in concept) by a left-rotation of 32− n,

0:mstop

mask = zeros where n is the number of bits by which to rotate right.

all other bits

There is no way to specify an all-zero mask.

operation, the

For instructions that use the rotate 32

mask start and stop positions are always in the low-

order 32 bits of the mask.

The use of the mask is described in following

sections.

The Rotate and Shift instructions with R c = 1 set the

first three bits of CR field 0 as described in Section

3.3.7, “Other Fixed-Point Instructions” on page 48.

Rotate and Shift instructions do not change the OV

and SO bits. Rotate and Shift instructions, except

algebraic right shifts, do not change the CA bit.

68 PowerPC User Instruction Set Architecture

Version 2.01

Rotate Left Doubleword Immediate then Rotate Left Doubleword Immediate then

Clear Left MD-form Clear Right MD-form

rldicl RA,RS,SH,MB (Rc=0) rldicr RA,RS,SH,ME (Rc=0)

rldicl. RA,RS,SH,MB (Rc=1) rldicr. RA,RS,SH,ME (Rc=1)

30 RS RA sh mb 0 shRc 30 RS RA sh me 1 shRc

0 6 11 16 21 27 30 31 0 6 11 16 21 27 30 31

← ←

n sh n sh

sh sh

|| ||

5 0:4 5 0:4

← ←

((RS), n) ((RS), n)

r r

ROTL ROTL

64 64

← ←

b e

mb mb me me

|| ||

5 0:4 5 0:4

← ←

m m

MASK(b, 63) MASK(0, e)

← ←

RA RA

r & m r & m

The contents of register RS are rotated left SH bits. The contents of register RS are rotated left SH bits.

64 64

A mask is generated having 1-bits from bit MB A mask is generated having 1-bits from bit 0 through

bit ME and 0-bits elsewhere. The rotated data are

through bit 63 and 0-bits elsewhere. The rotated data ANDed with the generated mask and the result is

are ANDed with the generated mask and the result is

placed into register RA. placed into register RA.

Special Registers Altered: Special Registers Altered:

CR0 (if R c = 1 ) CR0 (if R c = 1 )

Extended Mnemonics: Extended Mnemonics:

Examples of extended mnemonics for Rotate Left Examples of extended mnemonics for Rotate Left

Doubleword Immediate then Clear Left: Doubleword Immediate then Clear Right:

Extended: Equivalent to: Extended: Equivalent to:

extrdi Rx,Ry,n,b rldicl Rx,Ry,b+n,64− extldi Rx,Ry,n,b rldicr Rx,Ry,b,n−

n 1

srdi Rx,Ry,n rldicl Rx,Ry,64− n,n sldi Rx,Ry,n rldicr Rx,Ry,n,63− n

n

clrldi Rx,Ry,n rldicl Rx,Ry,0,n clrrdi Rx,Ry,n rldicr Rx,Ry,0,63−

Programming Note Programming Note

rldicl rldicr

can be used to extract an n-bit field that can be used to extract an n-bit field that

starts at bit position b in register RS, right- starts at bit position b in register RS, left-justified

justified into register RA (clearing the remaining into register RA (clearing the remaining 64− n bits

64− n bits of RA), by setting S H = b + n and of RA), by setting S H = b and M E = n 1. It can be

− n. It can be used to rotate the contents

M B = 6 4 used to rotate the contents of a register left

of a register left (right) by n bits, by setting S H = n n) and

(right) by n bits, by setting S H = n (64−

n) and M B = 0 . It can be used to shift the

(64− ME=63. It can be used to shift the contents of a

contents of a register right by n bits, by setting register left by n bits, by setting S H = n and

SH=64− n and M B = n . It can be used to clear the n. It can be used to clear the low-order

ME=63−

high-order n bits of a register, by setting S H = 0 n bits of a register, by setting S H = 0 and

and M B = n . ME=63− n.

Extended mnemonics are provided for all of these Extended mnemonics are provided for all of these

uses; see Appendix B, “Assembler Extended uses (some devolve to rldicl); see Appendix B,

Mnemonics” on page 143. “Assembler Extended Mnemonics” on page 143.

Chapter 3. Fixed-Point Processor 69

Version 2.01

Rotate Left Doubleword Immediate then Rotate Left Word Immediate then AND

Clear MD-form with Mask M-form

rldic RA,RS,SH,MB (Rc=0) rlwinm RA,RS,SH,MB,ME (Rc=0)

rldic. RA,RS,SH,MB (Rc=1) rlwinm. RA,RS,SH,MB,ME (Rc=1)

POWER mnemonics: rlinm, rlinm.]

[

30 RS RA sh mb 2 shRc 21 RS RA SH MB ME Rc

0 6 11 16 21 27 30 31 0 6 11 16 21 26 31

n sh

sh ||

5 0:4

← ((RS), n)

r ROTL ←

n SH

64

b mb mb ←

|| ((RS) , n)

r ROTL

5 0:4

← 32 32:63

m MASK(b, ¬n) ←

m MASK(MB+32, ME+32)

RA r & m ←

RA r & m

The contents of register RS are rotated left SH bits.

The contents of register RS are rotated left SH bits. 32

64 A mask is generated having 1-bits from bit M B + 3 2

A mask is generated having 1-bits from bit MB through bit ME+32 and 0-bits elsewhere. The rotated

through bit 63− SH and 0-bits elsewhere. The rotated data are ANDed with the generated mask and the

data are ANDed with the generated mask and the result is placed into register RA.

result is placed into register RA. Special Registers Altered:

Special Registers Altered: CR0 (if R c = 1 )

CR0 (if R c = 1 ) Extended Mnemonics:

Extended Mnemonics: Examples of extended mnemonics for Rotate Left

Example of extended mnemonics for Rotate Left Word Immediate then AND with Mask:

Doubleword Immediate then Clear: Extended: Equivalent to:

Extended: Equivalent to: extlwi Rx,Ry,n,b rlwinm Rx,Ry,b,0,n− 1

clrlsldi Rx,Ry,b,n rldic Rx,Ry,n,b− n n,n,31

srwi Rx,Ry,n rlwinm Rx,Ry,32−

clrrwi Rx,Ry,n rlwinm Rx,Ry,0,0,31− n

Programming Note

rldic can be used to clear the high-order b bits of

the contents of a register and then shift the result

left by n bits, by setting S H = n and M B = b n. It

can be used to clear the high-order n bits of a

register, by setting S H = 0 and M B = n .

Extended mnemonics are provided for both of

rldicl); see

these uses (the second devolves to

Appendix B, “Assembler Extended Mnemonics”

on page 143.

70 PowerPC User Instruction Set Architecture

Version 2.01 Rotate Left Doubleword then Clear Left

Programming Note MDS-form

Let RSL represent the low-order 32 bits of reg-

ister RS, with the bits numbered from 0 through rldcl RA,RS,RB,MB (Rc=0)

31. rldcl. RA,RS,RB,MB (Rc=1)

rlwinm can be used to extract an n-bit field that

starts at bit position b in RSL, right-justified into 30 RS RA RB mb 8 Rc

the low-order 32 bits of register RA (clearing the 0 6 11 16 21 27 31

remaining 32− n bits of the low-order 32 bits of

RA), by setting S H = b + n , M B = 3 2 n, and

ME=31. It can be used to extract an n-bit field ←

n (RB) 58:63

that starts at bit position b in RSL, left-justified ← ((RS), n)

r ROTL 64

into the low-order 32 bits of register RA (clearing b mb mb

||

5 0:4

n bits of the low-order 32 bits of

the remaining 32− m MASK(b, 63)

− ←

RA), by setting S H = b , MB = 0, and M E = n 1. It RA r & m

can be used to rotate the contents of the low- The contents of register RS are rotated left the

order 32 bits of a register left (right) by n bits, by 64

. A mask is gen-

number of bits specified by (RB)

setting S H = n (32− n), M B = 0 , and ME=31. It can 58:63

erated having 1-bits from bit MB through bit 63 and

be used to shift the contents of the low-order 32 0-bits elsewhere. The rotated data are ANDed with

bits of a register right by n bits, by setting the generated mask and the result is placed into reg-

SH=32− n, M B = n , and ME=31. It can be used to ister RA.

clear the high-order b bits of the low-order 32 bits

of the contents of a register and then shift the Special Registers Altered:

result left by n bits, by setting S H = n , M B = b n, CR0 (if R c = 1 )

n. It can be used to clear the low-

and ME=31−

order n bits of the low-order 32 bits of a register, Extended Mnemonics:

by setting SH=0, M B = 0 , and ME=31− n. Example of extended mnemonics for Rotate Left

For all the uses given above, the high-order 32 Doubleword then Clear Left:

bits of register RA are cleared. Extended: Equivalent to:

Extended mnemonics are provided for all of these rotld Rx,Ry,Rz rldcl Rx,Ry,Rz,0

uses; see Appendix B, “Assembler Extended

Mnemonics” on page 143. Programming Note

rldcl can be used to extract an n-bit field that

starts at variable bit position b in register RS,

right-justified into register RA (clearing the

remaining 64− n bits of RA), by setting

= b + n and M B = 6 4

RB n. It can be used to

58:63

rotate the contents of a register left (right) by var-

= n (64− n) and

iable n bits, by setting RB 58:63

MB=0.

Extended mnemonics are provided for some of

these uses; see Appendix B, “Assembler Extended

Mnemonics” on page 143.

Chapter 3. Fixed-Point Processor 71

Version 2.01

Rotate Left Doubleword then Clear Right Rotate Left Word then AND with Mask

MDS-form M-form

rldcr RA,RS,RB,ME (Rc=0) rlwnm RA,RS,RB,MB,ME (Rc=0)

rldcr. RA,RS,RB,ME (Rc=1) rlwnm. RA,RS,RB,MB,ME (Rc=1)

POWER mnemonics: rlnm, rlnm.]

[

30 RS RA RB me 9 Rc 23 RS RA RB MB ME Rc

0 6 11 16 21 27 31 0 6 11 16 21 26 31

n (RB) 58:63

← ((RS), n)

r ROTL ←

n (RB)

64

← 59:63

e me me ←

|| ((RS) , n)

r ROTL

5 0:4

← 32 32:63

m MASK(0, e) ←

m MASK(MB+32, ME+32)

RA r & m ←

RA r & m

The contents of register RS are rotated left the

64 The contents of register RS are rotated left the

32

. A mask is gen-

number of bits specified by (RB) 58:63 . A mask is gen-

number of bits specified by (RB) 59:63

erated having 1-bits from bit 0 through bit ME and erated having 1-bits from bit M B + 3 2 through bit

0-bits elsewhere. The rotated data are ANDed with ME+32 and 0-bits elsewhere. The rotated data are

the generated mask and the result is placed into reg- ANDed with the generated mask and the result is

ister RA. placed into register RA.

Special Registers Altered: Special Registers Altered:

CR0 (if R c = 1 ) CR0 (if R c = 1 )

Programming Note Extended Mnemonics:

rldcr can be used to extract an n-bit field that Example of extended mnemonics for Rotate Left Word

starts at variable bit position b in register RS, left- then AND with Mask:

justified into register RA (clearing the remaining

64− n bits of RA), by setting RB = b and Extended: Equivalent to:

58:63

− 1. It can be used to rotate the contents of

M E = n rotlw Rx,Ry,Rz rlwnm Rx,Ry,Rz,0,31

a register left (right) by variable n bits, by setting

RB = n (64− n) and ME=63. Programming Note

58:63 Let RSL represent the low-order 32 bits of reg-

Extended mnemonics are provided for some of ister RS, with the bits numbered from 0 through

these uses (some devolve to rldcl); see 31.

Appendix B, “Assembler Extended Mnemonics”

on page 143. rlwnm can be used to extract an n-bit field that

starts at variable bit position b in RSL, right-

justified into the low-order 32 bits of register RA

(clearing the remaining 32− n bits of the low-order

= b + n ,

32 bits of RA), by setting RB 59:63

M B = 3 2 n, and ME=31. It can be used to extract

an n-bit field that starts at variable bit position b

in RSL, left-justified into the low-order 32 bits of

register RA (clearing the remaining 32− n bits of

= b ,

the low-order 32 bits of RA), by setting RB 59:63

MB = 0, and M E = n 1. It can be used to rotate

the contents of the low-order 32 bits of a register

= n

left (right) by variable n bits, by setting RB 59:63

(32− n), M B = 0 , and ME=31.

For all the uses given above, the high-order 32

bits of register RA are cleared.

Extended mnemonics are provided for some of

these uses; see Appendix B, “Assembler Extended

Mnemonics” on page 143.

72 PowerPC User Instruction Set Architecture

Version 2.01

Rotate Left Doubleword Immediate then Rotate Left Word Immediate then Mask

Mask Insert MD-form Insert M-form

rldimi RA,RS,SH,MB (Rc=0) rlwimi RA,RS,SH,MB,ME (Rc=0)

rldimi. RA,RS,SH,MB (Rc=1) rlwimi. RA,RS,SH,MB,ME (Rc=1)

POWER mnemonics: rlimi, rlimi.]

[

30 RS RA sh mb 3 shRc 20 RS RA SH MB ME Rc

0 6 11 16 21 27 30 31 0 6 11 16 21 26 31

n sh

sh ||

5 0:4

← ((RS), n)

r ROTL ←

n SH

64

b mb mb ←

|| ((RS) , n)

r ROTL

5 0:4

← 32 32:63

m MASK(b, ¬n) ←

m MASK(MB+32, ME+32)

RA r&m | (RA)&¬m ←

RA r&m | (RA)&¬m

The contents of register RS are rotated left SH bits.

The contents of register RS are rotated left SH bits. 32

64 A mask is generated having 1-bits from bit M B + 3 2

A mask is generated having 1-bits from bit MB through bit ME+32 and 0-bits elsewhere. The rotated

through bit 63− SH and 0-bits elsewhere. The rotated data are inserted into register RA under control of the

data are inserted into register RA under control of the generated mask.

generated mask. Special Registers Altered:

Special Registers Altered: CR0 (if R c = 1 )

CR0 (if R c = 1 ) Extended Mnemonics:

Extended Mnemonics: Example of extended mnemonics for Rotate Left Word

Example of extended mnemonics for Rotate Left Immediate then Mask Insert:

Doubleword Immediate then Mask Insert: Extended: Equivalent to:

Extended: Equivalent to: inslwi Rx,Ry,n,b rlwimi Rx,Ry,32− b,b,b+n− 1

insrdi Rx,Ry,n,b rldimi Rx,Ry,64− (b+n),b Programming Note

Programming Note Let RAL represent the low-order 32 bits of reg-

rldimi can be used to insert an n-bit field that is ister RA, with the bits numbered from 0 through

right-justified in register RS, into register RA 31.

starting at bit position b, by setting

( b + n ) and M B = b .

SH=64− rlwimi can be used to insert an n-bit field that is

left-justified in the low-order 32 bits of register

An extended mnemonic is provided for this use; see RS, into RAL starting at bit position b, by setting

Appendix B, “Assembler Extended Mnemonics” on −

b, M B = b , and M E = ( b + n )

SH=32− 1. It can be

page 143. used to insert an n-bit field that is right-justified in

the low-order 32 bits of register RS, into RAL

starting at bit position b, by setting

(b+n), M B = b , and M E = ( b + n ) 1.

SH=32−

Extended mnemonics are provided for both of

these uses; see Appendix B, “Assembler Extended

Mnemonics” on page 143.

Chapter 3. Fixed-Point Processor 73

Version 2.01

3.3.12.2 Fixed-Point Shift Instructions

The instructions in this section perform left and right Programming Note

shifts. Any Shift Right Algebraic instruction, followed by

n . The

addze, can be used to divide quickly by 2

Extended mnemonics for shifts setting of the CA bit by the Shift Right Algebraic

instructions is independent of mode.

Immediate-form logical (unsigned) shift operations are

obtained by specifying appropriate masks and shift

values for certain Rotate instructions. A set of Programming Note

extended mnemonics is provided to make coding of Multiple-precision shifts can be programmed as

such shifts simpler and easier to understand. Some shown in Section C.1, “Multiple-Precision Shifts”

of these are shown as examples with the Rotate on page 155.

instructions. See Appendix B, “Assembler Extended

Mnemonics” on page 143 for additional extended

mnemonics.

Shift Left Doubleword X-form Shift Left Word X-form

sld RA,RS,RB (Rc=0) slw RA,RS,RB (Rc=0)

sld. RA,RS,RB (Rc=1) slw. RA,RS,RB (Rc=1)

POWER mnemonics: sl, sl.]

[

31 RS RA RB 27 Rc 31 RS RA RB 24 Rc

0 6 11 16 21 31 0 6 11 16 21 31

n (RB) 58:63

r ((RS), n)

ROTL ←

64 n (RB) 59:63

if (RB) = 0 then ←

57 r ((RS) , n)

ROTL

← 32 32:63

m MASK(0, 63 n) if (RB) = 0 then

← 58

64

0

else m ← −

m MASK(32, 63 n)

RA r & m ← 64

0

else m

RA r & m

The contents of register RS are shifted left the

number of bits specified by (RB) . Bits shifted out The contents of the low-order 32 bits of register RS

57:63

of position 0 are lost. Zeros are supplied to the are shifted left the number of bits specified by

vacated positions on the right. The result is placed (RB) . Bits shifted out of position 32 are lost.

58:63

into register RA. Shift amounts from 64 to 127 give a Zeros are supplied to the vacated positions on the

zero result. right. The 32-bit result is placed into RA . RA

32:63 0:31

are set to zero. Shift amounts from 32 to 63 give a

Special Registers Altered: zero result.

CR0 (if R c = 1 ) Special Registers Altered:

CR0 (if R c = 1 )

74 PowerPC User Instruction Set Architecture

Version 2.01

Shift Right Doubleword X-form Shift Right Word X-form

srd RA,RS,RB (Rc=0) srw RA,RS,RB (Rc=0)

srd. RA,RS,RB (Rc=1) srw. RA,RS,RB (Rc=1)

POWER mnemonics: sr, sr.]

[

31 RS RA RB 539 Rc 31 RS RA RB 536 Rc

0 6 11 16 21 31 0 6 11 16 21 31

n (RB) 58:63

← −

((RS), 64

r n)

ROTL ←

64 n (RB) 59:63

if (RB) = 0 then ← −

57 ((RS) , 64

r n)

ROTL

← 32 32:63

m MASK(n, 63) = 0 then

if (RB)

← 58

64

0

else m ←

m MASK(n+32, 63)

RA r & m ← 64

0

else m

RA r & m

The contents of register RS are shifted right the

. Bits shifted out

number of bits specified by (RB) The contents of the low-order 32 bits of register RS

57:63

of position 63 are lost. Zeros are supplied to the are shifted right the number of bits specified by

vacated positions on the left. The result is placed into (RB) . Bits shifted out of position 63 are lost.

58:63

register RA. Shift amounts from 64 to 127 give a zero Zeros are supplied to the vacated positions on the

result. . RA

left. The 32-bit result is placed into RA 32:63 0:31

are set to zero. Shift amounts from 32 to 63 give a

Special Registers Altered: zero result.

CR0 (if R c = 1 ) Special Registers Altered:

CR0 (if R c = 1 )

Chapter 3. Fixed-Point Processor 75

Version 2.01

Shift Right Algebraic Doubleword Shift Right Algebraic Word Immediate

Immediate XS-form X-form

sradi RA,RS,SH (Rc=0) srawi RA,RS,SH (Rc=0)

sradi. RA,RS,SH (Rc=1) srawi. RA,RS,SH (Rc=1)

POWER mnemonics: srai, srai.]

[

31 RS RA sh 413 sh Rc 31 RS RA SH 824 Rc

0 6 11 16 21 30 31 0 6 11 16 21 31

n sh

sh ||

5 0:4

← −

((RS), 64

r n)

ROTL ←

n SH

64

m MASK(n, 63) ← −

((RS) , 64 n)

r ROTL

← 32 32:63

s (RS) ←

m MASK(n+32, 63)

0

← 64

s)&¬m

RA r&m | ( ←

s (RS)

← 32

CA s & ((r&¬m)=/0) ← 64

s)&¬m

RA r&m | (

CA =/0)

s & ((r&¬m) 32:63

The contents of register RS are shifted right SH bits.

Bits shifted out of position 63 are lost. Bit 0 of RS is The contents of the low-order 32 bits of register RS

replicated to fill the vacated positions on the left. The are shifted right SH bits. Bits shifted out of position

result is placed into register RA. CA is set to 1 if (RS) 63 are lost. Bit 32 of RS is replicated to fill the

is negative and any 1-bits are shifted out of position vacated positions on the left. The 32-bit result is

63; otherwise CA is set to 0. A shift amount of zero placed into RA . Bit 32 of RS is replicated to fill

32:63

causes RA to be set equal to (RS), and CA to be set . CA is set to 1 if the low-order 32 bits of (RS)

RA 0:31

to 0. contain a negative number and any 1-bits are shifted

out of position 63; otherwise CA is set to 0. A shift

Special Registers Altered: amount of zero causes RA to receive EXTS((RS) ),

32:63

CA and CA to be set to 0.

CR0 (if R c = 1 ) Special Registers Altered:

CA

CR0 (if R c = 1 )

76 PowerPC User Instruction Set Architecture

Version 2.01

Shift Right Algebraic Doubleword Shift Right Algebraic Word X-form

X-form sraw RA,RS,RB (Rc=0)

sraw. RA,RS,RB (Rc=1)

srad RA,RS,RB (Rc=0)

srad. RA,RS,RB (Rc=1) POWER mnemonics: sra, sra.]

[

31 RS RA RB 794 Rc 31 RS RA RB 792 Rc

0 6 11 16 21 31 0 6 11 16 21 31

← ←

n (RB) n (RB)

58:63 59:63

← ←

− −

((RS), 64

r n)

ROTL ((RS) , 64

r n)

ROTL

64 32 32:63

if (RB) = 0 then = 0 then

if (RB)

57 58

← ←

m MASK(n, 63) m MASK(n+32, 63)

← ←

64 64

0

else m 0

else m

← ←

s (RS) s (RS)

0 32

← ←

64 64

s)&¬m

RA r&m | ( s)&¬m

RA r&m | (

← ←

CA s & ((r&¬m)=/0) CA =/0)

s & ((r&¬m) 32:63

The contents of register RS are shifted right the The contents of the low-order 32 bits of register RS

. Bits shifted out

number of bits specified by (RB) are shifted right the number of bits specified by

57:63

of position 63 are lost. Bit 0 of RS is replicated to fill (RB) . Bits shifted out of position 63 are lost. Bit

58:63

the vacated positions on the left. The result is placed 32 of RS is replicated to fill the vacated positions on

into register RA. CA is set to 1 if (RS) is negative and the left. The 32-bit result is placed into RA . Bit

32:63

any 1-bits are shifted out of position 63; otherwise CA . CA is set to 1 if

32 of RS is replicated to fill RA 0:31

is set to 0. A shift amount of zero causes RA to be the low-order 32 bits of (RS) contain a negative

set equal to (RS), and CA to be set to 0. Shift number and any 1-bits are shifted out of position 63;

amounts from 64 to 127 give a result of 64 sign bits in otherwise CA is set to 0. A shift amount of zero

RA, and cause CA to receive the sign bit of (RS). causes RA to receive EXTS((RS) ), and CA to be

32:63

set to 0. Shift amounts from 32 to 63 give a result of

Special Registers Altered: 64 sign bits, and cause CA to receive the sign bit of

CA (RS) .

32:63

CR0 (if R c = 1 ) Special Registers Altered:

CA

CR0 (if R c = 1 )

Chapter 3. Fixed-Point Processor 77

Version 2.01

3.3.13 Move To/From System Register Instructions

SPR name as part of the mnemonic rather than as a

The Move To Condition Register Fields instruction has numeric operand. An extended mnemonic is provided

a preferred form; see Section 1.9.1, “Preferred for the

Instruction Forms” on page 13. In the preferred form, mtcrf instruction for compatibility with old soft-

the FXM field satisfies the following rule. ware (written for a version of the architecture that

precedes Version 2.00) that uses it to set the entire

■ Exactly one bit of the FXM field is set to 1. Condition Register. Some of these extended mne-

monics are shown as examples with the relevant

Extended mnemonics instructions. See Appendix B, “Assembler Extended

Mnemonics” on page 143 for additional extended

mnemonics.

Extended mnemonics are provided for the mtspr and

mfspr instructions so that they can be coded with the Extended Mnemonics:

Move To Special Purpose Register Examples of extended mnemonics for Move To

XFX-form Special Purpose Register:

mtspr SPR,RS Extended: Equivalent to:

mtxer Rx mtspr 1,Rx

mtlr Rx mtspr 8,Rx

31 RS spr 467 / mtctr Rx mtspr 9,Rx

0 6 11 21 31 Compiler and Assembler Note

n spr spr

|| For the mtspr and mfspr instructions, the SPR

5:9 0:4

if length(SPREG(n)) = 64 then number coded in assembler language does not

SPREG(n) (RS) appear directly as a 10-bit binary number in the

else instruction. The number coded is split into two

SPREG(n) (RS) 5-bit halves that are reversed in the instruction,

32:63 with the high-order 5 bits appearing in bits 16:20

The SPR field denotes a Special Purpose Register, of the instruction and the low-order 5 bits in bits

encoded as shown in the table below. The contents of 11:15. This maintains compatibility with POWER

register RS are placed into the designated Special SPR encodings, in which these two instructions

Purpose Register. For Special Purpose Registers that have only a 5-bit SPR field occupying bits 11:15.

are 32 bits long, the low-order 32 bits of RS are

placed into the SPR. Compatibility Note

For a discussion of POWER compatibility with

*

SPR Register respect to SPR numbers not shown in the instruc-

decimal spr spr Name

5:9 0:4 mtspr and mfspr, see

tion descriptions for

Appendix E, “Incompatibilities with the POWER

1 00000 00001 XER Architecture” on page 163.

8 00000 01000 LR

9 00000 01001 CTR

* Note that the order of the two 5-bit

halves of the SPR number is reversed.

If the SPR field contains any value other than one of

the values shown above then one of the following

occurs.

■ The system illegal instruction error handler is

invoked.

■ The system privileged instruction error handler is

invoked.

■ The results are boundedly undefined.

A complete description of this instruction can be

found in Book III, PowerPC Operating Environment

Architecture.

Special Registers Altered:

See above

78 PowerPC User Instruction Set Architecture

Version 2.01

Move From Special Purpose Register

XFX-form

mfspr RT,SPR

31 RT spr 339 /

0 6 11 21 31

n spr

spr ||

5:9 0:4

if length(SPREG(n)) = 64 then

RT SPREG(n)

else ← 32

0

RT SPREG(n)

||

The SPR field denotes a Special Purpose Register,

encoded as shown in the table below. The contents of

the designated Special Purpose Register are placed

into register RT. For Special Purpose Registers that

are 32 bits long, the low-order 32 bits of RT receive

the contents of the Special Purpose Register and the

high-order 32 bits of RT are set to zero.

*

SPR Register

decimal spr spr Name

5:9 0:4

1 00000 00001 XER

8 00000 01000 LR

9 00000 01001 CTR

* Note that the order of the two 5-bit

halves of the SPR number is reversed.

If the SPR field contains any value other than one of

the values shown above then one of the following

occurs.

■ The system illegal instruction error handler is

invoked.

■ The system privileged instruction error handler is

invoked.

■ The results are boundedly undefined.

A complete description of this instruction can be

found in Book III, PowerPC Operating Environment

Architecture.

Special Registers Altered:

None

Extended Mnemonics:

Examples of extended mnemonics for Move From

Special Purpose Register:

Extended: Equivalent to:

mfxer Rx mfspr Rx,1

mflr Rx mfspr Rx,8

mfctr Rx mfspr Rx,9

Note

See the Notes that appear with mtspr. Chapter 3. Fixed-Point Processor 79

Version 2.01

Move To Condition Register Fields Move From Condition Register

XFX-form XFX-form

mtcrf FXM,RS mfcr RT

31 RS 0 FXM / 144 / 31 RT 0 /// 19 /

0 6 11 12 20 21 31 0 6 11 12 21 31

← ←

4 4 4 32

(FXM ) (FXM ) (FXM ) 0

mask ... RT CR

|| || ||

0 1 7

CR & mask) | (CR & ¬mask)

((RS) 32:63 The contents of the Condition Register are placed into

RT . RT are set to 0.

32:63 0:31

The contents of bits 32:63 of register RS are placed

into the Condition Register under control of the field Special Registers Altered:

mask specified by FXM. The field mask identifies the None

4-bit fields affected. Let i be an integer in the range

0-7. If FXM = 1 then CR field i (CR bits 4× i:4× i + 3 ) is

i

set to the contents of the corresponding field of the

low-order 32 bits of RS.

Special Registers Altered:

CR fields selected by mask

Extended Mnemonics:

Example of extended mnemonics for Move To Condi-

tion Register Fields:

Extended: Equivalent to:

mtcr Rx mtcrf 0xFF,Rx

Programming Note

In the preferred form of this instruction (see the

introduction to Section 3.3.13), only one Condition

Register field is updated.

80 PowerPC User Instruction Set Architecture

Version 2.01

Chapter 4. Floating-Point Processor

4.1 Floating-Point Processor Overview 81 4.4.5.1 Definition 94

. . . . . . . . . . . . .

4.4.5.2 Action

4.2 Floating-Point Processor Registers 82 94

. . . . . . . . . . . . . . .

4.5 Floating-Point Execution Models

4.2.1 Floating-Point Registers 94

82

. . . . . . . .

4.2.2 Floating-Point Status and Control 4.5.1 Execution Model for IEEE

Register Operations 95

83

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.5.2 Execution Model for Multiply-Add

4.3 Floating-Point Data 85

. . . . . . . . . .

4.3.1 Data Format Type Instructions

85 96

. . . . . . . . . . . . .

. . . . . . . . . . . . .

4.3.2 Value Representation 4.6 Floating-Point Processor

86

. . . . . . . Instructions

4.3.3 Sign of Result 87 97

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.3.4 Normalization and 4.6.1 Floating-Point Storage Access

87 97

Denormalization Instructions

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

88

4.3.5 Data Handling and Precision 4.6.1.1 Storage Access Exceptions . . .

. . . 97

4.6.2 Floating-Point Load Instructions

4.3.6 Rounding 89

. . . . . . . . . . . . . . .

4.6.3 Floating-Point Store Instructions 100

89

4.4 Floating-Point Exceptions . . . . . . 91 4.6.4 Floating-Point Move Instructions 104

4.4.1 Invalid Operation Exception . . . 4.6.5 Floating-Point Arithmetic

4.4.1.1 Definition 91

. . . . . . . . . . . . . Instructions

92 105

4.4.1.2 Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

92

4.4.2 Zero Divide Exception 4.6.5.1 Floating-Point Elementary

. . . . . . . 105

4.4.2.1 Definition 92 Arithmetic Instructions

. . . . . . . . . . . . . . . . . . . . .

4.6.5.2 Floating-Point Multiply-Add

4.4.2.2 Action 92

. . . . . . . . . . . . . . .

4.4.3 Overflow Exception Instructions

93 107

. . . . . . . . . . . . . . . . . . . . . . .

4.4.3.1 Definition 4.6.6 Floating-Point Rounding and

93

. . . . . . . . . . . . . 109

93 Conversion Instructions

4.4.3.2 Action . . . . . . . . . . . . . . . . . . . . . . .

4.6.7 Floating-Point Compare

93

4.4.4 Underflow Exception . . . . . . . . Instructions

4.4.4.1 Definition 93 113

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.4.4.2 Action 93 4.6.8 Floating-Point Status and Control

. . . . . . . . . . . . . . . 94 114

4.4.5 Inexact Exception Register Instructions

. . . . . . . . . . . . . . . . . .

This architecture specifies that the processor imple-

4.1 Floating-Point Processor ment a floating-point system as defined in ANSI/IEEE

Overview Standard 754-1985, “IEEE Standard for Binary

Floating-Point Arithmetic” (hereafter referred to as

“the IEEE standard”), but requires software support in

This chapter describes the registers and instructions order to conform fully with that standard. That

that make up the Floating-Point Processor facility. standard defines certain required “operations” (addi-

Section 4.2, “Floating-Point Processor Registers” on tion, subtraction, etc.); the term “floating-point opera-

page 82 describes the registers associated with the tion” is used in this chapter to refer to one of these

Floating-Point Processor. Section 4.6, “Floating-Point required operations, or to the operation performed by

Processor Instructions” on page 97 describes the one of the Multiply-Add or Reciprocal Estimate

instructions associated with the Floating-Point instructions. All floating-point operations conform to

Processor. that standard, except if software sets the Floating-

Point Non-IEEE Mode (NI) bit in the Floating-Point

Status and Control Register to 1 (see page 84), in

Chapter 4. Floating-Point Processor 81

Version 2.01

which case floating-point operations do not neces- Floating-Point Exceptions

sarily conform to that standard. The following floating-point exceptions are detected

Instructions are provided to perform arithmetic, by the processor:

rounding, conversion, comparison, and other oper- ■ Invalid Operation Exception (VX)

ations in floating-point registers; to move floating- SNaN (VXSNAN)

point data between storage and these registers; and Infinity− Infinity (VXISI)

to manipulate the Floating-Point Status and Control Infinity÷ Infinity (VXIDI)

Register explicitly. Zero÷ Zero (VXZDZ)

Infinity× Zero (VXIMZ)

These instructions are divided into two categories. Invalid Compare (VXVC)

■ computational instructions Software Request (VXSOFT)

The computational instructions are those that Invalid Square Root (VXSQRT)

perform addition, subtraction, multiplication, divi- Invalid Integer Convert (VXCVI)

sion, extracting the square root, rounding, con- Zero Divide Exception (ZX)

version, comparison, and combinations of these Overflow Exception (OX)

operations. These instructions provide the float- Underflow Exception (UX)

ing-point operations. They place status informa- Inexact Exception (XX)

tion into the Floating-Point Status and Control Each floating-point exception, and each category of

Register. They are the instructions described in Invalid Operation Exception, has an exception bit in

Sections 4.6.5 through 4.6.7 and Section 5.2.1. the FPSCR. In addition, each floating-point exception

■ non-computational instructions has a corresponding enable bit in the FPSCR. See

The non-computational instructions are those that Section 4.2.2, “Floating-Point Status and Control

perform loads and stores, move the contents of a Register” on page 83 for a description of these

floating-point register to another floating-point exception and enable bits, and Section 4.4, “Floating-

register possibly altering the sign, manipulate the Point Exceptions” on page 89 for a detailed dis-

Floating-Point Status and Control Register explic- cussion of floating-point exceptions, including the

itly, and select the value from one of two float- effects of the enable bits.

ing-point registers based on the value in a third

floating-point register. The operations performed

by these instructions are not considered floating- 4.2 Floating-Point Processor

point operations. With the exception of the Registers

instructions that manipulate the Floating-Point

Status and Control Register explicitly, they do not

alter the Floating-Point Status and Control Reg-

ister. They are the instructions described in 4.2.1 Floating-Point Registers

Sections 4.6.2 through 4.6.4, 4.6.8, and 5.2.2. Implementations of this architecture provide 32 float-

A floating-point number consists of a signed exponent ing-point registers (FPRs). The floating-point instruc-

and a signed significand. The quantity expressed by tion formats provide 5-bit fields for specifying the

this number is the product of the significand and the FPRs to be used in the execution of the instruction.

exponent . Encodings are provided in the data

number 2 The FPRs are numbered 0-31. See Figure 26 on

format to represent finite numeric values, Infinity,

± page 83.

and values that are “Not a Number” (NaN). Oper-

ations involving infinities produce results obeying tra- Each FPR contains 64 bits that support the floating-

ditional mathematical conventions. NaNs have no point double format. Every instruction that interprets

mathematical interpretation. Their encoding permits the contents of an FPR as a floating-point value uses

a variable diagnostic information field. They may be the floating-point double format for this interpretation.

used to indicate such things as uninitialized variables

and can be produced by certain invalid operations. The computational instructions, and the Move and

Select instructions, operate on data located in FPRs

There is one class of exceptional events that occur and, with the exception of the Compare instructions,

during instruction execution that is unique to the place the result value into an FPR and optionally

Floating-Point Processor: the Floating-Point Exception. place status information into the Condition Register.

Floating-point exceptions are signaled with bits set in

the Floating-Point Status and Control Register Load Double and Store Double instructions are pro-

(FPSCR). They can cause the system floating-point vided that transfer 64 bits of data between storage

enabled exception error handler to be invoked, pre- and the FPRs with no conversion. Load Single

cisely or imprecisely, if the proper control bits are set. instructions are provided to transfer and convert

floating-point values in floating-point single format

from storage to the same value in floating-point

82 PowerPC User Instruction Set Architecture

Version 2.01

double format in the FPRs. Store Single instructions The bit definitions for the FPSCR are as follows.

are provided to transfer and convert floating-point Bit(s) Description

values in floating-point double format from the FPRs

to the same value in floating-point single format in 0 Floating-Point Exception Summary (FX)

storage. Every floating-point instruction, except mtfsfi

and to 1 if that

mtfsf, implicitly sets FPSCR

FX

Instructions are provided that manipulate the instruction causes any of the floating-point

Floating-Point Status and Control Register and the exception bits in the FPSCR to change from 0 to

Condition Register explicitly. Some of these 1. mcrfs, mtfsfi, mtfsf, mtfsb0, and mtfsb1 can

instructions copy data from an FPR to the Floating- explicitly.

alter FPSCR

FX

Point Status and Control Register or vice versa. Floating-Point Enabled Exception Summary

1 (FEX)

The computational instructions and the Select instruc- This bit is the OR of all the floating-point excep-

tion accept values from the FPRs in double format. tion bits masked by their respective enable bits.

For single-precision arithmetic instructions, all input mcrfs, mtfsfi, mtfsf, mtfsb0, and mtfsb1 cannot

values must be representable in single format; if they explicitly.

alter FPSCR

FEX

are not, the result placed into the target FPR, and the

setting of status bits in the FPSCR and in the Condi- 2 Floating-Point Invalid Operation Exception

tion Register (if Rc=1), are undefined. Summary (VX)

This bit is the OR of all the Invalid Operation

exception bits. mcrfs, mtfsfi, mtfsf, mtfsb0, and

FPR 0 mtfsb1 explicitly.

cannot alter FPSCR

VX

FPR 1 Floating-Point Overflow Exception

3 (OX)

... See Section 4.4.3, “Overflow Exception” on

... page 93.

FPR 30 4 Floating-Point Underflow Exception (UX)

See Section 4.4.4, “Underflow Exception” on

FPR 31 page 93.

0 63 Floating-Point Zero Divide Exception

5 (ZX)

Figure 26. Floating-Point Registers See Section 4.4.2, “Zero Divide Exception” on

page 92.

Floating-Point Inexact Exception (XX)

6

4.2.2 Floating-Point Status and See Section 4.4.5, “Inexact Exception” on

Control Register page 94. is a sticky version of FPSCR (see

FPSCR

XX FI

The Floating-Point Status and Control Register below). Thus the following rules completely

(FPSCR) controls the handling of floating-point describe how FPSCR is set by a given instruc-

XX

exceptions and records status resulting from the float- tion.

ing-point operations. Bits 0:23 are status bits. Bits ■ , the new

If the instruction affects FPSCR

24:31 are control bits. FI

is obtained by ORing the

value of FPSCR

XX

old value of FPSCR with the new value of

The exception bits in the FPSCR (bits 3:12, 21:23) are XX

FPSCR .

sticky; that is, once set to 1 they remain set to 1 until FI

■ If the instruction does not affect FPSCR ,

they are set to 0 by an mcrfs, mtfsfi, mtfsf, or mtfsb0 FI

the value of FPSCR is unchanged.

instruction. The exception summary bits in the FPSCR XX

(FX, FEX, and VX, which are bits 0:2) are not consid- 7 Floating-Point Invalid Operation Exception

ered to be “exception bits”, and only FX is sticky. (SNaN) (VXSNAN)

See Section 4.4.1, “Invalid Operation Exception”

FEX and VX are simply the ORs of other FPSCR bits. on page 91.

Therefore these two bits are not listed among the 8 Floating-Point Invalid Operation Exception

FPSCR bits affected by the various instructions. − ∞

(∞ ) (VXISI)

See Section 4.4.1, “Invalid Operation Exception”

FPSCR on page 91.

0 31 9 Floating-Point Invalid Operation Exception

(∞ ) (VXIDI)

÷

Figure 27. Floating-Point Status and Control Register See Section 4.4.1, “Invalid Operation Exception”

on page 91.

Chapter 4. Floating-Point Processor 83

Version 2.01

10 21

Floating-Point Invalid Operation Exception Floating-Point Invalid Operation Exception

0) (VXZDZ)

(0÷ (Software Request) (VXSOFT)

See Section 4.4.1, “Invalid Operation Exception” This bit can be altered only by mcrfs, mtfsfi,

on page 91. or mtfsb1. See Section 4.4.1,

mtfsf, mtfsb0,

“Invalid Operation Exception” on page 91.

11 Floating-Point Invalid Operation Exception

(∞ 0) (VXIMZ) Floating-Point Invalid Operation Exception

22

×

See Section 4.4.1, “Invalid Operation Exception” (Invalid Square Root) (VXSQRT)

on page 91. See Section 4.4.1, “Invalid Operation Exception”

on page 91.

12 Floating-Point Invalid Operation Exception

(Invalid Compare) (VXVC) Programming Note

See Section 4.4.1, “Invalid Operation Exception” If the implementation does not support the

on page 91. optional Floating Square Root or Floating

13 Floating-Point Fraction Rounded (FR) Reciprocal Square Root Estimate instruction,

The last Arithmetic or Rounding and Conversion software can simulate the instruction and

instruction incremented the fraction during set this bit to reflect the exception.

rounding. See Section 4.3.6, “Rounding” on

page 89. This bit is not sticky. 23 Floating-Point Invalid Operation Exception

(Invalid Integer Convert)

Floating-Point Fraction Inexact (VXCVI)

14 (FI) See Section 4.4.1, “Invalid Operation Exception”

The last Arithmetic or Rounding and Conversion on page 91.

instruction either produced an inexact result

during rounding or caused a disabled Overflow Floating-Point Invalid Operation Exception

24

Exception. See Section 4.3.6, “Rounding” on Enable (VE)

page 89. This bit is not sticky. See Section 4.4.1, “Invalid Operation Exception”

, above, regarding on page 91.

See the definition of FPSCR

XX

the relationship between FPSCR and FPSCR .

FI XX 25 Floating-Point Overflow Exception Enable (OE)

15:19 Floating-Point Result Flags (FPRF) See Section 4.4.3, “Overflow Exception” on

This field is set as described below. For arith- page 93.

metic, rounding, and conversion instructions, Floating-Point Underflow Exception Enable

26 (UE)

the field is set based on the result placed into See Section 4.4.4, “Underflow Exception” on

the target register, except that if any portion of page 93.

the result is undefined then the value placed

into FPRF is undefined. Floating-Point Zero Divide Exception Enable

27 (ZE)

15 Floating-Point Result Class Descriptor (C) See Section 4.4.2, “Zero Divide Exception” on

Arithmetic, rounding, and conversion page 92.

instructions may set this bit with the FPCC bits, 28 Floating-Point Inexact Exception Enable

to indicate the class of the result as shown in (XE)

Figure 28 on page 85. See Section 4.4.5, “Inexact Exception” on

page 94.

Floating-Point Condition Code

16:19 (FPCC) Floating-Point Non-IEEE Mode (NI)

Floating-point Compare instructions set one of 29 Floating-point non-IEEE mode is optional. If

the FPCC bits to 1 and the other three FPCC floating-point non-IEEE mode is not imple-

bits to 0. Arithmetic, rounding, and conversion mented, this bit is treated as reserved, and the

instructions may set the FPCC bits with the C remainder of the definition of this bit does not

bit, to indicate the class of the result as shown apply.

in Figure 28 on page 85. Note that in this case

the high-order three bits of the FPCC retain If floating-point non-IEEE mode is implemented,

their relational significance indicating that the this bit has the following meaning.

value is less than, greater than, or equal to 0 The processor is not in floating-point

zero. non-IEEE mode (i.e., all floating-point oper-

Floating-Point Less Than or Negative (FL or < )

16 ations conform to the IEEE standard).

Floating-Point Greater Than or Positive (FG or

17 1 The processor is in floating-point non-IEEE

>) mode.

18 Floating-Point Equal or Zero (FE or = ) When the processor is in floating-point

non-IEEE mode, the remaining FPSCR bits

19 Floating-Point Unordered or NaN (FU or ?) may have meanings different from those

20 Reserved given in this document, and floating-point

operations need not conform to the IEEE

standard. The effects of running with

84 PowerPC User Instruction Set Architecture

Version 2.01

FPSCR = 1 , and any additional require- The lengths of the exponent and the fraction fields

NI

ments for using non-IEEE mode, are differ between these two formats. The structure of

described in the Book IV, PowerPC Imple- the single and double formats is shown below.

mentation Features for the implementation,

and may differ between implementations. S EXP FRACTION

Programming Note 01 9 31

When the processor is in floating-point

non-IEEE mode, the results of floating- Figure 29. Floating-point single format

point operations may be approximate,

and performance for these operations

may be better, more predictable, or S EXP FRACTION

less data-dependent than when the

processor is not in non-IEEE mode. For 01 12 63

example, in non-IEEE mode an imple- Figure 30. Floating-point double format

mentation may return 0 instead of a

denormalized number, and may return Values in floating-point format are composed of three

a large number instead of an infinity. fields:

S sign bit

30:31 Floating-Point Rounding Control (RN) EXP exponent+bias

See Section 4.3.6, “Rounding” on page 89. FRACTION fraction

00 Round to Nearest Representation of numeric values in the floating-point

01 Round toward Zero formats consists of a sign bit (S), a biased exponent

Infinity

10 Round toward +

− (EXP), and the fraction portion (FRACTION) of the

11 Round toward Infinity significand. The significand consists of a leading

implied bit concatenated on the right with the FRAC-

Result TION. This leading implied bit is 1 for normalized

Flags Result Value Class numbers and 0 for denormalized numbers and is

C < > = ? located in the unit bit position (i.e., the first bit to the

left of the binary point). Values representable within

1 0 0 0 1 Quiet NaN the two floating-point formats can be specified by the

0 1 0 0 1 Infinity parameters listed in Figure 31.

0 1 0 0 0 Normalized Number

− Denormalized Number

1 1 0 0 0 − Format

1 0 0 1 0 Zero

0 0 0 1 0 Zero

+ Single Double

1 0 1 0 0 Denormalized Number

+

0 0 1 0 0 Normalized Number

+ Exponent Bias +127 +1023

Infinity

0 0 1 0 1 + Maximum Exponent +127 +1023

− −

Figure 28. Floating-Point Result Flags Minimum Exponent 126 1022

Widths (bits)

Format 32 64

4.3 Floating-Point Data Sign 1 1

Exponent 8 11

Fraction 23 52

4.3.1 Data Format Significand 24 53

This architecture defines the representation of a float-

ing-point value in two different binary fixed-length Figure 31. IEEE floating-point fields

formats. The format may be a 32-bit single format for

a single-precision value or a 64-bit double format for The architecture requires that the FPRs of the

a double-precision value. The single format may be Floating-Point Processor support the floating-point

used for data in storage. The double format format double format only.

may be used for data in storage and for data in float-

ing-point registers. Chapter 4. Floating-Point Processor 85

Version 2.01

Zero values (± 0)

4.3.2 Value Representation These are values that have a biased exponent value

of zero and a fraction value of zero. Zeros can have

This architecture defines numeric and non-numeric a positive or negative sign. The sign of zero is

values representable within each of the two supported ignored by comparison operations (i.e., comparison

formats. The numeric values are approximations to −

regards + 0 as equal to 0).

the real numbers and include the normalized

numbers, denormalized numbers, and zero values. Denormalized numbers (± DEN)

The non-numeric values representable are the infin- These are values that have a biased exponent value

ities and the Not a Numbers (NaNs). The infinities are of zero and a nonzero fraction value. They are

adjoined to the real numbers, but are not numbers nonzero numbers smaller in magnitude than the

themselves, and the standard rules of arithmetic do representable normalized numbers. They are values

not hold when they are used in an operation. They in which the implied unit bit is 0. Denormalized

are related to the real numbers by order alone. It is numbers are interpreted as follows:

possible however to define restricted operations s Emin

DEN = (− x 2 x (0.fraction)

1)

among numbers and infinities as defined below. The

relative location on the real number line for each of where Emin is the minimum representable exponent

the defined entities is shown in Figure 32. −

value (− 126 for single-precision, 1022 for double-

precision).

³ ³ ³ ³ ³ ³ ³

-INF -NOR -DEN -0 +0 +DEN +NOR +INF

IÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÅÄÄÅÄÄÅÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄH ∞

Infinities (± )

These are values that have the maximum biased

Figure 32. Approximation to real numbers exponent value:

255 in single format

The NaNs are not related to the numeric values or 2047 in double format

infinities by order or value but are encodings used to

convey diagnostic information such as the represen- and a zero fraction value. They are used to approxi-

tation of uninitialized variables. mate values greater in magnitude than the maximum

normalized value.

The following is a description of the different floating-

point values defined in the architecture: Infinity arithmetic is defined as the limiting case of

real arithmetic, with restricted operations defined

Binary floating-point numbers among numbers and infinities. Infinities and the real

Machine representable values used as approxi- numbers can be related by ordering in the affine

mations to real numbers. Three categories of sense:

numbers are supported: normalized numbers, denor- − ∞ ∞

malized numbers, and zero values. every finite number

< < +

Normalized numbers (± NOR) Arithmetic on infinities is always exact and does not

These are values that have a biased exponent value signal any exception, except when an exception

in the range: occurs due to the invalid operations as described in

Section 4.4.1, “Invalid Operation Exception” on

1 to 254 in single format page 91.

1 to 2046 in double format

They are values in which the implied unit bit is 1. Not a Numbers (NaNs)

Normalized numbers are interpreted as follows: These are values that have the maximum biased

exponent value and a nonzero fraction value. The

s E

1) x 2 x (1.fraction)

NOR = (− sign bit is ignored (i.e., NaNs are neither positive nor

where s is the sign, E is the unbiased exponent, and negative). If the high-order bit of the fraction field is

1.fraction is the significand, which is composed of a 0 then the NaN is a Signaling NaN; otherwise it is a

leading unit bit (implied bit) and a fraction part. Quiet NaN.

The ranges covered by the magnitude (M) of a nor- Signaling NaNs are used to signal exceptions when

malized floating-point number are approximately they appear as operands of computational

equal to: instructions.

Single Format: Quiet NaNs are used to represent the results of

certain invalid operations, such as invalid arithmetic

− ≤ ≤

38 38

1.2x10 M 3.4x10 operations on infinities or on NaNs, when Invalid

= 0 ) . Quiet

Operation Exception is disabled (FPSCR

VE

Double Format: NaNs propagate through all floating-point operations

− ≤ ≤

308 308

2.2x10 M 1.8x10 except ordered comparison, Floating Round to Single-

86 PowerPC User Instruction Set Architecture


PAGINE

367

PESO

8.17 MB

AUTORE

Sara F

PUBBLICATO

+1 anno fa


DETTAGLI
Corso di laurea: Corso di laurea in ingegneria informatica
SSD:
A.A.: 2007-2008

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher Sara F di informazioni apprese con la frequenza delle lezioni di Architetture Sistemi Elaborazione e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Napoli Federico II - Unina o del prof Fadini Bruno.

Acquista con carta o conto PayPal

Scarica il file tutte le volte che vuoi

Paga con un conto PayPal per usufruire della garanzia Soddisfatto o rimborsato

Recensioni
Ti è piaciuto questo appunto? Valutalo!

Altri appunti di Architetture sistemi elaborazione

Architetture Sistemi Elaborazione   - Sommatori ASE
Dispensa
Architetture Sistemi Elaborazione -  Moltiplicatori sequenziali unsigned
Dispensa
Architetture Sistemi Elaborazione – Intel
Dispensa
Architetture Sistemi Elaborazione
Appunto