Appunti in inglese di Architetture Sistemi Elaborazione del prof. Mazzocca sulla tesi sui divisori: a high-performance data-dependent hardware integer divider, Preliminaries, Division Arithmetic, Improved Division, Performance Analysis, Obtained Results, Conclusions, Source Code.
CAS CAS CAS CAS CASq a1 6CAS CAS CAS CAS CASq a2 7CAS CAS CAS CAS CASq a3 8CAS CAS CAS CAS CASq 4 r r r r r0 1 2 3 4Figure 3.5: Circuit diagram of 4-by-4 restoring (top) and 5-by-5 non-restoring (bottom)cellular array divider.values, i.e., the lowest index of a and d is 1, rather than 0 which is reserved for the sign bit.i iCompared to conventional radix-2 dividers, in the cellular scheme the dividend remainsfixed and the divisor is shifted. More precisely, the partial remainder R is transferred toithe next row and the divisor is shifted right by one position, indicated by the diagonallines. Similarly to the iterative restoring scheme, where the quotient bits are generated 333.5. FAST ARRAY DIVIDERSthrough the adder’s carry-out signal, they are now obtained from the left of each row.That is, the quotient bits are given byq = a + c ,i+1 i+1 outwhere a is the bit shifted out in the ith iteration.i+1Negative operands can be supported through implementing non-restoring division inIn a similar way. Let A = a .a a · · · a denote the dividend, D = d .d d · · · d the divisor, n0 1 2 2n 0 1 2Q = q .q q · · · q the quotient, and R = r .r r · · · r the remainder. In case of non-restoring division, the basic element is a controlled-add/subtract (CAS) cell like shown in Figure 3.4 (right), which performs either a - d if P = 0 or a + d if P = 1. It contains an XOR gate for controlling the divisor input to a full adder (FA) that generates s = a ⊕ (d ⊕ P) ⊕ c, inc = (a + c)(d ⊕ P) + ac. out in in
In order to realize n-bit non-restoring division, (n + 1) × (n + 1) cells are required due to the sign-magnitude extension. Figure 3.5 (bottom) shows a 5-by-5 arrangement of CAS cells for performing 4-bit radix-2 non-restoring division. Note that operands and results all contain an additional sign bit, indicated by index 0. Compared to the previous array divider,
The operational sequence of the non-restoring divider is basically the same, except that the final remainder might be incorrect. In other words, a corrective restoration of the previous partial remainder can be necessary.
Regarding practical implementations, for both the restoring and non-restoring array divider, the execution time is of order O(n), due to the slow carry-propagation. However, in the latter case, the quotient bits depend only on the sign bits obtained from the left-most cells. Therefore, it suffices to compute only these bits with a fast carry-lookahead circuit, while the other bits of the partial remainder can be generated using carry-save adders. Again, carry-save addition is the key to faster execution time of order O(n log n).
Clearly, the high speed comes at an investment of required hardware that is immense, e.g., 264 = 4096 cells for 64-bit restoring division. Consequently, array dividers are reserved for large general-purpose processors, since they find rarely place in
micro architectures.
Finally, we present formulas for the gate count and divide time of the three proposed dividers in Table 3.2, where ∆ denotes the unit gate delay.
Table 3.2: Specific data of different array dividers.
Divider Type
Gate Count
Divide Time
2 2restoring
14n + 4n (2n + 6n + 2)∆ G
2
2 2non-restoring
15(n + 1) (2n + 6n + 4)∆ G
2non-restoring
CLA
18n + 27n + 10 (10n + 11)∆ G
34
CHAPTER 3. DIVISION ARITHMETIC
3.6 A Different Approach
And now for something completely different. Monty Python’s Flying Circus
All division methods and their implementations we have examined so far belong to the category of radix-β dividers. Their common property is that, except for initial and final normalization, a fixed number of positions is shifted at each step, i.e., with β = 2 , mbits are shifted. In case of low-radix division, the sequential process is generally inefficient due to an immense number of iterations required. In contrast, high-radix division is
veryefficient—without question. However, practical implementations usually result in muchtoo large circuits that do not meet the area constraints of small processors architectures.Consequently, there is a rapidly growing demand for small dividers which are also fast,at least on the average, such that higher throughput is achieved. The most promisingapproach addresses data-dependent dividers that execute in variable time. Because mostof small architectures, e.g., signal processors, are not fully pipelined, there is no desperateneed of designing synchronous circuits that offer constant execution time.
One of the obvious drawbacks of radix-β dividers is that the initial remainder is notentirely “visible” as long as some of its low-order bits remain in the shift register’s LW.Recall from Section 3.4.3 that radix-2 SRT division processes in data-dependent time, i.e.,it starts off with normalizing both operands and shifts the partial remainder an arbitrarynumber
of positions at each iteration in order to keep it bounded. To this end, a means for determining the number of leading zeros of the divisor is needed for performing the initial normalization. Clearly, the same means is needed for the partial remainder—otherwise it would not be possible to determine the optimal number of positions to be shifted. Thus, logical-shift registers have to be implemented. Furthermore, the final remainder must be shifted right in the last step by the number of initially shifted positions, which implies that a bidirectional logical-shift register is required. However, due to this hardware investment, the straightforward approach to achieve higher throughput at some lower cost of hardware is to consider simpler methods.
Referring to the RT divider proposed in Section 3.3, all we need to do is to rearrange some of its components, such that the new design looks like Figure 3.6. The first thing pointing out is that the roles of the remainder and the divisor are exchanged,
i.e., the divisor is shifted instead of the remainder, whereas the remainder is located in the static register. Because the remainder is entirely "visible" now, we can align the divisor to meet the initial remainder's data size before the first subtraction is performed. Due to this initial alignment, a quantity representing the divisor multiplied by 2, where m is the number of left-shifts, is subtracted from the initial remainder in one step. Hence, the average number of iterations needed is reduced to some m, which depends primarily on the data difference between the operands. (For explanations of the terms data size and data difference see Definition 11 and 12, respectively.) The second thing pointing out is that the quotient bit is now stored as the MSB of the bidirectional SR's LW. Obviously, after the initial alignment is finished, the direction changes and the divisor is shifted right.
Schematic of data-dependent divider.during the usual subtract-and-shift process. Therefore, the quotient bits need to growfrom left to right, which also implies that the final quotient is given in reversed order.Due to the data dependency, the recursion evaluates to m−iR = R − 2 × D, i = 0, 1, . . . , m, m ∈ {0, 1, . . . , n − 1},i+1 iand the quotient bit is determined by the rule m&minus-i½ 0 if R < 2 Diq =i+1 m&minus-i1 if R ≥ 2 D.iRegarding an implementation of this method, the average performance might dominateover the RT divider. However, up to 2n iterations are needed in the worst case, becausea small divisor is shifted bit-per-bit a long way up and back down to its original position.Therefore, essential improvements must be applied in order to successfully implement apractical divider based on this approach.3.7 Self-Aligning DividerPerfection is achieved, not when there is nothing more to add, but when thereis nothing left to take.
away.Antoine de Saint-Exupéry, 1939
Wind, Sand, and Stars,
In continuation of the preceding section, we now consider several improvements of the
proposed method. Its major drawback is that the divisor is shifted only bit-wise when it
is aligned to the initial remainder. Clearly, if we want to keep the divider design as “lean”
as possible concerning hardware, we have to stick with aligning the divisor in this way.
36 CHAPTER 3. DIVISION ARITHMETIC
DIVISORQUOTIENTREMAINDER1CI+CO
Schematic of self-aligning divider.
Despite this circumstance, we can still speed up the aligning process and improve other
parts as well. The more advanced and also simpler architecture shown in Figure 3.7 is
known as self-aligning (SA) divider, which we examine in some detail now.
Divisor Aligning. The initial left-shifts can be overlapped entirely with comparing
the shifted divisor to the initial remainder. This is possible due to the fact that, once the
divisor is stored in the SR, it is never
rewritten by any other data. Thus, the left-shift operation can be carried out in parallel with the comparison, which is performed through the unoccupied adder.
Process Control. The SR is extended to 2n + 2 bits with an additional control bit to the left and right-hand side. The left-most bit serves as the stop bit for detecting a possible over-shift during the aligning phase, whereas the right-most bit serves as the done bit for indicating the last iteration. For any of the radix-β dividers proposed in the preceding sections, a counter is needed in order to keep track of the iterations. Such counter can be omitted in this design through a simple trick: the second low-order bit of the SR, i.e., SR[1], is initially set to 1. This temporary bit is shifted out in the last iteration, and thus, indicates the completion of the division process.
Divisor Multiplexor. The SR contains a (2n + 2)-bit 3-to-1 input multiplexor for writing and bidirectional shifting. It implements in a single step: "initialize".
“shift left”,and “shift right”. The three input ports are wired in the following order:shift right [00] : {0, feedback[2n+1 : n+2], q , feedback[n : 1]}i+1shift left [01] : {feedback[2n : 0], 0}initialize [10] : {0, divisor[n−1 : 0], {(n−1) × 0}, 1, 0}According to the obtained synthesis results, the delay of the 3-to-1 multiplexor is equalto the delay of the 4-to-1 multiplexor used in the RT divider. 373.7. SELF-ALIGNING DIVIDERRemainder Multiplexor. Recall that the RT divider uses the multiplexor subsequentto the adder in order to avoid the restoring step. However, for the SA divider it is notneeded anymore, because it is sufficient to provide a
SSDIngegneria industriale e dell'informazioneING-INF/05 Sistemi di elaborazione delle informazioni
I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher valeria0186 di informazioni apprese con la frequenza delle lezioni di Architetture Sistemi Elaborazione e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Università degli studi di Napoli Federico II o del prof Mazzocca Nicola.
Ciao! Sono il tuo Tutor AI, il compagno ideale per uno studio interattivo. Utilizzo il metodo maieutico per affinare il tuo ragionamento e la comprensione. Insieme possiamo:
Risolvere un problema di matematica
Riassumere un testo
Tradurre una frase
E molto altro ancora...
Cosa vuoi imparare oggi?
Il Tutor AI di Skuola.net usa un modello AI di Chat GPT.
Per termini, condizioni e privacy, visita la relativa pagina.