Che materia stai cercando?

I divisori - tesi

Appunti in inglese di Architetture Sistemi Elaborazione del prof. Mazzocca sulla tesi sui divisori: a high-performance data-dependent hardware integer divider, Preliminaries, Division Arithmetic, Improved Division, Performance Analysis, Obtained Results, Conclusions, Source Code.

Esame di Architetture Sistemi Elaborazione docente Prof. N. Mazzocca

Anteprima

ESTRATTO DOCUMENTO

59

6.2. SIMULATED PERFORMANCE

thoroughly in Section 5.3. The total number of occurrences regrading the n possible data

sizes, i.e., 1, . . . , n bits, is written to a report file upon completion of a test. Due to

this, we can visualize the distribution of data sizes and data differences that occurred

in a simulation. Figure 6.1 shows the distributions of data sizes (left column) and data

differences (right column) obtained from sixteen simulations with four different word sizes

and four different RNG configurations, as discussed previously. It is plain to see that

RNG 1 yields distributions of operands that are very close together regarding their data

sizes, whereas RNG 4—with the largest random right-shift—yields distributions where

the operands are spread out over the entire range. Consequently, the associated data

differences resulting from RNG 4 are very large compared to RNG 1, which contributes

to a wide range of varying input data for the simulated dividers.

6.2 Simulated Performance

Experience does not ever err. It is only your judgment that errs in promising

itself results which are not caused by your experiments.

Leonardo da Vinci (1452−1519)

Dividers are generally composed of many individual components, and therefore, a lot

more control signals are needed compared to other devices. For this reason, the controller

of a divider usually operates at an oscillator frequency that represents a fraction of the

processor’s cycle time, i.e., the period of the system clock. Recall from Section 5.1 that

our four controllers are implemented as Finite State Machines containing MS flip-flops,

which are assigned a delay of 0.958 ns. Consequently, this enables us to run the divider

modules with an oscillator frequency of 1 GHz, which further implies that the number of

clock cycles required is identical with the execution time in nano-seconds. This fact will

be of great importance when we conduct speedup comparisons based on execution time

to other dividers in the next section.

The minimum and maximum performance results are obtained through simulations

with analytically determined best and worst cases. Note that the worst case is not always

given by dividing the largest representable value by the smallest one. The more interesting

average-case results are obtained by running the MPI parallel program on the 17-node

cluster introduced in Section 5.2. Against our expectations, the average values converge

very fast, usually within a few million divisions. Nevertheless, a minimum of 35 billion

divisions is executed during each simulation in order to prove stability. A summary of all

performance values resulting from sixteen simulations is given in Table 6.1. The graphical

interpretation of these results is shown in Figure 6.2.

Radix-Two Divider. It points out that the RT divider covers the smallest full spectrum

(light-gray) of the four dividers. Recall that its execution time is constant without the

applied modification of initially normalizing the dividend. Thus, the shifting-over-zeros

method implemented in this simple form causes only a slight improvement of the average

performance (gray), which is still very close to the maximum.

60 CHAPTER 6. OBTAINED RESULTS

Table 6.1: Performance of RT, SA, DA, and HA divider in number of clock cycles. Note

that due to the applied oscillator frequency of 1 GHz, the number of clock cycles required

is identical with the execution time in nano-seconds.

16-bit Performance [clock cycles]

Divider Minimum RNG 1 RNG 2 RNG 3 RNG 4 Maximum

RT 58 150 145 135 116 163

SA 23 57 64 81 116 270

DA 26 34 37 47 68 311

HA 26 30 33 39 52 161

32-bit Performance [clock cycles]

Divider Minimum RNG 1 RNG 2 RNG 3 RNG 4 Maximum

RT 106 305 295 275 238 323

SA 23 64 82 122 205 534

DA 26 37 47 71 118 615

HA 26 33 39 56 89 315

64-bit Performance [clock cycles]

Divider Minimum RNG 1 RNG 2 RNG 3 RNG 4 Maximum

RT 202 615 595 558 483 643

SA 23 82 122 209 382 1062

DA 26 47 71 120 219 1223

HA 26 39 56 92 163 635

128-bit Performance [clock cycles]

Divider Minimum RNG 1 RNG 2 RNG 3 RNG 4 Maximum

RT 394 1235 1198 1123 973 1283

SA 23 122 209 384 734 2118

DA 26 71 120 221 422 2439

HA 26 56 92 166 311 1275 61

6.2. SIMULATED PERFORMANCE

2500 2500

Radix-Two Performance Self-Aligning Performance

2000 2000

Cycles Cycles

1500 1500

Clock Clock

1000 1000

500 500

HbitsL HbitsL

16 32 64 128 16 32 64 128

Word Size Word Size

2500 2500

Direct-Aligning Performance Hybrid-Aligning Performance

2000 2000

Cycles Cycles

1500 1500

Clock Clock

1000 1000

500 500

HbitsL HbitsL

16 32 64 128 16 32 64 128

Word Size Word Size

Figure 6.2: Full (light-gray) and average (gray) spectrum of RT (left top), SA (right top),

DA (left bottom), and HA (right bottom) divider in number of clock cycles.

Self-Aligning Divider. The SA divider stands out with the widest average spectrum

(gray), since it is highly sensitive to strongly varying data differences. In other words, a

satisfactory average performance can only be achieved with operands that do not differ

too much in their data sizes. Moreover, it also covers a relatively wide overall spectrum

(light-gray), which demands further improvements to obtain an applicable divider.

Direct-Aligning Divider. It is plain to see that the DA divider spans a vast overall

spectrum (light-gray), primarily due to its very long critical path. This causes a worst-

case performance that exceeds the other dividers immensely regarding their full spectra.

Despite this fact, its average spectrum (gray) is sufficiently small and also much lower

compared to the SA divider. Nevertheless, to make the DA divider useful for practical

implementations, the worst-case behavior needs to be improved.

Hybrid-Aligning Divider. The HA divider clearly dominates over the other three with

the smallest and lowest average spectrum (gray), which implies that it is least sensitive

regarding strong variations of input data. Furthermore, it also stands out with the lowest

full spectrum (light-gray). In other words, with the help of an additional multiplexor

and two temporary registers, the worst-case performance could be improved to be even

slightly better than for the RT divider.

62 CHAPTER 6. OBTAINED RESULTS

6.3 Speedup Comparisons

Speed gets you nowhere if you’re headed the wrong way. American Proverb

The essential question that arises is: Where are we compared to state-of-the-art dividers?

So, to answer this question thoroughly, we will evaluate the performance of two specific

dividers: radix-4 SRT (SRT4) using carry-save addition and radix-2 non-restoring array

(ARR2) using carry-lookahead circuitry. Clearly, we will not implement these dividers,

since this would fairly exceed the scope of the thesis. However, in order to answer the

posed question, it suffice to determine some approximate performance. To this end, we

only need to implement the sequential components of the SRT4 divider’s iterative part,

using the same technology as for the other four dividers, i.e., Verilog HDL. Because the

execution time of the SRT4 divider is constant, we can derive its performance directly from

the determined critical path delay (CPD) and the number of iterations to be performed,

i.e., n/2 in case of radix 4. Recall from Section 3.4 that practical high-radix SRT dividers

are implemented with both remainder truncation and carry-save addition for reducing

the CPD. Accordingly, the division process can be described as follows:

Step 1: Normalize initial remainder and divisor.

Step 2.1: Perform high-speed addition obtaining index bits.

Step 2.2: Convert index bits to two’s complement if negative.

Step 2.3: Select corresponding quotient bits from lookup table.

Step 2.4: Select appropriate divisor multiple through multiplexor.

Step 2.5: Perform carry-save addition obtaining carry and sum.

Step 2.6: Pass logical shifter and shift left by 2 positions.

Step 2.7: Store logically shifted quantities in registers.

Step 3: Correct final remainder and quotient.

Referring to Figure 3.3 that depicts the sequential components of such an SRT divider,

we can calculate the CPD due to conducted measurements with Verilog implementations,

as demonstrated in Example 6.1. It turns out that the CPD of the SRT4 divider based

on FPGA technology evaluates to 8.162 ns (Verilog implementations of two lookup tables

used for synthesis can be found in Section A.1). So, if we omit both the normalization and

correction step, we obtain good estimates for the 8, 16, 32, and 64 iterations needed in

case of word sizes 16, 32, 64, and 128 bits, respectively. An overlap of the resulting SRT4

performance curve with the spectrum of the RT, SA, DA, and HA divider is presented

in Figure 6.3. Recall that the execution time of these four dividers is identical with

their numbers of clock cycles required, and therefore, we can compare them directly

to the SRT4 divider’s execution time in nano-seconds. The four corresponding SRT4

performance values can be found in Table 6.2. 63

6.3. SPEEDUP COMPARISONS

2500 2500

Radix-Two Performance Self-Aligning Performance

2000 2000

HnsL HnsL

1500 1500

Time Time

Execution Execution

1000 1000

500 500

HbitsL HbitsL

16 32 64 128 16 32 64 128

Word Size Word Size

2500 2500

Direct-Aligning Performance Hybrid-Aligning Performance

2000 2000

HnsL HnsL

1500 1500

Time Time

Execution Execution

1000 1000

500 500

HbitsL HbitsL

16 32 64 128 16 32 64 128

Word Size Word Size

Figure 6.3: Comparison of SRT4 divider (bold line) with RT (left top), SA (right top),

DA (left bottom), and HA (right bottom) divider in nano-seconds.

Example 6.1: Evaluating the SRT4 divider’s critical path delay.

Sequential Component ∆ SC

4-bit Carry-Lookahead Adder 2.568

4-to-3-bit Digit Converter 0.479

256 × 3-bit Lookup Table 1.687

n-bit 3-to-1 Divisor Multiplexor 0.793

n-bit Carry-Save Adder 0.479

n-bit Logical Left-Right-Shifter 1.198

n-bit Carry/Sum/Quo. Register 0.958

Critical Path Delay [ns] 8.162

In case of the ARR2 divider, we can use the formula (10n + 11)∆ from Table 3.2 for

G

calculating its execution time. In order to preserve the constraints applied similarly to all

implementations so far, i.e, FPGA technology, we only need to determine the unit gate

delay ∆ . It can be estimated based on the following considerations: all low-fan-in Verilog

G

modules with gate level 2, e.g., D-type latches and 2-to-1 multiplexors, yield a maximum

64 CHAPTER 6. OBTAINED RESULTS

path delay of 0.479 ns. Surprisingly, all Verilog primitives, i.e., AND, OR, XOR, NAND,

NOR, NXOR, and NOT, with a fan-in of 2 also yield a delay of 0.479 ns. The reason is

that the target device is an FPGA, which consists of an AND-OR matrix for establishing

the desired connectivity for the compiled image. Due to this matrix, most level-2 net-lists

can be interconnected efficiently without large overhead, whereas representing primitives

by AND-OR circuits clearly causes some overhead (for more details see [11]). According

to these considerations, we can assume that each primitive is transposed to a net-list of

level 2. So, we may use the approximation ∆ = 0.479/2 = 0.2395 ns for our calculations,

G

yielding the four performance values listed in Table 6.2. The corresponding overlap of

the obtained ARR2 performance curve with the spectrum of the RT, SA, DA, and HA

divider is illustrated in Figure 6.4.

Table 6.2: Performance of SRT4 divider with CSA and ARR2 divider with CLA.

Performance [ns]

Divider 16 bit 32 bit 64 bit 128 bit

SRT4 65 131 261 522

ARR2 41 79 156 309

The posed question—where are we—can now be answered by determining the speedup

of each divider. This can be accomplished easily through comparisons against the RT

divider’s maximum, which is equivalent to its constant execution time without initially

normalizing the dividend. Table 6.3 summarizes the average speedups of all six dividers.

In case of the RT, SA, DA, and HA divider, the corresponding table entries are determined

through dividing the RT maximum by the average of the four obtained simulation values

RNG 1 to RNG 4 from Table 6.1.

Table 6.3: Average speedup of six dividers compared to maximum of RT divider.

Average Speedup

Divider 16 bit 32 bit 64 bit 128 bit

RT 1.2 1.2 1.1 1.1

SRT4 2.5 2.5 2.5 2.5

SA 2.1 2.7 3.2 3.5

ARR2 4.0 4.1 4.1 4.2

DA 3.5 4.7 5.6 6.2

HA 4.2 6.0 7.3 8.2

Comparing all speedups reveals that the HA divider achieves the highest value in each

column, and thus, beats even the ARR2 divider. Clearly, these excellent HA speedups are 65

6.3. SPEEDUP COMPARISONS

2500 2500

Radix-Two Performance Self-Aligning Performance

2000 2000

HnsL HnsL

1500 1500

Time Time

Execution Execution

1000 1000

500 500

HbitsL HbitsL

16 32 64 128 16 32 64 128

Word Size Word Size

2500 2500

Direct-Aligning Performance Hybrid-Aligning Performance

2000 2000

HnsL HnsL

1500 1500

Time Time

Execution Execution

1000 1000

500 500

HbitsL HbitsL

16 32 64 128 16 32 64 128

Word Size Word Size

Figure 6.4: Comparison of ARR2 divider (bold line) with RT (left top), SA (right top),

DA (left bottom), and HA (right bottom) divider in nano-seconds.

merely approximate average values, since the HA divider is data-dependent. Nonetheless,

it can also be interpreted in the following way: the HA divider performs at least like a

conventional radix-2 divider, achieves an average throughput similar to a radix-2 array

divider, and comes at an investment of hardware far less than for a radix-4 SRT divider.

As a concluding remark, note that using FPGA technology for prototyping neither

leads to infeasible practical implementations, nor does it mean the obtained results are

not plausible. Regarding speedup comparisons, the only essential restriction that needs to

be satisfied is that the same FPGA constraints apply to all conducted experiments. When

it comes to silicon—actually copper is the more appropriate word—performance results

might differ depending on the chosen target technology. For instance, array dividers

implemented with CMOS technology are still the fastest dividers known. However, bear

in mind that potential advantages resulting from advanced target technologies might

improve the performance of any divider proposed in this work.

66 CHAPTER 6. OBTAINED RESULTS

Chapter 7

4 #

’Begin at the beginning,’ the King said gravely, ’and go on till you come to an

end; then stop.’ Lewis Carroll, 1865

Alice’s Adventures in Wonderland,

In this chapter we draw conclusions, highlight the main contributions, and overview some

possible future research directions. In Section 7.1 we review the content of the thesis,

followed by a brief discussion of the main contributions given in Section 7.2. Lastly, in

Section 7.3 we outline some possible directions for further research in this field.

7.1 Review of the Thesis

Aiming to impart some basic knowledge, we have started with examining fundamental

devices, such as different adders and shifters, which are commonly used in dividers based

on the subtract-and-shift approach. We have begun our excursion of sequential division

with discussing restoring and non-restoring division, serving as the essential principles

used in binary division. We have also discovered that there are ways for implementing

restoring division to work even faster than its non-restoring counterpart. To this end, we

have examined an advanced restoring design realized as the radix-two divider. In order

to increase efficiency and speed, we have considered variants of high-radix division and

investigated thoroughly the principles of SRT division. It has turned out that high-radix

SRT dividers—though unquestionably very efficient—demand a substantial investment

of hardware. Based on the approach of using pure combinational logic for performing

the sequential process, we have briefly outlined cellular array dividers that indicate the

fastest solution known. Due to the immense hardware investment required for practical

implementations based on high-radix division, we have considered a different approach

addressing data-dependent dividers that execute in variable time. For this reason, the

self-aligning divider has been introduced, which represents a minimal set of components

67

68 CHAPTER 7. CONCLUSIONS

needed for performing binary division. Although its average throughput is generally much

higher compared to low-radix dividers, we have discovered some drawbacks of this design.

In our effort to improve its insufficient behavior regarding an initial alignment of the

divisor, we have introduced a more sophisticated method. Based on this combinational-

aligning method, we have proposed a minimum-subtraction design named direct-aligning

divider and revealed a drawback concerning worst cases. This has inspired an improved

architecture, introduced as the hybrid-aligning divider. Thereafter, we have discussed the

difficulties arising from simulating data dependency and established a methodology for

analyzing the performance of four specific divider implementations. In order to determine

their average performance, we have used a parallel computer cluster for running several

simulations. Finally, we have discussed the obtained results and compared them with two

commonly used dividers in order to get an overview where we are.

7.2 Main Contributions

It is difficult to say what is impossible, for the dream of yesterday is the hope

of today and the reality of tomorrow. Robert H. Goddard (1882−1945)

In our effort to improve the efficiency of data-dependent dividers, we have developed an

elaborate method that offers an average speedup of 600%. The underlying architecture is

fully scalable, and therefore, might become attractive compared to other divider designs

that can only be scaled to some extend.

Combinational Aligning. We have shown that permanently aligning the divisor to

the partial remainder can be done efficiently by using priority encoders combined with

logical shifters. Instead of using conventional decoders for controlling the logical shifters,

we have simply adapted priority encoders for this special purpose. The great advantage

of the resulting combinational unit is that it provides both the aligned divisor and the

associated partial quotient. That is, in addition to the aligned divisor, a full-width partial

quotient is generated, containing the ith quotient bit already in its final position.

Direct-Aligning Divider. In the theoretical sense, the DA divider represents an

optimal solution concerning data-dependent divider architectures. It reduces the process

of sequential division to the minimum number of subtractions needed for breaking down

a given dividend. Although the DA divider yields a worst-case performance that exceeds

the other simulated dividers, its average throughput is much higher compared to a radix-4

SRT divider. Regarding practical implementations, the DA divider does not require any

shift registers nor register widths larger than the word size it is adapted for.

Hybrid-Aligning Divider. The more sophisticated HA divider dominates over all

simulated dividers in both worst-case and average performance. In this sense, it is not

sensitive regarding strong variations of input data. That means, it performs at least like 69

7.3. FUTURE DIRECTIONS

a conventional radix-2 divider, achieves an average throughput similar to a radix-2 array

divider, and comes at an investment of hardware far less than for a radix-4 SRT divider.

The HA divider contains only registers of widths equal to the supported word size and is

still small enough to meet general area constraints of micro architectures.

7.3 Future Directions

Even if you are on the right track, you’ll get run over if you just sit there.

Will Rogers (1879−1935)

We expect to investigate different and more advanced designs of the proposed aligning

method applied to both existing and new divider architectures. Further design issues

include the extension of the DA and HA divider to facilitate signed division, which is

not supported currently by the prototypes. Another important goal is to determine more

accurate performance data based on fully optimized implementations regarding specific

target technologies like ASICs.

If we can show that the obtained performance results and speedups hold independently

of the underlying target technology, then our proposed data-dependent division methods

might pave the way for possible practical implementations in the future.

70 CHAPTER 7. CONCLUSIONS

Appendix A

5 )++ 6

There is less in this than meets the eye.

Tallulah Bankhead, Remark to Alexander Wollcott, 1922

Even for small designs the amount of source code defining hardware can grow immensely.

It is not uncommon that combinational circuits consist of many thousand of gates and

interconnections, sometimes representing very elusive nets that are hard to understand.

Therefore, in order to keep the overview and also enable an individual synthesis of each

component, a fine-grained design structure becomes unavoidable.

In this appendix we present the entire source code which was required to carry out this

work. In Section A.1 we list Verilog code that was developed to compose several dividers,

followed by C++ code describing the MPI parallel program given in Section A.2.

A.1 Verilog Files latch.v

/*****************************************************************************/

/* */

/* Module N-Bit Latch */

/* */

/* Size (bits) | 8 | 16 | 32 | 64 | 128 */

/* ------------+--------+--------+--------+--------+-------- */

/* Logic Level | 1 | 1 | 1 | 1 | 1 */

/* ------------+--------+--------+--------+--------+-------- */

/* Delay (ns) | 0.479 | 0.479 | 0.479 | 0.479 | 0.479 */

/* */

/*****************************************************************************/

71

72 APPENDIX A. SOURCE CODE

‘timescale 1 ns / 1 ps

module latch( out, data, reset, enable );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

output [NBITS-1:0] out;

input [NBITS-1:0] data;

input reset, enable;

wire [NBITS-1:0] out = ({NBITS{~reset}} & (data | {NBITS{~enable}})) &

((data & {NBITS{enable}}) | out);

endmodule x latch.v

/*****************************************************************************/

/* */

/* Module N-Bit Latch */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

module x_latch( out, data, reset, enable );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

parameter LOGIC = 0.479;

output [NBITS-1:0] out;

input [NBITS-1:0] data;

input reset, enable;

reg [NBITS-1:0] out;

always @( posedge reset or posedge enable )

begin

if( reset )

out <= #LOGIC 0;

else out <= #LOGIC data;

end

endmodule 73

A.1. VERILOG FILES

data(7:0) data<7> out<7>

enable enable out(7:0)

data<6> out<6>

enable

data<5> out<5>

enable

data<4> out<4>

enable

data<3> out<3>

enable

data<2> out<2>

enable

data<1> out<1>

enable

data<0> out<0>

enable

Figure A.1: Wiring of 8-bit D-type latch.

74 APPENDIX A. SOURCE CODE

ms ff.v

/*****************************************************************************/

/* */

/* Module N-Bit Master-Slave Flip-Flop */

/* */

/* Size (bits) | 8 | 16 | 32 | 64 | 128 */

/* ------------+--------+--------+--------+--------+-------- */

/* Logic Level | 2 | 2 | 2 | 2 | 2 */

/* ------------+--------+--------+--------+--------+-------- */

/* Delay (ns) | 1.058 | 1.058 | 1.058 | 1.058 | 1.058 */

/* ============+========+========+========+========+======== */

/* Used Delay | 0.958 | 0.958 | 0.958 | 0.958 | 0.958 */

/* */

/* Comments: */

/* */

/* Route delay (0.1ns) omitted to achieve 1.0ns clock period. */

/* */

/* Synthesis: */

/* */

/* Timing constraint: Default path analysis */

/* Delay: 1.058ns (Levels of Logic = 2) */

/* Source: reset (PAD) */

/* Destination: out<63> (PAD) */

/* */

/* Data Path: reset to out<63> */

/* Gate Net */

/* Cell:in->out fanout Delay Delay Logical Name (Net Name) */

/* ---------------------------------------- ------------ */

/* LUT4_L:I0->LO 2 0.479 0.100 w<2>1 (w<2>) */

/* LUT4_D:I3->O 1 0.479 0.000 out<2>1 (out<2>) */

/* ---------------------------------------- */

/* Total 1.058ns (0.958ns logic, 0.100ns route) */

/* (90.5% logic, 9.5% route) */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

module ms_ff( out, data, reset, clock );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

output [NBITS-1:0] out;

input [NBITS-1:0] data; 75

A.1. VERILOG FILES

input reset, clock;

wire [NBITS-1:0] out, w;

assign w = ({NBITS{~reset}} & (data | {NBITS{~clock}})) &

((data & {NBITS{clock}}) | w);

assign out = ({NBITS{~reset}} & (w | {NBITS{clock}})) &

((w & {NBITS{~clock}}) | out);

endmodule x ms ff.v

/*****************************************************************************/

/* */

/* Module N-Bit Master-Slave Flip-Flop */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

module x_ms_ff( out, data, reset, clock );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

parameter RESET = 0.479;

parameter LOGIC = 0.958;

output [NBITS-1:0] out;

input [NBITS-1:0] data;

input reset, clock;

reg [NBITS-1:0] out;

always @( posedge reset or posedge clock )

begin

if( reset )

out <= #RESET 0;

else out <= #LOGIC data;

end

endmodule

76 APPENDIX A. SOURCE CODE

data(15:0) data<15> w<15> w<15> out<15>

clock clock clock out(15:0)

data<14> w<14> w<14> out<14>

clock clock

data<13> w<13> w<13> out<13>

clock clock

data<12> w<12> w<12> out<12>

clock clock

data<11> w<11> w<11> out<11>

clock clock

data<10> w<10> w<10> out<10>

clock clock

data<9> w<9> w<9> out<9>

clock clock

data<8> w<8> w<8> out<8>

clock clock

data<7> w<7> w<7> out<7>

clock clock

data<6> w<6> w<6> out<6>

clock clock

data<5> w<5> w<5> out<5>

clock clock

data<4> w<4> w<4> out<4>

clock clock

data<3> w<3> w<3> out<3>

clock clock

data<2> w<2> w<2> out<2>

clock clock

data<1> w<1> w<1> out<1>

clock clock

data<0> w<0> w<0> out<0>

clock clock

Figure A.2: Wiring of 8-bit MS flip-flop. 77

A.1. VERILOG FILES mux 2 1.v

/*****************************************************************************/

/* */

/* Module N-Bit 2-to-1 Multiplexor */

/* */

/* Size (bits) | 8 | 16 | 32 | 64 | 128 */

/* ------------+--------+--------+--------+--------+-------- */

/* Logic Level | 1 | 1 | 1 | 1 | 1 */

/* ------------+--------+--------+--------+--------+-------- */

/* Delay (ns) | 0.479 | 0.479 | 0.479 | 0.479 | 0.479 */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

module mux_2_1( out, in0, in1, sel );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

output [NBITS-1:0] out;

input [NBITS-1:0] in0, in1;

input sel;

wire [NBITS-1:0] out = (in0 & {NBITS{~sel}}) |

(in1 & {NBITS{ sel}});

endmodule x mux 2 1.v

/*****************************************************************************/

/* */

/* Module N-Bit 2-to-1 Multiplexor */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

module x_mux_2_1( out, in0, in1, sel );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

parameter LOGIC = 0.479;

output [NBITS-1:0] out;

78 APPENDIX A. SOURCE CODE

input [NBITS-1:0] in0, in1;

input sel;

reg [NBITS-1:0] out;

always @( * )

begin

case( sel )

1’b0: out <= #LOGIC in0;

1’b1: out <= #LOGIC in1;

endcase

end

endmodule mux 3 1.v

/*****************************************************************************/

/* */

/* Module N-Bit 3-to-1 Multiplexor */

/* */

/* Size (bits) | 8 | 16 | 32 | 64 | 128 */

/* ------------+--------+--------+--------+--------+-------- */

/* Logic Level | 2 | 2 | 2 | 2 | 2 */

/* ------------+--------+--------+--------+--------+-------- */

/* Delay (ns) | 0.793 | 0.793 | 0.793 | 0.793 | 0.793 */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

module mux_3_1( out, in0, in1, in2, sel );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

output [NBITS-1:0] out;

input [NBITS-1:0] in0, in1, in2;

input [1:0] sel;

wire [NBITS-1:0] out = (in0 & {NBITS{~sel[1] & ~sel[0]}}) |

(in1 & {NBITS{~sel[1] & sel[0]}}) |

(in2 & {NBITS{ sel[1] & ~sel[0]}}) |

(in0 & {NBITS{ sel[1] & sel[0]}});

endmodule 79

A.1. VERILOG FILES

in0(7:0) out<7>

in0<7>

in1(7:0) in1<7> out(7:0)

sel sel

in0<6> out<6>

in1<6>

sel

in0<5> out<5>

in1<5>

sel

in0<4> out<4>

in1<4>

sel

in0<3> out<3>

in1<3>

sel

in0<2> out<2>

in1<2>

sel

in0<1> out<1>

in1<1>

sel

in0<0> out<0>

in1<0>

sel

Figure A.3: Wiring of 8-bit 2-to-1 multiplexor.

80 APPENDIX A. SOURCE CODE

x mux 3 1.v

/*****************************************************************************/

/* */

/* Module N-Bit 3-to-1 Multiplexor */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

module x_mux_3_1( out, in0, in1, in2, sel );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

parameter LOGIC = 0.793;

output [NBITS-1:0] out;

input [NBITS-1:0] in0, in1, in2;

input [1:0] sel;

reg [NBITS-1:0] out;

always @( * )

begin

case( sel )

2’b00: out <= #LOGIC in0;

2’b01: out <= #LOGIC in1;

2’b10: out <= #LOGIC in2;

2’b11: out <= #LOGIC in0;

endcase

end

endmodule mux 4 1.v

/*****************************************************************************/

/* */

/* Module N-Bit 4-to-1 Multiplexor */

/* */

/* Size (bits) | 8 | 16 | 32 | 64 | 128 */

/* ------------+--------+--------+--------+--------+-------- */

/* Logic Level | 2 | 2 | 2 | 2 | 2 */

/* ------------+--------+--------+--------+--------+-------- */

/* Delay (ns) | 0.793 | 0.793 | 0.793 | 0.793 | 0.793 */

/* */

/*****************************************************************************/ 81

A.1. VERILOG FILES

module mux_4_1( out, in0, in1, in2, in3, sel );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

output [NBITS-1:0] out;

input [NBITS-1:0] in0, in1, in2, in3;

input [1:0] sel;

wire [NBITS-1:0] out = (in0 & {NBITS{~sel[1] & ~sel[0]}}) |

(in1 & {NBITS{~sel[1] & sel[0]}}) |

(in2 & {NBITS{ sel[1] & ~sel[0]}}) |

(in3 & {NBITS{ sel[1] & sel[0]}});

endmodule x mux 4 1.v

/*****************************************************************************/

/* */

/* Module N-Bit 4-to-1 Multiplexor */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

module x_mux_4_1( out, in0, in1, in2, in3, sel );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

parameter LOGIC = 0.793;

output [NBITS-1:0] out;

input [NBITS-1:0] in0, in1, in2, in3;

input [1:0] sel;

reg [NBITS-1:0] out;

always @( * )

begin

case( sel )

2’b00: out <= #LOGIC in0;

2’b01: out <= #LOGIC in1;

2’b10: out <= #LOGIC in2;

2’b11: out <= #LOGIC in3;

endcase

end

endmodule

82 APPENDIX A. SOURCE CODE

sel(1:0) sel<0> _n0011<17> out<1>

in0<1>

sel<1> AND2 in1<1>

in1(17:1) in2<1>

out<0>

out(17:0) _n0010<15>

_n0012<15>

_n0013<15>

in0<16> out<16>

sel<0> _n0010<17> in1<16>

sel<1> in0(8:1)

in2<16>

in0(16:0) in1(15:1)

in2(16:9) in2(15:1)

_n0010<17>

_n0011<17>

_n0012<17> in0<2> out<2>

_n0013<17> in1<2>

sel<0> _n0012<17> in2<2>

sel<1> in3(15:2) in3<2>

out<0>

_n0010<15>

_n0012<15>

_n0013<15>

in0<15> out<15>

in1<15>

AND2 in2<15>

_n0010<17>

_n0011<17>

_n0012<17>

_n0013<17> out<3>

in0<3>

in1<3>

in2<3>

in3<3>

out<0>

_n0010<15>

_n0012<15>

_n0013<15>

out<14>

in0<14>

in1<14>

in2<14>

_n0010<17>

_n0011<17>

_n0012<17>

_n0013<17> out<4>

in0<4>

in1<4>

in2<4>

in3<4>

out<0>

_n0010<15>

out<13>

in0<13> _n0012<15>

in1<13> _n0013<15>

in2<13>

_n0010<17>

_n0011<17>

_n0012<17>

_n0013<17> in0<5> out<5>

sel(1:0) sel<0> _n0010<15> in1<5>

sel<1> in2<5>

in3<5>

out<12>

in0<12> out<0>

in1<12> _n0010<15>

in2<12> _n0012<15>

_n0010<17> _n0013<15>

_n0011<17> out(15:0)

_n0012<17>

_n0013<17> in0<6> out<6>

in1<6>

in2<6>

in0<11> out<11> in3<6>

in1<11> out<0>

in2<11> _n0010<15>

_n0010<17> _n0012<15>

_n0011<17> _n0013<15>

_n0012<17>

_n0013<17> out<7>

in0<7>

in0<10> out<10> in1<7>

in1<10> in2<7>

in2<10> in3<7>

_n0010<17> out<0>

_n0011<17> _n0010<15>

_n0012<17> _n0012<15>

_n0013<17> _n0013<15>

in0<9> out<9>

in1<9> in0<8> out<8>

in2<9> in1<8>

_n0010<17> in2<8>

_n0011<17> in3<8>

_n0012<17> out<0>

_n0013<17> _n0010<15>

_n0012<15>

_n0013<15>

out<8>

in0<8>

in1<8>

_n0010<17> in1<9> out<9>

_n0011<17> in2<9>

_n0013<17> in3<9>

out<0>

_n0012<15>

_n0013<15>

out<7>

in0<7>

in1<7>

_n0010<17>

_n0011<17>

_n0013<17> in1<10> out<10>

in2<10>

in3<10>

out<0>

_n0012<15>

_n0013<15>

out<6>

in0<6>

in1<6>

_n0010<17>

_n0011<17>

_n0013<17> in1<11> out<11>

in2<11>

in3<11>

out<0>

_n0012<15>

_n0013<15>

out<5>

in0<5>

in1<5>

_n0010<17>

_n0011<17>

_n0013<17> out<12>

in1<12>

in2<12>

in3<12>

out<0>

out<4>

in0<4> _n0012<15>

_n0013<15>

in1<4>

_n0010<17>

_n0011<17>

_n0013<17> out<13>

in1<13>

sel<0> _n0012<15> in2<13>

sel<1> in3<13>

out<3>

in0<3> out<0>

in1<3> _n0012<15>

_n0010<17> _n0013<15>

_n0011<17>

_n0013<17> AND2 in1<14> out<14>

in2<14>

out<2>

in0<2> in3<14>

in1<2> out<0>

_n0010<17> _n0012<15>

_n0011<17> _n0013<15>

_n0013<17> in1<15> out<15>

in0<1> out<1> in2<15>

in1<1> in3<15>

_n0010<17> out<0>

_n0011<17> _n0012<15>

_n0012<17> _n0013<15>

_n0013<17>

in0<0> out<0>

_n0010<17> sel<0> out<0>

_n0013<17> sel<1>

Figure A.4: Wiring of 8-bit 3-to-1 and 4-to-1 multiplexor. 83

A.1. VERILOG FILES adder.v

/*****************************************************************************/

/* */

/* Module N-Bit Carry-Lookahead Adder */

/* */

/* Size (bits) | 8 | 16 | 32 | 64 | 128 */

/* ------------+--------+--------+--------+--------+-------- */

/* Logic Level | 4 | 5 | 6 | 7 | 11 */

/* ------------+--------+--------+--------+--------+-------- */

/* Delay (ns) | 2.944 | 4.448 | 5.368 | 6.767 | 10.070 */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

module adder( sum, cout, a, b, cin );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

output [NBITS-1:0] sum;

output cout;

input [NBITS-1:0] a, b;

input cin;

wire [NBITS-1:0] sum;

wire cout;

wire [NBITS-1:0] g, p, carry;

assign g = a & b;

assign p = a ^ b;

assign carry = g | (p & {carry[NBITS-2:0], cin});

assign sum = p ^ {carry[NBITS-2:0], cin};

assign cout = carry[NBITS-1];

endmodule x adder.v

/*****************************************************************************/

/* */

/* Module N-Bit Adder */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

84 APPENDIX A. SOURCE CODE

module x_adder( sum, cout, a, b, cin );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

parameter ADDER = 6.767;

output [NBITS-1:0] sum;

output cout;

input [NBITS-1:0] a, b;

input cin;

reg [NBITS-1:0] sum;

reg cout;

always @( * )

begin

{cout, sum} <= #ADDER a + b + cin;

end

endmodule lt comp.v

/*****************************************************************************/

/* */

/* Module N-Bit Carry-Lookahead Less-Than Comparator */

/* */

/* Size (bits) | 8 | 16 | 32 | 64 | 128 */

/* ------------+--------+--------+--------+--------+-------- */

/* Logic Level | 4 | 5 | 6 | 7 | 8 */

/* ------------+--------+--------+--------+--------+-------- */

/* Delay (ns) | 2.496 | 3.075 | 4.019 | 4.963 | 5.682 */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

module lt_comp( less, a, b );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

output less;

input [NBITS-1:0] a, b;

wire less;

wire [NBITS-1:0] carry; 85

A.1. VERILOG FILES

a(7:0) carry<6>

a<6> I0 O cout

a<7> cout

b(7:0) b<6> I1 b<7>

carry<5> carry<6>

p<6> p<7>

a<5> carry<5> I0 O

b<5> I1 sum(7:0)

carry<4>

p<5> carry<4>

a<4> I0 O

b<4> I1

carry<3>

p<4> carry<3>

a<3> I0 O

b<3> I1

carry<2>

p<3> carry<2>

a<2> I0 O

b<2> I1

carry<1>

p<2> carry<1>

a<1> I0 O

b<1> I1

carry<0>

p<1>

a<0> I0 O

carry<0>

b<0> I1

p<0>

a(7)...b(0) Data<15:0> Result<7:0> INV

Figure A.5: Wiring of 8-bit carry-lookahead adder.

86 APPENDIX A. SOURCE CODE

supply1 VCC;

assign carry = (a & ~b) | ((a ^ ~b) & {carry[NBITS-2:0], VCC});

assign less = ~carry[NBITS-1];

endmodule x lt comp.v

/*****************************************************************************/

/* */

/* Module N-Bit Less-Than Comparator */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

module x_lt_comp( less, a, b );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

parameter CARRY = 4.963;

output less;

input [NBITS-1:0] a, b;

reg less;

always @( * )

begin

less <= #CARRY a < b;

end

endmodule 87

A.1. VERILOG FILES pri enc.v

/*****************************************************************************/

/* */

/* Module 8/.../128-Bit Priority Encoder */

/* */

/* Size (bits) | 8 | 16 | 32 | 64 | 128 */

/* ------------+--------+--------+--------+--------+-------- */

/* Logic Level | 2 | 3 | 3 | 4 | 5 */

/* ------------+--------+--------+--------+--------+-------- */

/* Delay (ns) | 1.198 | 2.454 | 2.902 | 4.218 | 5.323 */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

/*****************************************************************************/

/* */

/* Module 8-Bit Priority Encoder */

/* */

/*****************************************************************************/

module pri_enc_8( out, in );

output [7:0] out;

input [7:0] in;

wire [7:0] out, w;

supply1 VCC;

assign out = in & w;

assign w[7] = VCC; assign w[6] = ~in[7];

assign w[5] = ~|in[7:6]; assign w[4] = ~|in[7:5];

assign w[3] = ~|in[7:4]; assign w[2] = ~|in[7:3];

assign w[1] = ~|in[7:2]; assign w[0] = ~|in[7:1];

endmodule

/*****************************************************************************/

/* */

/* Module 16-Bit Priority Encoder */

/* */

/*****************************************************************************/

module pri_enc_16( out, in );

88 APPENDIX A. SOURCE CODE

output [15:0] out;

input [15:0] in;

wire [15:0] out, w;

supply1 VCC;

assign out = in & w;

assign w[15] = VCC; assign w[14] = ~in[15];

assign w[13] = ~|in[15:14]; assign w[12] = ~|in[15:13];

assign w[11] = ~|in[15:12]; assign w[10] = ~|in[15:11];

assign w[09] = ~|in[15:10]; assign w[08] = ~|in[15:09];

assign w[07] = ~|in[15:08]; assign w[06] = ~|in[15:07];

assign w[05] = ~|in[15:06]; assign w[04] = ~|in[15:05];

assign w[03] = ~|in[15:04]; assign w[02] = ~|in[15:03];

assign w[01] = ~|in[15:02]; assign w[00] = ~|in[15:01];

endmodule

/*****************************************************************************/

/* */

/* Module 32-Bit Priority Encoder */

/* */

/*****************************************************************************/

module pri_enc_32( out, in );

output [31:0] out;

input [31:0] in;

wire [31:0] out, w;

supply1 VCC;

assign out = in & w;

assign w[31] = VCC; assign w[30] = ~in[31];

assign w[29] = ~|in[31:30]; assign w[28] = ~|in[31:29];

assign w[27] = ~|in[31:28]; assign w[26] = ~|in[31:27];

assign w[25] = ~|in[31:26]; assign w[24] = ~|in[31:25];

assign w[23] = ~|in[31:24]; assign w[22] = ~|in[31:23];

assign w[21] = ~|in[31:22]; assign w[20] = ~|in[31:21];

assign w[19] = ~|in[31:20]; assign w[18] = ~|in[31:19];

assign w[17] = ~|in[31:18]; assign w[16] = ~|in[31:17];

assign w[15] = ~|in[31:16]; assign w[14] = ~|in[31:15];

assign w[13] = ~|in[31:14]; assign w[12] = ~|in[31:13];

assign w[11] = ~|in[31:12]; assign w[10] = ~|in[31:11];

assign w[09] = ~|in[31:10]; assign w[08] = ~|in[31:09];

assign w[07] = ~|in[31:08]; assign w[06] = ~|in[31:07];

assign w[05] = ~|in[31:06]; assign w[04] = ~|in[31:05]; 89

A.1. VERILOG FILES

assign w[03] = ~|in[31:04]; assign w[02] = ~|in[31:03];

assign w[01] = ~|in[31:02]; assign w[00] = ~|in[31:01];

endmodule

/*****************************************************************************/

/* */

/* Module 64-Bit Priority Encoder */

/* */

/*****************************************************************************/

module pri_enc_64( out, in );

output [63:0] out;

input [63:0] in;

wire [63:0] out, w;

supply1 VCC;

assign out = in & w;

assign w[63] = VCC; assign w[62] = ~in[63];

assign w[61] = ~|in[63:62]; assign w[60] = ~|in[63:61];

assign w[59] = ~|in[63:60]; assign w[58] = ~|in[63:59];

assign w[57] = ~|in[63:58]; assign w[56] = ~|in[63:57];

assign w[55] = ~|in[63:56]; assign w[54] = ~|in[63:55];

assign w[53] = ~|in[63:54]; assign w[52] = ~|in[63:53];

assign w[51] = ~|in[63:52]; assign w[50] = ~|in[63:51];

assign w[49] = ~|in[63:50]; assign w[48] = ~|in[63:49];

assign w[47] = ~|in[63:48]; assign w[46] = ~|in[63:47];

assign w[45] = ~|in[63:46]; assign w[44] = ~|in[63:45];

assign w[43] = ~|in[63:44]; assign w[42] = ~|in[63:43];

assign w[41] = ~|in[63:42]; assign w[40] = ~|in[63:41];

assign w[39] = ~|in[63:40]; assign w[38] = ~|in[63:39];

assign w[37] = ~|in[63:38]; assign w[36] = ~|in[63:37];

assign w[35] = ~|in[63:36]; assign w[34] = ~|in[63:35];

assign w[33] = ~|in[63:34]; assign w[32] = ~|in[63:33];

assign w[31] = ~|in[63:32]; assign w[30] = ~|in[63:31];

assign w[29] = ~|in[63:30]; assign w[28] = ~|in[63:29];

assign w[27] = ~|in[63:28]; assign w[26] = ~|in[63:27];

assign w[25] = ~|in[63:26]; assign w[24] = ~|in[63:25];

assign w[23] = ~|in[63:24]; assign w[22] = ~|in[63:23];

assign w[21] = ~|in[63:22]; assign w[20] = ~|in[63:21];

assign w[19] = ~|in[63:20]; assign w[18] = ~|in[63:19];

assign w[17] = ~|in[63:18]; assign w[16] = ~|in[63:17];

assign w[15] = ~|in[63:16]; assign w[14] = ~|in[63:15];

assign w[13] = ~|in[63:14]; assign w[12] = ~|in[63:13];

90 APPENDIX A. SOURCE CODE

assign w[11] = ~|in[63:12]; assign w[10] = ~|in[63:11];

assign w[09] = ~|in[63:10]; assign w[08] = ~|in[63:09];

assign w[07] = ~|in[63:08]; assign w[06] = ~|in[63:07];

assign w[05] = ~|in[63:06]; assign w[04] = ~|in[63:05];

assign w[03] = ~|in[63:04]; assign w[02] = ~|in[63:03];

assign w[01] = ~|in[63:02]; assign w[00] = ~|in[63:01];

endmodule

/*****************************************************************************/

/* */

/* Module 128-Bit Priority Encoder */

/* */

/*****************************************************************************/

module pri_enc_128( out, in );

output [127:0] out;

input [127:0] in;

wire [127:0] out, w;

supply1 VCC;

assign out = in & w;

assign w[127] = VCC; assign w[126] = ~in[127];

assign w[125] = ~|in[127:126]; assign w[124] = ~|in[127:125];

assign w[123] = ~|in[127:124]; assign w[122] = ~|in[127:123];

assign w[121] = ~|in[127:122]; assign w[120] = ~|in[127:121];

assign w[119] = ~|in[127:120]; assign w[118] = ~|in[127:119];

assign w[117] = ~|in[127:118]; assign w[116] = ~|in[127:117];

assign w[115] = ~|in[127:116]; assign w[114] = ~|in[127:115];

assign w[113] = ~|in[127:114]; assign w[112] = ~|in[127:113];

assign w[111] = ~|in[127:112]; assign w[110] = ~|in[127:111];

assign w[109] = ~|in[127:110]; assign w[108] = ~|in[127:109];

assign w[107] = ~|in[127:108]; assign w[106] = ~|in[127:107];

assign w[105] = ~|in[127:106]; assign w[104] = ~|in[127:105];

assign w[103] = ~|in[127:104]; assign w[102] = ~|in[127:103];

assign w[101] = ~|in[127:102]; assign w[100] = ~|in[127:101];

assign w[099] = ~|in[127:100]; assign w[098] = ~|in[127:099];

assign w[097] = ~|in[127:098]; assign w[096] = ~|in[127:097];

assign w[095] = ~|in[127:096]; assign w[094] = ~|in[127:095];

assign w[093] = ~|in[127:094]; assign w[092] = ~|in[127:093];

assign w[091] = ~|in[127:092]; assign w[090] = ~|in[127:091];

assign w[089] = ~|in[127:090]; assign w[088] = ~|in[127:089];

assign w[087] = ~|in[127:088]; assign w[086] = ~|in[127:087];

assign w[085] = ~|in[127:086]; assign w[084] = ~|in[127:085]; 91

A.1. VERILOG FILES

assign w[083] = ~|in[127:084]; assign w[082] = ~|in[127:083];

assign w[081] = ~|in[127:082]; assign w[080] = ~|in[127:081];

assign w[079] = ~|in[127:080]; assign w[078] = ~|in[127:079];

assign w[077] = ~|in[127:078]; assign w[076] = ~|in[127:077];

assign w[075] = ~|in[127:076]; assign w[074] = ~|in[127:075];

assign w[073] = ~|in[127:074]; assign w[072] = ~|in[127:073];

assign w[071] = ~|in[127:072]; assign w[070] = ~|in[127:071];

assign w[069] = ~|in[127:070]; assign w[068] = ~|in[127:069];

assign w[067] = ~|in[127:068]; assign w[066] = ~|in[127:067];

assign w[065] = ~|in[127:066]; assign w[064] = ~|in[127:065];

assign w[063] = ~|in[127:064]; assign w[062] = ~|in[127:063];

assign w[061] = ~|in[127:062]; assign w[060] = ~|in[127:061];

assign w[059] = ~|in[127:060]; assign w[058] = ~|in[127:059];

assign w[057] = ~|in[127:058]; assign w[056] = ~|in[127:057];

assign w[055] = ~|in[127:056]; assign w[054] = ~|in[127:055];

assign w[053] = ~|in[127:054]; assign w[052] = ~|in[127:053];

assign w[051] = ~|in[127:052]; assign w[050] = ~|in[127:051];

assign w[049] = ~|in[127:050]; assign w[048] = ~|in[127:049];

assign w[047] = ~|in[127:048]; assign w[046] = ~|in[127:047];

assign w[045] = ~|in[127:046]; assign w[044] = ~|in[127:045];

assign w[043] = ~|in[127:044]; assign w[042] = ~|in[127:043];

assign w[041] = ~|in[127:042]; assign w[040] = ~|in[127:041];

assign w[039] = ~|in[127:040]; assign w[038] = ~|in[127:039];

assign w[037] = ~|in[127:038]; assign w[036] = ~|in[127:037];

assign w[035] = ~|in[127:036]; assign w[034] = ~|in[127:035];

assign w[033] = ~|in[127:034]; assign w[032] = ~|in[127:033];

assign w[031] = ~|in[127:032]; assign w[030] = ~|in[127:031];

assign w[029] = ~|in[127:030]; assign w[028] = ~|in[127:029];

assign w[027] = ~|in[127:028]; assign w[026] = ~|in[127:027];

assign w[025] = ~|in[127:026]; assign w[024] = ~|in[127:025];

assign w[023] = ~|in[127:024]; assign w[022] = ~|in[127:023];

assign w[021] = ~|in[127:022]; assign w[020] = ~|in[127:021];

assign w[019] = ~|in[127:020]; assign w[018] = ~|in[127:019];

assign w[017] = ~|in[127:018]; assign w[016] = ~|in[127:017];

assign w[015] = ~|in[127:016]; assign w[014] = ~|in[127:015];

assign w[013] = ~|in[127:014]; assign w[012] = ~|in[127:013];

assign w[011] = ~|in[127:012]; assign w[010] = ~|in[127:011];

assign w[009] = ~|in[127:010]; assign w[008] = ~|in[127:009];

assign w[007] = ~|in[127:008]; assign w[006] = ~|in[127:007];

assign w[005] = ~|in[127:006]; assign w[004] = ~|in[127:005];

assign w[003] = ~|in[127:004]; assign w[002] = ~|in[127:003];

assign w[001] = ~|in[127:002]; assign w[000] = ~|in[127:001];

endmodule

92 APPENDIX A. SOURCE CODE

x pri enc.v

/*****************************************************************************/

/* */

/* Module N-Bit Priority Encoder */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

module x_pri_enc( out, in );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

parameter LOGIC = 4.218;

output [NBITS-1:0] out;

input [NBITS-1:0] in;

reg [NBITS-1:0] out;

integer n, i;

always @( * )

begin

for( n = 0, i = NBITS; i > 0; i = i - 1 )

begin

if( n == 0 && in[i-1] )

n = i;

end

if( n )

out <= #LOGIC 1 << (n - 1);

else out <= #LOGIC 0;

end

endmodule A.1.

VERILOG

FILES

SHIFT

in<7> _old_n_2<3> _old_n_2<3>

_old_n_4<3> _old_n_4<3>

_old_n_6<3> _old_n_6<3>

_old_n_8<3>

in(7:0) in<6> _n0002 out(7:0)

I<31:0> O

_n0002 _n0003 _n0005 _n0007

in<7> GND logic

_n0004 _n0006 _n0008 S<4:0><7:0>

VCC

in<5> _n0003 in<3> _n0005 in<1> _n0007

_n0002 _old_n_3<2> _old_n_2<3> _old_n_4<0> _old_n_6<0>

_n0003 _old_n_3<2>

_old_n_4<2> _old_n_5<2>

_old_n_6<2> _old_n_6<2>

_old_n_8<2>

_n0002 _old_n_4<1> _old_n_6<1>

_n0004 _n0006 _n0007

_old_n_4<2> _old_n_6<2> _n0008

_old_n_4<3> _old_n_6<3> DataA<3:0> Result<3:0>

_old_n_3<2>

_old_n_4<1> _old_n_4<1>

_old_n_6<1>

_n0004

in<4> _n0004 _n0005

_old_n_2<3> in<2> _old_n_7<1>

_old_n_8<1>

_n0006 _n0006

_old_n_3<2> _old_n_4<0> _n0008

_n0002 _old_n_4<1>

_n0003 _old_n_4<3>

_n0002 _old_n_4<0> _old_n_5<2>

_n0003 _old_n_4<0>

_old_n_6<0> _old_n_6<0>

_old_n_8<0>

_n0005

_n0004 _n0005 _n0007

_n0006 _n0008

_old_n_4<2>

_old_n_5<2>

_n0005 _old_n_6<1>

_old_n_7<1> _n0008

in<0>

_n0007 _old_n_6<0>

_old_n_6<2>

_old_n_6<3>

_old_n_7<1>

_n0007

Figure A.6: Wiring of 8-bit priority encoder. 93

94 APPENDIX A. SOURCE CODE

log shl.v

/*****************************************************************************/

/* */

/* Module 8/.../128-Bit Logical Left-Shifter ( NO Selection Decoder ) */

/* */

/* Size (bits) | 8 | 16 | 32 | 64 | 128 */

/* ------------+--------+--------+--------+--------+-------- */

/* Logic Level | 2 | 3 | 4 | 4 | 5 */

/* ------------+--------+--------+--------+--------+-------- */

/* Delay (ns) | 1.198 | 1.917 | 2.496 | 2.636 | 3.215 */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

/*****************************************************************************/

/* */

/* Module 8-Bit Logical Left-Shifter */

/* */

/*****************************************************************************/

module log_shl_8( out, in, shs );

output [7:0] out;

input [7:0] in, shs;

wire [7:0] out, w;

assign out[7] = |(in[7:0] & w[7:0]); assign out[6] = |(in[6:0] & w[7:1]);

assign out[5] = |(in[5:0] & w[7:2]); assign out[4] = |(in[4:0] & w[7:3]);

assign out[3] = |(in[3:0] & w[7:4]); assign out[2] = |(in[2:0] & w[7:5]);

assign out[1] = |(in[1:0] & w[7:6]); assign out[0] = in[0] & w[7];

assign w[7] = shs[0]; assign w[6] = shs[1]; assign w[5] = shs[2];

assign w[4] = shs[3]; assign w[3] = shs[4]; assign w[2] = shs[5];

assign w[1] = shs[6]; assign w[0] = shs[7];

endmodule

/*****************************************************************************/

/* */

/* Module 16-Bit Logical Left-Shifter */

/* */

/*****************************************************************************/

module log_shl_16( out, in, shs ); 95

A.1. VERILOG FILES

output [15:0] out;

input [15:0] in, shs;

wire [15:0] out, w;

assign out[15] = |(in[15:00] & w[15:00]);

assign out[14] = |(in[14:00] & w[15:01]);

assign out[13] = |(in[13:00] & w[15:02]);

assign out[12] = |(in[12:00] & w[15:03]);

assign out[11] = |(in[11:00] & w[15:04]);

assign out[10] = |(in[10:00] & w[15:05]);

assign out[09] = |(in[09:00] & w[15:06]);

assign out[08] = |(in[08:00] & w[15:07]);

assign out[07] = |(in[07:00] & w[15:08]);

assign out[06] = |(in[06:00] & w[15:09]);

assign out[05] = |(in[05:00] & w[15:10]);

assign out[04] = |(in[04:00] & w[15:11]);

assign out[03] = |(in[03:00] & w[15:12]);

assign out[02] = |(in[02:00] & w[15:13]);

assign out[01] = |(in[01:00] & w[15:14]);

assign out[00] = in[00] & w[15];

assign w[15] = shs[00]; assign w[14] = shs[01]; assign w[13] = shs[02];

assign w[12] = shs[03]; assign w[11] = shs[04]; assign w[10] = shs[05];

assign w[09] = shs[06]; assign w[08] = shs[07]; assign w[07] = shs[08];

assign w[06] = shs[09]; assign w[05] = shs[10]; assign w[04] = shs[11];

assign w[03] = shs[12]; assign w[02] = shs[13]; assign w[01] = shs[14];

assign w[00] = shs[15];

endmodule

/*****************************************************************************/

/* */

/* Module 32-Bit Logical Left-Shifter */

/* */

/*****************************************************************************/

module log_shl_32( out, in, shs );

output [31:0] out;

input [31:0] in, shs;

wire [31:0] out, w;

assign out[31] = |(in[31:00] & w[31:00]);

assign out[30] = |(in[30:00] & w[31:01]);

assign out[29] = |(in[29:00] & w[31:02]);

assign out[28] = |(in[28:00] & w[31:03]);

assign out[27] = |(in[27:00] & w[31:04]);

96 APPENDIX A. SOURCE CODE

assign out[26] = |(in[26:00] & w[31:05]);

assign out[25] = |(in[25:00] & w[31:06]);

assign out[24] = |(in[24:00] & w[31:07]);

assign out[23] = |(in[23:00] & w[31:08]);

assign out[22] = |(in[22:00] & w[31:09]);

assign out[21] = |(in[21:00] & w[31:10]);

assign out[20] = |(in[20:00] & w[31:11]);

assign out[19] = |(in[19:00] & w[31:12]);

assign out[18] = |(in[18:00] & w[31:13]);

assign out[17] = |(in[17:00] & w[31:14]);

assign out[16] = |(in[16:00] & w[31:15]);

assign out[15] = |(in[15:00] & w[31:16]);

assign out[14] = |(in[14:00] & w[31:17]);

assign out[13] = |(in[13:00] & w[31:18]);

assign out[12] = |(in[12:00] & w[31:19]);

assign out[11] = |(in[11:00] & w[31:20]);

assign out[10] = |(in[10:00] & w[31:21]);

assign out[09] = |(in[09:00] & w[31:22]);

assign out[08] = |(in[08:00] & w[31:23]);

assign out[07] = |(in[07:00] & w[31:24]);

assign out[06] = |(in[06:00] & w[31:25]);

assign out[05] = |(in[05:00] & w[31:26]);

assign out[04] = |(in[04:00] & w[31:27]);

assign out[03] = |(in[03:00] & w[31:28]);

assign out[02] = |(in[02:00] & w[31:29]);

assign out[01] = |(in[01:00] & w[31:30]);

assign out[00] = in[00] & w[31];

assign w[31] = shs[00]; assign w[30] = shs[01]; assign w[29] = shs[02];

assign w[28] = shs[03]; assign w[27] = shs[04]; assign w[26] = shs[05];

assign w[25] = shs[06]; assign w[24] = shs[07]; assign w[23] = shs[08];

assign w[22] = shs[09]; assign w[21] = shs[10]; assign w[20] = shs[11];

assign w[19] = shs[12]; assign w[18] = shs[13]; assign w[17] = shs[14];

assign w[16] = shs[15]; assign w[15] = shs[16]; assign w[14] = shs[17];

assign w[13] = shs[18]; assign w[12] = shs[19]; assign w[11] = shs[20];

assign w[10] = shs[21]; assign w[09] = shs[22]; assign w[08] = shs[23];

assign w[07] = shs[24]; assign w[06] = shs[25]; assign w[05] = shs[26];

assign w[04] = shs[27]; assign w[03] = shs[28]; assign w[02] = shs[29];

assign w[01] = shs[30]; assign w[00] = shs[31];

endmodule

/*****************************************************************************/

/* */

/* Module 64-Bit Logical Left-Shifter */

/* */

/*****************************************************************************/ 97

A.1. VERILOG FILES

module log_shl_64( out, in, shs );

output [63:0] out;

input [63:0] in, shs;

wire [63:0] out, w;

assign out[63] = |(in[63:00] & w[63:00]);

assign out[62] = |(in[62:00] & w[63:01]);

assign out[61] = |(in[61:00] & w[63:02]);

assign out[60] = |(in[60:00] & w[63:03]);

assign out[59] = |(in[59:00] & w[63:04]);

assign out[58] = |(in[58:00] & w[63:05]);

assign out[57] = |(in[57:00] & w[63:06]);

assign out[56] = |(in[56:00] & w[63:07]);

assign out[55] = |(in[55:00] & w[63:08]);

assign out[54] = |(in[54:00] & w[63:09]);

assign out[53] = |(in[53:00] & w[63:10]);

assign out[52] = |(in[52:00] & w[63:11]);

assign out[51] = |(in[51:00] & w[63:12]);

assign out[50] = |(in[50:00] & w[63:13]);

assign out[49] = |(in[49:00] & w[63:14]);

assign out[48] = |(in[48:00] & w[63:15]);

assign out[47] = |(in[47:00] & w[63:16]);

assign out[46] = |(in[46:00] & w[63:17]);

assign out[45] = |(in[45:00] & w[63:18]);

assign out[44] = |(in[44:00] & w[63:19]);

assign out[43] = |(in[43:00] & w[63:20]);

assign out[42] = |(in[42:00] & w[63:21]);

assign out[41] = |(in[41:00] & w[63:22]);

assign out[40] = |(in[40:00] & w[63:23]);

assign out[39] = |(in[39:00] & w[63:24]);

assign out[38] = |(in[38:00] & w[63:25]);

assign out[37] = |(in[37:00] & w[63:26]);

assign out[36] = |(in[36:00] & w[63:27]);

assign out[35] = |(in[35:00] & w[63:28]);

assign out[34] = |(in[34:00] & w[63:29]);

assign out[33] = |(in[33:00] & w[63:30]);

assign out[32] = |(in[32:00] & w[63:31]);

assign out[31] = |(in[31:00] & w[63:32]);

assign out[30] = |(in[30:00] & w[63:33]);

assign out[29] = |(in[29:00] & w[63:34]);

assign out[28] = |(in[28:00] & w[63:35]);

assign out[27] = |(in[27:00] & w[63:36]);

assign out[26] = |(in[26:00] & w[63:37]);

assign out[25] = |(in[25:00] & w[63:38]);

assign out[24] = |(in[24:00] & w[63:39]);

98 APPENDIX A. SOURCE CODE

assign out[23] = |(in[23:00] & w[63:40]);

assign out[22] = |(in[22:00] & w[63:41]);

assign out[21] = |(in[21:00] & w[63:42]);

assign out[20] = |(in[20:00] & w[63:43]);

assign out[19] = |(in[19:00] & w[63:44]);

assign out[18] = |(in[18:00] & w[63:45]);

assign out[17] = |(in[17:00] & w[63:46]);

assign out[16] = |(in[16:00] & w[63:47]);

assign out[15] = |(in[15:00] & w[63:48]);

assign out[14] = |(in[14:00] & w[63:49]);

assign out[13] = |(in[13:00] & w[63:50]);

assign out[12] = |(in[12:00] & w[63:51]);

assign out[11] = |(in[11:00] & w[63:52]);

assign out[10] = |(in[10:00] & w[63:53]);

assign out[09] = |(in[09:00] & w[63:54]);

assign out[08] = |(in[08:00] & w[63:55]);

assign out[07] = |(in[07:00] & w[63:56]);

assign out[06] = |(in[06:00] & w[63:57]);

assign out[05] = |(in[05:00] & w[63:58]);

assign out[04] = |(in[04:00] & w[63:59]);

assign out[03] = |(in[03:00] & w[63:60]);

assign out[02] = |(in[02:00] & w[63:61]);

assign out[01] = |(in[01:00] & w[63:62]);

assign out[00] = in[00] & w[63];

assign w[63] = shs[00]; assign w[62] = shs[01]; assign w[61] = shs[02];

assign w[60] = shs[03]; assign w[59] = shs[04]; assign w[58] = shs[05];

assign w[57] = shs[06]; assign w[56] = shs[07]; assign w[55] = shs[08];

assign w[54] = shs[09]; assign w[53] = shs[10]; assign w[52] = shs[11];

assign w[51] = shs[12]; assign w[50] = shs[13]; assign w[49] = shs[14];

assign w[48] = shs[15]; assign w[47] = shs[16]; assign w[46] = shs[17];

assign w[45] = shs[18]; assign w[44] = shs[19]; assign w[43] = shs[20];

assign w[42] = shs[21]; assign w[41] = shs[22]; assign w[40] = shs[23];

assign w[39] = shs[24]; assign w[38] = shs[25]; assign w[37] = shs[26];

assign w[36] = shs[27]; assign w[35] = shs[28]; assign w[34] = shs[29];

assign w[33] = shs[30]; assign w[32] = shs[31]; assign w[31] = shs[32];

assign w[30] = shs[33]; assign w[29] = shs[34]; assign w[28] = shs[35];

assign w[27] = shs[36]; assign w[26] = shs[37]; assign w[25] = shs[38];

assign w[24] = shs[39]; assign w[23] = shs[40]; assign w[22] = shs[41];

assign w[21] = shs[42]; assign w[20] = shs[43]; assign w[19] = shs[44];

assign w[18] = shs[45]; assign w[17] = shs[46]; assign w[16] = shs[47];

assign w[15] = shs[48]; assign w[14] = shs[49]; assign w[13] = shs[50];

assign w[12] = shs[51]; assign w[11] = shs[52]; assign w[10] = shs[53];

assign w[09] = shs[54]; assign w[08] = shs[55]; assign w[07] = shs[56];

assign w[06] = shs[57]; assign w[05] = shs[58]; assign w[04] = shs[59];

assign w[03] = shs[60]; assign w[02] = shs[61]; assign w[01] = shs[62]; 99

A.1. VERILOG FILES

assign w[00] = shs[63];

endmodule

/*****************************************************************************/

/* */

/* Module 128-Bit Logical Left-Shifter */

/* */

/*****************************************************************************/

module log_shl_128( out, in, shs );

output [127:0] out;

input [127:0] in, shs;

wire [127:0] out, w;

assign out[127] = |(in[127:000] & w[127:000]);

assign out[126] = |(in[126:000] & w[127:001]);

assign out[125] = |(in[125:000] & w[127:002]);

assign out[124] = |(in[124:000] & w[127:003]);

assign out[123] = |(in[123:000] & w[127:004]);

assign out[122] = |(in[122:000] & w[127:005]);

assign out[121] = |(in[121:000] & w[127:006]);

assign out[120] = |(in[120:000] & w[127:007]);

assign out[119] = |(in[119:000] & w[127:008]);

assign out[118] = |(in[118:000] & w[127:009]);

assign out[117] = |(in[117:000] & w[127:010]);

assign out[116] = |(in[116:000] & w[127:011]);

assign out[115] = |(in[115:000] & w[127:012]);

assign out[114] = |(in[114:000] & w[127:013]);

assign out[113] = |(in[113:000] & w[127:014]);

assign out[112] = |(in[112:000] & w[127:015]);

assign out[111] = |(in[111:000] & w[127:016]);

assign out[110] = |(in[110:000] & w[127:017]);

assign out[109] = |(in[109:000] & w[127:018]);

assign out[108] = |(in[108:000] & w[127:019]);

assign out[107] = |(in[107:000] & w[127:020]);

assign out[106] = |(in[106:000] & w[127:021]);

assign out[105] = |(in[105:000] & w[127:022]);

assign out[104] = |(in[104:000] & w[127:023]);

assign out[103] = |(in[103:000] & w[127:024]);

assign out[102] = |(in[102:000] & w[127:025]);

assign out[101] = |(in[101:000] & w[127:026]);

assign out[100] = |(in[100:000] & w[127:027]);

assign out[099] = |(in[099:000] & w[127:028]);

assign out[098] = |(in[098:000] & w[127:029]);

100 APPENDIX A. SOURCE CODE

assign out[097] = |(in[097:000] & w[127:030]);

assign out[096] = |(in[096:000] & w[127:031]);

assign out[095] = |(in[095:000] & w[127:032]);

assign out[094] = |(in[094:000] & w[127:033]);

assign out[093] = |(in[093:000] & w[127:034]);

assign out[092] = |(in[092:000] & w[127:035]);

assign out[091] = |(in[091:000] & w[127:036]);

assign out[090] = |(in[090:000] & w[127:037]);

assign out[089] = |(in[089:000] & w[127:038]);

assign out[088] = |(in[088:000] & w[127:039]);

assign out[087] = |(in[087:000] & w[127:040]);

assign out[086] = |(in[086:000] & w[127:041]);

assign out[085] = |(in[085:000] & w[127:042]);

assign out[084] = |(in[084:000] & w[127:043]);

assign out[083] = |(in[083:000] & w[127:044]);

assign out[082] = |(in[082:000] & w[127:045]);

assign out[081] = |(in[081:000] & w[127:046]);

assign out[080] = |(in[080:000] & w[127:047]);

assign out[079] = |(in[079:000] & w[127:048]);

assign out[078] = |(in[078:000] & w[127:049]);

assign out[077] = |(in[077:000] & w[127:050]);

assign out[076] = |(in[076:000] & w[127:051]);

assign out[075] = |(in[075:000] & w[127:052]);

assign out[074] = |(in[074:000] & w[127:053]);

assign out[073] = |(in[073:000] & w[127:054]);

assign out[072] = |(in[072:000] & w[127:055]);

assign out[071] = |(in[071:000] & w[127:056]);

assign out[070] = |(in[070:000] & w[127:057]);

assign out[069] = |(in[069:000] & w[127:058]);

assign out[068] = |(in[068:000] & w[127:059]);

assign out[067] = |(in[067:000] & w[127:060]);

assign out[066] = |(in[066:000] & w[127:061]);

assign out[065] = |(in[065:000] & w[127:062]);

assign out[064] = |(in[064:000] & w[127:063]);

assign out[063] = |(in[063:000] & w[127:064]);

assign out[062] = |(in[062:000] & w[127:065]);

assign out[061] = |(in[061:000] & w[127:066]);

assign out[060] = |(in[060:000] & w[127:067]);

assign out[059] = |(in[059:000] & w[127:068]);

assign out[058] = |(in[058:000] & w[127:069]);

assign out[057] = |(in[057:000] & w[127:070]);

assign out[056] = |(in[056:000] & w[127:071]);

assign out[055] = |(in[055:000] & w[127:072]);

assign out[054] = |(in[054:000] & w[127:073]);

assign out[053] = |(in[053:000] & w[127:074]);

assign out[052] = |(in[052:000] & w[127:075]); 101

A.1. VERILOG FILES

assign out[051] = |(in[051:000] & w[127:076]);

assign out[050] = |(in[050:000] & w[127:077]);

assign out[049] = |(in[049:000] & w[127:078]);

assign out[048] = |(in[048:000] & w[127:079]);

assign out[047] = |(in[047:000] & w[127:080]);

assign out[046] = |(in[046:000] & w[127:081]);

assign out[045] = |(in[045:000] & w[127:082]);

assign out[044] = |(in[044:000] & w[127:083]);

assign out[043] = |(in[043:000] & w[127:084]);

assign out[042] = |(in[042:000] & w[127:085]);

assign out[041] = |(in[041:000] & w[127:086]);

assign out[040] = |(in[040:000] & w[127:087]);

assign out[039] = |(in[039:000] & w[127:088]);

assign out[038] = |(in[038:000] & w[127:089]);

assign out[037] = |(in[037:000] & w[127:090]);

assign out[036] = |(in[036:000] & w[127:091]);

assign out[035] = |(in[035:000] & w[127:092]);

assign out[034] = |(in[034:000] & w[127:093]);

assign out[033] = |(in[033:000] & w[127:094]);

assign out[032] = |(in[032:000] & w[127:095]);

assign out[031] = |(in[031:000] & w[127:096]);

assign out[030] = |(in[030:000] & w[127:097]);

assign out[029] = |(in[029:000] & w[127:098]);

assign out[028] = |(in[028:000] & w[127:099]);

assign out[027] = |(in[027:000] & w[127:100]);

assign out[026] = |(in[026:000] & w[127:101]);

assign out[025] = |(in[025:000] & w[127:102]);

assign out[024] = |(in[024:000] & w[127:103]);

assign out[023] = |(in[023:000] & w[127:104]);

assign out[022] = |(in[022:000] & w[127:105]);

assign out[021] = |(in[021:000] & w[127:106]);

assign out[020] = |(in[020:000] & w[127:107]);

assign out[019] = |(in[019:000] & w[127:108]);

assign out[018] = |(in[018:000] & w[127:109]);

assign out[017] = |(in[017:000] & w[127:110]);

assign out[016] = |(in[016:000] & w[127:111]);

assign out[015] = |(in[015:000] & w[127:112]);

assign out[014] = |(in[014:000] & w[127:113]);

assign out[013] = |(in[013:000] & w[127:114]);

assign out[012] = |(in[012:000] & w[127:115]);

assign out[011] = |(in[011:000] & w[127:116]);

assign out[010] = |(in[010:000] & w[127:117]);

assign out[009] = |(in[009:000] & w[127:118]);

assign out[008] = |(in[008:000] & w[127:119]);

assign out[007] = |(in[007:000] & w[127:120]);

assign out[006] = |(in[006:000] & w[127:121]);

102 APPENDIX A. SOURCE CODE

assign out[005] = |(in[005:000] & w[127:122]);

assign out[004] = |(in[004:000] & w[127:123]);

assign out[003] = |(in[003:000] & w[127:124]);

assign out[002] = |(in[002:000] & w[127:125]);

assign out[001] = |(in[001:000] & w[127:126]);

assign out[000] = in[000] & w[127];

assign w[127] = shs[000]; assign w[126] = shs[001]; assign w[125] = shs[002];

assign w[124] = shs[003]; assign w[123] = shs[004]; assign w[122] = shs[005];

assign w[121] = shs[006]; assign w[120] = shs[007]; assign w[119] = shs[008];

assign w[118] = shs[009]; assign w[117] = shs[010]; assign w[116] = shs[011];

assign w[115] = shs[012]; assign w[114] = shs[013]; assign w[113] = shs[014];

assign w[112] = shs[015]; assign w[111] = shs[016]; assign w[110] = shs[017];

assign w[109] = shs[018]; assign w[108] = shs[019]; assign w[107] = shs[020];

assign w[106] = shs[021]; assign w[105] = shs[022]; assign w[104] = shs[023];

assign w[103] = shs[024]; assign w[102] = shs[025]; assign w[101] = shs[026];

assign w[100] = shs[027]; assign w[099] = shs[028]; assign w[098] = shs[029];

assign w[097] = shs[030]; assign w[096] = shs[031]; assign w[095] = shs[032];

assign w[094] = shs[033]; assign w[093] = shs[034]; assign w[092] = shs[035];

assign w[091] = shs[036]; assign w[090] = shs[037]; assign w[089] = shs[038];

assign w[088] = shs[039]; assign w[087] = shs[040]; assign w[086] = shs[041];

assign w[085] = shs[042]; assign w[084] = shs[043]; assign w[083] = shs[044];

assign w[082] = shs[045]; assign w[081] = shs[046]; assign w[080] = shs[047];

assign w[079] = shs[048]; assign w[078] = shs[049]; assign w[077] = shs[050];

assign w[076] = shs[051]; assign w[075] = shs[052]; assign w[074] = shs[053];

assign w[073] = shs[054]; assign w[072] = shs[055]; assign w[071] = shs[056];

assign w[070] = shs[057]; assign w[069] = shs[058]; assign w[068] = shs[059];

assign w[067] = shs[060]; assign w[066] = shs[061]; assign w[065] = shs[062];

assign w[064] = shs[063]; assign w[063] = shs[064]; assign w[062] = shs[065];

assign w[061] = shs[066]; assign w[060] = shs[067]; assign w[059] = shs[068];

assign w[058] = shs[069]; assign w[057] = shs[070]; assign w[056] = shs[071];

assign w[055] = shs[072]; assign w[054] = shs[073]; assign w[053] = shs[074];

assign w[052] = shs[075]; assign w[051] = shs[076]; assign w[050] = shs[077];

assign w[049] = shs[078]; assign w[048] = shs[079]; assign w[047] = shs[080];

assign w[046] = shs[081]; assign w[045] = shs[082]; assign w[044] = shs[083];

assign w[043] = shs[084]; assign w[042] = shs[085]; assign w[041] = shs[086];

assign w[040] = shs[087]; assign w[039] = shs[088]; assign w[038] = shs[089];

assign w[037] = shs[090]; assign w[036] = shs[091]; assign w[035] = shs[092];

assign w[034] = shs[093]; assign w[033] = shs[094]; assign w[032] = shs[095];

assign w[031] = shs[096]; assign w[030] = shs[097]; assign w[029] = shs[098];

assign w[028] = shs[099]; assign w[027] = shs[100]; assign w[026] = shs[101];

assign w[025] = shs[102]; assign w[024] = shs[103]; assign w[023] = shs[104];

assign w[022] = shs[105]; assign w[021] = shs[106]; assign w[020] = shs[107];

assign w[019] = shs[108]; assign w[018] = shs[109]; assign w[017] = shs[110];

assign w[016] = shs[111]; assign w[015] = shs[112]; assign w[014] = shs[113];

assign w[013] = shs[114]; assign w[012] = shs[115]; assign w[011] = shs[116]; 103

A.1. VERILOG FILES

assign w[010] = shs[117]; assign w[009] = shs[118]; assign w[008] = shs[119];

assign w[007] = shs[120]; assign w[006] = shs[121]; assign w[005] = shs[122];

assign w[004] = shs[123]; assign w[003] = shs[124]; assign w[002] = shs[125];

assign w[001] = shs[126]; assign w[000] = shs[127];

endmodule x log shl.v

/*****************************************************************************/

/* */

/* Module N-Bit Logical Left-Shifter */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

module x_log_shl( out, in, shs );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

parameter SHIFT = 2.636;

output [NBITS-1:0] out;

input [NBITS-1:0] in, shs;

reg [NBITS-1:0] out;

integer n, i;

always @( * )

begin

for( n = 0, i = NBITS; i > 0; i = i - 1 )

begin

if( n == 0 && shs[i-1] )

n = i;

end

if( n )

out <= #SHIFT in << (n - 1);

else out <= #SHIFT 0;

end

endmodule 104

_n0007 _old_n_19<2> _old_n_22<3>

_old_n_24<3> DataA<3:0> Result<3:0> _n0001<3> out<7>

_n0009 _n0006 _n0002<7> out(7:0)

_n0008 SHIFT

in(7:0) I<7:0> O _n0001<3> out<6>

logic

S<2:0><7:0> _n0002<6>

_old_n_22<2>

_old_n_24<2>

_n0006

_n0008 _n0001<3> out<5>

_n0002<5>

_old_n_23<1>

_old_n_24<1>

_n0008 _n0001<3> out<4>

_n0002<4>

shs(7:0) shs<6> _n0007 shs<7> _old_n_20<3> _old_n_20<3>

_old_n_22<3> _old_n_22<0>

_old_n_24<0>

shs<4> _n0003

shs<7> _n0003 _n0004 _n0006

shs<7> _n0001<3> out<3>

_n0007 _n0005 _n0008

_old_n_19<2> _n0002<3>

_n0009

_n0007

_n0009 _n0004 _n0006

shs<5> _n0009 shs<3> shs<1>

shs<7> _old_n_20<0> _old_n_22<0>

_old_n_19<2>

_old_n_20<2> _old_n_21<2>

_old_n_22<2> _n0001<3> out<2>

_n0007 _old_n_20<1> _old_n_22<1>

_n0003 _n0005 _n0002<2>

_old_n_20<2> _old_n_22<2>

_old_n_20<3> _old_n_22<3>

_old_n_19<2>

_old_n_20<1> _old_n_20<1>

_old_n_22<1> _n0001<3> out<1>

_n0003 _n0004 _n0002<1>

_n0005 _n0008

shs<2> shs<0>

_n0005

_old_n_20<0> _old_n_22<0>

_old_n_20<1> _old_n_22<2> _n0001<3> out<0>

_old_n_20<3> _old_n_22<3>

_n0003 _old_n_20<0> _n0002<0>

_old_n_21<2> _old_n_23<1> APPENDIX

_n0007 _old_n_22<0>

_old_n_20<0>

_n0004 _n0006

_n0009 _n0004

_n0005

_old_n_20<2>

_old_n_21<2>

_n0004 _old_n_22<1>

_old_n_23<1>

_n0006 A.

SOURCE

CODE

Figure A.7: Wiring of 8-bit logical left-shifter.

105

A.1. VERILOG FILES log shr.v

/*****************************************************************************/

/* */

/* Module 8/.../128-Bit Logical Right-Shifter ( NO Selection Decoder ) */

/* */

/* Size (bits) | 8 | 16 | 32 | 64 | 128 */

/* ------------+--------+--------+--------+--------+-------- */

/* Logic Level | 2 | 3 | 4 | 4 | 5 */

/* ------------+--------+--------+--------+--------+-------- */

/* Delay (ns) | 1.198 | 1.917 | 2.496 | 2.636 | 3.215 */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

/*****************************************************************************/

/* */

/* Module 8-Bit Logical Right-Shifter */

/* */

/*****************************************************************************/

module log_shr_8( out, in, shs );

output [0:7] out;

input [0:7] in, shs;

wire [0:7] out;

assign out[7] = |(in[0:7] & shs[0:7]); assign out[6] = |(in[0:6] & shs[1:7]);

assign out[5] = |(in[0:5] & shs[2:7]); assign out[4] = |(in[0:4] & shs[3:7]);

assign out[3] = |(in[0:3] & shs[4:7]); assign out[2] = |(in[0:2] & shs[5:7]);

assign out[1] = |(in[0:1] & shs[6:7]); assign out[0] = in[0] & shs[7];

endmodule

/*****************************************************************************/

/* */

/* Module 16-Bit Logical Right-Shifter */

/* */

/*****************************************************************************/

module log_shr_16( out, in, shs );

output [0:15] out;

input [0:15] in, shs;

wire [0:15] out;

106 APPENDIX A. SOURCE CODE

assign out[15] = |(in[00:15] & shs[00:15]);

assign out[14] = |(in[00:14] & shs[01:15]);

assign out[13] = |(in[00:13] & shs[02:15]);

assign out[12] = |(in[00:12] & shs[03:15]);

assign out[11] = |(in[00:11] & shs[04:15]);

assign out[10] = |(in[00:10] & shs[05:15]);

assign out[09] = |(in[00:09] & shs[06:15]);

assign out[08] = |(in[00:08] & shs[07:15]);

assign out[07] = |(in[00:07] & shs[08:15]);

assign out[06] = |(in[00:06] & shs[09:15]);

assign out[05] = |(in[00:05] & shs[10:15]);

assign out[04] = |(in[00:04] & shs[11:15]);

assign out[03] = |(in[00:03] & shs[12:15]);

assign out[02] = |(in[00:02] & shs[13:15]);

assign out[01] = |(in[00:01] & shs[14:15]);

assign out[00] = in[00] & shs[15];

endmodule

/*****************************************************************************/

/* */

/* Module 32-Bit Logical Right-Shifter */

/* */

/*****************************************************************************/

module log_shr_32( out, in, shs );

output [0:31] out;

input [0:31] in, shs;

wire [0:31] out;

assign out[31] = |(in[00:31] & shs[00:31]);

assign out[30] = |(in[00:30] & shs[01:31]);

assign out[29] = |(in[00:29] & shs[02:31]);

assign out[28] = |(in[00:28] & shs[03:31]);

assign out[27] = |(in[00:27] & shs[04:31]);

assign out[26] = |(in[00:26] & shs[05:31]);

assign out[25] = |(in[00:25] & shs[06:31]);

assign out[24] = |(in[00:24] & shs[07:31]);

assign out[23] = |(in[00:23] & shs[08:31]);

assign out[22] = |(in[00:22] & shs[09:31]);

assign out[21] = |(in[00:21] & shs[10:31]);

assign out[20] = |(in[00:20] & shs[11:31]);

assign out[19] = |(in[00:19] & shs[12:31]);

assign out[18] = |(in[00:18] & shs[13:31]);

assign out[17] = |(in[00:17] & shs[14:31]); 107

A.1. VERILOG FILES

assign out[16] = |(in[00:16] & shs[15:31]);

assign out[15] = |(in[00:15] & shs[16:31]);

assign out[14] = |(in[00:14] & shs[17:31]);

assign out[13] = |(in[00:13] & shs[18:31]);

assign out[12] = |(in[00:12] & shs[19:31]);

assign out[11] = |(in[00:11] & shs[20:31]);

assign out[10] = |(in[00:10] & shs[21:31]);

assign out[09] = |(in[00:09] & shs[22:31]);

assign out[08] = |(in[00:08] & shs[23:31]);

assign out[07] = |(in[00:07] & shs[24:31]);

assign out[06] = |(in[00:06] & shs[25:31]);

assign out[05] = |(in[00:05] & shs[26:31]);

assign out[04] = |(in[00:04] & shs[27:31]);

assign out[03] = |(in[00:03] & shs[28:31]);

assign out[02] = |(in[00:02] & shs[29:31]);

assign out[01] = |(in[00:01] & shs[30:31]);

assign out[00] = in[00] & shs[31];

endmodule

/*****************************************************************************/

/* */

/* Module 64-Bit Logical Right-Shifter */

/* */

/*****************************************************************************/

module log_shr_64( out, in, shs );

output [0:63] out;

input [0:63] in, shs;

wire [0:63] out;

assign out[63] = |(in[00:63] & shs[00:63]);

assign out[62] = |(in[00:62] & shs[01:63]);

assign out[61] = |(in[00:61] & shs[02:63]);

assign out[60] = |(in[00:60] & shs[03:63]);

assign out[59] = |(in[00:59] & shs[04:63]);

assign out[58] = |(in[00:58] & shs[05:63]);

assign out[57] = |(in[00:57] & shs[06:63]);

assign out[56] = |(in[00:56] & shs[07:63]);

assign out[55] = |(in[00:55] & shs[08:63]);

assign out[54] = |(in[00:54] & shs[09:63]);

assign out[53] = |(in[00:53] & shs[10:63]);

assign out[52] = |(in[00:52] & shs[11:63]);

assign out[51] = |(in[00:51] & shs[12:63]);

assign out[50] = |(in[00:50] & shs[13:63]);

108 APPENDIX A. SOURCE CODE

assign out[49] = |(in[00:49] & shs[14:63]);

assign out[48] = |(in[00:48] & shs[15:63]);

assign out[47] = |(in[00:47] & shs[16:63]);

assign out[46] = |(in[00:46] & shs[17:63]);

assign out[45] = |(in[00:45] & shs[18:63]);

assign out[44] = |(in[00:44] & shs[19:63]);

assign out[43] = |(in[00:43] & shs[20:63]);

assign out[42] = |(in[00:42] & shs[21:63]);

assign out[41] = |(in[00:41] & shs[22:63]);

assign out[40] = |(in[00:40] & shs[23:63]);

assign out[39] = |(in[00:39] & shs[24:63]);

assign out[38] = |(in[00:38] & shs[25:63]);

assign out[37] = |(in[00:37] & shs[26:63]);

assign out[36] = |(in[00:36] & shs[27:63]);

assign out[35] = |(in[00:35] & shs[28:63]);

assign out[34] = |(in[00:34] & shs[29:63]);

assign out[33] = |(in[00:33] & shs[30:63]);

assign out[32] = |(in[00:32] & shs[31:63]);

assign out[31] = |(in[00:31] & shs[32:63]);

assign out[30] = |(in[00:30] & shs[33:63]);

assign out[29] = |(in[00:29] & shs[34:63]);

assign out[28] = |(in[00:28] & shs[35:63]);

assign out[27] = |(in[00:27] & shs[36:63]);

assign out[26] = |(in[00:26] & shs[37:63]);

assign out[25] = |(in[00:25] & shs[38:63]);

assign out[24] = |(in[00:24] & shs[39:63]);

assign out[23] = |(in[00:23] & shs[40:63]);

assign out[22] = |(in[00:22] & shs[41:63]);

assign out[21] = |(in[00:21] & shs[42:63]);

assign out[20] = |(in[00:20] & shs[43:63]);

assign out[19] = |(in[00:19] & shs[44:63]);

assign out[18] = |(in[00:18] & shs[45:63]);

assign out[17] = |(in[00:17] & shs[46:63]);

assign out[16] = |(in[00:16] & shs[47:63]);

assign out[15] = |(in[00:15] & shs[48:63]);

assign out[14] = |(in[00:14] & shs[49:63]);

assign out[13] = |(in[00:13] & shs[50:63]);

assign out[12] = |(in[00:12] & shs[51:63]);

assign out[11] = |(in[00:11] & shs[52:63]);

assign out[10] = |(in[00:10] & shs[53:63]);

assign out[09] = |(in[00:09] & shs[54:63]);

assign out[08] = |(in[00:08] & shs[55:63]);

assign out[07] = |(in[00:07] & shs[56:63]);

assign out[06] = |(in[00:06] & shs[57:63]);

assign out[05] = |(in[00:05] & shs[58:63]);

assign out[04] = |(in[00:04] & shs[59:63]); 109

A.1. VERILOG FILES

assign out[03] = |(in[00:03] & shs[60:63]);

assign out[02] = |(in[00:02] & shs[61:63]);

assign out[01] = |(in[00:01] & shs[62:63]);

assign out[00] = in[00] & shs[63];

endmodule

/*****************************************************************************/

/* */

/* Module 128-Bit Logical Right-Shifter */

/* */

/*****************************************************************************/

module log_shr_128( out, in, shs );

output [0:127] out;

input [0:127] in, shs;

wire [0:127] out;

assign out[127] = |(in[000:127] & shs[000:127]);

assign out[126] = |(in[000:126] & shs[001:127]);

assign out[125] = |(in[000:125] & shs[002:127]);

assign out[124] = |(in[000:124] & shs[003:127]);

assign out[123] = |(in[000:123] & shs[004:127]);

assign out[122] = |(in[000:122] & shs[005:127]);

assign out[121] = |(in[000:121] & shs[006:127]);

assign out[120] = |(in[000:120] & shs[007:127]);

assign out[119] = |(in[000:119] & shs[008:127]);

assign out[118] = |(in[000:118] & shs[009:127]);

assign out[117] = |(in[000:117] & shs[010:127]);

assign out[116] = |(in[000:116] & shs[011:127]);

assign out[115] = |(in[000:115] & shs[012:127]);

assign out[114] = |(in[000:114] & shs[013:127]);

assign out[113] = |(in[000:113] & shs[014:127]);

assign out[112] = |(in[000:112] & shs[015:127]);

assign out[111] = |(in[000:111] & shs[016:127]);

assign out[110] = |(in[000:110] & shs[017:127]);

assign out[109] = |(in[000:109] & shs[018:127]);

assign out[108] = |(in[000:108] & shs[019:127]);

assign out[107] = |(in[000:107] & shs[020:127]);

assign out[106] = |(in[000:106] & shs[021:127]);

assign out[105] = |(in[000:105] & shs[022:127]);

assign out[104] = |(in[000:104] & shs[023:127]);

assign out[103] = |(in[000:103] & shs[024:127]);

assign out[102] = |(in[000:102] & shs[025:127]);

assign out[101] = |(in[000:101] & shs[026:127]);

110 APPENDIX A. SOURCE CODE

assign out[100] = |(in[000:100] & shs[027:127]);

assign out[099] = |(in[000:099] & shs[028:127]);

assign out[098] = |(in[000:098] & shs[029:127]);

assign out[097] = |(in[000:097] & shs[030:127]);

assign out[096] = |(in[000:096] & shs[031:127]);

assign out[095] = |(in[000:095] & shs[032:127]);

assign out[094] = |(in[000:094] & shs[033:127]);

assign out[093] = |(in[000:093] & shs[034:127]);

assign out[092] = |(in[000:092] & shs[035:127]);

assign out[091] = |(in[000:091] & shs[036:127]);

assign out[090] = |(in[000:090] & shs[037:127]);

assign out[089] = |(in[000:089] & shs[038:127]);

assign out[088] = |(in[000:088] & shs[039:127]);

assign out[087] = |(in[000:087] & shs[040:127]);

assign out[086] = |(in[000:086] & shs[041:127]);

assign out[085] = |(in[000:085] & shs[042:127]);

assign out[084] = |(in[000:084] & shs[043:127]);

assign out[083] = |(in[000:083] & shs[044:127]);

assign out[082] = |(in[000:082] & shs[045:127]);

assign out[081] = |(in[000:081] & shs[046:127]);

assign out[080] = |(in[000:080] & shs[047:127]);

assign out[079] = |(in[000:079] & shs[048:127]);

assign out[078] = |(in[000:078] & shs[049:127]);

assign out[077] = |(in[000:077] & shs[050:127]);

assign out[076] = |(in[000:076] & shs[051:127]);

assign out[075] = |(in[000:075] & shs[052:127]);

assign out[074] = |(in[000:074] & shs[053:127]);

assign out[073] = |(in[000:073] & shs[054:127]);

assign out[072] = |(in[000:072] & shs[055:127]);

assign out[071] = |(in[000:071] & shs[056:127]);

assign out[070] = |(in[000:070] & shs[057:127]);

assign out[069] = |(in[000:069] & shs[058:127]);

assign out[068] = |(in[000:068] & shs[059:127]);

assign out[067] = |(in[000:067] & shs[060:127]);

assign out[066] = |(in[000:066] & shs[061:127]);

assign out[065] = |(in[000:065] & shs[062:127]);

assign out[064] = |(in[000:064] & shs[063:127]);

assign out[063] = |(in[000:063] & shs[064:127]);

assign out[062] = |(in[000:062] & shs[065:127]);

assign out[061] = |(in[000:061] & shs[066:127]);

assign out[060] = |(in[000:060] & shs[067:127]);

assign out[059] = |(in[000:059] & shs[068:127]);

assign out[058] = |(in[000:058] & shs[069:127]);

assign out[057] = |(in[000:057] & shs[070:127]);

assign out[056] = |(in[000:056] & shs[071:127]);

assign out[055] = |(in[000:055] & shs[072:127]); 111

A.1. VERILOG FILES

assign out[054] = |(in[000:054] & shs[073:127]);

assign out[053] = |(in[000:053] & shs[074:127]);

assign out[052] = |(in[000:052] & shs[075:127]);

assign out[051] = |(in[000:051] & shs[076:127]);

assign out[050] = |(in[000:050] & shs[077:127]);

assign out[049] = |(in[000:049] & shs[078:127]);

assign out[048] = |(in[000:048] & shs[079:127]);

assign out[047] = |(in[000:047] & shs[080:127]);

assign out[046] = |(in[000:046] & shs[081:127]);

assign out[045] = |(in[000:045] & shs[082:127]);

assign out[044] = |(in[000:044] & shs[083:127]);

assign out[043] = |(in[000:043] & shs[084:127]);

assign out[042] = |(in[000:042] & shs[085:127]);

assign out[041] = |(in[000:041] & shs[086:127]);

assign out[040] = |(in[000:040] & shs[087:127]);

assign out[039] = |(in[000:039] & shs[088:127]);

assign out[038] = |(in[000:038] & shs[089:127]);

assign out[037] = |(in[000:037] & shs[090:127]);

assign out[036] = |(in[000:036] & shs[091:127]);

assign out[035] = |(in[000:035] & shs[092:127]);

assign out[034] = |(in[000:034] & shs[093:127]);

assign out[033] = |(in[000:033] & shs[094:127]);

assign out[032] = |(in[000:032] & shs[095:127]);

assign out[031] = |(in[000:031] & shs[096:127]);

assign out[030] = |(in[000:030] & shs[097:127]);

assign out[029] = |(in[000:029] & shs[098:127]);

assign out[028] = |(in[000:028] & shs[099:127]);

assign out[027] = |(in[000:027] & shs[100:127]);

assign out[026] = |(in[000:026] & shs[101:127]);

assign out[025] = |(in[000:025] & shs[102:127]);

assign out[024] = |(in[000:024] & shs[103:127]);

assign out[023] = |(in[000:023] & shs[104:127]);

assign out[022] = |(in[000:022] & shs[105:127]);

assign out[021] = |(in[000:021] & shs[106:127]);

assign out[020] = |(in[000:020] & shs[107:127]);

assign out[019] = |(in[000:019] & shs[108:127]);

assign out[018] = |(in[000:018] & shs[109:127]);

assign out[017] = |(in[000:017] & shs[110:127]);

assign out[016] = |(in[000:016] & shs[111:127]);

assign out[015] = |(in[000:015] & shs[112:127]);

assign out[014] = |(in[000:014] & shs[113:127]);

assign out[013] = |(in[000:013] & shs[114:127]);

assign out[012] = |(in[000:012] & shs[115:127]);

assign out[011] = |(in[000:011] & shs[116:127]);

assign out[010] = |(in[000:010] & shs[117:127]);

assign out[009] = |(in[000:009] & shs[118:127]);

112 APPENDIX A. SOURCE CODE

assign out[008] = |(in[000:008] & shs[119:127]);

assign out[007] = |(in[000:007] & shs[120:127]);

assign out[006] = |(in[000:006] & shs[121:127]);

assign out[005] = |(in[000:005] & shs[122:127]);

assign out[004] = |(in[000:004] & shs[123:127]);

assign out[003] = |(in[000:003] & shs[124:127]);

assign out[002] = |(in[000:002] & shs[125:127]);

assign out[001] = |(in[000:001] & shs[126:127]);

assign out[000] = in[000] & shs[127];

endmodule x log shr.v

/*****************************************************************************/

/* */

/* Module N-Bit Logical Right-Shifter */

/* */

/*****************************************************************************/

module x_log_shr( out, in, shs );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

parameter SHIFT = 2.636;

output [NBITS-1:0] out;

input [NBITS-1:0] in, shs;

reg [NBITS-1:0] out;

integer n, i;

always @( * )

begin

for( n = 0, i = NBITS; i > 0; i = i - 1 )

begin

if( n == 0 && shs[i-1] )

n = i;

end

if( n )

out <= #SHIFT in >> (n - 1);

else out <= #SHIFT 0;

end

endmodule A.1.

VERILOG

FILES

_old_n_12<3> _old_n_14<3> _old_n_16<3>

shs<7> _old_n_10<3> _old_n_10<3> _old_n_12<3> _old_n_14<3> DataA<3:0> Result<3:0> _n0001<3> out<7>

shs<6> _n0003

shs(7:0) _n0003 _n0004 _n0006 _n0008 _n0002<7>

shs<7> out(7:0)

_n0005 _n0007 _n0009 SHIFT

in(7:0) I<7:0> O<7:0>

_n0006

shs<5> _n0004 shs<3> _n0001<3> out<6>

logic

S<2:0>

_old_n_10<3> _old_n_12<0> _n0002<6>

_n0003 _old_n_11<2> _old_n_11<2>

_old_n_12<2> _old_n_13<2>

_old_n_14<2> _old_n_14<2>

_old_n_16<2>

_n0003 _old_n_12<1>

_n0004 _n0005 _n0007 _n0008

_old_n_12<2> _n0009

_old_n_12<3> _n0001<3> out<5>

_n0002<5>

_old_n_11<2>

_old_n_12<1> _old_n_12<1>

_old_n_14<1>

shs<4> _n0005 _n0005 _n0006

_old_n_10<3> shs<2> _old_n_15<1>

_old_n_16<1>

_n0007 _n0007

_old_n_11<2> _old_n_12<0> _n0009 _n0001<3> out<4>

_n0003 _old_n_12<1> _n0002<4>

_n0004 _old_n_12<3>

_n0003 _old_n_12<0> _old_n_13<2>

_n0004 _old_n_12<0>

_old_n_14<0> _old_n_14<0>

_old_n_16<0>

_n0006

_n0005 _n0006 _n0008 _n0001<3> out<3>

_n0007 _n0009 _n0002<3>

_n0008

_old_n_12<2>

_old_n_13<2> shs<1>

_n0006 _old_n_14<0>

_old_n_14<1>

_old_n_15<1> _n0001<3> out<2>

_old_n_14<1>

_n0008 _n0002<2>

_old_n_14<2>

_old_n_14<3> _n0001<3> out<1>

_n0002<1>

shs<0> _n0009

_old_n_14<0>

_old_n_14<2>

_old_n_14<3> _n0001<3> out<0>

_old_n_15<1> _n0002<0>

_n0008

Figure A.8: Wiring of 8-bit logical right-shifter. 113

114 APPENDIX A. SOURCE CODE

downcnt.v

/*****************************************************************************/

/* */

/* Module N-Bit Down-Counter */

/* */

/* Size (bits) | 8 | 16 | 32 | 64 | 128 */

/* ------------+--------+--------+--------+--------+-------- */

/* Logic Level | 1 | 1 | 2 | 7 | 8 */

/* ------------+--------+--------+--------+--------+-------- */

/* Delay (ns) | 1.858 | 1.910 | 2.658 | 2.964 | 3.020 */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

module downcnt( value, reset, clock );

parameter LOG_N = 6;

output [LOG_N-1:0] value;

input reset, clock;

reg [LOG_N-1:0] value;

always @( posedge reset or posedge clock )

begin

if( reset )

value <= 0;

else value <= value - 1;

end

endmodule x downcnt.v

/*****************************************************************************/

/* */

/* Module N-Bit Down-Counter */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

module x_downcnt( value, reset, clock ); 115

A.1. VERILOG FILES

parameter LOG_N = 6;

parameter RESET = 0.479;

parameter COUNT = 2.964;

output [LOG_N-1:0] value;

input reset, clock;

reg [LOG_N-1:0] value;

always @( posedge reset or posedge clock )

begin

if( reset )

value <= #RESET 0;

else value <= #COUNT value - 1;

end

endmodule int divider test.v

/*****************************************************************************/

/* */

/* Test Module for all N-Bit Integer Dividers */

/* */

/*****************************************************************************/

‘define PRINT_EACH_RESULT

‘define VERIFY_EACH_RESULT

‘define A_GREATER_EQUAL_B

‘define ASCENDING_OPERANDS

‘define DETERMINE_AVERAGE

‘define MAX_DIVISION_COUNT 10**6

//‘define STORE_EACH_RESULT

//‘define FILE_NAME "c:/temp/radix_two_128.txt"

//‘define FILE_NAME "c:/temp/self_aligning_128.txt"

//‘define FILE_NAME "c:/temp/direct_aligning_128.txt"

//‘define FILE_NAME "c:/temp/hybrid_aligning_128.txt"

module int_divider_test( );

parameter LOG_N = 3;

parameter NBITS = 2**LOG_N;

parameter PERIOD = 1.0;

reg [31:0] cycles = 0;

reg [31:0] clkcnt = 0;

116 APPENDIX A. SOURCE CODE

reg [31:0] divcnt = 0;

reg [31:0] min = -1;

reg [31:0] max = 0;

reg [LOG_N-1:0] s = 0;

reg [LOG_N-1:0] t = 0;

reg [NBITS-1:0] a = 0;

reg [NBITS-1:0] b = 0;

reg reset = 1;

reg start = 0;

reg clock = 0;

wire [NBITS-1:0] q;

wire [NBITS-1:0] r;

wire ready;

wire error;

‘ifdef STORE_EACH_RESULT

integer file;

‘endif

‘ifdef A_GREATER_EQUAL_B

int_divider #( LOG_N ) divider( q, r, ready, error, a, b, reset, start, clock );

‘else

int_divider #( LOG_N ) divider( q, r, ready, error, b, a, reset, start, clock );

‘endif

initial

begin

#(PERIOD / 2.0) forever

#(PERIOD / 2.0) clock = ~clock;

end

initial

begin

‘ifdef PRINT_EACH_RESULT

$display( "Output: a b q r cycles\n" );

‘endif

‘ifdef STORE_EACH_RESULT

file = $fopen( ‘FILE_NAME );

if( file == 0 )

begin

$display( "\nFile Access Error!\n" );

$stop;

end

‘endif 117

A.1. VERILOG FILES

end

always @( posedge clock )

begin

if( ready )

begin

‘ifdef ASCENDING_OPERANDS

‘ifdef DETERMINE_AVERAGE

if( b == a )

begin

a = a + 1;

b = 1;

end else

b = b + 1;

‘else

if( b == 0 )

begin

a = -1;

b = 1;

end else

if( a == 3 )

begin

a = -1;

a[NBITS-2] = 0;

b = 1;

end else

if( a == 5 )

begin

a = 1;

a[NBITS-1] = 1;

b = 1;

end else

if( b == a )

begin

a = a >> 1;

b = 1;

end else

if( b == 15 )

begin

if( a > 20 )

b = a - 5;

else b = a;

end else

b = b + 1;

‘endif

118 APPENDIX A. SOURCE CODE

‘else s = $random;

t = $random;

a = $random >> s;

b = $random >> t;

while( b == 0 || a < b )

begin

s = $random;

t = $random;

a = $random >> s;

b = $random >> t;

end

‘endif reset = 0;

start = 1;

cycles = 0;

end else

begin

start = 0;

cycles = cycles + 1;

end

end

always @( posedge ready )

begin

‘ifdef PRINT_EACH_RESULT

‘ifdef A_GREATER_EQUAL_B

$display( "%d %d %d %d %d", a, b, q, r, cycles );

‘else

$display( "%d %d %d %d %d", b, a, q, r, cycles );

‘endif

‘endif

‘ifdef STORE_EACH_RESULT

if( cycles != 0 )

$fdisplay( file, "%.0f", cycles );

‘endif

‘ifdef VERIFY_EACH_RESULT

‘ifdef A_GREATER_EQUAL_B

if( q != a / b || r != a % b )

‘else

if( q != b / a || r != b % a )

‘endif

begin 119

A.1. VERILOG FILES

$display( "\nFailure! >>> %d %d", a / b, a % b );

‘ifdef STORE_EACH_RESULT

$fclose( file );

‘endif

$stop;

end

‘endif

if( b > 0 && a >= b )

begin

if( min > cycles )

min = cycles;

if( max < cycles )

max = cycles;

end

clkcnt = clkcnt + cycles;

divcnt = divcnt + 1;

‘ifdef ASCENDING_OPERANDS

if( a == 0 && b == 1 )

‘else

if( divcnt == ‘MAX_DIVISION_COUNT )

‘endif

begin

$display( "\nMinimum Clock Cycles: %d\nMaximum Clock Cycles: %d\n", min, max );

‘ifdef DETERMINE_AVERAGE

$display( "Average Clock Cycles: %d\n", clkcnt / divcnt + 1 );

‘endif

‘ifdef STORE_EACH_RESULT

$fclose( file );

‘endif

$stop;

end

end

always @( posedge error )

begin

‘ifdef PRINT_EACH_RESULT

$display( "Division by zero!" );

‘endif

reset = 1;

end

endmodule

120 APPENDIX A. SOURCE CODE

radix two.v

/*****************************************************************************/

/* */

/* Company: University of Salzburg */

/* Engineer: Rainer Trummer */

/* */

/* Create Date: 09/21/2004 */

/* Design Name: N-Bit Radix-Two Divider */

/* Module Name: int_divider */

/* Target Device: Division Unit */

/* */

/* Comments: */

/* */

/* Controller currently adjusted to 64-bit delays. */

/* */

/* Test Results: */

/* */

/* Word Size (bits) | 8 | 16 | 32 | 64 | 128 */

/* ------------------+------+------+------+------+------ */

/* Min. Clock Cycles | 34 | 58 | 106 | 202 | 394 */

/* Max. Clock Cycles | 83 | 163 | 323 | 643 | 1283 */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

/*****************************************************************************/

/* */

/* Module N-Bit Radix-Two Divider */

/* */

/*****************************************************************************/

module int_divider( q, r, ready, error, a, b, reset, start, clock );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

output [NBITS-1:0] q, r;

output ready, error;

input [NBITS-1:0] a, b;

input reset, start, clock;

wire [NBITS-1:0] q, r;

wire ready, error;

wire [1:0] rqsel; 121

A.1. VERILOG FILES

wire write, c_rst, count;

wire div_0, rem_0, cout, done;

wire [2*NBITS-1:0] m_r_q, r_q;

wire [NBITS-1:0] sum;

wire [LOG_N-1:0] val;

supply0 GND;

supply1 VCC;

assign q = r_q[NBITS-1:0];

assign r = r_q[2*NBITS-1:NBITS];

assign done = ~|val[LOG_N-1:0];

x_mux_4_1 #( LOG_N+1 ) r_q_mux( m_r_q,

{{NBITS-1{GND}}, a, GND}, // 00: initialize

{sum, r_q[NBITS-1:1], VCC}, // 01: write only

{r_q[2*NBITS-2:0], GND}, // 10: shift only

{sum[NBITS-2:0], r_q[NBITS-1:1], // 11: write+shift

VCC, GND},

rqsel );

x_ms_ff #( LOG_N+1 ) r_q_reg( r_q, m_r_q, GND, write );

x_adder #( LOG_N ) cladder( sum, cout, r, ~b, VCC );

x_downcnt #( LOG_N ) counter( val, c_rst, count );

x_not_or #( LOG_N ) check_b( div_0, b );

x_not_or #( LOG_N ) check_a( rem_0, a );

x_ctrl #( LOG_N ) control( ready, error, rqsel, write, c_rst, count,

reset, start, clock, div_0, rem_0, ~r_q[NBITS],

cout, done );

endmodule

/*****************************************************************************/

/* */

/* Module N-Bit Control Unit */

/* */

/*****************************************************************************/

module x_ctrl( ready, error, rqsel, write, c_rst, count,

reset, start, clock, div_0, rem_0, shift, cout, done );

parameter LOG_N = 6;

parameter LATCH = 0.479;

parameter MS_FF = 0.958;

122 APPENDIX A. SOURCE CODE

output ready, error, write, c_rst, count;

output [1:0] rqsel;

input reset, start, clock, div_0, rem_0, shift, cout, done;

reg ready, error, write, c_rst, count;

reg [1:0] rqsel;

reg [3:0] state;

always @( posedge reset )

begin

state <= #MS_FF 0;

ready <= #LATCH 1;

error <= #LATCH 0;

rqsel <= #LATCH 0;

write <= #LATCH 0;

c_rst <= #LATCH 1;

count <= #LATCH 0;

end

always @( posedge start or posedge clock )

begin

case( state )

0:

begin

state <= #MS_FF start ? 1 : 0;

ready <= #LATCH ~start;

rqsel <= #LATCH 0;

write <= #LATCH 0;

c_rst <= #LATCH 0;

count <= #LATCH 0;

@( negedge clock )

begin

write <= #LATCH start;

end

end

1:

begin

state <= #MS_FF 2;

write <= #LATCH 0;

end

2:

begin

state <= #MS_FF 3;

rqsel <= #LATCH 2;

write <= #LATCH 1;

count <= #LATCH 1; 123

A.1. VERILOG FILES

@( negedge clock )

begin

write <= #LATCH 0;

end

end

3:

begin

state <= #MS_FF div_0 | rem_0 ? 0 : 4;

ready <= #LATCH div_0 | rem_0;

error <= #LATCH div_0;

@( negedge clock )

begin

count <= #LATCH 0;

end

end

4:

begin

state <= #MS_FF shift ? 2 : 13;

end

8:

begin

state <= #MS_FF 9;

@( negedge clock )

begin

rqsel <= #LATCH {~done, cout};

end

end

9:

begin

state <= #MS_FF 10;

@( negedge clock )

begin

write <= #LATCH ~done | cout;

count <= #LATCH ~done;

end

end

10:

begin

state <= #MS_FF 11;

write <= #LATCH 0;

end

11:

begin

124 APPENDIX A. SOURCE CODE

state <= #MS_FF done ? 0 : 12;

ready <= #LATCH done;

count <= #LATCH 0;

end

14:

begin

state <= #MS_FF 5;

end

default:

state <= #MS_FF state + 1;

endcase

end

endmodule A.1.

VERILOG

FILES

r_q_reg

gnd_a_g r_q_mux check_a

in<7:0> out<15:0> in0<15:0> data<15:0> out<15:0> in<7:0> out

out<15:0> q(7:0)

gnd in1<15:0> clock

GND in2<15:0> reset

b(7:0) check_b

INV

in3<15:0> in<7:0> out

VCC sel<1:0> counter

clock value<2:0> cladder

a(7:0) sum<7:0>

a<7:0>

reset b<7:0>

cin cout

done_imp

control

clock rqsel<1:0>

clock val<0> done

cout val<1>

count

div_0 val<2>

c_rst

done error error

rem_0

reset ready

ready

reset INV

shift write

start start r(7:0)

Figure A.9: Wiring of 8-bit radix-two divider. 125

126 APPENDIX A. SOURCE CODE

self aligning.v

/*****************************************************************************/

/* */

/* Company: University of Salzburg */

/* Engineer: Rainer Trummer */

/* */

/* Create Date: 09/24/2004 */

/* Design Name: N-Bit Self-Aligning Divider */

/* Module Name: int_divider */

/* Target Device: Division Unit */

/* */

/* Comments: */

/* */

/* Controller currently adjusted to 64-bit delays. */

/* */

/* Test Results: */

/* */

/* Word Size (bits) | 8 | 16 | 32 | 64 | 128 */

/* ------------------+------+------+------+------+------ */

/* Min. Clock Cycles | 23 | 23 | 23 | 23 | 23 */

/* Max. Clock Cycles | 138 | 270 | 534 | 1062 | 2118 */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

/*****************************************************************************/

/* */

/* Module N-Bit Self-Aligning Divider */

/* */

/*****************************************************************************/

module int_divider( q, r, ready, error, a, b, reset, start, clock );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

output [NBITS-1:0] q, r;

output ready, error;

input [NBITS-1:0] a, b;

input reset, start, clock;

wire [NBITS-1:0] q, r;

wire ready, error;

wire dqrst, r_sel, q_bit, shift, write; 127

A.1. VERILOG FILES

wire [1:0] dqsel;

wire div_0, cout;

wire [2*NBITS+1:0] m_d_q, d_q;

wire [NBITS-1:0] m_r, sum;

supply0 GND;

supply1 VCC;

x_mux_3_1 #( LOG_N,

2*NBITS+2 ) d_q_mux( m_d_q,

{GND, d_q[2*NBITS+1:NBITS+2],

q_bit, d_q[NBITS:1]}, // 00: shift right

{d_q[2*NBITS:0], GND}, // 01: shift left

{GND, b, {NBITS-1{GND}}, // 10: initialize

VCC, GND}, // 11: same as 00

dqsel );

x_ms_ff #( LOG_N,

2*NBITS+2 ) d_q_reg( d_q, m_d_q, dqrst, shift );

x_w_cross #( LOG_N ) quo_out( q, d_q[NBITS:1] );

x_mux_2_1 #( LOG_N ) rem_mux( m_r, sum, a, r_sel );

x_ms_ff #( LOG_N ) rem_reg( r, m_r, GND, write );

x_adder #( LOG_N ) cladder( sum, cout, r, ~d_q[2*NBITS:NBITS+1], VCC );

x_not_or #( LOG_N ) check_b( div_0, b );

x_ctrl #( LOG_N ) control( ready, error, dqrst, dqsel, r_sel, q_bit, shift,

write, reset, start, clock, div_0, cout,

d_q[2*NBITS+1], d_q[NBITS+1], d_q[0] );

endmodule

/*****************************************************************************/

/* */

/* Module N-Bit Control Unit */

/* */

/*****************************************************************************/

module x_ctrl( ready, error, dqrst, dqsel, r_sel, q_bit, shift, write,

reset, start, clock, div_0, cout, d_msb, d_lsb, done );

parameter LOG_N = 6;

parameter LATCH = 0.479;

parameter MS_FF = 0.958;

output ready, error, dqrst, r_sel, q_bit, shift, write;

128 APPENDIX A. SOURCE CODE

output [1:0] dqsel;

input reset, start, clock, div_0, cout, d_msb, d_lsb, done;

reg ready, error, dqrst, r_sel, q_bit, shift, write, d_sel, delay;

reg [4:0] state;

wire [1:0] dqsel = {r_sel, d_sel};

wire carry = cout & ~d_msb;

always @( posedge reset )

begin

state <= #MS_FF 0;

ready <= #LATCH 1;

error <= #LATCH 0;

dqrst <= #LATCH 1;

d_sel <= #LATCH 0;

r_sel <= #LATCH 1;

q_bit <= #LATCH 0;

shift <= #LATCH 0;

write <= #LATCH 0;

delay <= #LATCH 1;

end

always @( posedge start or posedge clock )

begin

case( state )

0:

begin

state <= #MS_FF start ? 1 : 0;

ready <= #LATCH ~start;

dqrst <= #LATCH 0;

d_sel <= #LATCH 0;

r_sel <= #LATCH 1;

q_bit <= #LATCH 0;

shift <= #LATCH 0;

write <= #LATCH 0;

delay <= #LATCH 1;

end

1:

begin

state <= #MS_FF 2;

shift <= #LATCH 1;

write <= #LATCH 1;

@( negedge clock )

begin

shift <= #LATCH 0;

write <= #LATCH 0; 129

A.1. VERILOG FILES

end

end

3:

begin

state <= #MS_FF div_0 ? 0 : 4;

ready <= #LATCH div_0;

error <= #LATCH div_0;

end

6:

begin

state <= #MS_FF 7;

d_sel <= #LATCH 1;

r_sel <= #LATCH 0;

end

7:

begin

state <= #MS_FF 8;

@( negedge clock )

begin

shift <= #LATCH 1;

end

end

8:

begin

state <= #MS_FF 9;

shift <= #LATCH 0;

end

10:

begin

state <= #MS_FF ~cout ? 0 : (~d_msb ? 11 : 27);

dqrst <= #LATCH ~cout;

d_sel <= #LATCH carry;

end

14:

begin

state <= #MS_FF delay ? 16 : 15;

shift <= #LATCH delay;

@( negedge clock )

begin

shift <= #LATCH ~delay;

delay <= #LATCH ~delay;

end

end

15:

130 APPENDIX A. SOURCE CODE

begin

state <= #MS_FF 16;

shift <= #LATCH 0;

end

17:

begin

state <= #MS_FF carry ? 11 : 27;

d_sel <= #LATCH carry;

q_bit <= #LATCH carry | (d_msb & d_lsb);

end

19:

begin

state <= #MS_FF done ? 0 : 20;

ready <= #LATCH done;

end

26:

begin

state <= #MS_FF 27;

q_bit <= #LATCH carry | (d_msb & d_lsb);

end

27:

begin

state <= #MS_FF 18;

shift <= #LATCH 1;

write <= #LATCH carry;

@( negedge clock )

begin

shift <= #LATCH 0;

write <= #LATCH 0;

end

end

default:

state <= #MS_FF state + 1;

endcase

end

endmodule

/*****************************************************************************/

/* */

/* Module N-Bit Wire Cross-Coupling */

/* */

/*****************************************************************************/ 131

A.1. VERILOG FILES

module x_w_cross( out, in );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

output [0:NBITS-1] out;

input [NBITS-1:0] in;

reg [0:NBITS-1] out;

integer i;

always @( in )

begin

for( i = 0; i < NBITS; i = i + 1 )

begin

out[i] = in[i];

end

end

endmodule 132

rem_reg cladder rem_mux

data<7:0> out<7:0> a<7:0> sum<7:0> in0<7:0> out<7:0>

clock b<7:0> in1<7:0>

reset cin cout sel

a(7:0) r(7:0)

INV GND

VCC control

clock dqsel<1:0>

clock dqrst

cout error

error

div_0

check_b q_bit

done

in<7:0> out ready

ready

d_lsb r_sel

d_msb

reset shift

reset

start write

start APPENDIX

d_q_mux d_q_reg

g_b_gvg d_q(1:8)

out<17:0>

in0<17:0> data<17:0> out<17:0>

b(7:0) in<7:0> out<17:0> in1<17:0> clock

gnd A.

in2<17:0> reset

vcc sel<1:0> SOURCE

CODE

Figure A.10: Wiring of 8-bit self-aligning divider.

133

A.1. VERILOG FILES direct aligning.v

/*****************************************************************************/

/* */

/* Company: University of Salzburg */

/* Engineer: Rainer Trummer */

/* */

/* Create Date: 10/01/2004 */

/* Design Name: N-Bit Direct-Aligning Divider */

/* Module Name: int_divider */

/* Target Device: Division Unit */

/* */

/* Comments: */

/* */

/* Controller currently adjusted to 64-bit delays. */

/* */

/* Test Results: */

/* */

/* Word Size (bits) | 8 | 16 | 32 | 64 | 128 */

/* ------------------+------+------+------+------+------ */

/* Min. Clock Cycles | 26 | 26 | 26 | 26 | 26 */

/* Max. Clock Cycles | 159 | 311 | 615 | 1223 | 2439 */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

/*****************************************************************************/

/* */

/* Module N-Bit Direct-Aligning Divider */

/* */

/*****************************************************************************/

module int_divider( q, r, ready, error, a, b, reset, start, clock );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

output [NBITS-1:0] q, r;

output ready, error;

input [NBITS-1:0] a, b;

input reset, start, clock;

wire [NBITS-1:0] q, r;

wire ready, error;

wire q_rst, q_sel, write;

134 APPENDIX A. SOURCE CODE

wire [1:0] r_sel;

wire div_0, cout, unused, done;

wire [NBITS-1:0] m_q, m_r, p_r, p_d;

wire [NBITS-1:0] quo, div, sum1, sum2;

supply0 GND;

supply1 VCC;

x_mux_2_1 #( LOG_N ) quo_mux( m_q,

quo, // 00: write only

{GND, quo[NBITS-1:1]}, // 01: write+shift

q_sel );

x_ms_ff #( LOG_N ) quo_reg( q, q | m_q, q_rst, write );

x_mux_3_1 #( LOG_N ) rem_mux( m_r,

sum1, // 00: write 1st sum

sum2, // 01: write 2nd sum

a, // 10: initialize

r_sel ); // 11: same as 00

x_latch #( LOG_N ) rem_reg( r, m_r, GND, write );

x_pri_enc #( LOG_N ) rem_pri( p_r, r );

x_pri_enc #( LOG_N ) div_pri( p_d, b );

x_log_shr #( LOG_N ) quo_shr( quo, p_r, p_d );

x_log_shl #( LOG_N ) div_shl( div, b, quo );

x_adder #( LOG_N ) adder_1( sum1, cout, r, ~div, VCC );

x_adder #( LOG_N ) adder_2( sum2, unused, r, {VCC, ~div[NBITS-1:1]}, VCC );

x_not_or #( LOG_N ) check_b( div_0, b );

x_lt_comp #( LOG_N ) lt_comp( done, r, b );

x_ctrl #( LOG_N ) control( ready, error, q_rst, q_sel, r_sel, write,

reset, start, clock, div_0, cout, done );

endmodule

/*****************************************************************************/

/* */

/* Module N-Bit Control Unit */

/* */

/*****************************************************************************/

module x_ctrl( ready, error, q_rst, q_sel, r_sel, write,

reset, start, clock, div_0, cout, done );

parameter LOG_N = 6; 135

A.1. VERILOG FILES

parameter LATCH = 0.479;

parameter MS_FF = 0.958;

output ready, error, q_rst, q_sel, write;

output [1:0] r_sel;

input reset, start, clock, div_0, cout, done;

reg ready, error, q_rst, q_sel, write;

reg [4:0] state;

wire [1:0] r_sel = {q_rst, q_sel};

always @( posedge reset )

begin

state <= #MS_FF 0;

ready <= #LATCH 1;

error <= #LATCH 0;

q_rst <= #LATCH 1;

q_sel <= #LATCH 0;

write <= #LATCH 0;

end

always @( posedge start or posedge clock )

begin

case( state )

0:

begin

state <= #MS_FF start ? 1 : 0;

ready <= #LATCH ~start;

q_rst <= #LATCH start;

q_sel <= #LATCH 0;

write <= #LATCH 0;

end

1:

begin

state <= #MS_FF 2;

write <= #LATCH 1;

@( negedge clock )

begin

write <= #LATCH 0;

end

end

3:

begin

state <= #MS_FF div_0 ? 0 : 4;

ready <= #LATCH div_0;

136 APPENDIX A. SOURCE CODE

error <= #LATCH div_0;

end

7:

begin

state <= #MS_FF done ? 0 : 8;

ready <= #LATCH done;

end

10:

begin

state <= #MS_FF 11;

q_rst <= #LATCH 0;

end

18:

begin

state <= #MS_FF 19;

@( negedge clock )

begin

q_sel <= #LATCH ~cout;

end

end

19:

begin

state <= #MS_FF 20;

@( negedge clock )

begin

write <= #LATCH 1;

end

end

20:

begin

state <= #MS_FF 2;

write <= #LATCH 0;

end

default:

state <= #MS_FF state + 1;

endcase

end

endmodule A.1.

VERILOG

FILES

adder_1 quo_shr

rem_reg rem_pri

rem_mux

div_shl a<7:0> sum<7:0>

b(7:0) out<7:0>

in<7:0> out<7:0> in0<7:0> data<7:0> out<7:0> in<7:0> out<7:0> in<7:0> out<7:0>

INV b<7:0>

shs<7:0> in1<7:0> enable shs<7:0>

cin cout in2<7:0> reset div_pri

in<7:0> out<7:0>

sel<1:0>

a(7:0) VCC adder_2

a<7:0> sum<7:0>

b<7:0>

cin

clock quo_mux quo_reg

control

check_b q(7:0)

r_sel<1:0>

clock in0<7:0> out<7:0> data<7:0> out<7:0>

in<7:0> out in1<7:0> clock

error

cout OR2

q_rst

div_0 sel reset

lt_comp q_sel

done

a<7:0> less ready

reset

b<7:0> error

start write

start

reset ready

GND r(7:0)

Figure A.11: Wiring of 8-bit direct-aligning divider. 137

138 APPENDIX A. SOURCE CODE

hybrid aligning.v

/*****************************************************************************/

/* */

/* Company: University of Salzburg */

/* Engineer: Rainer Trummer */

/* */

/* Create Date: 10/04/2004 */

/* Design Name: N-Bit Hybrid-Aligning Divider */

/* Module Name: int_divider */

/* Target Device: Division Unit */

/* */

/* Comments: */

/* */

/* Controller currently adjusted to 64-bit delays. */

/* */

/* Test Results: */

/* */

/* Word Size (bits) | 8 | 16 | 32 | 64 | 128 */

/* ------------------+------+------+------+------+------ */

/* Min. Clock Cycles | 26 | 26 | 26 | 26 | 26 */

/* Max. Clock Cycles | 89 | 161 | 315 | 635 | 1275 */

/* */

/*****************************************************************************/

‘timescale 1 ns / 1 ps

/*****************************************************************************/

/* */

/* Module N-Bit Hybrid-Aligning Divider */

/* */

/*****************************************************************************/

module int_divider( q, r, ready, error, a, b, reset, start, clock );

parameter LOG_N = 6;

parameter NBITS = 2**LOG_N;

output [NBITS-1:0] q, r;

output ready, error;

input [NBITS-1:0] a, b;

input reset, start, clock;

wire [NBITS-1:0] q, r;

wire ready, error;

wire q_rst, dqsel, d_shr, q_shr, q_wrt, r_wrt; 139

A.1. VERILOG FILES

wire [1:0] r_sel;

wire div_0, cout1, cout2, done;

wire [NBITS-1:0] m_sd, sd, m_sq, sq, m_r, p_r, p_d, quo, div, sum1, sum2;

supply0 GND;

supply1 VCC;

x_mux_2_1 #( LOG_N ) shd_mux( m_sd,

{GND, sd[NBITS-1:1]}, // 00: shift right

div, // 01: initialize

dqsel );

x_ms_ff #( LOG_N ) shd_reg( sd, m_sd, GND, d_shr );

x_mux_2_1 #( LOG_N ) shq_mux( m_sq,

{GND, sq[NBITS-1:1]}, // 00: shift right

quo, // 01: initialize

dqsel );

x_ms_ff #( LOG_N ) shq_reg( sq, m_sq, GND, q_shr );

x_ms_ff #( LOG_N ) quo_reg( q, q | sq, q_rst, q_wrt );

x_mux_3_1 #( LOG_N ) rem_mux( m_r,

sum1, // 00: write 1st sum

sum2, // 01: write 2nd sum

a, // 10: initialize

r_sel ); // 11: same as 00

x_latch #( LOG_N ) rem_reg( r, m_r, GND, r_wrt );

x_lt_comp #( LOG_N ) lt_comp( done, m_r, b );

x_pri_enc #( LOG_N ) rem_pri( p_r, r );

x_pri_enc #( LOG_N ) div_pri( p_d, b );

x_log_shr #( LOG_N ) quo_shr( quo, p_r, p_d );

x_log_shl #( LOG_N ) div_shl( div, b, quo );

x_adder #( LOG_N ) adder_1( sum1, cout1, r, ~sd, VCC );

x_adder #( LOG_N ) adder_2( sum2, cout2, r, {VCC, ~sd[NBITS-1:1]}, VCC );

x_not_or #( LOG_N ) check_b( div_0, b );

x_ctrl #( LOG_N ) control( ready, error, q_rst, dqsel, r_sel, d_shr,

q_shr, q_wrt, r_wrt, reset, start, clock,

div_0, cout1, cout2, done );

endmodule

/*****************************************************************************/

/* */

/* Module N-Bit Control Unit */

/* */

/*****************************************************************************/

140 APPENDIX A. SOURCE CODE

module x_ctrl( ready, error, q_rst, dqsel, r_sel, d_shr, q_shr, q_wrt, r_wrt,

reset, start, clock, div_0, cout1, cout2, done );

parameter LOG_N = 6;

parameter LATCH = 0.479;

parameter MS_FF = 0.958;

output ready, error, q_rst, dqsel, d_shr, q_shr, q_wrt, r_wrt;

output [1:0] r_sel;

input reset, start, clock, div_0, cout1, cout2, done;

reg ready, error, q_rst, dqsel, d_shr, q_shr, q_wrt, r_wrt;

reg [4:0] state;

wire valid = cout1 | cout2;

wire item2 = q_rst | cout1;

wire [1:0] r_sel = {q_rst, ~item2};

always @( posedge reset )

begin

state <= #MS_FF 0;

ready <= #LATCH 1;

error <= #LATCH 0;

q_rst <= #LATCH 1;

dqsel <= #LATCH 0;

d_shr <= #LATCH 0;

q_shr <= #LATCH 0;

q_wrt <= #LATCH 0;

r_wrt <= #LATCH 0;

end

always @( posedge start or posedge clock )

begin

case( state )

0:

begin

state <= #MS_FF start ? 1 : 0;

ready <= #LATCH ~start;

q_rst <= #LATCH start;

dqsel <= #LATCH 0;

d_shr <= #LATCH 0;

q_shr <= #LATCH 0;

q_wrt <= #LATCH 0;

r_wrt <= #LATCH 0;

end

1:

begin

state <= #MS_FF 2; 141

A.1. VERILOG FILES

r_wrt <= #LATCH 1;

@( negedge clock )

begin

q_wrt <= #LATCH 1;

r_wrt <= #LATCH 0;

end

end

2:

begin

state <= #MS_FF 3;

q_wrt <= #LATCH 0;

end

3:

begin

state <= #MS_FF div_0 ? 0 : 4;

ready <= #LATCH div_0;

error <= #LATCH div_0;

end

7:

begin

state <= #MS_FF done ? 0 : 8;

ready <= #LATCH done;

end

11:

begin

state <= #MS_FF 12;

dqsel <= #LATCH 1;

@( negedge clock )

begin

d_shr <= #LATCH 1;

q_shr <= #LATCH 1;

end

end

12:

begin

state <= #MS_FF 13;

d_shr <= #LATCH 0;

q_shr <= #LATCH 0;

@( negedge clock )

begin

dqsel <= #LATCH 0;

end

end

142 APPENDIX A. SOURCE CODE

19:

begin

state <= #MS_FF 20;

@( negedge clock )

begin

q_rst <= #LATCH 0;

d_shr <= #LATCH 1;

end

end

20:

begin

state <= #MS_FF 21;

d_shr <= #LATCH 0;

q_shr <= #LATCH ~item2;

@( negedge clock )

begin

q_shr <= #LATCH 0;

r_wrt <= #LATCH 1;

end

end

21:

begin

state <= #MS_FF 22;

q_wrt <= #LATCH 1;

r_wrt <= #LATCH 0;

@( negedge clock )

begin

q_wrt <= #LATCH 0;

end

end

25:

begin

state <= #MS_FF 26;

q_shr <= #LATCH item2;

@( negedge clock )

begin

q_shr <= #LATCH 0;

end

end

26:

begin

state <= #MS_FF done ? 0 : 27; 143

A.1. VERILOG FILES

ready <= #LATCH done;

end

27:

begin

state <= #MS_FF 28;

@( negedge clock )

begin

d_shr <= #LATCH 1;

end

end

28:

begin

state <= #MS_FF 29;

d_shr <= #LATCH 0;

@( negedge clock )

begin

q_shr <= #LATCH ~item2;

end

end

29:

begin

state <= #MS_FF 30;

q_shr <= #LATCH 0;

@( negedge clock )

begin

q_wrt <= #LATCH valid;

r_wrt <= #LATCH valid;

end

end

30:

begin

state <= #MS_FF valid ? 22 : 11;

q_wrt <= #LATCH 0;

r_wrt <= #LATCH 0;

end

default:

state <= #MS_FF state + 1;

endcase

end

endmodule 144

rem_pri quo_shr div_shl shd_mux

in<7:0> out<7:0> in<7:0> out<7:0> in<7:0> out<7:0> in0<7:0> out<7:0>

shs<7:0> shs<7:0> in1<7:0>

div_pri sel

in<7:0> out<7:0>

b(7:0) shd_reg

adder_1 control

sum<7:0> out<7:0>

a<7:0> data<7:0>

clock r_sel<1:0>

clock rem_reg

rem_mux b<7:0> clock

dqsel

out<7:0>

in0<7:0> data<7:0> out<7:0> cout1

cin cout reset

d_shr

in1<7:0> enable cout2 error error

in2<7:0> reset

a(7:0) div_0 q_rst

sel<1:0> adder_2 quo_reg

q_shr

done q(7:0)

a<7:0> sum<7:0> data<7:0> out<7:0>

q_wrt

reset reset clock

b<7:0> OR2

ready

start reset ready

INV cin cout

VCC r_wrt r(7:0)

shq_reg

shq_mux

check_b

start in<7:0> out out<7:0> out<7:0>

in0<7:0> data<7:0> APPENDIX

in1<7:0> clock

sel reset

a<7:0> less

b<7:0> GND

lt_comp A.

SOURCE

CODE

Figure A.12: Wiring of 8-bit hybrid-aligning divider.

145

A.1. VERILOG FILES table 512.v

/*****************************************************************************/

/* */

/* Module 512-Entry Lookup Table (512 x 3-Bit ROM) */

/* */

/* Spartan 3 (Speed Grade -5) */

/* */

/* Size (bits) | 512 x 3 */

/* ------------+--------- */

/* Logic Level | 6 */

/* ------------+--------- */

/* Delay (ns) | 2.406 */

/* */

/*****************************************************************************/

module table_512( q, i );

output [2:0] q;

input [8:0] i;

reg [2:0] q;

always @( i )

begin

case( i )

// b = 8: r = -12,...,11

9’b000_110100: q <= 3’b110; 9’b000_110101: q <= 3’b110;

9’b000_110110: q <= 3’b110; 9’b000_110111: q <= 3’b110;

9’b000_111000: q <= 3’b110; 9’b000_111001: q <= 3’b110;

9’b000_111010: q <= 3’b111; 9’b000_111011: q <= 3’b111;

9’b000_111100: q <= 3’b111; 9’b000_111101: q <= 3’b111;

9’b000_111110: q <= 3’b000; 9’b000_111111: q <= 3’b000;

9’b000_000000: q <= 3’b000; 9’b000_000001: q <= 3’b000;

9’b000_000010: q <= 3’b001; 9’b000_000011: q <= 3’b001;

9’b000_000100: q <= 3’b001; 9’b000_000101: q <= 3’b001;

9’b000_000110: q <= 3’b010; 9’b000_000111: q <= 3’b010;

9’b000_001000: q <= 3’b010; 9’b000_001001: q <= 3’b010;

9’b000_001010: q <= 3’b010; 9’b000_001011: q <= 3’b010;

// b = 9: r = -14,...,13

9’b001_110010: q <= 3’b110; 9’b001_110011: q <= 3’b110;

9’b001_110100: q <= 3’b110; 9’b001_110101: q <= 3’b110;

9’b001_110110: q <= 3’b110; 9’b001_110111: q <= 3’b110;

9’b001_111000: q <= 3’b110; 9’b001_111001: q <= 3’b111;

9’b001_111010: q <= 3’b111; 9’b001_111011: q <= 3’b111;

9’b001_111100: q <= 3’b111; 9’b001_111101: q <= 3’b000;

146 APPENDIX A. SOURCE CODE

9’b001_111110: q <= 3’b000; 9’b001_111111: q <= 3’b000;

9’b001_000000: q <= 3’b000; 9’b001_000001: q <= 3’b000;

9’b001_000010: q <= 3’b000; 9’b001_000011: q <= 3’b001;

9’b001_000100: q <= 3’b001; 9’b001_000101: q <= 3’b001;

9’b001_000110: q <= 3’b001; 9’b001_000111: q <= 3’b010;

9’b001_001000: q <= 3’b010; 9’b001_001001: q <= 3’b010;

9’b001_001010: q <= 3’b010; 9’b001_001011: q <= 3’b010;

9’b001_001100: q <= 3’b010; 9’b001_001101: q <= 3’b010;

// b = 10: r = -15,...,13

9’b010_110001: q <= 3’b110; 9’b010_110010: q <= 3’b110;

9’b010_110011: q <= 3’b110; 9’b010_110100: q <= 3’b110;

9’b010_110101: q <= 3’b110; 9’b010_110110: q <= 3’b110;

9’b010_110111: q <= 3’b110; 9’b010_111000: q <= 3’b111;

9’b010_111001: q <= 3’b111; 9’b010_111010: q <= 3’b111;

9’b010_111011: q <= 3’b111; 9’b010_111100: q <= 3’b111;

9’b010_111101: q <= 3’b000; 9’b010_111110: q <= 3’b000;

9’b010_111111: q <= 3’b000; 9’b010_000000: q <= 3’b000;

9’b010_000001: q <= 3’b000; 9’b010_000010: q <= 3’b000;

9’b010_000011: q <= 3’b001; 9’b010_000100: q <= 3’b001;

9’b010_000101: q <= 3’b001; 9’b010_000110: q <= 3’b001;

9’b010_000111: q <= 3’b001; 9’b010_001000: q <= 3’b010;

9’b010_001001: q <= 3’b010; 9’b010_001010: q <= 3’b010;

9’b010_001011: q <= 3’b010; 9’b010_001100: q <= 3’b010;

9’b010_001101: q <= 3’b010; 9’b010_001110: q <= 3’b010;

// b = 11: r = -16,...,15

9’b011_110000: q <= 3’b110; 9’b011_110001: q <= 3’b110;

9’b011_110010: q <= 3’b110; 9’b011_110011: q <= 3’b110;

9’b011_110100: q <= 3’b110; 9’b011_110101: q <= 3’b110;

9’b011_110110: q <= 3’b110; 9’b011_110111: q <= 3’b111;

9’b011_111000: q <= 3’b111; 9’b011_111001: q <= 3’b111;

9’b011_111010: q <= 3’b111; 9’b011_111011: q <= 3’b111;

9’b011_111100: q <= 3’b111; 9’b011_111101: q <= 3’b000;

9’b011_111110: q <= 3’b000; 9’b011_111111: q <= 3’b000;

9’b011_000000: q <= 3’b000; 9’b011_000001: q <= 3’b000;

9’b011_000010: q <= 3’b000; 9’b011_000011: q <= 3’b001;

9’b011_000100: q <= 3’b001; 9’b011_000101: q <= 3’b001;

9’b011_000110: q <= 3’b001; 9’b011_000111: q <= 3’b001;

9’b011_001000: q <= 3’b001; 9’b011_001001: q <= 3’b010;

9’b011_001010: q <= 3’b010; 9’b011_001011: q <= 3’b010;

9’b011_001100: q <= 3’b010; 9’b011_001101: q <= 3’b010;

9’b011_001110: q <= 3’b010; 9’b011_001111: q <= 3’b010;

// b = 12: r = -18,...,17

9’b100_101110: q <= 3’b110; 9’b100_101111: q <= 3’b110; 147

A.1. VERILOG FILES

9’b100_110000: q <= 3’b110; 9’b100_110001: q <= 3’b110;

9’b100_110010: q <= 3’b110; 9’b100_110011: q <= 3’b110;

9’b100_110100: q <= 3’b110; 9’b100_110101: q <= 3’b110;

9’b100_110110: q <= 3’b111; 9’b100_110111: q <= 3’b111;

9’b100_111000: q <= 3’b111; 9’b100_111001: q <= 3’b111;

9’b100_111010: q <= 3’b111; 9’b100_111011: q <= 3’b111;

9’b100_111100: q <= 3’b000; 9’b100_111101: q <= 3’b000;

9’b100_111110: q <= 3’b000; 9’b100_111111: q <= 3’b000;

9’b100_000000: q <= 3’b000; 9’b100_000001: q <= 3’b000;

9’b100_000010: q <= 3’b000; 9’b100_000011: q <= 3’b000;

9’b100_000100: q <= 3’b001; 9’b100_000101: q <= 3’b001;

9’b100_000110: q <= 3’b001; 9’b100_000111: q <= 3’b001;

9’b100_001000: q <= 3’b001; 9’b100_001001: q <= 3’b001;

9’b100_001010: q <= 3’b010; 9’b100_001011: q <= 3’b010;

9’b100_001100: q <= 3’b010; 9’b100_001101: q <= 3’b010;

9’b100_001110: q <= 3’b010; 9’b100_001111: q <= 3’b010;

9’b100_010000: q <= 3’b010; 9’b100_010001: q <= 3’b010;

// b = 13: r = -19,...,18

9’b101_101101: q <= 3’b110; 9’b101_101110: q <= 3’b110;

9’b101_101111: q <= 3’b110; 9’b101_110000: q <= 3’b110;

9’b101_110001: q <= 3’b110; 9’b101_110010: q <= 3’b110;

9’b101_110011: q <= 3’b110; 9’b101_110100: q <= 3’b110;

9’b101_110101: q <= 3’b110; 9’b101_110110: q <= 3’b111;

9’b101_110111: q <= 3’b111; 9’b101_111000: q <= 3’b111;

9’b101_111001: q <= 3’b111; 9’b101_111010: q <= 3’b111;

9’b101_111011: q <= 3’b111; 9’b101_111100: q <= 3’b000;

9’b101_111101: q <= 3’b000; 9’b101_111110: q <= 3’b000;

9’b101_111111: q <= 3’b000; 9’b101_000000: q <= 3’b000;

9’b101_000001: q <= 3’b000; 9’b101_000010: q <= 3’b000;

9’b101_000011: q <= 3’b000; 9’b101_000100: q <= 3’b001;

9’b101_000101: q <= 3’b001; 9’b101_000110: q <= 3’b001;

9’b101_000111: q <= 3’b001; 9’b101_001000: q <= 3’b001;

9’b101_001001: q <= 3’b001; 9’b101_001010: q <= 3’b010;

9’b101_001011: q <= 3’b010; 9’b101_001100: q <= 3’b010;

9’b101_001101: q <= 3’b010; 9’b101_001110: q <= 3’b010;

9’b101_001111: q <= 3’b010; 9’b101_010000: q <= 3’b010;

9’b101_010001: q <= 3’b010; 9’b101_010010: q <= 3’b010;

// b = 14: r = -20,...,19

9’b110_101100: q <= 3’b110; 9’b110_101101: q <= 3’b110;

9’b110_101110: q <= 3’b110; 9’b110_101111: q <= 3’b110;

9’b110_110000: q <= 3’b110; 9’b110_110001: q <= 3’b110;

9’b110_110010: q <= 3’b110; 9’b110_110011: q <= 3’b110;

9’b110_110100: q <= 3’b110; 9’b110_110101: q <= 3’b111;

9’b110_110110: q <= 3’b111; 9’b110_110111: q <= 3’b111;

148 APPENDIX A. SOURCE CODE

9’b110_111000: q <= 3’b111; 9’b110_111001: q <= 3’b111;

9’b110_111010: q <= 3’b111; 9’b110_111011: q <= 3’b111;

9’b110_111100: q <= 3’b000; 9’b110_111101: q <= 3’b000;

9’b110_111110: q <= 3’b000; 9’b110_111111: q <= 3’b000;

9’b110_000000: q <= 3’b000; 9’b110_000001: q <= 3’b000;

9’b110_000010: q <= 3’b000; 9’b110_000011: q <= 3’b000;

9’b110_000100: q <= 3’b001; 9’b110_000101: q <= 3’b001;

9’b110_000110: q <= 3’b001; 9’b110_000111: q <= 3’b001;

9’b110_001000: q <= 3’b001; 9’b110_001001: q <= 3’b001;

9’b110_001010: q <= 3’b001; 9’b110_001011: q <= 3’b010;

9’b110_001100: q <= 3’b010; 9’b110_001101: q <= 3’b010;

9’b110_001110: q <= 3’b010; 9’b110_001111: q <= 3’b010;

9’b110_010000: q <= 3’b010; 9’b110_010001: q <= 3’b010;

9’b110_010010: q <= 3’b010; 9’b110_010011: q <= 3’b010;

// b = 15: r = -22,...,21

9’b111_101010: q <= 3’b110; 9’b111_101011: q <= 3’b110;

9’b111_101100: q <= 3’b110; 9’b111_101101: q <= 3’b110;

9’b111_101110: q <= 3’b110; 9’b111_101111: q <= 3’b110;

9’b111_110000: q <= 3’b110; 9’b111_110001: q <= 3’b110;

9’b111_110010: q <= 3’b110; 9’b111_110011: q <= 3’b110;

9’b111_110100: q <= 3’b111; 9’b111_110101: q <= 3’b111;

9’b111_110110: q <= 3’b111; 9’b111_110111: q <= 3’b111;

9’b111_111000: q <= 3’b111; 9’b111_111001: q <= 3’b111;

9’b111_111010: q <= 3’b111; 9’b111_111011: q <= 3’b000;

9’b111_111100: q <= 3’b000; 9’b111_111101: q <= 3’b000;

9’b111_111110: q <= 3’b000; 9’b111_111111: q <= 3’b000;

9’b111_000000: q <= 3’b000; 9’b111_000001: q <= 3’b000;

9’b111_000010: q <= 3’b000; 9’b111_000011: q <= 3’b000;

9’b111_000100: q <= 3’b000; 9’b111_000101: q <= 3’b001;

9’b111_000110: q <= 3’b001; 9’b111_000111: q <= 3’b001;

9’b111_001000: q <= 3’b001; 9’b111_001001: q <= 3’b001;

9’b111_001010: q <= 3’b001; 9’b111_001011: q <= 3’b001;

9’b111_001100: q <= 3’b010; 9’b111_001101: q <= 3’b010;

9’b111_001110: q <= 3’b010; 9’b111_001111: q <= 3’b010;

9’b111_010000: q <= 3’b010; 9’b111_010001: q <= 3’b010;

9’b111_010010: q <= 3’b010; 9’b111_010011: q <= 3’b010;

9’b111_010100: q <= 3’b010; 9’b111_010101: q <= 3’b010;

default: q <= 3’b100;

endcase

end

endmodule 149

A.1. VERILOG FILES table 256.v

/*****************************************************************************/

/* */

/* Module 256-Entry Lookup Table (256 x 3-Bit ROM) */

/* */

/* Spartan 3 (Speed Grade -5) */

/* */

/* Size (bits) | 256 x 3 */

/* ------------+--------- */

/* Logic Level | 5 */

/* ------------+--------- */

/* Delay (ns) | 1.687 */

/* */

/*****************************************************************************/

module table_256( q, i );

output [2:0] q;

input [7:0] i;

reg [2:0] q;

always @( i )

begin

case( i )

// b = 8: r = 0,...,11

8’b000_00000: q <= 3’b000; 8’b000_00001: q <= 3’b000;

8’b000_00010: q <= 3’b001; 8’b000_00011: q <= 3’b001;

8’b000_00100: q <= 3’b001; 8’b000_00101: q <= 3’b001;

8’b000_00110: q <= 3’b010; 8’b000_00111: q <= 3’b010;

8’b000_01000: q <= 3’b010; 8’b000_01001: q <= 3’b010;

8’b000_01010: q <= 3’b010; 8’b000_01011: q <= 3’b010;

// b = 9: r = 0,...,13

8’b001_00000: q <= 3’b000; 8’b001_00001: q <= 3’b000;

8’b001_00010: q <= 3’b000; 8’b001_00011: q <= 3’b001;

8’b001_00100: q <= 3’b001; 8’b001_00101: q <= 3’b001;

8’b001_00110: q <= 3’b001; 8’b001_00111: q <= 3’b010;

8’b001_01000: q <= 3’b010; 8’b001_01001: q <= 3’b010;

8’b001_01010: q <= 3’b010; 8’b001_01011: q <= 3’b010;

8’b001_01100: q <= 3’b010; 8’b001_01101: q <= 3’b010;

// b = 10: r = 0,...,13

8’b010_00000: q <= 3’b000; 8’b010_00001: q <= 3’b000;

8’b010_00010: q <= 3’b000; 8’b010_00011: q <= 3’b001;

8’b010_00100: q <= 3’b001; 8’b010_00101: q <= 3’b001;

150 APPENDIX A. SOURCE CODE

8’b010_00110: q <= 3’b001; 8’b010_00111: q <= 3’b001;

8’b010_01000: q <= 3’b010; 8’b010_01001: q <= 3’b010;

8’b010_01010: q <= 3’b010; 8’b010_01011: q <= 3’b010;

8’b010_01100: q <= 3’b010; 8’b010_01101: q <= 3’b010;

8’b010_01110: q <= 3’b010;

// b = 11: r = 0,...,15

8’b011_00000: q <= 3’b000; 8’b011_00001: q <= 3’b000;

8’b011_00010: q <= 3’b000; 8’b011_00011: q <= 3’b001;

8’b011_00100: q <= 3’b001; 8’b011_00101: q <= 3’b001;

8’b011_00110: q <= 3’b001; 8’b011_00111: q <= 3’b001;

8’b011_01000: q <= 3’b001; 8’b011_01001: q <= 3’b010;

8’b011_01010: q <= 3’b010; 8’b011_01011: q <= 3’b010;

8’b011_01100: q <= 3’b010; 8’b011_01101: q <= 3’b010;

8’b011_01110: q <= 3’b010; 8’b011_01111: q <= 3’b010;

// b = 12: r = 0,...,17

8’b100_00000: q <= 3’b000; 8’b100_00001: q <= 3’b000;

8’b100_00010: q <= 3’b000; 8’b100_00011: q <= 3’b000;

8’b100_00100: q <= 3’b001; 8’b100_00101: q <= 3’b001;

8’b100_00110: q <= 3’b001; 8’b100_00111: q <= 3’b001;

8’b100_01000: q <= 3’b001; 8’b100_01001: q <= 3’b001;

8’b100_01010: q <= 3’b010; 8’b100_01011: q <= 3’b010;

8’b100_01100: q <= 3’b010; 8’b100_01101: q <= 3’b010;

8’b100_01110: q <= 3’b010; 8’b100_01111: q <= 3’b010;

8’b100_10000: q <= 3’b010; 8’b100_10001: q <= 3’b010;

// b = 13: r = 0,...,18

8’b101_00000: q <= 3’b000; 8’b101_00001: q <= 3’b000;

8’b101_00010: q <= 3’b000; 8’b101_00011: q <= 3’b000;

8’b101_00100: q <= 3’b001; 8’b101_00101: q <= 3’b001;

8’b101_00110: q <= 3’b001; 8’b101_00111: q <= 3’b001;

8’b101_01000: q <= 3’b001; 8’b101_01001: q <= 3’b001;

8’b101_01010: q <= 3’b010; 8’b101_01011: q <= 3’b010;

8’b101_01100: q <= 3’b010; 8’b101_01101: q <= 3’b010;

8’b101_01110: q <= 3’b010; 8’b101_01111: q <= 3’b010;

8’b101_10000: q <= 3’b010; 8’b101_10001: q <= 3’b010;

8’b101_10010: q <= 3’b010;

// b = 14: r = 0,...,19

8’b110_00000: q <= 3’b000; 8’b110_00001: q <= 3’b000;

8’b110_00010: q <= 3’b000; 8’b110_00011: q <= 3’b000;

8’b110_00100: q <= 3’b001; 8’b110_00101: q <= 3’b001;

8’b110_00110: q <= 3’b001; 8’b110_00111: q <= 3’b001;

8’b110_01000: q <= 3’b001; 8’b110_01001: q <= 3’b001;

8’b110_01010: q <= 3’b001; 8’b110_01011: q <= 3’b010; 151

A.1. VERILOG FILES

8’b110_01100: q <= 3’b010; 8’b110_01101: q <= 3’b010;

8’b110_01110: q <= 3’b010; 8’b110_01111: q <= 3’b010;

8’b110_10000: q <= 3’b010; 8’b110_10001: q <= 3’b010;

8’b110_10010: q <= 3’b010; 8’b110_10011: q <= 3’b010;

// b = 15: r = 0,...,21

8’b111_00000: q <= 3’b000; 8’b111_00001: q <= 3’b000;

8’b111_00010: q <= 3’b000; 8’b111_00011: q <= 3’b000;

8’b111_00100: q <= 3’b000; 8’b111_00101: q <= 3’b001;

8’b111_00110: q <= 3’b001; 8’b111_00111: q <= 3’b001;

8’b111_01000: q <= 3’b001; 8’b111_01001: q <= 3’b001;

8’b111_01010: q <= 3’b001; 8’b111_01011: q <= 3’b001;

8’b111_01100: q <= 3’b010; 8’b111_01101: q <= 3’b010;

8’b111_01110: q <= 3’b010; 8’b111_01111: q <= 3’b010;

8’b111_10000: q <= 3’b010; 8’b111_10001: q <= 3’b010;

8’b111_10010: q <= 3’b010; 8’b111_10011: q <= 3’b010;

8’b111_10100: q <= 3’b010; 8’b111_10101: q <= 3’b010;

default q <= 3’b100;

endcase

end

endmodule

152 APPENDIX A. SOURCE CODE

A.2 C++ Files global.h

/*****************************************************************************/

/* */

/* Project: MPI Program For Testing All Xilinx Dividers */

/* Copyright: Copyright (c) 2004. All Rights Reserved */

/* Compiler: Microsoft Visual C++ .NET - Target Win32 */

/* Company: Dept of ComSci, University of Salzburg */

/* Author: Rainer Trummer <rtrummer@cosy.sbg.ac.at> */

/* Date: October 28, 2004 */

/* */

/*****************************************************************************/

#ifndef GLOBAL_H

#define GLOBAL_H

#include <stdlib.h>

#include <stdio.h>

// Word length associated with dividers

//

#define WORD_LEN 16

// Enables different random functions

//

//#define RAND_FUNC_1

//#define RAND_FUNC_2

//#define RAND_FUNC_3

#define RAND_FUNC_4

// Control of verification routines

//

//#define VERIFY

//#define RANDOM_FUNCTION

//#define RADIX_TWO

//#define SELF_ALIGNING

//#define DIRECT_ALIGNING

//#define HYBRID_ALIGNING

// CPU size associated with data types

//

#if RAND_MAX > 32767

#define CPU_SIZE 64

#else 153

A.2. C++ FILES

#define CPU_SIZE 32

#endif

typedef unsigned char byte;

typedef unsigned short hword;

typedef unsigned int uint;

typedef unsigned long word;

#ifndef _WIN32

typedef unsigned long long dword;

#else

typedef unsigned __int64 dword;

#endif

#if WORD_LEN == 16

typedef hword intx;

#elif WORD_LEN == 32

typedef word intx;

#elif WORD_LEN == 64

typedef dword intx;

#elif WORD_LEN == 128

#include "util.h"

typedef ui128 intx;

#endif

// Number of defined performance functions

//

#define NUM_FUNC 4

extern int verification ( void );

extern intx randomize ( void );

extern int num_bits ( intx );

extern int radix_two ( intx, intx );

extern int self_aligning ( intx, intx );

extern int direct_aligning ( intx, intx );

extern int hybrid_aligning ( intx, intx );

extern char * to_str ( dword );

#endif // !GLOBAL_H


PAGINE

213

PESO

899.45 KB

PUBBLICATO

+1 anno fa


DETTAGLI
Corso di laurea: Corso di laurea in ingegneria informatica
SSD:
A.A.: 2013-2014

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher valeria0186 di informazioni apprese con la frequenza delle lezioni di Architetture Sistemi Elaborazione e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Napoli Federico II - Unina o del prof Mazzocca Nicola.

Acquista con carta o conto PayPal

Scarica il file tutte le volte che vuoi

Paga con un conto PayPal per usufruire della garanzia Soddisfatto o rimborsato

Recensioni
Ti è piaciuto questo appunto? Valutalo!

Altri appunti di Architetture sistemi elaborazione

Architetture Sistemi Elaborazione – Interruzioni
Dispensa
Architetture Sistemi Elaborazione   - Sommatori ASE
Dispensa
Architetture Sistemi Elaborazione -  Moltiplicatori sequenziali unsigned
Dispensa
Architetture Sistemi Elaborazione – Intel
Dispensa