### **BIT-PARALLEL ARITHMETIC**

#### **Addition and Subtraction**

Two binary numbers,

 $\mathbf{x} = (x_{0\bullet}x_1x_2...x_{Wd-1})_2$  and  $\mathbf{y} = (y_{0\bullet}y_1y_2...y_{Wd-1})_2$ ,

can be added using bit-parallel operation in  $O(\log(W_d))$  steps

## ■ Ripple-Carry Adder/Subtractor



Adder x + y

Subtractor x - y

larsw@isy.liu.se

larsw@isv.liu.se

http://www.es.isv.liu.s

http://www.es.isy.liu.se

1

3

| DSP Integrated Circuits | Department of Electrical Engineering |
|-------------------------|--------------------------------------|
| Lars Wanhammar          | Linköping University                 |

Department of Electrical Engineering

Linköping Universit

Carry-Look-Ahead Adder

- Brent–Kung's Adder
- Carry-Save Adder
- Carry-Select Adder
- Carry-Skip Adder

DSP Integrated Circuits

Lars Wanham

Conditional–Sum Adder

Addition time is determined by the carry path in the full-adders.

Notice that even though the adder is said to be bit-parallel, computation of the output bits is done sequentially.

In fact, at each time instant only one of the full-adders performs useful work. The others have already completed their computations or may not yet have received a correct carry input, hence they may perform erroneous computations.

Because of these wasted switch operations, power consumption is higher than necessary.



# **Bit-Parallel Multiplication**

In order to multiply two two's-complement numbers, a and x, we form the partial bit-products  $a_i \cdot x_k$ , as shown below.



High-speed, bit-parallel multipliers can be divided in three classes

## ■ Add–and–shift multipliers

The partial bit-products are generated sequentially and accumulate them successively as they are generated. This type are therefore the slowest multiplier, but the required chip area is low.

DSP Integrated Circuits Department of Electrical Engineering Lars Wanhammar Linköping University

larsw@isy.liu.se http://www.es.isy.liu.s



2

4

#### Parallel multipliers

All bit-products are generated in parallel and uses a multi-operand adder (i.e., an adder tree) for their accumulation.

A parallel multiplier structure can be partitioned into three parts: partial product generation, carry-free addition, and carry-propagation addition.

These three parts can be implemented using different schemes.

For example, the partial product generation can be implemented by AND gates or by using the Booth's algorithm.

The carry-free addition part is often implemented by using a Wallace or redundant binary addition tree.

#### Array multipliers

An array of almost identical cells for generation of the bit-products and accumulation.

This type of multipliers requires the least amount of area, but it is also the slower



# **Modified Booth's Algorithm**

By considering three bits at a time from the input operand, x, is used to recode the other input operand, y, into partial products (-2y, -y, 0, y, 2y) that are added (subtracted) to form the result.

Hence, the number of recoded partial products is reduced.

Booth's algorithm reduces the number of additions/subtractions cycles to  $W_d/2 = 8$ , which is only half that required in the ordinary shift–and–add algorithm.

Notice that the reduction of the number of cycles directly corresponds to a reduction in the power consumption.

Booth's algorithm is also suitable for bit-serial or digit-serial implementation.

# Add–And–Shift Multiplication

Consider the multiplication of two two's-complement numbers,  $y = a \cdot x$ .

$$\mathbf{y} = \mathbf{a} \begin{pmatrix} W_d - 1 \\ -x_0 + \sum_{i=1}^{W} x_i 2^{-i} \end{pmatrix} = -\mathbf{a} x_0 + \sum_{i=1}^{W_d - 1} \mathbf{a} x_i 2^{-i}$$

The multiplication can be performed by generating the partial products, for example row wise.



# ar Linköping University http://www.ex.isy.liu.se/

# **Tree–Based Multipliers**

Short-hand version of the bit-products



In a *tree-based multiplier* the bit-products are often added using full-adders.

A full-adder can be considered as a counter, denoted (3, 2)-counter, that adds three inputs and forms a two bit result.

larsw@isv.liu.se

http://www.es.isv.liu.se

The output equals the number of 1s in the inputs – counter.

A half-adder is denoted (2, 2)-counter.



larsw@isy.liu.se http://www.es.isy.liu.s



5

7

In 1964 Wallace showed that a tree structure (Wallace tree) of such counters is an efficient method,  $O(\log(W_d))$ , to add the bit-products.



In the fourth stage, a RCA or a CLA can be used to obtain the product.

Several alternative addition schemes are possible.

For example, Dadda has derived a scheme where all bit-products with the same weight are collected and added using a Wallace tree with a minimum number of counters and minimal critical paths.

However, the multiplier is irregular and difficult to implement efficiently since a large wiring area is required.

Wallace tree multipliers should therefore only be used for large word length and where the performance is critical.

Another more regular structure is the Overturn-Stair Multiplier



9

## **Array Multipliers**

Many array multiplier schemes with varying degrees of pipelining have been proposed.

A typical array multiplier—the Baugh-Wooley's multiplier with a multiplication time proportional to  $2W_d$ .

x2.y3  $\mathbf{x}_1$  $\mathbf{x}_2$ x<sub>3</sub> x<sub>3</sub>.y<sub>3</sub> x<sub>0</sub> y<sub>2</sub> x3.y2 У<sub>0</sub> У1 У<sub>3</sub>  $\overline{x_0 \cdot y_3}$ 1  $x_1 \cdot y_3$  $x_2 \cdot y_3$ x<sub>3</sub>.y<sub>3</sub>  $\overline{x_0 \cdot y_2}$  $\overline{x_0 \cdot y_2}$  $x_1 \cdot y_2$  $x_2 \cdot y_2$  $x_3 \cdot y_2$  $\overline{x_0 \cdot y_1}$  $x_1 \cdot y_1$  $x_2 \cdot y_1$ x<sub>3</sub>·y<sub>1</sub>  $\overline{x_1 \cdot y_0}$  $\overline{\mathbf{x}_2 \cdot \mathbf{y}_0}$  $\overline{x_3 \cdot y_0}$ x<sub>0</sub>.y  $x_0 \cdot y_0$  $P_{-1}$ Po •  $p_1$  $\mathbf{p}_2$  $p_3$  $p_4$ p<sub>5</sub> P<sub>6</sub> X1·Yc FA FA P3 ♥ P4 • P5 don't care



http://www.es.isv.liu.s





12

Notice that some slight variations in the way the partial products are formed occur in the literature.

All bit-products are generated in parallel and collected through an array of full-adders and a final RCA.