..

ALU

If the central processing unit (CPU) is the brain of a computer, then the arithmetic logic unit (ALU) can be (kinda) thought of as its cerebral cortex, responsible for fundamental computations.

In this chapter we aim to build basic addition capabilities, starting from adding two 1-bit values, to adding two 8-bit values. We then combine addition with gates we’ve built in the previous chapter, to form an ALU capable of performing

  1. addition
  2. subtraction
  3. comparison
  4. sign conversion

These fundamental operations will form the building blocks for more complex ones such as multiplication and division.

half adder

The first step is to add two 1-bit numbers. The largest possible result is 2 (1+1). However, computers only recognize the digits 0 and 1. To handle this, we follow a process similar to how we count in decimal. For example, when counting, we go from 1, 2, 3, …, 9, and then to 10, adding a 1 to the tens place and resetting the ones place to 0 because we’ve run out of unique digits. In binary, the same principle applies. We count 0, 1, 10.

[homepage/content/comp_arch/alu/images/half_adder.png]

We’ll refer to the result in the ones place as the sum. Since we’re working in binary, the carry represents the twos place instead of the tens place, as it would be in decimal.

a b sum carry
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1

By focusing on the sum and carry columns in isolation, we immediately identify previously built logic gates that correspond to these results. sum can be represented with XOR, and carry can be represented with AND.

[homepage/content/comp_arch/alu/images/half_adder_impl.png]

full adder

The full adder takes three 1-bit inputs.

[homepage/content/comp_arch/alu/images/full_adder.png]

The maximum result is now 1 + 1 + 1=3 (11 in binary)

a b c sum carry
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

A full adder can be constructed using two half adders. Here’s how:

  1. half adder 1
    • Use a half adder to add two inputs, a and b.
    • The output will give us two values:
      • sum: Represents the ones place.
      • carry: Represents the twos place.
  2. half adder 2 - Next, we add the third input, c, to the sum obtained from the first half adder (sum + c).
    • We add these together because they belong to the same place value (ones place).

[homepage/content/comp_arch/alu/images/full_adder_impl_1.png]

Identifying the correct carry

the above logic leaves us with two carry outputs—one from each half adder. Question is: Which carry should we use?

consider the following cases:

case 1: carry1 = 1

  • If the first half adder’s carry output (carry1) is 1, it means the initial sum (sum1) must be 0 (refer to half adder’s truth table).
  • When the second half adder processes sum1 + c, it must be 0 + c. This combination can never produce a binary result of 10, so carry2 must be 0.
  • In summary: If carry1 = 1, then carry2 = 0.

case 2: carry1 = 0

  • If carry1 is 0, the intermediate sum (sum1) could be either 0 or 1.
  • Consequently, the second half adder (sum1 + c) might produce a carry (carry2) of 0 or 1.

The takeaway is that carry1 and carry2 will never both be 1 at the same time (verify with a truth table!). Therefore, the final carry output for the full adder can be determined by taking the OR of these two values.

[homepage/content/posts/ALU/images/full_adder_impl.png]

adder

Now we wish to extend our addition capabilities from 1-bit values to n-bit values. [homepage/content/comp_arch/alu/images/adder.png]

Let’s try 8-bits.

but how?

Examine how we would add two 8-bit values by hand.

  01010101
+ 00001111
----------

The first step to addition would be to add the digits in the ones-place, giving us

        1
  01010101
+ 00001111
----------
         0

The two 1s add together to give a sum of 0 and carry of 1. We then repeat these steps again for “twos” place

       11
  01010101
+ 00001111
----------
        00

Resulting in another sum=0 and carry=1. But notice how this addition involved the carry from the previous addition, totaling to 3 inputs (full adder) while the first addition only needed 2 inputs (half adder).

It follows that every addition other than the first can be done with full adders, and these additions can be computed sequentially.

[homepage/content/posts/ALU/images/adder_impl.png]

how long will it take?

This setup allows us to add two 8-bit numbers, and we could extend this (1 half adder, n-1 full adders) strategy to add any n-bit numbers. However, one notable limitation is that the result must be generated sequentially. i.e. sum1 has to wait for carry0, sum2 for carry1, and so forth.

What does waiting for the previous carry entail?

In reality, voltage are the inputs to these circuits. For example, the 0 and 1 input values above could be 0V and 1.2V respectively.

To give a rough idea about how modern circuits work: CMOS

When voltage is applied to logic gate inputs, picture water flowing through a narrow pipe into a water tank. When this water tank is sufficiently full, a switch is flipped, and the output of the gate will change.

Similarly, when no charge is applied, water flows out of the tank through the narrow pipe. When the water level is sufficiently low, the switch flips and the output changes again.

Water Tank: Capacitor Water Flow: Electrical Charge Narrow Pipe: Resistance in the circuit “Sufficiently filled”: Logic Threshold Voltage

Switching the output of one CMOS gate may take up to 120 picoseconds. One Full Adder that consists of 2 Half Adders and 1 OR gate requires 360 picoseconds.

Our 8-bit Adder needs: 120 + 360 x 7 = 2640 picoseconds = 2.64 nanoseconds

going faster: carry-lookahead adder

The bottleneck here lies in waiting for the correct carry value to propagate through the full adders. What if we could cut down on this propagation time?

Re-examine the truth table for a full adder. But this time, we’ll pretend the value of c (carry-in) isn’t available to us yet (It’s still propagating). See how this leaves many of the values for sum and carry ambiguous (try to work this out yourself).

a b c sum carry
0 0 ? ? 0
0 0 ? ? 0
0 1 ? ? ?
0 1 ? ? ?
1 0 ? ? ?
1 0 ? ? ?
==1== ==1== ==?== ==?== ==1==
==1== ==1== ==?== ==?== ==1==

Crucially, we see that when a and b are both true, the value of carry must be true, regardless of c. This means that under this condition, the carry can be determined immediately, and there is no need to wait for the propagation. This mechanism is called carry generate.

[homepage/content/comp_arch/alu/images/carry_lookahead_adder_impl.png]

ALU

Finally we arrive at the arithmetic logic unit.

An ALU typically takes in two data inputs, also known as operands.

What to do with the operands is determined by a third input, called the control bits. Each control bit signifies its own unique operation, and the number of control bits which is up to the chip designer.

For our simple example, we will be implementing the ALU from the Hack Computer in nand2tetris.

The ALU will take two 16-bit operands, x and y. It also features 6 control bits.

Hack Computer ALU specification

// This file is part of www.nand2tetris.org
// and the book "The Elements of Computing Systems"
// by Nisan and Schocken, MIT Press.
// File name: projects/2/ALU.hdl
/**
 * ALU (Arithmetic Logic Unit):
 * Computes out = one of the following functions:
 *                0, 1, -1,
 *                x, y, !x, !y, -x, -y,
 *                x + 1, y + 1, x - 1, y - 1,
 *                x + y, x - y, y - x,
 *                x & y, x | y
 * on the 16-bit inputs x, y,
 * according to the input bits zx, nx, zy, ny, f, no.
 * In addition, computes the two output bits:
 * if (out == 0) zr = 1, else zr = 0
 * if (out < 0)  ng = 1, else ng = 0
 */
// Implementation: Manipulates the x and y inputs
// and operates on the resulting values, as follows:
// if (zx == 1) sets x = 0        // 16-bit constant
// if (nx == 1) sets x = !x       // bitwise not
// if (zy == 1) sets y = 0        // 16-bit constant
// if (ny == 1) sets y = !y       // bitwise not
// if (f == 1)  sets out = x + y  // integer 2's complement addition
// if (f == 0)  sets out = x & y  // bitwise and
// if (no == 1) sets out = !out   // bitwise not

Note that 6 control bits does not equate to 6 possible results, but rather 26 possible combinations of these single operations. These actions are executed sequentially. For instance, if both zx and nx are applied, we will zero the value of x before negating it’s value.

The idea behind it’s implementation is relatively simple. No matter the value of the control bits, we prepare both values in advance, e.g. x and -x for nx. We then apply Multiplexers to choose between them, passing the control bit as the sel input.

Below is the implementation of the ALU in Hardware Description Language (hdl). Its syntax should be intuitive, and all of the gates utilized have been built in this or the previous chapter, except Add16, which is just an extension of our 8-bit adder.

alu.hdl

CHIP ALU {
    IN  
        x[16], y[16],  // 16-bit inputs        
        zx, // zero the x input?
        nx, // negate the x input?
        zy, // zero the y input?
        ny, // negate the y input?
        f,  // compute (out = x + y) or (out = x & y)?
        no; // negate the out output?
    OUT
        out[16], // 16-bit output
        zr,      // if (out == 0) equals 1, else 0
        ng;      // if (out < 0)  equals 1, else 0

    PARTS:
    // if zx == 1: x = 0
    Mux16(a=x , b[0..15]=false , sel=zx , out=x1);

    // if zy == 1: y = 0
    Mux16(a=y , b[0..15]=false , sel=zy , out=y1);

    // if nx == 1: x = -x
    Not16(in=x1 , out=notx1);
    Mux16(a=x1 , b=notx1 , sel=nx , out=x2);

    // if ny == 1: y = -y
    Not16(in=y1 , out=noty1);
    Mux16(a=y1 , b=noty1 , sel=ny , out=y2);

    // x + y
    Add16(a =x2 , b =y2 , out =xplusy);

    // x & y
    And16(a=x2 , b=y2 , out=xandy);

    // if f == 1: out=x+y; else out=x&y
    Mux16(a=xandy , b=xplusy , sel=f , out=out1);

    // if no: out=-out
    Not16(in=out1 , out=notout1);
    Mux16(a=out1 , b=notout1 , sel=no , out=out);

    // if out < 0: ng = 1
    // negative numbers in twos-complement
    // have a 1 in the most significant bit
    Mux16(a=out1 , b=notout1 , sel=no , out[15]=ng);

    // if out == 0, zr = 0
    Mux16(a=out1 , b=notout1 , sel=no , out[0..7]=outa,
    out[8..15]=outb);
    Or8Way(in=outa , out=onePresentOutA);
    Or8Way(in=outb , out=onePresentOutB);
    Or(a=onePresentOutA , b=onePresentOutB , out=onePresent);
    Not(in=onePresent , out=zr );
}

[homepage/content/comp_arch/alu/images/alu_impl.png] zr and ng output bit logic not pictured….

With this, we have all the capabilities promised at the start of the page.

Subtraction is simply x + (-y), and we can get -y from applying the ny control bit.

Comparison can be done with x + (-y) too, and checking if the zr output bit is set.

Sign conversion is trivial.

next steps

We now have the ability to perform computation, but nowhere to record our results. This will be fixed in the next chapter: memory

© 2025 Xavier Goh