ALU
If the central processing unit (CPU) is the brain of a computer, then the arithmetic logic unit (ALU) can be (kinda) thought of as its cerebral cortex, responsible for fundamental computations.
In this chapter we aim to build basic addition capabilities, starting from adding two 1-bit values, to adding two 8-bit values. We then combine addition with gates we’ve built in the previous chapter, to form an ALU capable of performing
- addition
- subtraction
- comparison
- sign conversion
These fundamental operations will form the building blocks for more complex ones such as multiplication and division.
half adder
The first step is to add two 1-bit numbers. The largest possible result is 2 (1+1). However, computers only recognize the digits 0 and 1. To handle this, we follow a process similar to how we count in decimal. For example, when counting, we go from 1, 2, 3, …, 9, and then to 10, adding a 1 to the tens place and resetting the ones place to 0 because we’ve run out of unique digits. In binary, the same principle applies. We count 0, 1, 10.
We’ll refer to the result in the ones place as the sum. Since we’re working in binary, the carry represents the twos place instead of the tens place, as it would be in decimal.
a | b | sum | carry |
---|---|---|---|
0 | 0 | 0 | 0 |
0 | 1 | 1 | 0 |
1 | 0 | 1 | 0 |
1 | 1 | 0 | 1 |
By focusing on the sum and carry columns in isolation, we immediately identify previously built logic gates that correspond to these results. sum can be represented with XOR, and carry can be represented with AND.
full adder
The full adder takes three 1-bit inputs.
The maximum result is now 1 + 1 + 1=3 (11 in binary)
a | b | c | sum | carry |
---|---|---|---|---|
0 | 0 | 0 | 0 | 0 |
0 | 0 | 1 | 1 | 0 |
0 | 1 | 0 | 1 | 0 |
0 | 1 | 1 | 0 | 1 |
1 | 0 | 0 | 1 | 0 |
1 | 0 | 1 | 0 | 1 |
1 | 1 | 0 | 0 | 1 |
1 | 1 | 1 | 1 | 1 |
A full adder can be constructed using two half adders. Here’s how:
- half adder 1
- Use a half adder to add two inputs, a and b.
- The output will give us two values:
- sum: Represents the ones place.
- carry: Represents the twos place.
- half adder 2
- Next, we add the third input, c, to the sum obtained from the first half adder (sum + c).
- We add these together because they belong to the same place value (ones place).
Identifying the correct carry
the above logic leaves us with two carry outputs—one from each half adder. Question is: Which carry should we use?
consider the following cases:
case 1: carry1 = 1
- If the first half adder’s carry output (carry1) is 1, it means the initial sum (sum1) must be 0 (refer to half adder’s truth table).
- When the second half adder processes sum1 + c, it must be 0 + c. This combination can never produce a binary result of
10
, so carry2 must be 0. - In summary: If carry1 = 1, then carry2 = 0.
case 2: carry1 = 0
- If carry1 is 0, the intermediate sum (sum1) could be either 0 or 1.
- Consequently, the second half adder (sum1 + c) might produce a carry (carry2) of 0 or 1.
The takeaway is that carry1 and carry2 will never both be 1 at the same time (verify with a truth table!). Therefore, the final carry output for the full adder can be determined by taking the OR of these two values.
adder
Now we wish to extend our addition capabilities from 1-bit values to n-bit values.
Let’s try 8-bits.
but how?
Examine how we would add two 8-bit values by hand.
01010101
+ 00001111
----------
The first step to addition would be to add the digits in the ones-place, giving us
1
01010101
+ 00001111
----------
0
The two 1s add together to give a sum of 0 and carry of 1. We then repeat these steps again for “twos” place
11
01010101
+ 00001111
----------
00
Resulting in another sum=0 and carry=1. But notice how this addition involved the carry from the previous addition, totaling to 3 inputs (full adder) while the first addition only needed 2 inputs (half adder).
It follows that every addition other than the first can be done with full adders, and these additions can be computed sequentially.
how long will it take?
This setup allows us to add two 8-bit numbers, and we could extend this (1 half adder, n-1 full adders) strategy to add any n-bit numbers. However, one notable limitation is that the result must be generated sequentially. i.e. sum1 has to wait for carry0, sum2 for carry1, and so forth.
What does waiting for the previous carry entail?
In reality, voltage are the inputs to these circuits. For example, the 0 and 1 input values above could be 0V and 1.2V respectively.
To give a rough idea about how modern circuits work: CMOS
When voltage is applied to logic gate inputs, picture water flowing through a narrow pipe into a water tank. When this water tank is sufficiently full, a switch is flipped, and the output of the gate will change.
Similarly, when no charge is applied, water flows out of the tank through the narrow pipe. When the water level is sufficiently low, the switch flips and the output changes again.
Water Tank: Capacitor Water Flow: Electrical Charge Narrow Pipe: Resistance in the circuit “Sufficiently filled”: Logic Threshold Voltage
Switching the output of one CMOS gate may take up to 120 picoseconds. One Full Adder that consists of 2 Half Adders and 1 OR gate requires 360 picoseconds.
Our 8-bit Adder needs: 120 + 360 x 7 = 2640 picoseconds = 2.64 nanoseconds
going faster: carry-lookahead adder
The bottleneck here lies in waiting for the correct carry value to propagate through the full adders. What if we could cut down on this propagation time?
Re-examine the truth table for a full adder. But this time, we’ll pretend the value of c (carry-in) isn’t available to us yet (It’s still propagating). See how this leaves many of the values for sum and carry ambiguous (try to work this out yourself).
a | b | c | sum | carry |
---|---|---|---|---|
0 | 0 | ? | ? | 0 |
0 | 0 | ? | ? | 0 |
0 | 1 | ? | ? | ? |
0 | 1 | ? | ? | ? |
1 | 0 | ? | ? | ? |
1 | 0 | ? | ? | ? |
==1== | ==1== | ==?== | ==?== | ==1== |
==1== | ==1== | ==?== | ==?== | ==1== |
Crucially, we see that when a and b are both true, the value of carry must be true, regardless of c. This means that under this condition, the carry can be determined immediately, and there is no need to wait for the propagation. This mechanism is called carry generate.
ALU
Finally we arrive at the arithmetic logic unit.
An ALU typically takes in two data inputs, also known as operands.
What to do with the operands is determined by a third input, called the control bits. Each control bit signifies its own unique operation, and the number of control bits which is up to the chip designer.
For our simple example, we will be implementing the ALU from the Hack Computer in nand2tetris.
The ALU will take two 16-bit operands, x and y. It also features 6 control bits.
Hack Computer ALU specification
// This file is part of www.nand2tetris.org
// and the book "The Elements of Computing Systems"
// by Nisan and Schocken, MIT Press.
// File name: projects/2/ALU.hdl
/**
* ALU (Arithmetic Logic Unit):
* Computes out = one of the following functions:
* 0, 1, -1,
* x, y, !x, !y, -x, -y,
* x + 1, y + 1, x - 1, y - 1,
* x + y, x - y, y - x,
* x & y, x | y
* on the 16-bit inputs x, y,
* according to the input bits zx, nx, zy, ny, f, no.
* In addition, computes the two output bits:
* if (out == 0) zr = 1, else zr = 0
* if (out < 0) ng = 1, else ng = 0
*/
// Implementation: Manipulates the x and y inputs
// and operates on the resulting values, as follows:
// if (zx == 1) sets x = 0 // 16-bit constant
// if (nx == 1) sets x = !x // bitwise not
// if (zy == 1) sets y = 0 // 16-bit constant
// if (ny == 1) sets y = !y // bitwise not
// if (f == 1) sets out = x + y // integer 2's complement addition
// if (f == 0) sets out = x & y // bitwise and
// if (no == 1) sets out = !out // bitwise not
Note that 6 control bits does not equate to 6 possible results, but rather 26 possible combinations of these single operations. These actions are executed sequentially. For instance, if both zx and nx are applied, we will zero the value of x before negating it’s value.
The idea behind it’s implementation is relatively simple. No matter the value of the control bits, we prepare both values in advance, e.g. x and -x for nx. We then apply Multiplexers to choose between them, passing the control bit as the sel input.
Below is the implementation of the ALU in Hardware Description Language (hdl). Its syntax should be intuitive, and all of the gates utilized have been built in this or the previous chapter, except Add16, which is just an extension of our 8-bit adder.
alu.hdl
CHIP ALU {
IN
x[16], y[16], // 16-bit inputs
zx, // zero the x input?
nx, // negate the x input?
zy, // zero the y input?
ny, // negate the y input?
f, // compute (out = x + y) or (out = x & y)?
no; // negate the out output?
OUT
out[16], // 16-bit output
zr, // if (out == 0) equals 1, else 0
ng; // if (out < 0) equals 1, else 0
PARTS:
// if zx == 1: x = 0
Mux16(a=x , b[0..15]=false , sel=zx , out=x1);
// if zy == 1: y = 0
Mux16(a=y , b[0..15]=false , sel=zy , out=y1);
// if nx == 1: x = -x
Not16(in=x1 , out=notx1);
Mux16(a=x1 , b=notx1 , sel=nx , out=x2);
// if ny == 1: y = -y
Not16(in=y1 , out=noty1);
Mux16(a=y1 , b=noty1 , sel=ny , out=y2);
// x + y
Add16(a =x2 , b =y2 , out =xplusy);
// x & y
And16(a=x2 , b=y2 , out=xandy);
// if f == 1: out=x+y; else out=x&y
Mux16(a=xandy , b=xplusy , sel=f , out=out1);
// if no: out=-out
Not16(in=out1 , out=notout1);
Mux16(a=out1 , b=notout1 , sel=no , out=out);
// if out < 0: ng = 1
// negative numbers in twos-complement
// have a 1 in the most significant bit
Mux16(a=out1 , b=notout1 , sel=no , out[15]=ng);
// if out == 0, zr = 0
Mux16(a=out1 , b=notout1 , sel=no , out[0..7]=outa,
out[8..15]=outb);
Or8Way(in=outa , out=onePresentOutA);
Or8Way(in=outb , out=onePresentOutB);
Or(a=onePresentOutA , b=onePresentOutB , out=onePresent);
Not(in=onePresent , out=zr );
}
zr and ng output bit logic not pictured….
With this, we have all the capabilities promised at the start of the page.
Subtraction is simply x + (-y), and we can get -y from applying the ny control bit.
Comparison can be done with x + (-y) too, and checking if the zr output bit is set.
Sign conversion is trivial.
next steps
We now have the ability to perform computation, but nowhere to record our results. This will be fixed in the next chapter: memory