CmpSci 535 Notes from Lecture 5

Logic Design

Brief biographical sketch of George Boole and his relationship to Augustus DeMorgan, Charles Babbage, and Ada Lovelace. How Claude Shannon showed that Boolean Algebra could be used in the design of binary digital computers.

From both programming and prior work with assembly, everone should be familiar with the Boolean AND, OR, and NOT operations. Since we design computers by physically connecting gates together, we use diagrams where symbols represent the gates and lines represent the wires that connect them.

[drawings of AND, OR, NOT gates]

[drawing of (A AND NOT B) AND NOT (C OR D)]

We could, of course, write the same logical expression using programming notation:

(A AND NOT B) AND NOT (C OR D)

And there is also an algebraic notation that is often used:

[algebraic expression of same using ^, v, - notation, and +, ., - notation]

The NOT operation is easiest to build from transistors, and it is simple to extendthe NOT to form either NOT-AND (NAND) or NOT-OR (NOR).

[diagrams of transistor arrangements for these]

[drawings of their symbols]

To build AND or OR we would follow one of these with a NOT. Thus, we can see that it is more efficient to design circuits that are built with negative logic gates -- that is, it takes fewer transistors to build a NAND than an AND.

The NAND and NOR functions are universal in that we can construct any other Boolean function using either of them. Lets think about how we can show this for NAND.

How can we build NOT from NAND?

If we wire together the inputs to the NAND, we get NOT.

Now, how can we generate AND? (This one is easy!)

If we feed the output of a NAND through one of these NOT operations, we get AND.

What about OR? This one is not so easy.

DeMorgan's law (named after Augustus) says that -(A.B) = -A + -B.

How can we use this to construct the OR?

If we invert A and B at the input to a NAND gate, the output is the OR of A and B.

Now we know that we can build anything once we have AND, OR, and NOT.

One other gate that is useful is the XOR (exclusive OR), which can be built from (A OR B) AND NOT (A AND B). It computes the sum of two inputs. When A AND B is true, we get a carry out. The combination is called a half adder.

Of course, in order to build a circuit that adds multiple bits, we need to provide this adder with the ability to incorporate a carry in from a prior stage. All we have to do is feed the three inputs to a pair of half adders with the sum of the input bits going to the other input of the half adder that has the carry in going to its other input. The carry outputs of the two half adders are ORed together, and we have a full adder. We can then chain the carry out to the carry in between stages, and create a multi-bit adder known as a ripple-carry adder.

The trouble with the ripple-carry adder is two-fold: the time it takes it to add is proportional to the number of bits (and given a few nanoseconds delay for every gate, we can see that a 32-bit add would take hundreds of nanoseconds), and it can produce a series of transitions on its outputs that are incorrect results that have to be ignored until the final state is reached.

An alternative is called a carry-lookahead-adder. This circuit notes that the carry at each stage can be computed by a two- level Boolean function of all of the inputs to the prior stages. Thus, we can build an adder that determines the carry input for every stage in a constant number of gate delays, and feeds this into the three-input adder for each stage. The adder operates in constant time, but it takes a large number of gates to do so. Thus, it is very costly. This is just another example of the speed versus space tradeoff that we are familiar with from our other experiences with algorithms.

There are many different designs for arithmetic circuits -- whole books and journals are devoted to them. The point to be kept in mind is simply that an architect has lots of options to choose from in designing an arithmetic unit for a computer, and these options are meant to address a wide range of different requirements that a design may be subject to.

For example, in digital signal processing, it is very important to be able to perform a multiply and an add in one instruction cycle, so very expensive circuitry is employed to accompish this. In a calculator, however, slower circuitry that conserves power is used.

Early microprocessors didn't have multiply or divide circuitry because of its expense. Now they almost all do, but it may vary in the speed with which it operates. Nor is it necessarily the case that, for example, a slow diveder will get faster in newer versions of an architecture. From Amdahl's law, we know that increasing the speed of a little- used operation has a very small effect on overall performance. An architect may instead decide that it is better to have two adders and two multipliers than to have a faster divider -- divide accounts for only a tiny percentage of the operations executed in most programs.

PLAs

Time-critical circuits are still designed by linking together gates. However, large Boolean functions as are found in an instruction decoder, are usually built with a more regular structure called a programmed logic array (PLA). To understand why this works, you must know that any Boolean operation can be reduced to just two levels of gates (either AND-OR, or OR-AND) as long as the individual gates can have an arbitrary number of inputs.

A typical PLA consists of an AND plane in which each input crosses a set of wires in both its true and complemented form. Depending on whether the input is true or complemented in the function, the appropriate wire is connected to a transistor's gate input in the AND plane. The wires in the AND plane emerge perpendicularly to the inputs and are crossed by another set of wires that make up the OR plane. Wherever a term would be fed into the OR for a given output of the function, the wires are connected (a wired- OR).

The PLA has the advantage that an automated tool can lay out all of the transistors and wires in both planes, for the particular number of inputs and outputs. Then, it is simply a matter of making a few connections to implement any Boolean function of that form. It can be more compact than a complex Boolean function because it doesn't implement full gates. Another advantage is that, if the function changes, but the number of I/O lines remains the same, the PLA stays the same size -- unlike a set of custom-wired gates. Given that a change in size may require the rest of a chip to be rearranged to provide enough space, this can be a significant savings in design effort.

Its disadvantage is that it may be slower than a set of gates with direct wiring because the switched-AND and wired-OR introduce large resistances that increase the time to charge the outputs.

ROMs as Boolean Function Blocks

A memory can be considered as equivalent to a block of Boolean logic circuitry if we treat the inputs as address lines and the outputs as the value fetched from the memory. As was mentioned, the IBM 1620 used a table lookup memory instead of and arithmetic unit. In that case, the memory was a writable store. In most cases where memory is used to implement a logic function, it is read-only.

A read only memory is constructed by wiring the bit-cells to either 1 or 0 permanently in every location. (In the VLSI process, this involves changing a single mask.) It has a decoder that uses the address to select a particular bit location, which is output as the result of the function input. The ROM must be programmed with a value for every possible input -- it does not have the option of combining inputs which have the same output. For example, if we wanted to construct a 32-bit AND gate with a ROM, all 4M addresses except one would have 0 stored in them. Obviously, a 4M-bit ROM is an inefficient way to implement an AND gate.

Gate Arrays and PLDs

An approach that falls between PLAs and random logic is the gate array. As its name implies, it is an array of gates with sets of wires that can be connected to link them together. The gate array has the advantage that it can be used to build functions with multiple levels of gates, which may be more efficient than the two-level formulation. Thus, although the individual gates are larger, the overall function can be smaller. Also, the gates tend to be faster, so one can build faster two-level circuits or deeper circuits at the same speed as the PLA. It still retains the advantage of fixed-size because there are often gates left unused (about 30% is typical) and automated tools exist for converting a function to the appropriate set of connections.

Gate arrays can be programmed by designing a mask layer for the VLSI process. There are also field-programmable gate arrays (FPGAs) in which the links are created by running a higher than normal voltage through a circuit to cause wires to fuse together.

Programmable logic devices are another approach. In these, a memory is used to control switches that link the gates (and even registers) together in a standard block of logic. This has the advantage of being very flexible, but it is also rather large and loading the memory at start-up is time consuming. However, it is often used in prototyping specialized systems because it is cheaper than building custom VLSI. PLDs are usually found on separate chips. Gate arrays and ROMs are found either on separate chips or in custom designs. PLAs are usually found only in custom designs.

Systems today are more and more being built from a microprocessor, a few gate array chips, and some memory. Custom chips or discreet logic chips are being found less and less. The exceptions usually have to do with I/O such as high-power bus buffers, analog to digital converters, and so on.

The circuits we have seen so far are called combinational, because their outputs depend only on the combination of input values fed into them. A second kind of circuit remembers its previous state, and its output depends on both its current inputs and its prior state -- it is thus called a sequential circuit.

Sequential Circuits

We can build a minimal sequential circuit by cross-coupling the outputs of two gates to each other's inputs. This forms a feedback loop that snaps into one state or another given particular inputs, and holds this state after the inputs are removed. The circuit is a single-bit memory known as a flip- flop.

[drawing of flip-flop from cross-coupled NAND gates]

The simple flip flop of this form is called an S-R (set-reset) flip-flop because its output is set to one when one of its inputs is 1 and the other is 0, and when the opposite combination of inputs is applied the output becomes 0. (Go through process of figuring out which is which.) Note that 1-1 is an undefined input.

We can gang up a bunch of flip-flops to form a register. However, we want to be able to load all of the bits of the register together using multiple input lines but a single control line. How might we do this?

Feed the control line to AND gates at the inputs of the flip- flops. When the control is high, the inputs are applied to the flip-flops. When it is low, they see 0. If we feed one of the inputs through an inverter to the other we get a flip flop that stores its input value when the control signal is applied -- also called a latch or D flip-flop.

The D flip-flop is just what we need to build a register, except that it has the nasty property that whenever the control line is high, changes in the input all go through to the output. This could lead subsequent circuitry to produce temporary erroneous results. The solution is to gang two flip flops in series with the control signal going to the seconf of them inverted. The result is that when the control signal goes high, the second flip flop hold its value while the first one tracks the input. Then, when the control signal goes low again, the first flip-flop locks its value down and the second flip flop takes this final value as its input. Thus, only the one change is seen to occur at the output.

The control signal is usually called the clock. Flip flops like this are called master-slave. They can be ganged in series to form a shift register, or in parallel to form a normal register. A block of N registers can be assembled to form a register file, and a decoder circuit can be provided that takes a log N input register number and outputs a signal to select the given register (selection is simply the ANDing of this signal with the clock input to the particular register).

Memory

Static random access memory (SRAM) is built much like a register in that it uses a feedback loop to hold a value. However it is not master-slave. The feedback loop usually consists of six transistors or sometimes four. An SRAM retains its contents as long as power is applied -- it continuously consumes power to maintain its contents. It is slower than a register because it is built with small transistors that can't carry as much current as those in a gate.

Dynamic RAM (DRAM) uses fewer transistors (three or even just one). This explains why DRAMs are larger than SRAMs (usually by a factor of 4). It stores a charge in a capacitor and then reads it back out using a special amplification circuit. The charge can disipate gradually over time (perhaps a millisecond) and so the contents of memory must be read out and rewritten periodically to aviod losing them. Fortunately, the memory chips themselves provide ways to do this internally as long as they are set to the proper mode and clocked a certain number of times. DRAM consumes power only when data is being read, written or refreshed, making it a cooler technology. However, DRAM is also slower than SRAM because the special amplification needed to sense the stored charge is time consuming.

During a refresh operation, the data is unavailable, but refresh takes only abou 1 to 2% of the available time because the internal refresh circuitry reads out an entire row of bits to a buffer register and then writes them all back. Thus, the refresh time is proportional to the number of columns of bits in the RAM.

Video RAM (VRAM) is a special type of DRAM in which the refresh buffer is duplicated and connected to the outside world as a second, serial I/O port. A line of data can be fetched from the memory in parallel, just as is done in a refresh, and then be read out serially. If the RAM contains pixel values, then this sequence nicely corresponds to a scan line on a monitor and facilitates display generation. At the same time, the computer can access the RAM normally with almost no interference.

Error Detection and Correction

Given that a modern system contains typically 32 MB of memory, there are 256M bits present. These are accessed at a rate of perhaps 320 million per second. Given these large numbers, the probability that an error will occur is fairly high. By adding redundant information to the stored data, we can detect errors and in some cases also correct them.

The key to this capability is the Hamming distance between combinations of code words, where a code word is a data value combined with the redundant information.

Take for example, the two words

0001 and 1000

These differ in two bits, and thus they have a Hamming distance of 2. With this distance, we can detect any single bit error in either of these two words. Flipping any one of their bits produces a code that is not one of them. Flipping two bits may also produce erroneous codes, but if the right two are flipped then the one code becomes the other and no error can be detected.

We can construct codes with many more values so that all of the valid values are at least two bits apart. This is most directly done by taking an existing data word and adding an extra bit that is always set or cleared so as to make the number of bits in the code odd (or even). This type of code is called a parity code, and can detect a single bit error.

Whenever we have a Hamming distance of D for every combination of code words, we can detect D - 1 error bits. You can determine the Hamming distance for a set of code words by comparing every pair of codes and counting the number of different bits. The smallest number of different bits among all of the pairings is the Hamming distance for the entire code (the pair with the least number of different bits is the weak link in the code, since it takes only that many bits to confuse one for the other).

If the Hamming distance for a set of code words is at least 3, we can correct for any single bit error. A single bit error is just one away from a valid code word, and since every other code word is 2 bits away from the erroneous code word, we correct the error and get back to the only word that is just one bit away.

Of course, if multiple bits flip, we may mistakenly correct to the wrong code word. However, for a distance of 3, we can tell if two bits were flipped, so it would take a pretty big (3- bit) error to fool the correction system. This kind of code is called SECDED (Single Error Correction, Double Error Detection).

Given a number of errors that we would like to correct, what Hamming distance is required for the code?

This is easy when we think in terms of distances geometrically. If you want to be able to go at least E distance away from a point in space yet always have it be the closest point to you, what distance has to separate all of the points?

2E + 1 = D

So, to correct 5-bit errors, we would have to use a code with a Hamming distance of 11.

As an aside, suppose you are transmitting a lot of data over phone lines and you know that they can have bursts of errors due to noise, but never more than 500 bits worth. How could you use a SECDED code to correct for this?

Arrange the data in blocks of 500 words. Send all of their first bits, then all of their second bits, etc. If an error burst occurs it will only damage a single bit in each word. You can then run through the words at the other end, using the SECDED method to correct each of their 1-bit errors.

Finite State Machines

FSMs are used to control the steps that a computer goes through in executing an instruction or any other multistep operation such as a multicycle arithmetic computation. They consist of Boolean function logic and memory (usually a set of registers). Depending on the current state, as represented by memory and input values, a next state is selected and an output value is generated.

For example, a computer's control unti would have as inputs the current instruction code, the contents of the status register, and a set of registers called the major and minor states of the machine. Major states are steps like Fetch, Execute, Interrupt. Minor states are substeps such as output the address to the memory, set the control lines to read, latch the output from the memory into the instruction register. When each substep is completed, the next-state function causes the status of the state registers and status register to change so that the computer goes to the next substep.

[drawing of a control unit FSM]

The outputs of this FSM are the control signals that are distributed across the machine. So, for example, the memory read control signal might be

(Major State = Fetch AND Minor State = Read) OR (Major State = Execute AND Minor State = Operand Fetch AND Opcode = Read)

Basically, to design a control unit, one looks at every circuit in the computer that needs a control input and then determines all of the conditions that can cause each circuit to activate. These are then given individual functions in terms of the available input and status information.


© Copyright 1995, 1996 Charles C. Weems Jr. All rights reserved.
Back to Chip Weems' home page.
Back to courses index page.
Back to Computer Science Department home page.