Lecture 22: Some more buses, and memory organization (by Trek Palmer) ================================================ Synchronous buses ----------------- In a synchronous bus, everything is oriented around the clock signal. All the logic in the attached devices runs off of the global clock, which simplifies the detection of bus signals. Here's an example of a data transfer on a synchronous bus: Clock +-------------------------+ +-------- ___| |__________________________| __ ____________________________________________________ ________ Addr/ \/ \/ Cmd __/\____________________________________________________/\________ Data ___________________________ ____________________________/ \_________ \___________________________/ With synchronous buses, the clock pulse must be long enough to allow for signals to propagate to the farthest devices on the bus, and to allow them time to decode and react to control signals. In this case, the data is sent over the bus sometime after the master asserts the address and command lines because there's a delay between when the master sends the signal and when the slave recieves and interprets the signal. Getting the timing of the clock correct is perhaps the biggest headache in synchronous bus design. Asynchronous Buses ------------------- In an asynchronous bus, there is no centralized clock. This means that the bus designers can be much more relaxed about the clock signal, but it means that the bus protocol becomes much more complicated. Like in networking, devices now have to "handshake" in order to communicate to one another. A handshake is an exchange between two devices to establish a communication channel. In a bus it boils down to two questions: Master: "Are you ready? Cause I'm ready." and the response, Slave: "Now I'm ready. Let's get started." In a timing diagram: __ ____________________________________ __________________________ Addr/ \/ \/ Cmd __/\____________________________________/\__________________________ Master _______________________ Ready ____________| |________________________________ Slave ______________________________ Ready _____________________________| |________ ____________________________ Data _____________________________/ \________ \____________________________/ So here, the master asserts its ready signal until the slave is ready, and then it waits for the transfer. In general, asynchronous buses operate by the master signalling that it's ready, the slave signalling that its ready and lowering the ready signal when this particular transaction is complete. Memory Organization ------------------- The basic, high-level view of memory organization is that the CPU asks the MMU (Memory Management Unit) for an address, and the MMU takes care of the details for accessing the physical memory hardware. CPU ----- MMU ------ RAM MAR MDR The MAR and MDR are a little anachronistic. But the abstraction is useful. MAR stands for memory address register, and it stores the address that the CPU is currently interested in. If the MAR is k bits wide, then the system can be said to be k-bit addressable (or have k-bit addresses). Most modern systems have 32-bit or 64-bit addresses (sometimes both). MDR stands for memory data register, and it holds the value returned by the memory system. If the MDR is n-bits wide, then the system can deal with n-bit values (at most). RAM stands for random-access memory, which means that any address can be accessed in a fixed amount of time. RAM memories ------------- Modern RAM is implemented with semiconductors, but is somewhat different from the flip-flop based memories we've already seen. Putting that aside for a moment, assume that we have some way of storing a bit. If we organize these bit-buckets into a grid. Each row of the grid stores the bits of an addressable word. If a decoder is attached to the grid, it can be used to activate a given row with a given input address. When a row is activated, it's bits are dumped down the wires running vertically down the grid. decoder ___ / |-+-Bitn-1 -+- Bitn-2 ... -+- Bit1 -+- Bit0 Word0 A0-----| | | | | | A1-----| |-+-Bitn-1 -+- Bitn-2 ... -+- Bit1 -+- Bit0 Word1 . | | | . | | | . | | | . | | | . | | | . | | | Ak-1---| |-+-Bitn-1 -+- Bitn-2 ... -+- Bit1 -+- Bit0 Word (2^k)-1 \___| | | | | | | | | output lines In this simple case, an address comes into the decoder, activates a word line. In the case of a write, the input value will be coming in on the output lines and written to the bit cells. For a read, the bit cells will dump their values out onto the output lines. Decoders are expensive, in terms of transistor count, and so actual memories use a more complex organization. Instead of having one word per line, they will have many words, all of which will be selected by part of the address, the rest of the address will be used to drive a multiplexor attached to the output lines to select the actual word. decoder ___ / |-+-Bitn-1 -+- Bitn-2 ... -+- Bit1 -+- Bit0 Word0-Word(2^k/2)-1 A0-----| | | | | | A1-----| |-+-Bitn-1 -+- Bitn-2 ... -+- Bit1 -+- Bit0 Word2^k/2-Word2^(k/2+1) . | | | . | | | . | | | . | | | . | | | . | | | Ak/2---| |-+-Bitn-1 -+- Bitn-2 ... -+- Bit1 -+- Bit0 -Word(2^k)-1 -1 \___| | | | | | | | | output lines _|_________|______________|________|_ Ak/2 -------| Multiplexor | Ak/2 + 1 ---| | ... | | Ak-1 -------\ / \__________________________________/ | | | | bitn-1 bitn-2 ... bit1 bit0 This helps reduce the transistor count greatly, but complicates memory access a tad. Now you select an entire group of words, and you have to filter out the ones you're not interested in. Static memories ---------------- A static memory is one that maintains its values as long as power is supplied. The flip-flops we examined earlier are an instance of static memories. A static memory cell may look something like: | | | | | | | |\ | | +--| o-------+ | | | |/ | | +-----|____|----+ +--------|____|------+ | ---+-- | /| | ---+-- | | | +-----o |----+ | | | | \| | | | | | | | | | | ----------+-------------------------------+------------- Word line | | B ~B Now as long as power is provided, anything you put into this memory cell will be remembered. Static RAMs aren't what's actually used in the RAM chips in your PC, however. The implementation above requires 6 transistors, which are bulky, and more expensive than simpler components. So, if your goal is to squeeze as much RAM into a fixed size as possible (this, by the way, is the goal of the RAM manufacturers), then if you can construct a smaller memory cell you can get more bang for your buck. Dynamic memory -------------- If you're willing to complicate the control circuitry, you can dramatically reduce the size of your memory cells. In a dynamic memory system, the memory must be periodically refreshed. You have to refresh it because instead of cross-coupled inverters, you're using a capacitor to store the value. | -----|--------+------------------------------------ Word line | | | __|__ | _____ | | | +------| |-----| |-------|| | | | | | | Capacitors are an electronic component that can hold a charge. They're smaller than transistors, so you can have at least 3 capacitor based memory cells in the same amount of space as a flip-flop based one. The downside is that capacitors (particularly at these feature sizes) 'bleed' charge. So, every so often, you need to re-inject charge into the capacitor. What this means is that you need refresh circuitry in addition to the vanilla read/write circuitry. This refresh logic basically reads out the old values and writes them back in. This can be accomplished by attaching a latch to each output line that remembers the last value read out. This may seem a little stupid, but it means that you can have oodles and oodles of bits for next to nothing. Dynamic RAM is often abbreviated as DRAM, and until recently, most of what people had in their PCs were DRAMs not much more complicated than this.