Lecture 12: Buses

Buses

Buses are designed to connect more than two devices together in a system. Devices may include the CPU, main memory, and I/O devices. They consist of three types of signals:

Data

Address

Control

In addition to these signals, there may be utility lines defined as part of the bus to supply power and ground to the devices connected to it.

General purpose buses include the PCI, VME, FutureBus, Multibus, USB, SCSI, etc. and are designed to facilitate connecting a wide range of devices to a machine with high bandwidth and low latency, relative to cost. The versatility of these buses limit their performance to some extent, however.

Most buses are backplane devices -- that is, the physical implementation is a circuit board with parallel connectors into which other boards are plugged perpendicularly.

Some buses, such as USB and SCSI, are cable based -- devices in separate chassis are connected together by cables that carry the bus signals.

Another variation is the front-plane bus in which ribbon cable connects circuit boards mounted in a chassis (usually in addition to a backplane bus) -- an example is the MaxBus digital video bus, or the CSPI QuickRing multiprocessor interconnect. The SGI Origin multiprocessors also use a frontplane but to interconnect racks.

Buses generally are distinguished into two control mechanisms: synchronous and asynchronous.

Most general purpose bus designs are asynchronous because asynchrony is more flexible in accommodating a wider range of devices. Asynchronous bus designs also adapt more readily to technological advances.

In a synchronous bus design, there is an assumption of a basic clock rate and increasing that rate can cause older devices to fail to operate properly. In an asynchronous design, older devices may simply reduce performance.

However, an asynchronous protocol is usually more complex, requiring both more hardware and more overhead for each transaction. Synchronous buses can operate with lower latency and higher bandwidth for a given number of signals.

Thus, synchronous buses are usually developed for applications in which maximum performance is required and the usage is constrained to devices with a narrow range of timing characteristics.

From this we can deduce that the two types of buses have different applications for which they are best.

Modern systems often employ both types of buses. The CPU connects to one or more synchronous busses connecting it with external cache, main memory, video driver, and/or a bus adapter. In some cases, the CPU and Cache are connected by a dedicated synchronous link, and the cache controller connects to the memory bus. The CPU may also have a direct connection to the bus that bypasses the cache.

The advantage of the combined approach is that the path from the CPU to memory can have the optimum performance, and this is the path that most affects overall performance. The bus adapter provides a connection to an asynchronous bus that matches the characteristics of the synchronous bus on one side and handles the asynchronous protocol on the other side.

Usually, in such architectures, the synchronous bus is a custom design that is found only on the CPU board itself, and possibly on memory expansion daughter cards, although systems have been designed in the past with such buses on custom backplanes to facilitate memory expansion. In many such applications, the synchronous control scheme is even further simplified by the fact that there are only two potential bus masters.

Arbitration

On a typical I/O bus, however, there may be multiple potential masters and there is a need to arbitrate between simultaneous requests to use the bus. The arbitration can be either central or distributed.

In the central scheme, it is assumed that there is just one device (usually the CPU card) that has the arbitration hardware. The central arbiter can determine priorities and can force termination of a transaction if necessary. Central arbitration is simpler and lower in cost for a uniprocessor system. It does not work as well for a symmetric multiprocessor design unless the arbiter is independent of the CPUs.

In the distributed scheme, every potential master carries some hardware for arbitration and all potential masters compete equally for the bus. The arbitration scheme is often based on some preassigned priorities for the devices, but these can be changed.

Daisychaining is a hybrid of central and distributed arbitration. Devices request the bus by passing a signal to their neighbors who are closer to the central arbiter. If a closer device also requests the bus, then the request from the more distant device is blocked. The central arbiter then just has to grant the bus to the closest device requesting it. The priority scheme is fixed by the device's physical position on the bus, and cannot be changed in software. Daisychaining is also susceptible to faults. If a device fails, all devices beyond it in the chain are unable to request the bus.

Sometimes, multiple request and grant lines are used with daisychaining to enable requests from devices to bypass a closer device, and thereby implement a restricted software-controllable priority scheme.

Transfer modes

Buses may also support multiple transfer modes. For example, a block transfer that begins with the start address and then sends a sequence of data values can occur faster because the arbitration and address portion of the protocol take place just once for the block.

Some bus designs also multiplex the address and data lines. There are two instances where this is useful -- reducing cost, and increasing bandwidth.

A very low cost bus might always transfer an address and then the data over the same lines. Thus the two transfers cannot be overlapped in time, and buffering of the address is required at each end, but the bus itself can have half the number of wires. To reduce cost even further, the width of the bus can be reduced so that the address and data are transmitted in pieces. This can be taken to the limit of a single signal path with the control protocol embedded (a traditional serial data link).

On the other hand, increased bandwidth can be the goal in multiplexing address and data. For example, in a block transfer, following the initial address, the address and data lines might both be employed to transfer twice as much data per cycle.

One problem with multiplexing the address lines with data values, is that they must then pass though an extra set of steering logic gates that direct the signals to the appropriate places at each end of the transmission. This both increases bus latency and the cost of the bus interface.

Another transfer mode that is found on high performance buses is a split transfer, which is usually associated with a read. Because memory latency may be high, the bus can be tied up for several cycles while waiting for the data to return from a read request. In a split cycle, the read is requested and then the bus is released. Other transactions can then use the bus. When the value has been fetched, a memory controller becomes a bus master and transfers the data to the requester.

Often in a split transfer system, the memory controller is designed to buffer multiple requests, so that reads may be pipelined over the bus. The split transfer is especially useful in multiprocessor systems. The overall result is to increase the effective bandwidth of the bus by the factor L/T where L is the memory latency and T is the bus cycle time.

Here are some examples of older workstation buses

 

 

VME

FutureBus

Multibus II

S bus

Bus width

128

96

96

 

Address/Data multiplexed

No

Yes

Yes

 

Data width

16/32/64

32/64/128

32

32

Transfer

Single/Block

Single/Block

Single/Block

Single/Block

Masters

Multiple

Multiple

Multiple

Multiple

Split transfer

No

Optional

Optional

 

Clocking

Asynch.

Asynch.

Synch.

16 - 25 MHz.

Bandwidth, single word, no latency

25 MB/S

37 MB/S

20 MB/S

33 MB/S

Bandwidth, single word, 150 ns latency

12.9 MB/S

15.5 MB/S

10 MB/S

 

Bandwidth, infinite block, no latency

27.9 MB/S

95.2 MB/S

40 MB/S

89 MB/S

Bandwidthe, infinite block, 150 ns latency

13.6 MB/S

20.8 MB/S

13.3 MB/S

 

Maximum devices

21

20

21

 

Maximum length

0.5 m

0.5 m

0.5 m

 

Standard number

IEEE 1014

IEEE 896.1

ANSI/IEEE 1296

 


Here are some peripheral buses

 

 

SCSI

SCSI-2

IDE

IPI

Bus width

50

Varies

40

 

Address/Data multiplexed

Yes

Yes

N/A

 

Data width

8

8/16/32

16

16

Transfer

Single/Block

Single/Block

Block

 

Masters

Multiple

Multiple

Single

Single

Split transfer

Optional

Optional

No

 

Clocking

Both

Both

Asynch

Asynch

Bandwidth, single word, no latency

5 MB Synch, 1.5 MB Asyn

5 - 40 MB/S Synch.

 

25 MB/S

Bandwidth, single word, 150 ns latency

5 MB Synch 1.5 MB Asyn

 

 

 

Bandwidth, infinite block, no latency

5 MB Synch 1.5 MB Asyn

 

 

25 MB/S

Bandwidthe, infinite block, 150 ns latency

5 MB Synch 1.5 MB Asyn

 

 

 

Maximum devices

7

7

2 (Disk only)

 

Maximum length

25 m

25 m

18 in.

 

Standard number

ANSI X3.131-1986

ANSI X3.131-199X

ANSI X3T9.2/90-143

ANSI X3.129

Here are some PC-class buses

 

 

Micro Channel

PCI

Nubus

Bus width

140/182

124/188

96

Address/Data multiplexed

16/32/no/24

32/64//yes

Yes

Yes

Data width

16/32

32/64

32

Transfer

Single/Block

Single/Block

Single/Block

Masters

Multiple

Multiple

Muliple (restricted)

Split transfer

 

No

No

Clocking

Asynch.

33/66 MHz.

10 MHz

Bandwidth, 1 word, 0 latency

20 MB/S

33/66 MB/S

 

BW, 1 word, 150 ns latency

 

 

 

BW, infinite block, 0 latency

75 MB/S

111/222 MB/S

 

BW, infinite block, 150 ns latency

 

 

 

Maximum devices

 

 

 

Maximum length

 

 

 

Standard number

 

 

ANSI 1196-1987

Here are some multiprocessor server bus examples

 

 

HP Summit

SGI Challenge

Sun XDBus

Bus width

 

 

 

Address/Data multiplexed

 

 

 

Data width

128

256

144

Transfer

Single/Block

Single/Block

Single/Block

Masters

Multiple

Multiple

Multiple

Split transfer

 

 

 

Clocking

60 MHz.

48 MHz

66 MHz.

Bandwidth, single word, no latency

 

 

 

Bandwidth, single word, 150 ns latency

 

 

 

Bandwidth, infinite block, no latency

960 MB/S

1200 MB/S

1056 MB/S

Bandwidthe, infinite block, 150 ns latency

 

 

 

Maximum devices

 

 

 

Maximum length

 

 

 

Standard number

 

 

 

Here, for comparison, are the older IBM-PC clone buses

 

 

ISA-8

ISA-16

EISA

VESA

Bus width

62

98

98/100

116

Address/Data multiplexed

20/no

24/no

24/32/no

 

Data width

8

16

16/32

32

Transfer

Single/Block

Single/Block