Buses
Buses are designed to connect more than two devices
together in a system. Devices may include the CPU, main memory, and I/O
devices. They consist of three types of signals:
Data
Address
Control
In addition to these signals, there may be utility
lines defined as part of the bus to supply power and ground to the devices
connected to it.
General purpose buses include the PCI, VME, FutureBus,
Multibus, USB, SCSI, etc. and are designed to facilitate connecting a wide
range of devices to a machine with high bandwidth and low latency, relative to
cost. The versatility of these buses limit their performance to some extent,
however.
Most buses are backplane devices -- that is, the
physical implementation is a circuit board with parallel connectors into which
other boards are plugged perpendicularly.
Some buses, such as USB and SCSI, are cable based --
devices in separate chassis are connected together by cables that carry the bus
signals.
Another variation is the front-plane bus in which ribbon
cable connects circuit boards mounted in a chassis (usually in addition to a
backplane bus) -- an example is the MaxBus digital video bus, or the CSPI
QuickRing multiprocessor interconnect. The SGI Origin multiprocessors also use
a frontplane but to interconnect racks.
Buses generally are distinguished into two control
mechanisms: synchronous and asynchronous.
Most general purpose bus designs are asynchronous
because asynchrony is more flexible in accommodating a wider range of devices.
Asynchronous bus designs also adapt more readily to technological advances.
In a synchronous bus design, there is an assumption of
a basic clock rate and increasing that rate can cause older devices to fail to
operate properly. In an asynchronous design, older devices may simply reduce
performance.
However, an asynchronous protocol is usually more
complex, requiring both more hardware and more overhead for each transaction.
Synchronous buses can operate with lower latency and higher bandwidth for a
given number of signals.
Thus, synchronous buses are usually developed for
applications in which maximum performance is required and the usage is
constrained to devices with a narrow range of timing characteristics.
From this we can deduce that the two types of buses
have different applications for which they are best.

Modern systems often employ both types of buses. The
CPU connects to one or more synchronous busses connecting it with external
cache, main memory, video driver, and/or a bus adapter. In some cases, the CPU
and Cache are connected by a dedicated synchronous link, and the cache
controller connects to the memory bus. The CPU may also have a direct
connection to the bus that bypasses the cache.

The advantage of the combined approach is that the
path from the CPU to memory can have the optimum performance, and this is the
path that most affects overall performance. The bus adapter provides a
connection to an asynchronous bus that matches the characteristics of the
synchronous bus on one side and handles the asynchronous protocol on the other
side.
Usually, in such architectures, the synchronous bus is
a custom design that is found only on the CPU board itself, and possibly on
memory expansion daughter cards, although systems have been designed in the
past with such buses on custom backplanes to facilitate memory expansion. In
many such applications, the synchronous control scheme is even further
simplified by the fact that there are only two potential bus masters.
Arbitration
On a typical I/O bus, however, there may be multiple
potential masters and there is a need to arbitrate between simultaneous
requests to use the bus. The arbitration can be either central or distributed.
In the central scheme, it is assumed that there is
just one device (usually the CPU card) that has the arbitration hardware. The
central arbiter can determine priorities and can force termination of a
transaction if necessary. Central arbitration is simpler and lower in cost for
a uniprocessor system. It does not work as well for a symmetric multiprocessor
design unless the arbiter is independent of the CPUs.
In the distributed scheme, every potential master
carries some hardware for arbitration and all potential masters compete equally
for the bus. The arbitration scheme is often based on some preassigned
priorities for the devices, but these can be changed.
Daisychaining is a hybrid of central and distributed
arbitration. Devices request the bus by passing a signal to their neighbors who
are closer to the central arbiter. If a closer device also requests the bus,
then the request from the more distant device is blocked. The central arbiter
then just has to grant the bus to the closest device requesting it. The
priority scheme is fixed by the device's physical position on the bus, and
cannot be changed in software. Daisychaining is also susceptible to faults. If
a device fails, all devices beyond it in the chain are unable to request the
bus.
Sometimes, multiple request and grant lines are used
with daisychaining to enable requests from devices to bypass a closer device,
and thereby implement a restricted software-controllable priority scheme.
Transfer modes
Buses may also support multiple transfer modes. For
example, a block transfer that begins with the start address and then sends a
sequence of data values can occur faster because the arbitration and address
portion of the protocol take place just once for the block.
Some bus designs also multiplex the address and data
lines. There are two instances where this is useful -- reducing cost, and
increasing bandwidth.
A very low cost bus might always transfer an address
and then the data over the same lines. Thus the two transfers cannot be
overlapped in time, and buffering of the address is required at each end, but
the bus itself can have half the number of wires. To reduce cost even further,
the width of the bus can be reduced so that the address and data are
transmitted in pieces. This can be taken to the limit of a single signal path
with the control protocol embedded (a traditional serial data link).
On the other hand, increased bandwidth can be the goal
in multiplexing address and data. For example, in a block transfer, following
the initial address, the address and data lines might both be employed to
transfer twice as much data per cycle.
One problem with multiplexing the address lines with
data values, is that they must then pass though an extra set of steering logic
gates that direct the signals to the appropriate places at each end of the
transmission. This both increases bus latency and the cost of the bus
interface.
Another transfer mode that is found on high
performance buses is a split transfer, which is usually associated with a read.
Because memory latency may be high, the bus can be tied up for several cycles
while waiting for the data to return from a read request. In a split cycle, the
read is requested and then the bus is released. Other transactions can then use
the bus. When the value has been fetched, a memory controller becomes a bus
master and transfers the data to the requester.
Often in a split transfer system, the memory
controller is designed to buffer multiple requests, so that reads may be
pipelined over the bus. The split transfer is especially useful in
multiprocessor systems. The overall result is to increase the effective bandwidth
of the bus by the factor L/T where L is the memory latency and T is the bus
cycle time.
Here are some examples of older workstation buses
|
|
VME |
FutureBus |
Multibus
II |
S
bus |
|
Bus
width |
128 |
96 |
96 |
|
|
Address/Data
multiplexed |
No |
Yes |
Yes |
|
|
Data
width |
16/32/64 |
32/64/128 |
32 |
32 |
|
Transfer |
Single/Block |
Single/Block |
Single/Block |
Single/Block |
|
Masters |
Multiple |
Multiple |
Multiple |
Multiple |
|
Split
transfer |
No |
Optional |
Optional |
|
|
Clocking |
Asynch. |
Asynch. |
Synch. |
16
- 25 MHz. |
|
Bandwidth,
single word, no latency |
25
MB/S |
37
MB/S |
20
MB/S |
33
MB/S |
|
Bandwidth,
single word, 150 ns latency |
12.9
MB/S |
15.5
MB/S |
10
MB/S |
|
|
Bandwidth,
infinite block, no latency |
27.9
MB/S |
95.2
MB/S |
40
MB/S |
89
MB/S |
|
Bandwidthe,
infinite block, 150 ns latency |
13.6
MB/S |
20.8
MB/S |
13.3
MB/S |
|
|
Maximum
devices |
21 |
20 |
21 |
|
|
Maximum
length |
0.5
m |
0.5
m |
0.5
m |
|
|
Standard
number |
IEEE
1014 |
IEEE
896.1 |
ANSI/IEEE
1296 |
|
Here
are some peripheral buses
|
|
SCSI |
SCSI-2 |
IDE |
IPI |
|
Bus
width |
50 |
Varies |
40 |
|
|
Address/Data
multiplexed |
Yes |
Yes |
N/A |
|
|
Data
width |
8 |
8/16/32 |
16 |
16 |
|
Transfer |
Single/Block |
Single/Block |
Block |
|
|
Masters |
Multiple |
Multiple |
Single |
Single |
|
Split
transfer |
Optional |
Optional |
No |
|
|
Clocking |
Both |
Both |
Asynch |
Asynch |
|
Bandwidth,
single word, no latency |
5
MB Synch, 1.5 MB Asyn |
5
- 40 MB/S Synch. |
|
25
MB/S |
|
Bandwidth,
single word, 150 ns latency |
5
MB Synch 1.5 MB Asyn |
|
|
|
|
Bandwidth,
infinite block, no latency |
5
MB Synch 1.5 MB Asyn |
|
|
25
MB/S |
|
Bandwidthe,
infinite block, 150 ns latency |
5
MB Synch 1.5 MB Asyn |
|
|
|
|
Maximum
devices |
7 |
7 |
2
(Disk only) |
|
|
Maximum
length |
25
m |
25
m |
18
in. |
|
|
Standard
number |
ANSI
X3.131-1986 |
ANSI
X3.131-199X |
ANSI
X3T9.2/90-143 |
ANSI
X3.129 |
Here are some PC-class buses
|
|
Micro
Channel |
PCI |
Nubus |
|
Bus
width |
140/182 |
124/188 |
96 |
|
Address/Data
multiplexed |
16/32/no/24 32/64//yes |
Yes |
Yes |
|
Data
width |
16/32 |
32/64 |
32 |
|
Transfer |
Single/Block |
Single/Block |
Single/Block |
|
Masters |
Multiple |
Multiple |
Muliple
(restricted) |
|
Split
transfer |
|
No |
No |
|
Clocking |
Asynch. |
33/66
MHz. |
10
MHz |
|
Bandwidth,
1 word, 0 latency |
20
MB/S |
33/66
MB/S |
|
|
BW,
1 word, 150 ns latency |
|
|
|
|
BW,
infinite block, 0 latency |
75
MB/S |
111/222
MB/S |
|
|
BW,
infinite block, 150 ns latency |
|
|
|
|
Maximum
devices |
|
|
|
|
Maximum
length |
|
|
|
|
Standard
number |
|
|
ANSI
1196-1987 |
Here are some multiprocessor server bus examples
|
|
HP
Summit |
SGI
Challenge |
Sun
XDBus |
|
Bus
width |
|
|
|
|
Address/Data
multiplexed |
|
|
|
|
Data
width |
128 |
256 |
144 |
|
Transfer |
Single/Block |
Single/Block |
Single/Block |
|
Masters |
Multiple |
Multiple |
Multiple |
|
Split
transfer |
|
|
|
|
Clocking |
60
MHz. |
48
MHz |
66
MHz. |
|
Bandwidth,
single word, no latency |
|
|
|
|
Bandwidth,
single word, 150 ns latency |
|
|
|
|
Bandwidth,
infinite block, no latency |
960
MB/S |
1200
MB/S |
1056
MB/S |
|
Bandwidthe,
infinite block, 150 ns latency |
|
|
|
|
Maximum
devices |
|
|
|
|
Maximum
length |
|
|
|
|
Standard
number |
|
|
|
Here, for comparison, are the older IBM-PC clone buses
|
|
ISA-8 |
ISA-16 |
EISA |
VESA |
|
Bus
width |
62 |
98 |
98/100 |
116 |
|
Address/Data
multiplexed |
20/no |
24/no |
24/32/no |
|
|
Data
width |
8 |
16 |
16/32 |
32 |
|
Transfer |
Single/Block |
Single/Block |