At this point we have considered the three central components of the
basic five-block diagram of the computer that was introduced at the start
of the class: the arithmetic logic unit, the control unit, and the memory.
We have examined these three in depth while virtually ignoring the other
two: input and output. This follows a long tradition of neglecting I/O in
the architecture community.
Part of the neglect stems from the fact that advances in I/O technology
are mostly beyond the control of architects. I/O devices tend to be complex
electromechanical systems with properties that are much different than the
purely electronic devices that are used in the construction of the processing
components. If an architect wishes to improve a processor, it is simply
a matter of rearranging some gates with a CAD tool or changing the type
of memory that is employed. All of the technology is essentially the same,
and its steady advance is relatively predictable.
In contrast, the architect can do little to improve an I/O device, other
than suggest improvements to the manufacturers. For example, when the standards
were being developed for the audio compact disk, the incorporation of a
standard for computer data was largely an afterthought. The display devices
use in computers are derived from standard television technology. Keyboards
originated with typewriters, and their layout follows a standard that is
designed to reduce typing speed and increase effort. (Early typewriters
tended to jam if the typist pressed the keys too quickly.) The designs of
many forms of I/O devices are driven by concerns other than optimal performance.
There are basically two major classes of I/O devices used with computers: those that interact with people, and those that do not. The ones that interact with people are largely concerned with the translation between human-readable and machine-readable forms. The ones that do not interact with people avoid this translation.
We can further divide the Human-interactive devices into two categories:
direct and indirect. Direct I/O devices must respond to human action and
display information in real time at a rate that complements the capabilities
of people. Indirect I/O devices accept input or produce output where the
human is not directly involved. Examples would be a scanner or a printer
-- these devices perform the human-machine translation, but they do not
need to react directly to a human in real-time.
Direct I/O devices include the keyboard, mouse, trackball, screen, joystick,
drawing tablet, musical instrument interface, speaker(s) and microphone.
Indirect I/O devices include the printer, scanner, video camera, and analog
video tape. (Digital video and audio are already machine readable.) Video
cameras are currently indirect I/O because little is done with the information,
other than to store it or to process it in a noninteractive manner. Eventually,
computers may interpret video and humans will be able to use it as an interactive
I/O device in which their gestures are translated into commands.
The keyboard is an assembly of switches logically arranged in a matrix.
When a switch closes (a key is pressed) a row and column of wires are energized
and the combination specifies a particular key. The electronics of the keyboard
have two functions to perform. First, they must "debounce" the
keypress -- whenever a switch closes, it has a tendency to bounce, or produce
a noisy set of pulses in the instant before it solidly makes contact. Circuitry
in the keyboard screens out this noise and presents only the solid switch
transistions. Second, it must translate the row and column into a standard
code and then send this as a serial train of pulses to the CPU. The translation
is done through a table lookup, and the serialization is done through a
circuit called a UART (Universal Asynchronous Receiver/Transmitter). It
is essentially a parallel-load shift register that adds some control bits
and possibly a parity bit to the character code. Some keyboards also provide
circuitry that detects when a key is held down for a longer than normal
period and they automatically start to repeat the transmission of the corresponding
character code. In some systems, they keyboard may also have special keys
that bypass the usual translation step and connect directly to the computer,
providing a means of getting the computer's attention (generating a high-priority
interrupt). A typical keyboard produces bursts of character codes at a rate
of up to 10 per second (120 words per minute) but the average data rate
is even lower.
A mouse can be either mechanical or optical in nature. With a mechanical
mouse, a ball protrudes under the housing and as it is rolled across a surface
it turns a perpendicular pair of shafts inside the housing. The shafts drive
encoders that consist of a clear plastic wheel with radial lines printed
on it. An LED shines through this wheel onto a phototransistor, and as the
lines pass between them the variation in the light reaching the phototransistor
causes it to generate pulses. The pulse train is either counted in the mouse
or it is sent to the computer to be counted. (A pair of phototransistors
is actually used, which enables the circutry of the mouse to determine which
direction the shaft is rotating.)
In an optical mouse, a pair of LEDs shine on a special reflective pad that
is printed with a grid of lines that have two different colors (usually
blue lines run horizontally and black lines run vertically). There are two
phototransistors that sense the reflected light and can determine, as the
mouse is moved across the pad, which direction it is moving, because each
phototransistor is sensitive to just one of the colors and is elongated
in the particular direction. Again, it either counts the pulses resulting
from the reflections of the dots and sends the pulses to the computer or
counts the pulses and sends the counts to the computer.
The mouse is one of the few I/O devices that originated solely with the
computer industry. It has a burst data rate of 20 bytes per second (when
sending counts) and slightly higher when sending raw pulses, but its average
data rate is much lower. Typically the counts are sent to the processor
in the same serial manner as the keyboard, and in some systems the mouse
even shares the same I/O lines with the keyboard.
A trackball is very much like an upside-down mechanical mouse, but the ball
is typically larger, and the user rolls it with his or her fingers or hand.
A joystick, as emplyed in many computer games, is typically a resistive
device. The stick turns the shafts of two (One for X, one for Y) potentiometers
(just like the volume knob on a radio) and the voltage resulting from the
potentiometer's resistance at that particular position is converted into
a corresponding number by an analog to digital converter (ADC). The output
of the ADC is serialized and sent to the computer much like keyboard or
mouse data.
Screen: Based on television technology. An elecron beam is excited (essentially
a small linear accelerator) and as it heads toward the front of the screen,
it passes between orthogonal pairs of plates that are appropriately charged
to deflect it. The plates are set up to oscillate in a basic pattern in
which the horizontal deflection is completed for each step in vertical deflection.
The result is much like the way that a nested loop accesses a two dimensional
array in row-major order. As the scan occurs, the beam is varied in intensity
to produce variations in brightness on the screen. When the electrons reach
the front of the tube, they strike a phosphorescent material that truns
their energy into light. The material continues to emit light for a brief
period after it is struck by the electron beam (called the decay period),
so that the screen appears to remain evenly illuminated. The color is achieved
by a mask dot pattern, with triads of phosphor dots that each emit light
in one of the primary colors. The basic steering signal to the plates is
modulated with a color (or "chroma") signal that induces minor
steering deflections in the beam so that it illuminates the appropriate
combination of each triad to produce the desired color. Sharp distinctions
in bright color thus require the beam to be steered sharply from triad to
triad, which requires a high-frequency, high-energy signal. This in turn
requires electronics to be very carefully designed and constructed. Of course,
this is one area in which cost/performance tradeoffs are often made. A monitor
that is designed to handle sharp boundaries between fine-grained fields
of very different colors will invariably be more expensive than one that
tends to blur color boundaries.
Another aspect of the design that is often traded off is magnetic shielding.
The drive electronics include some fairly large electromagnets, and their
magnetic fields can extend beyond the case of the monitor if not properly
shielded. These fields can add noise to nearby speakers, and especially
in situations where multiple monitors are used, they can cause interference
between adjacent monitors.
The screen's parameters include resolution (horizontal and vertical, ranging
from 320 by 200 to 1600 by 1200 in typical monitors, with 640 x 480, 832
x 624, 1024 by 768, 1152 x 870, and 1360 x 1024 being popular intermediates),
colors (depth -- 8 to 24 bits), pitch (dots per inch -- typically 72 to
100), and size (measured diagonally including a portion of the tube that
can't be seen, and typically 12 to 21 inches). Given that each color pixel
must occupy three distinct places on the screen with a wide enough separation
that the electron beam can be steered among them, the resolution will be
lower than that of a monochrome screen. Monitros can also vary in what is
called the refresh rate -- the number times the beam scans the screen each
second. In a TV set, the screen is scanned at a rate of 30 Hz per frame.
However, a frame is divided into two fields that are interlaced -- that
is, the lines of the two fields alternate on the screen. The fields are
scanned in sequence (each takes 1/60th of a second) and the decay time of
the phosphor keeps their contents visible. Computer monitors are scanned
at a slightly faster rate (typically 67 to 75 Hz) and may or may not be
interlaced.
One of the problems in displaying information to the screen is the amount
of data that has to be continually fetched. Given an 1152 by 870 screen
with a noninterlaced refresh rate of 75 Hz and 24 bits of color, we must
transfer 1152 x 870 x 75 x 3 = 225.5 MB/s. An interlaced display would be
half of that rate. Roughly 3 MB of memory is needed to hold the data that
represents those pixels, and every byte is accessed 75 times per second.The
memory is called a frame buffer, and it is typically made of video RAM (VRAM),
which has two Read/Write ports called direct access and sequential access.
The sequentiall access port fetches a line of video data from the main block
of the RAM chip into a buffer that is a long shift register. It then proceeds
to stream the pixels out to the display. The direct access port appears
as normal DRAM to the CPU, and it is only prevented from writing to the
port during the brief instant that a line is being copied into the serial
buffer.
Higher resolution resolution displays are being developed. In particular,
current displays do not show all of the detail that a printer can generate,
and this means that graphical designers have difficult previewing their
work on a screen. Higher resolution can be achieved with current technology
(High Definition Television -- HDTV -- for example, will provide 4000 x
2000 resolution in a wide-screen format). But many are looking to LCD and
LED display technology. Xerox, for example, has demonstrated an 8.5"
by 11" LCD display that has a 300 dots per inch pitch (2550 x 3300
resolution) with 24 bits of color. Unfortunately, such a screen requires
a 25 MB frame buffer, and at a 75 Hz refresh rate the data must be transferred
at 1.9 GB/s to the screen.
Drawing tablet. Uses a grid of resistive "wires" (printed circuit
lines). Pressure on the surface causes the grid lines to make contact with
an underlying conductive sheet. The voltage from the X and Y wires is converted
by analog to digital convertors and sent to the computer.
MIDI Interface. A standard for controlling electronic musical instruments
or for inputting data from the same instruments.
Speaker and sound generation, text to speech synthesis. To produce sounds
the computer sends a numerical value to a digital to analog convertor (DAC)
that translates it into a voltage which is amplified and sent to the speaker.
If the value is varied with sufficient rapidity it can produce any waveform
with a principal frequency of up to half of the rate of variation. The number
of bits that the DAC can distinguish is called the resolution (8 is typical,
4 can produce many distinct tones, 16 gives CD-quality sound, and 24 are
required to equal studio-quality). The rate at which the signal is varied
is called the sample rate and can be as little as 1 KHz (to produce a 500
Hz tone) to 44.1 KHz, to produce a CD-quality frequency response. Speech
is synthesized using a lookup table of phonemes that are chained together
into words. Text can be transformed into phonemes through a set of a few
hundred rules augmented with dictionary lookup for special cases. The result
is understandable but not particularly natural sounding. More natural speech
is usually produced for a limited domain where each phrase is actually a
recorded voice that is highly compressed. CD-quality recorded sound takes
a significant amount of space due to its moderate but steady data rate (88
KB/s), but can be significantly compressed without a loss of quality and
highly compressed with a loss of quality that takes advantage of certain
limitations of human hearing so that the loss is generally unnoticed by
casual listeners.
Microphone and voice recognition. Going in the reverse direction, the microphone
turns acoustical pressure into a variation in voltage that is converted
at regular intervals (the sampling rate) into numerical values. This digitized
signal can simply be recorded, or it can be processed by voice recognition
software and turned into text. The recognition process requires a combination
of signal processing and artificial intelligence techniques to extract phonemes
and whole words. It is beyond the scope of this discussion, except to say
that if it is done well, it can easily consume all of the processing power
of a very powerful machine plus a dedicated signal processing computer.
Even then, it may be limited to a single person for which it has been "trained"
or it may be limited to just a small number of words and phrases for multiple
speakers. Note that speech recognition is much easier than recognizing arbitrary
sounds, which is still well beyond the capabilities of today's computers.
Printer: There are many different types of printer: laser printer, ink jet,
impact dot matrix, dye sublimation. Each uses a different technology. In
an impact dot-matrix, wires are driven by electromagnets to strike an inked
ribbon against the paper. In an inkjet, the ink is squirted from a column
of holes by piezoelectric crystals or by thermal expansion. In a laser printer,
a selenium coated drum is charged with static electricity and then scanned
by a laser beam that rapidly pulses. Wherever the light strikes the selenium,
the charge is disipated. The drum then passes over a cartridge with black
"toner" powder. Where the static charge remains, the toner is
drawn to the drum. The drum then applies the toner to the paper (and is
cleaned of any residue). The paper passes through a heater that fuses the
toner particles into it and then a static discharge brush before being ejected
from the printer. Typical printer data rates range from 120 B/s to 4MB/s.
Scanner: A controlled linear light source is passed over a piece of paper
with a corresponding linear array of photosensors (CCD) to sense the light
reflected from the paper. The contents of the CCD are read out and digitized
and sent to the computer. Scanners typically can operate of resolutions
of 300 to 2400 dots per inch with up to 24 bits of color information. A
typical data rate would be 1 MB/s to 5 MB/s, and is often limited by the
disk to which the data is being written.
Video camera and tape -- Input via an ADC to a frame buffer. If fully digitized
without compression, a data rate of 28 MB/s can result. Compression with
some loss can reduce this to 80 KB/s, and with very little loss (broadcast
quality) the rate can still be reduced to 160 KB/s. The compression is done
between the ADC and the frame buffer, usually by a dedicated digital signal
processor.
Secondary storage vs. external I/O.
Floppy disk: originally encased in a flexible partially exposed envelope,
now rigid and self- sealing. Contact heads. Magnetic read-write.
Hard disk -- sealed, rigid, flying heads, importance of thinner films, smaller
heads, polished surfaces, perfect cleanliness. Tracks (500 to 2000), sectors
(32 to 128), platters (1 to 12), cylinders. Seek time (7 to 15 ms), rotation
time (6 to 8 ms), transfer rate (2 to 8 MB/s). Smaller = faster. Use of
memory buffers to minimize bus contention.
Magnetic disks are another I/O device that developed specifically for the
computer industry, although they are now finding their way into entertainment
(recording studios and video producers use them for storing material to
be edited -- this has only become cost effective in the last few years).
Optical disk -- Optical plastic disk with aluminum backer. Holds 680 MB,
read-only in most cases (write-once with special media and drive -- a high-power
laser burns spots in a dye layer that turn dark). Pits in plastic are read
by a laser beam that reflects into a phototransistor. Due to variations
in the thickness of the disk, vibrations, etc. a focusing lens assembly
is used to image the pits onto the phototransistor -- needs to constantly
adjust focus at high speed. Access time of 200 to 350 ms, and transfer rates
of 150 KB/s to 600 KB/s. Disks can be mass produced for less than $1 each.
Writable CD's cost about $10 each. A new version of the CD is under development
that will store 6 to 10 GB of data, to be used in place of video tape or
video disk -- at that capacity, roughly 2 hours of video can be stored with
current (VHS tape) levels of quality.
Magneto-optical disk -- reading is similar to CD (in fact, some drives can
read CD and MO), but uses a layer of magnetic grains that are reoriented
by the magnetic write head so that they either block or allow light to reflect
off of the backer. Read-write media stored in a self-sealing rigid case
(similar to floppy disk). Seek time of 16 to 30 ms, transfer rate of 2 to
3 MB/s, capacities of 128 MB, 230 MB, 1.3 GB. Cost $27 to $100 each. It
is interesting to note that if the MO disk can achieve a 10x increase in
capacity and a 10x decrease in cost, it will become a viable alternative
to videotape.
Tape (linear and helical scan) -- DAT is 4 mm helical scan, used in audio
industry. Every tape pays a tax to the record companies. Can store 2 to
8 GB per tape. Transfer rate of 160 KB/s to 1 MB/s. Tapes cost from $6 to
$25 each. 8mm tape stores up to four times as much at the same cost level.
Sensors -- various types of sensors are used to input data to computers
-- chemical sensors, temperature sensors, magnetic field sensors, etc. These
are usually binary (on/off), multivalued (via ADC), and occasionally imaging
(e.g. ultrasound, CAT and MR scans, radar, sonat, IR, UV, radio, etc.) Data
rates range from bytes per hour to GB/s.
Actuators -- a wide range of control devices, such as switches, valves,
solenoids, motors, stepper motors, linear motors, lights, lasers, electron
beams, X-rays, hydraulic pumps, and so on are controllable by computers.
Data rates are typically in the B/s to KB/s range.
Networks -- perhaps one of the most interesting I/O devices. The computer
network was built on top of the existing telephone network. However, it
is another example of I/O designed specificially for the the computer industry.
Current typical data rates are around 1 MB/s (Ethernet) but a expected to
increase by a factor of 10 or more very soon (such networks are currently
in use, but not widespread). Note that this data rate is rarely generated
or consumed by a single computer, but rather by the sum of a large number
of computers using the network.
Buses are designed to connect more than two devices together in a system.
Devices may include the CPU, main memory, and I/O devices. They consist
of three types of signals:
Data
Address
Control
In addition to these signals, there may be utility lines defined as part
of the bus to supply power and ground to the devices connected to it.
General purpose buses include the VME, FutureBus, Multibus, SCSI, etc. and
are designed to facilitate connecting a wide range of devices to a machine
with high bandwidth and low latency, relative to cost. The versatility of
these buses limit their performance to some extent, however.
Most buses are backplane devices -- that is, the physical implementation
is a circuit board with parallel connectors into which other boards are
plugged perpendicularly.
Some buses, such as SCSI, are cable based -- devices in separate chassis
are connected together by cables that carry the bus signals.
Another variation is the front-plane bus in which ribbon cable connects
circuit boards mounted in a chassis (usually in addition to a backplane
bus) -- an example is the MaxBus digital video bus, or the Apple/CSPI QuickRing
multiprocessor interconnect.
Buses generally are distinguished into two control mechanisms: synchronous
and asynchronous.
Most general purpose bus designs are asynchronous because asynchrony is
more flexible in accomodating a wider range of devices. Asynchronous bus
designs also adapt more readily to technological advances.
In a synchronous bus design, there is an assumption of a basic clock rate
and increasing that rate can cause older devices to fail to operate properly.
In an asynchronous design, older devices may simply reduce performance.
However, an asynchronous protocol is usually more complex, requiring both
more hardware and more overhead for each transaction. Synchronous buses
can operate with lower latency and higher bandwidth for a given number of
signals.
Thus, synchronous buses are usually developed for applications in which
maximum performance is required and the usage is constrained to devices
with a narrow range of timing characteristics.
From this we can deduce that the two types of buses have different applications
for which they are best.

Modern systems often employ both types of buses. The CPU connects to a synchronous
memory bus containing the cache, main memory, and a bus adapter. In some
cases, the CPU and Cache are connected by a dedicated synchronous link,
and the cache controller connects to the memory bus.

The advantage of this approach is that the path from the CPU to memory can
have the optimum performance, and this is the path that most affects overall
performance. The bus adapter provides a connection to an asynchronous bus
that is constrained to matching the characteristics of the synchronous bus.
Usually, in such architectures, the synchronous bus is a custom design that
is found only on the CPU board itself, and possibly on memory expansion
daughter cards, although systems have been designed in the past with such
buses on custom backplanes to facilitate memory expansion. In many such
applications, the synchronous control scheme is even further simplified
by the fact that there are only two potential bus masters.
On a typical I/O bus, however, there may be multiple potential masters
and there is a need to arbitrate between simultaneous requests to use the
bus. The arbitration can be either central or distributed.
In the central scheme, it is assumed that there is just one device (usually
the CPU card) that has the arbitration hardware. The central arbiter can
determine priorities and can force termination of a transaction if necessary.
Central arbitration is simpler and lower in cost for a uniprocessor system.
It does not work as well for a symmetric multiprocessor design unless the
arbiter is independent of the CPUs.
In the distributed scheme, every potential master carries some hardware
for arbitration and all potential masters compete equally for the bus. The
arbitration scheme is often based on some preassigned priorities for the
devices, but these can be changed.
Daisychaining is a hybrid of central and distributed arbitration. Devices
request the bus by passing a signal to their neighbors who are closer to
the central arbiter. If a closer device also requests the bus, then the
request from the more distant device is blocked. The central arbiter then
just has to grant the bus to the closest device requesting it. The priority
scheme is fixed by the device's physical position on the bus, and cannot
be changed in software. Daisychaining is also susceptible to faults
Sometimes, multiple request and grant lines are used with daisychaining
to enable requests from devices to bypass a closer device, and thereby implement
a restricted software- controllable priority scheme.
Buses may also support multiple transfer modes. For example, a block
transfer that begins with the start address and then sends a sequence of
data values can occur faster because the arbitration and address portion
of the protocol take place just once for the block.
Some bus designs also multiplex the address and data lines. There are two
instances where this is useful -- reducing cost, and increasing bandwidth.
A very low cost bus might always transfer an address and then the data over
the same lines. Thus the two transfers cannot be overlapped in time, and
buffering of the address is required at each end, but the bus itself can
have half the number of wires. To reduce cost even further, the width of
the bus can be reduced so that the address and data are transmitted in pieces.
This can be taken to the limit of a single signal path with the control
protocol embedded (a traditional serial data link).
On the other hand, increased bandwidth can be the goal in multiplexing address
and data. For example, in a block transfer, following the initial address,
the address and data lines might both be employed to transfer twice as much
data per cycle.
One problem with multiplexing the address lines with data values, is that
they must then pass though an extra set of steering logic gates that direct
the signals to the appropriate places at each end of the transmission. This
both increases bus latency and the cost of the bus interface.
Another transfer mode that is found on high performance buses is a split
transfer, which is usually associated with a read. Because memory latency
may be high, the bus can be tied up for several cycles while waiting for
the data to return from a read request. In a split cycle, the read is requested
and then the bus is released. Other transactions can then use the bus. When
the value has been fetched, a memory controller becomes a bus master and
transfers the data to the requester.
Often in a split transfer system, the memory controller is designed to buffer
multiple requests, so that reads may be pipelined over the bus. The split
transfer is especially useful in multiprocessor systems. The overall result
is to increase the effective bandwidth of the bus by the factor L/T where
L is the memory latency and T is the bus cycle time.
| VME | FutureBus | Multibus II | SCSI | |
| Bus width | 128 | 96 | 96 | 8 |
| Address/Data multiplexed? | No | Yes | Yes | Serial |
| Data width | 16/32/64 | 32/64/128 | 32 | 8 |
| Transfer | Single/Block | Single/Block | Single/Block | Single/Block |
| Masters | Multiple | Multiple | Multiple | Multiple |
| Split transfer | No | Optional | Optional | Optional |
| Clocking | Asynch | Asynch | Synch | Both |
| Bandwidth, single word, no latency | 25 MB/S | 37 MB/S | 20 MB/S | 5 MBSynch, 1.5 MB Asyn |
| Bandwidth, single word,150 ns latency | 12.9 MB/S | 15.5 MB/S | 10 MB/S | 5 MB Synch 1.5 MB Asyn |
| Bandwidth, infinite block, no latency | 27.9 MB/S | 95.2 MB/S | 40 MB/S | 5 MB Synch 1.5 MB Asyn |
| Bandwidth, infinite block, 150 ns latency | 13.6 MB/S | 20.8 MB/S | 13.3 MB/S | 5 MB Synch 1.5 MB Asyn |
| Maximum devices | 21 | 20 | 21 | 7 |
| Maximum length | 0.5 m | 0.5 m | 0.5 m | 25 m |
| Standard number | IEEE 1014 | IEEE 896.1 | ANSI/IEEE 1296 | ANSI X3.131 |
In addition to the logical specification, each bus has a physical and
electrical specification that defines the types and possibly the positions
of the connectors, the signal levels (i.e. maximum, minimum, and rising
and falling transition voltages), required drive capacity of the devices,
impedance of the signal paths on the bus, signal timing (rise, fall, and
hold times of signals), allowable noise, and the power limitations of devices
that draw their power from the utility lines of the bus.
Futurebus is a defined standard, whereas the other buses are adopted from
working systems. It has not caught on in spite of a great deal of publicity,
largely because it obsoletes a very large installed base of VME compatible
devices and systems, while offering only a limited improvement in performance.
It also requires the development of new bus controller chips and new connectors,
both of which are more expensive than their counterparts in the well-established
VME market. In addition, the VME standard is continuing to evolve in an
upward-compatible manner that increases its performance. It is thus likely
to remain the most popular backplane bus for some time to come.
Interaction with the virtual memory system. Providing a translation table
to the I/O device. Limiting transfers to individual pages.
Interaction with the cache. Have all I/O go through the cache (costly).
Have the OS flush the cache before I/O. Selectively flush the cache during
I/O. Note that a virtually addressed cache can be accessed if the I/O devices
have their own virtual translation tables that are maintained by the OS.