Now that we've seen a capsule summary of the history of computing technology,
let's see if we can extract some significant trends that might help to give
us a sense of where technology is going. It's dangereous to draw too strong
a set of conclusions from any of these, but they have general value in that
they provide us with some intuition.
Let's start by looking at computing technology a bit differently from the
standard list of "generations." What was the function of computing
technology through history, and how did it change? Where were the points
that its purpose changed radically?
The first computing technology, such as the abacus or written number, was
used as an aid to memory. The arithmetic was still done by the human operator,
but it required only simple operations on single digits. The benefit was
that the operator could concentrate on the operations instead of the numbers
themselves. This improved the accuracy of the computations and allowed people
to work with larger numbers. It lasted for a period of several thousand
years (and is still in use).
The next era of computing saw devices such as Pascal's box that also performed
arithmetic. Thus, the operator is freed of having to do this as well, and
can now concentrate on a series of steps to perform a more complex overall
computation. Accuracy is again improved, and algorithms gain in importance.
The period of mechanized arithmetic has been with us for several hundred
years.
As computations become more complex, human error creeps into the following
of algorithms and the input and output of data. Babbage begins to solve
this problem with the difference engine, as it can carry out a preset series
of steps and stamp its results directly onto printing plates. The operator
is freed from handling the numbers and can devise more complex algorithms.
Accuracy again improves.
Once numbers and computations are separated from each other, algorithms
can take a major leap in complexity by employing symbolic quantities. This
is the same as going from multiplication and addition to algebra. It is
no coincidence that the time of Babbage also saw a revolution in how mathematicians
approached algebra. This lead quickly to the analytical engine. However,
it was never completed and this new era really only got underway as tabulating
machines (a la Hollerith) became more sophisticated. Once the technology
made an automatic calculator truly feasible (e.g. ENIAC) the designers again
quickly saw the need for a machine that could be programmed to operate on
symbolic quantities (EDVAC).
Symbolic computation replaced automated arithmetic in Babbage's work after
only about 30 years. When it was later rediscovered, it took just three
years for computers like EDSAC to enter the picture after calculators llike
ENIAC were built. At that point, it became possible for a machine to record
and repeat formalized versions of the mental processes of its operator.
The human could be removed from the computation, and was freed to think
about how to better express and codify those mental processes.
This stage in the development of technology is crucial because we have reached
the maximum level of accuracy that can be obtained without transferring
the human creative capacity to the machine. That is, given a set of instructions
from a human, whatever they may be, the computer can carry them out with
complete accuracy. Any inaccuracy that remains in the computation is in
the instructions, and removing that can only be done by the human operator.
So, until computers can be made to solve problems and program themselves,
they can do no more to advance the goal of greater accuracy.
Yet we clearly see that the technology of computing has advanced?
How?
Why?
Computers have advanced by becoming faster and by becoming easier to program.
The first type of advance is partly the result of people having learned
to codify more and more complex algorithms, which solve bigger problems.
In order to obtain those solutions in a reasonable amount of time, it becomes
necessary for the machine to work faster than a human. Speed, rather than
accuracy, becomes a new goal and once it is established as a goal, it is
self perpetuating through both demand and market-based competition. As computers
become faster and cheaper, people are willing to use computational techniques
that were previously unrealistic. In so doing, they usually try to push
beyond the limits of current technology and thus create a demand for greater
speed. In the market, of course, one of the important selling points is
that a product is better than its competition, and so the manufacturers
strive to build faster computers simply to be competitive.
Ease of programming is the modern form of the traditional goal of greater
accuracy. Because programming is now a major source of error, we use the
computer to store algorithms that assist us in developing programs. For
example, early computers were programmed directly in binary codes by storing
instructions through a bank of switches. This was an error-prone process,
as was the translation of written algorithms into binary codes. First assemblers
and then compilers were developed to help reduce these errors. Programming
languages have evolved to make it easier for us to express complex algorithms,
which in turn makes it more difficult to compile those algorithms, and thus
we again need faster computers.
How is it that compilers help us to avoid errors? By augmenting our memory,
performing computations for us, and automating sequences of steps -- i.e.,
by applying the capabilities of computing technology.
If we draw a timeline of computation technology, we can see that premodern
computing epoch had three main eras: memory aids, arithmetic aids, and automated
arithmetic. It was driven by the goal of increased accuracy. The modern
epoch began with the development of programmable symbolic computation, and
adds to the earlier goal a new goal of increased speed that contributes
partly to increased accuracy by aiding algorithm development. The first
era in the modern epoch is characterized by the augmentation of human creativity
in solving problems and turning those solutions into programs. Perhaps the
next era will see the development of techniques that enable computers to
be creative.
In our review of modern computation, we saw a list of generations of
technology:
0 Mechanical/Electromechanical
1 Vacuum tube
2 Transistor
3 Integrated circuit
4 Very Large Scale Integration/Microprocessor
5 Parallel
These generations, however, are not particularly indicitative of anything
other than changes in construction techniques. The first two generations
represent the typical initial period in which existing technology is applied
to a novel invention. It is not unlike the early days of automobiles in
which some people tried to use steam engines for power. Just as the complicated
and dangerous steam mechanisms were inappropriate for a personal transportation
device, the high-maintenance and high-power-consumption mechanical and vacuum
tube technologies were inappropriate for a device that needed to employ
greater and greater numbers of components to achieve its goals.
Semiconductor electronics were developed for other applications in which
low power and high reliability were required, and were a natural match for
computing. It should not be inferred that this was somehow a coincidence
-- developments in science and engineering together with social pressures
resulted in a general progression to more complex mechanisms in many domains.
Computing happens to be one of the most visible and novel outgrowths of
having crossed this particular threshold of complexity.
Once semiconductor technology was introduced to computing, the next three
generations are really the same. The only difference between them is the
density with which the elecronics are packaged. The second generation is
the point where each sealed package contains just one device. The third
generation is the transitional period when "many" devices are
in each package, but not enough to form a complete computer. The fourth
generation begins at the point that a whole computer fits into one package.
In terms of mainstream computing, we are still in the fourth generation.
Why?
Because the goal of placing an entire computer in one package has such significant
benefits that it was done prematurely -- at a point where the technology
could barely support a very minimal computer. The technology is only now
reaching the point that a single package can hold enough components to equal
the capabilities of the most powerful second and third generation machines
of 30 years ago.
Except for comparatively minor advancements, very little has changed in
mainstream computer architecture since second- and early third-generation
machines reached a peak in the mid-60's. In the meantime, there has been
one major digression in architectural approaches (complex instruction sets),
but we have lately returned to the practices of that earlier period (reduced
instruction set designs, memory hierarchy, multiple arithmetic units, pipelined
processing). Most of the advancements have involved either the development
of clever engineering tricks (to take advantage of features of the changing
technology) or responses to empirical data taken from studies of application
software.
Outside of the mainstream there has been a steady stream of research into
novel architectural approaches, few of which have succeeded because of the
cost of creating a viable base of software. There have been database machines,
dataflow processors, parallel processors of many varieties, logic machines,
neural network machines, fuzzy logic machines, and so on. Some of these
are in use in niche application markets. The more successful architectures
are digital signal processors (used in modems, VCRs, phones, CD players,
talking toys, etc.), fault tolerant processors (used in banks, aircraft,
medical equipment, etc.), and image processors (used in video games, TVs,
manufacturing processes, and medical equipment).
Many people believe we are on the verge of entering (or have recently
entered) a new generation in which the mainstream architecture changes significantly.
This generation will employ parallel processing. Currently, parallelism
is used quite conservatively -- mainstream processors now typically have
multiple arithmetic elements that can operate simultaneously. For example
the Pentium can perform two arithmetic operations at once in many cases.
Parallel processing will expand to include specialized independent computers
in a single package. Intel, for example, has a corporate vision of a microprocessor
with several units for executing normal instruction streams, specialized
units for generating graphical displays and processing input signals (such
as speech and video). This will be a heterogeneous parallel processor, in
contrast to the homogeneous parallel processors that have been built of
10s to 1000s of identical processors.
The viability of alternative architectures will probably start to grow once
the technology advances to the point that they cost little to add to mainstream
processors. Until then, they will continue in niche markets. A prime example
is the current generation of homogeneous massively parallel processors which
were developed for science and engineering. These were large, expensive
machines that attempted to push performance two or three orders of magnitude
beyond uniprocessors. The problem is that they have always been one generation
of technology behind (because of the time required to design a new implementation
for the latest microprocessor), and they have inherent inefficiencies. The
result is that they deliver only one or two magnitudes of increased performance
while costing proportionally more than a contemporary uniprocessor.
Given the added cost of building software for parallel machines, most potential
buyers have chosen to wait a few years until uniprocessors deliver similar
performance (with no software development cost), or to employ only limited
parallelism (manually spreading computation over several networked machines,
or using a small-scale multiprocessor), or to employ specialized accelerators
(vector, array, image, and signal processors). These are precisely the direction
that microprocessors are going, so it is likely that we will see specialty
manufacturers gradually fall aside and greater consolidation growing around
a few standard architectures. In many ways, this development of the computer
industry is a parallel to the automotive industry which saw a large number
of manufacturers dwindle to just a few as the manufacturing and development
infrastructure costs grew to support the complexity of the product.
Now lets turn to the basic structure of a processor's architecture.
Every mainstream computer architecture today has the same basic elements
as the EDVAC design developed at the famous Princeton summer workshop: Input,
Output, Memory, Control Unit (CU), and Arithmetic-Logic Unit (ALU). The
latter is sometimes called the "datapath" in current terminology,
usually by people who focus on the design of processor chips. This is bacause
the circuitry for the ALU is often arranged as a straight path of wiring
that passes through a series of stages -- the data thus flows from stage
to stage along this path as it is processed.
The CU and ALU are sometimes grouped together and called the Central Processing
Unit (CPU), and this is really what constituted the contents of early microprocessor
chips. In earlier generations the two were grouped together (e.g., on the
same circuit board or in the same cabinet), because they are interconnected
by many wires and because the signals that travel between them must proceed
with minimal delay.
Today's microprocessors also include a portion of the memory (the cache),
so the CPU's boundary is not always restricted to the ALU and CU. However,
it is likely that as more of the I/O functions also move onto the same chip,
the CPU will return to its traditional definition in order to retain the
functional distinction..
The control unit is the brain of the microprocessor in that it contains
the circuitry that coordinates the activities of all of the other circuitry
in the computer. It is usually the most complex portion of an implementation,
but not because it is especially sophisticated. Rather, the complexity stems
from having to handle all of the special cases that naturally arise in any
processor design. The basic concept behind its operation is quite simple.
It is essentially a finite state machine.
If the CU is the brain, the clock is the heart of the system. It is a very
simple circuit that sends out pulses at a regular interval. These pulses
are like the ticking of an old-fashioned mechanical clock -- they indicate
the begining and end of a basic interval of time in the processor. Every
operation in the processor takes place during a clock period. Data moves
from one place to another, a basic part of a computation takes place, a
memory fetch is issued, all in response to a clock tick combined with signals
from the CU.
On each clock tick, the CU transitions from one state to another. Usually
there is a major state register that keeps track of what part of an instruction
is being performed. The instruction code and status registers form the other
inputs to the finite state machine of the CU. When a clock pulse is received,
the inputs are accepted and evaluated and the FSM goes to a new state. Each
state of the FSM causes a certain set of control signals to be sent out
over wires to the rest of the computer.
The FSM can be built in a variety of ways. One involves creating a very
simple computer that takes the inputs as a jump address into a subroutine
library. The simple computer is called a microengine and the instructions
it executes are called microcode. The microcode is stored in read only memory.
Another involves a read only memory in which the FSM inputs are the address
to the memory and the data stored in the location becomes the control signals
-- but this memory is very large except for a really simple computer. Yet
another approach is to construct a Boolean logic circuit that takes the
inputs and generates the outputs. For obvious reasons, this is called a
hard-wired CU.
This part of the system performs the actual computations: arithmetic,
logical operations, comparisons, data movement and reformatting. It usually
consists of a collection of modular circuits (one for each major operation)
that are connected to shared sets of input and output wires called buses.
The control unit signals the different circuits when they are to accept
the data currently on the input bus, and when to put their results onto
the output bus.
The ALU also includes the highest level of the memory, called the data registers.
These are also on the input and output buses and the CU signals which of
them should output its contents to the input bus or input the value currently
on the output bus.
In simple computers, the memory is divided into just two parts: registers
and main memory (sometimes called "core", especially by old-timers,
after magnetic core memory). Most machines, however, have a hierarchy of
memory types that vary in speed and cost:
Registers
Cache (1 to 3 levels)
Main memory
Secondary memory (disk)
Tertiary memory (tape)
We have more of the cheaper forms of memory in a system. This recognizes
that the processor is only accessing a small portion of all of the available
data at any one time. So rather than pay for all of the memory to be fast
and expensive, a small fast memory is used to hold the current "working
set" and cheaper, slower memory can hold everything else. Only when
you have to fetch a new value into the working set do you suffer the slowdown.
The registers and one or two levels of cache typically appear on the same
chip in today's microprocessors.
Why would you have more than one level of cache on a chip?
Because the speed of the cache depends on its size. As it get larger, it
takes longer for signals to propagate through it. If you make it big, the
machine must slow down to match the rate at which the cache can supply instructions
and data. Thus, a designer will make the first level of cache as big as
it can be without slowing down the clock, and if there is room left over
on the chip, will add a second level of cache that is still much faster
than main memory but also bigger than the primary cache. That way, when
the primary cache runs out of room it can save values in the secondary cache
and later retrieve them more quickly than if it had to go to main memory.
The main memory level and above are also characterized in today's technology
as being volatile (they forget their contents when power is lost), while
the lower levels are nonvolatile (they retain their contents without power.
In the past, main memory technology such as magnetic core was also nonvolatile.
There are many forms of I/O. In early computers, the ALU had I/O registers
and the data would be input and output under direct program control. Modern
systems mostly use direct memory access (DMA) in which a separate I/O processor
has access to the main memory of the system and either a set of parameters
or a simple program is stored by the CPU in registers in the I/O processor
whic is then released to perform the I/O operation independently of the
subsequent actions of the CPU.
As in most complex systems, computers are designed in layers of abstraction.
These layers vary from machine to machine. Some of them are software while
others are hardware. For example, on most machines a high level language
would be translated by software (a compiler) into machine language, but
in the Zilog Z8000, the machine language was BASIC.
Some machines have an operating system and run-time software library that
provide an abstraction on top of the hardware, while others (such as digital
signal processors) may have little or no software at this level of abstraction.
Some machines are implemented with microcode, which is a form of software
that is fixed into the hardware (called firmware). Others are implemented
in logic circuits that are generated by Computer Aided Design (CAD) tools,
which model the hardware with various abstractions.
In spite of all of the variation in these abstractions, there is one fixed-point
that almost every machine is built around: the instruction-set architecture
(ISA). This is a contract between the architect and the programmer that
defines a view of the machine that is guaranteed to operate in the same
manner across all implementations, with the exception that performance can
vary. The concept of the ISA arose with the IBM 360 family of computers,
where every machine had the same set of instructions, the same number of
user registers, and the same behavior. The family had numerous models at
various price/perfomance levels, but they were all object-code compatible
with one another. The advantage of having this particular abstraction as
a fixed point is that it supports transparent hardware upgrades and portability
of software.
The ISA specifies the instructions, their binary formats, the complete effect
of each operation including any side effects, the visible registers of the
machine, and any other aspects of the system that affect how it is programmed.
The ISA remains the same for all implementations. Unfortunately, it is not
sufficient to permit a compiler writer to do any more than generate suboptimal
object code. In order to get the best perfomance, a compiler must have additional
information about a particular implementation. For example, if the compiler
knows how much data is fetched at once from main memory into the cache (and
it is usually some fixed number of words), then it can rearrange the program's
pattern of access to the data to ensure that each block is operated on as
many times as possible after it is fetched into the cache. Thus, the ISA
is only moderately effective as an abstraction, although it does facilitate
portability if performance is ignored.