Instructor: Chip Weems
Office: CS-342
Phone: 545-3163
E-mail: weems@cs.umass.edu
Syllabus
0 - Mechanical / Electromechanical
1 - Vacuum tube
2 - Transistor
3 - Integrated circuit
4 - Very Large Scale Integration (VLSI) / Microprocessor
5 - Homogeneous parallel processors
Mechanical computers were built with trains of gears, much like clocks.
Typically, they used decimal arithmetic, and each gear or wheel had ten
positions. The hardest part of designing such a machine was to get the carry
to propogate cleanly from one digit to the next (so that there wouldn't
be any ambiguous, half visible, numbers showing in the display windows).
The other difficulty was that the sheer amount of complexity of a large
calculator, together with the friction of all of the gears, made construction
very difficult prior to the advent of modern machining technology.
Storage in a mechanical computer was by the position of the gears. In the
later electromechanical machines, relays were able to store some of the
machine's state. The program, however, was always stored in a separate medium,
typically a punched paper card or tape. Some analog mechanical computers
could be programmed by changing the gear train, but this was really just
equivalent to changing parameters to the program, since they generally just
computed one type of function (e.g. differential equations).
Relays work on the principle that a voltage is applied to a coil, driving
a magnetic rod (solenoid) outward so that a hinged or flexible electrical
contact is forced to touch a fixed contact, thus closing a circuit. We thus
have an electrically controlled switch.
The significance of the electrically controlled switch is that information
(the state of switch in one place) can be transmitted over significant distances
without loss, and without interference. In a mechanical system, carrying
the state of one wheel to another at a distance involves long shafts and
often extra gears to allow the shafts to bypass other shafts.
Also of major significance is that the relay is more naturally used with
a binary number system (rather than decimal), because of the on/off nature
of circuits.
Gunter's scale, based on Napier's bones, was the first slide rule. Multiplication
and division could be done using sliding sticks insribed with a logarithmic
scale.
Schickard's
calculator was destroyed in a fire and never rebuilt -- we know of it only
through a letter written to Johannes Kepler. If Schickard's claims are true,
it was much more sophisticated than Pascal's box.
Pascal's
box could add and subtract amounts of money. He invented it to aid his
father, a tax collector. Note that even today, taxes are still a major application
area for computers. Pascal's box was such a sensation that people even made
and sold non-working replicas as showpieces. Several copies still exist
in museums.
Liebniz's
calculator was much like Pascal's but could also multiply. Division
required a long sequence of steps. Interestingly, Liebniz's goal was to
reduce thought to a logical abstraction that could be performed automatically
-- although he didn't use the term, he was actually seeking to create artificial
intelligence.
Lepine (1725), Hillerin (1730), Pereire(1751), Earl Stanhope (1775) etc.
built calculators similar to those of Pascal and Liebniz, with minor improvements.
Jaquard's loom is programmable by feeding it a chain of cards with holes
punched in. The loom can weave any pattern, including a portrait of the
inventor. Although it doesn't calculate, it is the first programmable machine.
The Thomas Arithmometer, the first commercially manufactured mechanical
calculator, remained in production until 1926.
Babbage's
difference engine was an automated calculator for numerical tables.
The analytical engine was the first programmable computer, using punched
cards for storing instructions. Babbage got the idea from the Jacquard automatic
loom. Neither of Babbages engines were ever completed. But, in 1853, a difference
enginebased on Babbage's design was built by George and Edward Scheutz in
Sweden, and was sold to the Dudley Observatory in Albany, NY, for calculating
astronomical tables. Babbage is also associated with other important figures
in the history of computing -- see: Boole,
DeMorgan,
Lovelace.
Later on Herman Hollerith would use the same sort of cards as Babbage, but
for entering data into a tabulating machine he built for the 1890 census;
his company would eventually become IBM. The tabulator was also novel in
its use of electricity to carry the information from the cards to the calculator.
Bush's Differential Analyzer was an electromechanical analog computer for
computing differential equations. Programming was limited, and was accomplished
by replacing gears in the drive mechanism.
Zuse's electromechanical calculators used relays and were very similar in
concept to Babbage's analytical engine. They were the first working programmable
computers. Zuse also had a plan for an electronic computer using 1500 vacuum
tubes. Unfortunately, most of Zuse's early work was destroyed in World War
II, although he continued to build computers after the war. Later in
life he became a painter.
Howard Aiken's Mark I was a 52 x 8 feet sized programmable calculator. It
used decimal arithmetic, and was built largely from parts used in commercial
tabulating machines. Addition took 0.3 seconds, multiplication took 6 seconds.
Pictures of some
other electromechanical computers.
A vacuum tube is, reasonably enough, a sealed glass tube containing a
vacuum in which are present several electronic elements: the cathode, anode,
grid, and filament. When the cathode and anode are heated by the filament,
and a voltage is applied across them, current flows between the cathode
and anode. If a grid is inserted between them, the flow can be controlled
by changing the grid between a positive and negative voltage.
The grid voltage can be quite small, and the plate voltages can be quite
high, thus providing an amplifying capability. More importantly for computers,
switching the grid voltage causes the tube to act as a switch with respect
to the plates. Thus, we have an electronically controlled switch that is
much faster than a relay.
A type of vacuum tube also served as a popular storage mechanism, the Cathode
Ray Tube (CRT). Other memory devices used during the period include mercury
or glass delay lines, and magnetic core memory.
Vacuum tubes, however, are large, require a lot of power, and produce a
lot of waste heat. In fact, for one rather large vacuum tube machine, it
was once estimated that if its four turbine-powered air conditioners were
to fail, the heat buildup in 15 minutes would be sufficient to melt the
concrete and steel building containing it (of course, it would simply catch
fire and stop working long before that). It has also been estimated that
if a modern computer were built with vacuum tubes, it would be the size
of the Empire State Building.
John Atanasoff developed
an electronic switch based on vacuum tubes, and used this in a special purpose
computer that had capacitors as memories (essentially the same principle
as modern dynamic RAM).
The COLOSSUS machines were developed by British Intelligence during WW II
to crack coded messages. They also used vacuum tubes as logic elements.
Much of their design remains secret.
ENIAC (Electronic
Numerical Inegrator and Calculator) was developed at the Moore School of
Engineering as a specialized programmable computer for computing ballistics
tables for the Army. It was programmed by changing wires in patch panels,
and flipping switches. John von Neumann became involved with ENIAC and saw
the need for storing the program in the machine itself, resulting in the
EDVAC
(Electronic Discrete Variable Calculator) design.
The Manchester Automatic Digital Machine (MADM) was the first machine built
with a stored program, but it was really just for testing a new memory device.
EDSAC (Electronic Delay Storage Automatic Calculator) was based closely
on the EDVAC design and was the first true stored program computer to become
operational. It used an ultrasonic glass delay line for a memory.
UNIVAC, built
by Presper Eckert and John Mauchley, from the Moore School, became the first
commercially produced programmable digital computer. It used Cathode Ray
Tubes (the type used in oscilloscopes) as its memory. It was based on the
EDVAC design.
Von Neumann left the Moore School for the Institute for Advanced Studies
at Princeton, where he became involved in another computer design (known
as the IAS machine), which was also based on EDVAC. The JOHNNIAC,
named in honor of von Neumann, was an IAS-like machine built by the Rand
Corporation in Santa Monica, CA. There were several otehr machines built
along this line, almost all a result of a Summer course given by Eckert
and Mauchley at the end of the war, about their work with ENIAC and the
EDVAC design. These included the ILLIAC (not to be confused with the ILLIAC
IV parallel processor), MANIAC, WEIZAC, AVIDAC, ORACLE, ORDVAC.
The Whirlwind, built at MIT, is notable mostly for the development of magnetic
core memory, which would eventually replace CRT and delay line storage during
the 1960's.
The IBM 701 was their first commercial computer and grew out their work
with Harvard on the last successor to the Mark I (the Mark IV). The 701
was developed to directly compete with the UNIVAC.
The IBM 709
was the last of the major vacuum tube computers. It was a faster 704, which
had 4K 36- bit words of core memory. During this period, IBM also sold a
model 650, which had a magnetic drum memory, but was low enough in cost
that many were sold to universities -- it was the basis for the first user's
community.
The transistor, invented in 1948, performed the same basic function as
the vacuum tube, but with much lower voltage and current, and very little
waste heat. It is interesting to note that many engineers at that time pronounced
it a useless device precisely because it couldn't handle the power of a
tube!
By using a material called a semiconductor, which conducts electricity when
a charge is applied to it and acts as an insulator when then the charge
is removed, an electronic switch can be built. Current flows between the
collector and emitter when a charge is applied to the base.
Transistors are also much smaller than tubes. Rather than being an inch
in diameter and three inches long (the size of a typical tube), transistors
are about 1/4 inch in diameter and 3/16 inch long. They also require fewer
wires, since there are no filaments.
Computers of this era mostly used magnetic core memory, although registers
were built from transistor circuits -- eventually leading to modern solid-state
memories.
Still, a computer equivalent to a modern day microprocessor, built with
transistors, would have occupied several floors of the Empire State Building.
A typical machine of the period had 16K 32-bit words of core, and filled
a large room.
The TX-0 was the first computer built with transistors. It was an experimental
system developed at MIT in 1955. It was developed to test the concepts of
the TX-2, which was to be a major new computer, but the TX-2 wasn't completed.
One of the people who worked on the TX-0 was Ken Olsen, who went on to found
Digital Equipment Corp. (DEC). Initially, DEC built transistorized logic
modules called Flip Chips (printed circuit boards), which could be used
to build specialized control logic. They also formed the basis of the PDP-1,
the first computer built by DEC, and which was very similar to the TX-0.
The PDP-1 was an 18-bit machine that resulted in a family of similar machines
which culminated in the PDP-15. You can see a PDP-1, still playing Space
War (the world's first video game), at the Computer Museum in Boston.
DEC had two other lines of machines, one with 12-bit words and the other
with 36-bit words (in thosedays, the 8-bit ASCII code had not been standardized,
nor had numerical representations, so there wasn't as strong an incentive
to build machines with a word size as a multiple of 8 bits).
The 12-bit machines resulted in the PDP-8,
which in 1967 was sold for $8900 (plus teletype). It became the worlds first
minicomputer, and was affordable even by high-schools. The PDP-12 was a
variant combining the PDP-8 with a Laboratory INstrument Computer (LINC)
to form an affordable system for controlling experiments and gathering data
in the laboratory.
The 36-bit machines resulted in the PDP-10, a large time-sharing mainframe.
IBM switched to using transistors with the 7090, which had the same architecture
as the 709. In fact, it was sometimes referred to as the 709T (for transistor).
Like the PDP-6, it was a 36-bit machine. The 7094
came out a little later, and was especially oriented towards scientific
computing using floating point. The 7094 formed the basis of a venture to
build a huge new computer, called the Stretch. Before the Stretch project
was completed, it nearly bankrupted IBM
IBM was also building a small machine for business (what the B stands for,
after all), the 1401,
which used BCD arithmetic (base 10, represented by 4-bit words) where operations
were carried out digit-serially on values of arbitrary length. This same
sort of scheme would later become the basis of most pocket calculators.
Another machine that IBM built in this period was the 1620, which was even
simpler than the 1401. It earned the nickname "CADET" from its
users, which stood for Can't Add, Doesn't Even Try. The 1620 did not have
a normal ALU -- instead, it used table lookup, and the table was user- programmable.
Thus, you could program it to compute many different functions in place
of addition. Like the 1401, it used a BCD representation for numbers.
Control Data Corporation got its start by taking a design for a machine
built for the military at another company, and turning it into a commercial
product. The 1604, built in 1958, was Seymour
Cray's first large design, and the earliest large transistorized computer
on the market. After the 1604, there were two major product lines, the 3000
series of 24-bit or 48-bit machines (depending on model) and the 6000 series
of 60-bit machines. The 6600 was Seymour Cray's first vector supercomputer
design (1964), and is considered to be one of the first modern supercomputers.
The Burroughs B5000 was the first in a series of unusual machines with many
features designed to simplify programming. It had many novel features such
as a hardware stack and tagged memory. Burroughs also pioneered virtual
memory, although they didn't coin the term.
The concept behind the integrated circuit is that transistors can be
formed by crossing two semiconducting materials on a silicon substrate.
Wherever a "wire" or "line" of polysilicon crosses a
line of silicon with ions diffused into it, a transistor is formed. If a
charge is applied to the polysilicon line, current flows through the junction.
If thecharge is taken away, the junction becomes non-conductive. Thus, we
have another example of an electronically controlled switch.
The important point about the integrated circuit is that multiple transistors
can be formed on a single substrate. Thus, a logic circuit that occupied
a whole PC board can be reduced to fit on a single chip of silicon. Also,
because the transistors can be connected directly on the chip, they can
be smaller and need less power to communicate. Thus, IC's require less power,
and generate less waste heat.
Early ICs contained just a few transistors. These were later called SSI,
for Small Scale Integration. Later on, there were chips with a hundred or
so transistors that were called Medium Scale Integration (MSI). These might
contain a whole register, or part of an arithmetic unit. Then came Large
Scale Integration (LSI) in which as many as a thousand transistors could
be placed on a chip, so that fairly complicated building blocks fit into
one IC.
The other novel use of the IC was pioneered by Texas Instruments as part
of the Illiac IV parallel processor project: solid state memory. Just as
John Atanasoff had used small capacitors as storage devices, it was found
that lengths of IC "wire" also store charge, and could be used
as memory. Thus, the IC revolutionized computer design in two ways: by shrinking
the size of computers, and by making the memory technology compatible with
the processing technology.
IBM replaced the 1401 and 7090 with the System 360, an architectural family
with machines at many different levels of cost and performance. Thes ranged
from the 360/20 with an 8K-word core memory and no secondary storage to
the 360/91with a large memory, high-speed floating point, and a vast array
of peripherals. The machines ran the same instruction set, and from the
360/40 up, the same operating system. The 360 series was succeeded by the
370 series, 4300, 3080, 3090, etc., all having basically the same architecture.
Another novelty of these machines is their virtualizability -- it is possible
for the operating system to emulate an empty machine for each user. Thus,
for example, one can even run the operating system as a task under another
copy of the operating system.
DEC moved into integrated circuits with the PDP-8I. (The PDP-15 and PDP-10
also used integrated circuits.) The PDP-8 was the most successful of DEC's
machines, but it was very limited in its capabilities. For example, special
operations were required to access more than 4K words, and even then the
limit was 32K words. The were several other versions of the 8, but all had
roughly the same performance and just having different features or technology.
The "successor" to the PDP-8 and PDP-15 was the PDP-11,
a 16-bit machine that could directly access 64KB and indirectly up to 256KB.
Like the IBM 360, the PDP-11 was an architectural family ranging from the
11/05 (intended for embedded control applications) to the 11/70 (a super
minicomputer). Actually, the PDP-8 lived on long after the PDP-11 went into
production, eventually being sold as a single chip, the Intersil 6100 microprocessor.
The successor to the PDP-11 and PDP-10 (by then called the DECsystem 20)
was the VAX (Virtual Architecure eXtension), a 32-bit machine drawing on
the experience of both families. The first version, VAX-11/780,
could even emulate the PDP-11. The VAX was designed as a mini- mainframe,
and became DEC's main product line for several years. It appeared in versions
ranging from workstations to multiprocessors.
Texas Instruments Advanced Scientific Computer was a complex machine with
many intersting features. It was intended to be a scientific supercomputer,
as was used for applications such as analysis of seismic data for oil exploration.
The ASC used pipelining of operations and interleaving of memory to achieve
high processing speeds on long vectors of data. Another interesting feature
was its complete lack of interrupts, permitting a shorter basic instruction
cycle.
The CDC-Cyber was the integrated circuit version of the 6600
(and 7600)
supercomputers. CDC tried to make these scientific machines into business
data processors by adding instructions to support character string and BCD
processing, but the result was not very successful. CDC also developed another
supercomputer, the Star-100, but sold very few of these. It was about this
time that Seymour Cray left CDC to form his own supercomputer company.
The University of Illinois had continued to develop machines after the initial
ILLIAC. ILLIAC II was a successor to ILLIAC, but ILLIAC III was an entirely
different machine -- a parallel processor for analyzing bubble chamber photographs
for nuclear physics. It was destroyed in a fire before being completed,
but in 1963 was well ahead of its time. The last and most notable of the
ILLIAC series was the IV, a large parallel processor that broke a great
deal of new ground (although it was obsolete by the time it was completed).
Perhaps the greatest contribution was the development of solid-state memory
for the ILLIAC IV. The ability to use the same technology for memory as
for the processor resulted in the explosion of cheap, large memory machines
that we see today.
The STARAN was the first commercially built massively parallel processor.
In 1967 you could buy a system with up to 8192 processors and a sophisticated
communication network. Its programming model was based on the notion of
associative or content addressable processing. A modern version is still
used in military aircraft radars.
Very Large Scale Integration (VLSI) was simply the next step beyond LSI,
to thousands of transistors on a chip. For a while, a few people tried calling
new levels of integration ULSI (Ultra), but the name never caught on.
Basically, the division between LSI and VLSI is the difference in design
approach between the two: With LSI, you think in terms of standard modules
that are wired together on a circuit board to build a computer or custom
logic system. With VLSI, an entire system can be placed on a chip, and the
design of chips is standard practice.
A test run of a small VLSI chip costs about $10,000, and something the size
of a microprocessor costs about $300,000, so it's no longer the case that
a prototype circuit can be hacked together and tweaked until it works, and
then sent out for production. The current generation of designers works
extensively with CAD tools and many levels of simulation before a design
is cast in silicon. Test chips are expected to be fully functional (and
actually are more than half of the time), and are typically used to analyze
performance and yield before some final circuit tweaking and full production.
imagine how much differently you would program if it cost you $300,000 each
time you compiled your code and it took a month or more to get the object
code back from the compiler for testing.
Intel developed the first microprocessor, the 4-bit 4004, in 1971, as a
basis for a desktop calculator. In the next year, they produced the 8008,
an 8-bit microprocessor to control a terminal. Fortunately for Intel, the
terminal manufacturer chose not to use the 8008 -- thus forcing Intel to
look for other uses for the device. It was not easy to sell a microprocessor
in 1972 -- most engineers designed in MSI and LSI, and were leery of this
stuff called software. However, computer hobbyists caught on to the potential
and a new industry sprang up overnight.
Intel followed the 8008 with the 8080, a more powerful 8-bit system and
the 8085 which required fewer support chips.
Shortly after that, Zilog introduced the Z80, which was compatible with
the 8085 but had more registers and a more symmetrical instruction set.
Zilog tried to follow up with the Z8000, a much more powerful processor,
which was not commercially successful. They also marketed the Z8, which
in one version had a BASIC interpreter in on-chip ROM, but that version
was likewise unsuccessful. A different configuration of the Z8 architecture
was produced as a microcontroller, and in that form it has had a long life
in use within disk drive controllers, network interfaces, and other embedded
applications.
Intel was not the first to jump to 16-bits (National Semiconductor's IMP-16
was actually contemporary with the 8085), but they were the first to market
one successfully. It is widely agreed that the Intel architectures are poorly
conceived, but that their success was always due to the fact that Intel
has been the first to bring a usable product to market. The 80186 and 80286
are extensions of the 8086 16-bit processor. The 8088 is an 8086 with an
8-bit external data path. The 80386 is a 32-bit extension of the family
and the 80486 adds floating point and virtual memory support to the processor.
The Pentium (IA-32) family added cache memory and pipelined execution. Subsequent
generations of the Pentium have continued to increase on-chip cache and
pipeline depth (the P4 has a 20-stage branch pipe) as well as adding special
features such as multimedia instructions. It is interesting to note that
Intel's "family" of processors is not architecturally consistent;
there is upward compatibility from the 8086, but not downward compatibility.
Intel recognized that the legacy of the x86/IA32 would make it difficult
to extend the architecture to 64 bits, and so they set out to design an
entirely new preocessors family, called Itanium.
The IA-64 draws heavily on two prior architectures, the Cydrome Cydra series
and the Multiflow series of Very Long Instruction Word (VLIW) machines.
In a VLIW architecture, each instruction word carries multiple instructions
that can be executed in parallel. This is in some ways an extension of the
Cray designs of the 1960s which could pack as many as four instructions
into a word, although the Cray dispatched those instructions in a pipelined
fashion. Intel calls their variation of VLIW Explicitly Parallel Instruction
Computing (because of the commercial failure of earlier VLIW architectures,
due to a variety of market factors, use of the VLIW acronym carried negative
connotations that Intel did not want to associate with its new product line).
The Itanium packs three instructions in a 128-bit instruction word, and
operates on 64-bit data. It also provides far more registers (128 integer,
128 floating point) than previous designs. Although the instruction set
is more RISC-like, the architecture is ver complex. It features techniques
called predication (to allow both outcomes of a branch to execute in parallel,
picking the correct one when the branch direction becomes known) and speculation
(to allow operations to proceed under the assumtion that data is available,
and correct the result in cases where the data was delayed). These enhance
the throughput of the processor pipelines. As a result of the complexity,
the first generation of Itanium (Merced) was actually slower than the corresponding
IA-32 generation. The second generation is comparable in performance to
the contemporary P4, and by the third generation Intel expects Itanium to
pull ahead on computationally intensive applications, such as scientific
computing and servers, although probably not on desktop applications. Considering
that that Pentium line has generally delivered less performance than competing
RISC designs such as MIPS and PowerPC, the Itanium has a way to go before
it is sold on the basis of speed. However, Intel has the advantage that
it sells the processors to other companies to package and market, while
its competitors keep their designs in house.
About a year after Intel introduced the 8080, Motorola came out with a direct
competitor, the 8-bit 6800. It was generally acknowledged that the 6800
was a better architecture, but it was late getting into the market place
and had some flaws that made it difficult to use. Some of the 6800 team
left Motorola and formed MOS Technology to build an improved 6800, which
they called the 6502. That chip was moderately popular because it was quickly
picked up and sold as a single board computer (SBC) -- the KIM and SYM were
the most popular versions. This allowed many engineers (and students) to
learn and try out microprocessors for their applications. In particular,
Steven Jobs and Steven Wozniak started a company out of their garage by
building a 6502-based personal computer called the Apple II.
Motorola decided to leapfrog the competition by producing a 32-bit processor, the 68000. But although it was internally a 32-bit design, it could access only 16-bits at once. Motorola was also beaten to the punch by Intel, which delivered its 16-bit 8086 slightly earlier, and the 8086 could run most 8080 code whereas the 68000 was a complete break from the 6800. The 68010 fixed a couple of minor problems in the 68000 that made it impossible to write a virtual operating system for it. The 68020 added a 32-bit external data path, and virtual memory support through a coprocessor. The 68030 added on-chip virtual memory support and cache. The 68040 increases the performance of the 68030.
DEC was slow to jump into the microprocessor business, having been caught somewhat off guard by it. They licensed the PDP-8 architecture to Intersil to produce the 6100, and the PDP-11 architecture to Western Digital to produce a 3-chip implementation which DEC sold as the PDP-11/03.
Finally, DEC developed its own microprocessor, an implementation of the
VAX which became the MicroVAX and eventually the VAXstation product line.
There were several other microprocessors developed, most of which were rather
unremarkable. Texas Instruments, however, produced a novel design called
the 9900. In that machine, there was only one register -- the register pointer.
All other registers were actually just memory locations. This made the machine
somewhat slow at processing data, but made context switching (for subroutine
calls, interrupts, etc.) very fast because saving the machine state required
only that the register pointer be saved.
All of the microprocessors to this point had been Complex Instruction Set
Computers (CISC). There was a movement in architectures toward simplifying
instruction sets and the corresponding control logic, so that the basic
machine cycle would be faster. Code would be larger, and more instructions
would be required for infrequently used operation, but overall speed would
increase. The first of these new microprocessors to be sold commercially
was the SPARC, developed at UC Berkeley. Shortly after that came the MIPS
R2000, developed at Stanford. IBM also had an entry, the RISC 6000, which
was operating in a laboratory well before the other two. There are a lot
of claims and counterclaims about who first developed the concept of Reduced
Instruction Set Computers (RISC). In reality, the first machines were mostly
RISC designs, and it is really a question of who rediscovered this and repopularized
the notion. The folks at Berkeley coind the popular terms RISC and CISC
in the 1980a to distinguish their approach from other machines of the day
(such as the VAX). But a close look at the Cray designs of the 1960s reveals
them to have many features in common with RISC architectures. At that time,
however, there was nothing else sufficeintly complex for the Cray to be
considered a "reduced" architecture.
IBM's RS6000 became the Power family of architectures, which today are used in high-end workstations, parallel processors, and servers. In the early 1990's, IBM, Motorola, and Apple joined forces to revise the Power architecture into a 32-bit design which they called the PowerPC (first produced in 1993). It is interesting to note that in its first quarter of selling Power Macintosh computers, Apple sold more RISC-based desktop machines than had been previously sold by all other manufacturers combined. At the time, Apple accounted for 10% of the PC market, and other RISC systems were being sold as high-performance workstations. The PowerPC continues to outsell all other RISC designs. When these low volumes are considered in comparison to the cost of desgning and producing a new microprocessor, one must conclude that either the business isn't viable for the smaller producers, or that Intel is making huge profits, or both. This is a major factor that leads analysts to believe that diversity in processor architectures is going to continue to decline.
The MIPS and SPARC architectures continue to be produced. However, MIPS
has said that it will switch to using the Itanium design once that processor
reaches performance levels comparable to its own products. Sun Microsystems
continues to use SPARC processors in its servers, where throughput is more
significant than raw computational power. A 32-bit version of the MIPS continues
to be poluar in embedded applications, as are some versions of the SPARC.
DEC introduced yet another RISC machine, their 64-bit Alpha, which executed
at 200 Million Instructions Per Second (MIPS). At the same time, competitors
were struggling to reach 33MHz and 50 MHz. The Alpha philosophy was to exploit
the simplicity of RISC to push the clock rate higher, and only later to
add sophistication to the processor to imporve performance in other ways.
Increasing clock rate improves overall performance as long as memory and
I/O are able to keep up with the increase in CPU speed. The Alpha went through
three generations before DEC was sold to Compaq Computer Corp., which produced
a fourth generation of the Alpha before cancelling further development in
2001. The cost of maintaining a cutting edge fabrication facility was one
of the factors that contributed to DEC going out of business, and their
fabrication plants and many of their patents were sold to Intel when Compaq
bought the company.
Intel's entry in the RISC market was the i860, which was originally a back-door
project, then it obtained corporate support for a while, before falling
out of favor again. It was especially popular with high performance embedded
applications and graphics processing. At one point SGI had a graphics engine
wiht 16 i860s running in parallel. Intel and Hewlett-Packard signed an agreement
to merge some of the HP Precision Architecture into the Intel Itanium family
processor.
Other notable designs are the Intel 432, a very CISC-oriented machine that
had many OS kernal operations built into hardware (and nearly bankrupted
the company), and the many DSP (Digital Signal Processor) chips, such as
the Texas Instruments TMS320Cxx series, which are widely used in embedded
applications (such as modems, digital phones, etc.).
Staran was the first commercially successful massively parallel processor,
and is still being sold as the ASPRO.
MPP was a one-of-a-kind processor built for NASA to process the vast amount
of data being gathered by earth-resources satellites. It had 16,384 1-bit
processors arranged in a grid. Another similar design is the AMT/CPP DAP.
The Transputer was a parallel processor cell with communication channels
and the ability to be configured into different network topologies. The
first version was marginally successful. A second generation was repeatedly
set back by delaysand never went into volume production.
The Connection Machine CM-2 was like the MPP with the addition of a second
communication network for routing data. A significant contribution of the
system was the software support for virtual processors, which simplified
programming of problems with large data sets. The CM-5 used up to 64K SPARC
processors (none was ever built with this many) arranged in a fat-binary-tree
topology with separate networks for data, control, and diagnostics. It operated
in a mode known as Single Program, Multiple Data (SPMD) in which a single
program is replicated across the nodes, but unlike SIMD, the processors
can take branches independently and communicate with each other asynchronously.
Neither machine is in production any longer.
Based on work at CalTech, Intel developed a multiprocessor consisting of
up to 128 of their microprocessors (80286, 80386, i860) connected via a
network. Based on work at Carnegie Mellon, they also developed a systolic
array processor, the iWarp. They then built a machine (Paragon) that combined
the communication facilities of iWarp with the processing power of the i860.
Subsequently, they built a one-of-a-kind parallel processor for Sandia National
Laboratory containing 9632 Pentium-Pro processors. This was the first general
purpose machine to exceed 1 trillion floating point operations per second
of peak performance. It has since been exceeded by production machines from
IBM (SP series).
Since the early 1990s, parallel processing has largely turned to inexpensive clusters of PCs operating with standard or slightly enhanced networking hardware. Although clusters suffer severe performance penalties (sometimes running slower than a single processor) for some applications, they are highly cost effective for applications in which computations are largely independent of one another. This phenomenon has undercut the supercomputing industry, and contributed to Cray Research (which had shifted to parallel processing, selling a system (T3D, and T3E) based on the DEC Alpha and a high-performance custom interconnection network) being sold to SGI and then later to Tera Computer Systems (which had worked for many years to deliver a high-performance multithreaded parallel processor supercomputer).
A small segment of the user community still depends on the kind of processing
that can only be obtained with vector supercomputers like the Cray. Even
high performance parallel processors like the SP and Origin are not suitable
to their applications, which require an extremely high rate of communication
among the processors. But there is not enough of a market to justify the
huge investment of resources required to build a new supercomputer from
scratch. Remaining manufacturers of
such machines besides Tera/Cray include Hitachi, NEC, and Fujitsu.
Other formerly produced parallel systems include nCube, Kendall Square Research,
WaveTracer, MasPar, Butterfly, etc.