Lecture 2
Technology and Economics
of High Performance Systems
The basic technology of computer architecture today is
the VLSI chip. To understand the capacity and the rate of progress in VLSI, it
helps to have an understanding of the process by which a chip is built. The
first step is to create the substrate for the chip, which is the platform upon
which all of the circuitry is constructed. Most chips today are built on
substrates formed of very pure crystaline silicon.
Silicon begins as sand (SiO2) that is purified into silicon of 99.9999999% purity. In
addition, very old sand is employed so that fewer radioactive isotopes of
silicon are present in the purified material. If a radioactive atom of silicon
decays after it has been built into a chip, the ejected decay products have
sufficient mass and energy to damage the circuitry of the chip.
The pure silicon is then turned into a crystal ingot
that is grown through a very carefully controlled deposition process. A typical
ingot used today is 12 inches in diameter (30 cm), and 12 to 18 inches long.
Ingot diameter has increased steadily from about 2 inches (5 cm) in the early 1970s, to 4
inches (10 cm) in the late 1970s, 6 inches (15 cm) in the mid 1980s, 8 inches
(20 cm) in the early 1990s, to the current size in the late 1990s. The ingot is
machined into a uniform cylinder and then a flat is ground into one side to
facilitate the manufacturing process.
The cylindrical ingot is then sliced by diamond saw
into thin wafers that are polished and coated with a protective layer of SiO2 insulation (about 0.05 µm thick).
A material is then applied that is resistive to acid as long as it is not
exposed to ultraviolet light. The photoresist is then exposed to an ultraviolet
light source through a lens system and a 10X size mask made of quartz that has
lines drawn in chromium on its surface. Quartz is used because it is has an
extremely low coefficient of thermal expansion, thereby reducing variations in
the positioning of the drawn lines. The lines themselves are drawn directly
from computer aided design files using very high precision positioning tables
and steered electron beams.
After the photoresistive layer is exposed to
ultraviolet light, the exposed areas are washed off with a solvent and then the
exposed SiO2 is etched away. A very
thin layer (<100 Å) of SiO2
is then applied, called the gate oxide. A conductive layer of polysilicon
(non-crystaline silicon) is then deposited. Another layer of photoresist is
added and a new pattern mask is used to expose it. The exposed resist is washed
away and the exposed polysilicon and gate oxide are etched off. The chip is
then doped with impurities that turn the exposed silicon into a semiconductor,
a process called diffusion. A typical transistor now has the following
geometry:

The
photoresist is then removed and a new layer of SiO2 is added. Another photoresist/etch cycle cuts holes through
the SiO2 insulator and then aluminum
is deposited to form wires that contact the transistor through these holes.
(Copper is used in additional layers in some processes. The most modern
processes are actually built on a thin layer of insulator.)

In modern technology, there are actually many more
layers and steps than this. Early MOS processes used one type of doping to
create either positive or negative channel transistors. Complimentary MOS
(CMOS) devices today use both positive and negative channel devices in
complimentary configurations. While this requires more manufacturing steps and
more transistors, the advantage is that less power is required and so the devices
are smaller and faster. Let's consider the reason for this.

In an N-channel device such as the one illustrated,
the source and drain have been doped so that electrons carry the flow of
electricity, while the substrate carries current by the movement of
"holes" -- positive ionizations of its atoms. The difference between
the charge carriers prevents current from flowing. However, applying a negative
charge to the gate creates a field that drives holes out of the surface of the
substrate, creating a thin channel in which electrons are carriers. Thus, a
current of electrons can flow between the source and drain in either direction.
The device gets the name "field effect transistor" from this
behavior. The transistor thus acts as a switch that is controlled by applying
current to the gate.
A P-channel device works similarly, however, it
requires that an N-well be formed in the P-substrate to contain the two
P-diffusion areas. Of course, a positive charge is applied to the gate to drive
the electron carriers away from the surface of the well, and create a thin
channel of holes.

The P-channel device is not as efficient as an
N-Channel device because electrons have two to three times the mobility of
holes. However, we can compensate for this by increasing the size of the
transistor to decrease its resistance. Early MOS devices were P-channel because
it was easier to control the doping of the small wells than of an entire
substrate. Once it became possible to use the substrate as the channel, NMOS
devices were built that were more efficient. However, to create an inverter in
NMOS, one had to build a circuit in which a switch connects Ground to the
output and Vdd is connected to the
output via a resistor:

Thus, when the transistor is on, the output is
grounded and the resistor prevents a direct short between power and ground.
When the transistor is off, then Vdd
charges up the output via the resistor. Thus a 1 input produces a 0 output and
vice versa. More complex gates are based on this same sort of circuit. The
trouble is that the resistor dissipates a considerable amount of power when the
transistor is turned on, and when the transistor is turned off, it introduces a
considerable delay into the positive charging of the output (the charge time is
proportional to resistance times capacitance, so a high resistance lengthens
the charge time). Thus a negative-to-positive signal transition is slower than
the reverse.
By combining the two types of transistor, however, we
get a circuit that has lower power dissipation and equal transition times in
both directions.

In this circuit, when the input is 1 the N-channel
device turns on and the P-channel device turns off. Thus, current flows between
ground and the output with little resistance and there is little power dissipated
through the very high resistance of the P-channel device. When the input is 0
then the configuration reverses and there is again very low power dissipation.
Because of the reduced power dissipation, it is
possible to pack many more transistors onto a chip without burning it out.
Thus, the density and size of chips is enabled to grow significantly.
A technology that gained some popularity in the mid
1990s is called BiCMOS because it combines faster bipolar transistors (that
nest N, P, and N wells within each other, creating a larger channel) with CMOS
transistors on one substrate. Of course, it involves more manufacturing
steps. BiCMOS also dissipates more
power as heat (because it takes more power to charge the P well and create a
carrier path than to create the field of a MOSFET). Subsequent developments in
CMOS, such as Silicon on Insulator (SoI) and copper interconnect have made the
use of BiCMOS less attractive, except in certain circumstances, as similar
speeds have been obtained with less power dissipation. In 2001, Motorola
developed process for combining CMOS with Gallium Arsenide (GaAs -- another
semiconductor). Their goal in doing this was to enable a single chip to support
signal processing and also contain the high frequency radio circuitry used in a
cellular phone. While this is a niche application, the other feature of GaAs is
that it is optically active, and can be used for semiconductor lasers. Thus,
this advance in technology may enable optical interconnect between chips or
even across chips.
Our example has followed the creation and operation of
a single transistor, but of course the process takes place in parallel for all
of the transistors on a chip and in fact for all of the chips on a wafer
(although in some cases the photo-exposure process is done chip-by-chip with a
precision stepping table, a technique called direct-step-on-wafer (DSW)). Thus,
there is an economic incentive to increase the size of the wafer so that more
chips can be created at once. This is balanced by the increased difficulty of
manufacturing a larger defect-free wafer and of maintaining consistency of
processing across a larger surface area.
A wafer fabrication process line is a carefully tuned
combination of machinery and chemistry that must be continuously monitored and
maintained in order to consistently yield good chips. If a line is shut down
for any length of time (say due to a natural disaster cutting off power), then
the line is said to "go sour" and must be fully cleaned and
restarted. It can take weeks for a line to reach a production level of thermal,
chemical, and cleanliness stability.
Even the slightest chemical contamination can ruin the
quality of product emerging from a fabrication line (or fab). The clean rooms
in which fabrication is done achieve levels of as few as 10-6 particles of micron size contaminants
per cubic foot. Workers are not allowed to wear makeup, must wear dust-free
bunny suits and face masks, and must go through an elaborate cleansing process
before entering the clean room. Inside the clean room, filtered air is pumped
in from above and sucked out through a gridwork in the floor so that any
particles are immediately drawn out of the room. In addition, the fabrication
facility is isolated from any external vibration that could disturb the
delicate machinery. A modern fab line can cost as much as $2B to construct.
Such an investment clearly requires a large profit margin for its products, as
the typical life of a line is about 6 years with only 3 of those years being
prime. Of course, continual upgrading of the line can hold off obsolescence,
but eventually the facility must be extensively refitted in order to keep up
with advances in cleaning technology. The cost to build a new fab line roughly
doubles with each generation, which has some staggering economic implications
if Moore’s law (doubling performance every eighteen months) is to
continue for much longer.
Once chips have been created on wafers, they go
through a testing process. Test structures on the wafer are probed to determine
the general quality of the circuitry on the wafer, and it is possible that an
entire wafer could be rejected. Accepted wafers are then probed chip-by-chip,
with very fine wire probes being used to contact special test ports on the
chips. The good chips are identified and then the wafer is "diced"
either with a diamond saw or by scribing and snapping. The bad chips are
discarded and the good chips are packaged.
There are a variety of packaging techniques. The
approach that was used for most of the 20th century involved gluing
the chip to an open carrier and then stringing fine gold wires between pads on
the chip and on the carrier. The wires are bonded through pressure welding --
essentially the end of the wired is slammed onto the chip (after being heated)
so that the energy of the impact causes the metals to fuse together. Toward the
end of the 1990s, a technique gained popularity in which solder is deposited
onto the chip’s connection pads, and the chip is then mounted face-down
onto a carrier with a matching pattern of pads and traces. This solder-ball
technique has the advantage that connections can be made virtually anywhere on
the chip, rather than only around its edges. Needless to say, there are
additional losses in the packaging process. Once the package has been sealed,
the chip is again tested and if it passes it is burned in (run at full power
for a short time to identify early failures due to thermal stress or other
causes).
Next the part is graded for speed and marked
appropriately. Even on the same wafer, variations in the processing can result
in chips with significantly different maximum speeds. The grading process
selects the best chips for higher pricing and slower chips are sold at standard
or discount prices.
It should be noted also that memory chips undergo
different processing than do processor chips. Because of the desire to obtain
the maximum density of memory devices, their extremely regular geometry, and
the fact that the primary purpose of a memory cell is to store charge, different
materials and deeper structures are used. These processes are incompatible with
the processes that are used to build chips containing mostly active devices.
Thus, the memory that one sees on a processor chip is less space efficient than
the memory on a RAM chip, and this is the reason that processor chips still
have relatively small on-chip memories. Even though a chip has a megabyte
cache, for example, it is serving a main memory that is potentially gigabytes
in size.
Progress of Technology
Processor chips have advanced at a slightly declining
rate of increasing transistor count by 37% per year in the 1970's to about 28%
per year in the 1990's, doubling in 2 to 2.5 years. Where does this increase
come from? Two places -- transistors get smaller and chips get larger.
Transistors shrink in size because the ability to
control photolithography and process technology improves. We learn to etch
finer lines in the silicon. In 1982, typical line widths were 3 µm. By
1997, 0.3 µm was common. Thus, the width of a line decreased by a factor
of 10 in 15 years. Because we are considering area, the effective increase in
"real estate" is a factor of 100, which corresponds quite closely to
an increase in transistor count averaging 36% per year over that period.
However transistor counts have actually not kept up with this average, in spite
of the fact that chips also grew in size by a factor of about 11 over that
time. Given that we should see an increase of effective transistor counts of
roughly 1100, why did they actually only go up by a factor of about 110?
Part of this discrepancy can be traced to the fact
that a chip is not entirely made up of transistors -- it also contains wires.
The wires are often wider than the transistors because aluminum is harder to
deposit and etch as cleanly as silicon, and because the wires are on top of the
other layers where they must follow an uneven surface (extra width giving them
a greater ability to do so without breaking). Also, aluminum is subject to a
phenomenon known as metal migration in which an electric current actually draws
metal atoms along with the electron flow and the metal can separate at weak
points. As chips grow larger and device counts increase, the length of wires
and their number also increase. Modern chips are in fact dominated by wire
area. This has lead to an increase in the number of layers of wires on a chip
to try to increase density. In 1982, chips had one layer of metal. In the mid
1990s, three layers were common and by the late 1990s four-metal processes were
in production. Some of those layers also switched to copper in the late 1990s,
as copper has lower resistance than aluminum. This may seem minor, but at high
frequencies it is a significant factor. Copper was not used originally because
it is harder to work with than aluminum. It tends to contaminate other parts of
the process, and being softer, it is also harder to control its deposition
precisely. IBM was the first to work out this process, and others followed soon
after.
Adding layers of metal sounds simple, but it is in
fact quite difficult. The metal, being the thickest layer, adds a great deal of
vertical relief to the surface of the chip. It is necessary to fill in the
valleys between these hills with an insulator that is then polished flat. Then
the insulator must be etched away to varying depths to create contact points
for the next metal layer. And the metal must be applied in such a way to fill
these deep wells and make contact when its natural tendency is to bridge the
holes.
As the sizes of transistors shrink, they also switch
more quickly, because the distance between the source and drain is reduced and
the capacitance of the gate (along with everything else) decreases. Thus, a
channel can be opened or closed more quickly. In addition, local runs of wires
are shorter, so that the time for a signal to propagate locally is reduced.
Processors have accelerated at nearly the same rate that transistor counts have
gone up.
However, long runs of wire actually have considerable
resistance and capacitance. The time to charge a wire is proportional to the
product of these, and it is the time for a wire to charge, rather than the
propagation time of electrons in a conductor, that determines signal
propagation time on a chip. Because resistance and capacitance increase as
device sizes shrink, the delay on long wires increases with a square factor. In
addition, because relative distance increases by a square factor as devices
shrink, the delay can be considered to have a 4th power factor in relationship
to feature size. Thus, some people predict that either die sizes will have to
shrink to accommodate delay as devices continue to shrink, or else chips will
have to be divided into distinct sections that communicate with each other
asynchronously. This is a concern even with the lower-resistance copper
interconnect, as clock rates have grown exponentially.
Another potential stumbling block in shrinking
transistors is something called the short-channel effect. When the gate region
of a transistor gets too small, electrons can tunnel through it whether it is
on or off. This requires adjustment of the doping to increase the
off-resistance. Eventually, however, the doping reaches a level at which the
field that can be generated by charging the gate can no longer create a
conductive channel. To put it another way, in order to build a tiny transistor
that can be turned off, we have to dope it to the point that it can’t be
turned on. The solution to this problem is to place gates on both sides of the
channel, and charge them together. The combined fields create a conductive
channel, and when they are removed, the transistor turns off. However, it
greatly complicates the processing to try to place a gate under the channel as
well as on top of it. IBM is once again the first to succeed in this, clearing
the path for at least a few more generations of shrinkage.
Memory technology does not suffer from the same
problems as processor chips. Because of its special processing and extreme
regularity, it increases in size at a rate of 60% per year, quadrupling in
three years. However, its speed has not increased at a comparable rate -- only
about 50% over 15 years. The reason is that DRAM uses a charge-bucket storage
cell. Each cell is effectively a tiny capacitor that can hold a very small
charge. The smaller the cell, the smaller the charge (although deepening the
cell keeps the decrease in charge capacity below expected amounts). In order to
read out this tiny charge, a wire is precharged and a switch is opened to the
bucket. If there is no charge in the bucket a tiny momentary drain is detected
on the wire. If the bucket contains a charge, then a tiny momentary surge is
seen on the line. A very sensitive differential amplifier and detector is used
to identify these momentary deviations in the charge on the wire. In order to
recognize the change, the detector must observe a steady state level on the
wire for a period of time so that it can calibrate itself. Memory design must
essentially trade off this time against minimizing the size of the charge
bucket. Market forces tend to demand greater density over greater speed, as we
shall see when we discuss the memory hierarchy. Thus, the size/speed tradeoff
has favored only a modest increase in speed.
Disk technology increases in density approximately 25%
per year, doubling about every three years. Like memory, the access speed only
increased by about 50% in the 15 years from about 20 ms in 1982 to about 10 ms
in 1997. However, costs have decreased dramatically. It is obvious that if
memory and disk technology continue to increase in density at the same rate
that memory will someday overtake disk in terms of density. However, the cost
differential between them will make disk a viable element of the memory
hierarchy for the foreseeable future. To see why this is so, consider that
making a wafer of DRAM takes many processing steps, while the creation of a
disk platter has far fewer steps. To work, the DRAM must have a precisely
patterned set of cells, while a disk surface is just a pattern less bulk
material. In fact, the difficult aspect of making a disk platter is to avoid
giving it a “pattern,” that is, to make it perfectly free from
unwanted variations.
Economic Model of Chip Manufacturing
This is extended from Hennesey and Patterson.
Creation of a chip involves a combination of recurring
and nonrecurring costs. The recurring costs are the costs of materials,
processing, testing, packaging, marketing, etc. The nonrecurring costs are the
design of the chip and creation of the mask set that is used for its
production.
Depending on the size of the chip and the complexity
of the process (i.e. how many steps are involved), the creation of a mask set
can cost from $10,000 to $250,000 (or more) today. For sample runs of chips it
is typical practice to place different chip designs on a single wafer, and so a
portion of the mask creation cost is split between the owners of the designs.
This nonrecurring cost must be amortized over the number of chips that are
eventually sold, so the cost of mask creation per chip is
mask cost / chips sold
For very long runs, the cost may be amortized over
some fixed initial number of chips, and then the remaining chips are no longer
burdened with this cost (or other nonrecurring costs). For smaller production
runs, it may be necessary to amortize the costs over all of the chips produced.
For simplicity, we use the floor function of the nonrecurring costs as we can
assume that once the cost falls below a penny per chip, it is no longer
applied.
The mask creation cost for a new processor chip is
typically dominated by the engineering cost in the total of nonrecurring costs.
The high cost of mask creation is a disincentive to producing test-runs to
verify a design. An even greater disincentive to such a trial-and-error approach
is the cost of revalidating a design after a change has been made to correct an
error. It may be necessary to simulate large sections of the chip with great
precision (sometimes at the electron-by-electron level) to ensure that, for
example, a new run of wire does not induce stray signals into other circuits.
Thus, the design process is a huge effort, with each step, proceeding from a
logical design through the chip layout, being simulated extensively and
validated against a design model. After the chip layout is complete, other
tools are used to scan the patterns, and extract the circuits that they create.
This independent extraction results in a new logical circuit that is also
validated. The goal of this process is to produce an error-free chip that operates
at its target speed on the first run of chips. Consider what it would be like
if you could not compile and run a program until it had been so thoroughly
verified that you were certain it would execute correctly, and you get a sense
of the amount of engineering effort that goes into a chip design. And, of
course, bugs still manage to slip through.
Nonrecurring engineering cost varies dramatically,
depending on many factors. A team of 20 engineers in a startup may work long
hours at low pay using a few dozen computers to bring out a novel design, while
a major manufacturer could use a team of hundreds of well-paid engineers and a
farm of a thousand computers for simulation to bring out a new generation of an
established processor. The startup may bring out a processor for just a $20M
investment, while the next generation of a commodity microprocessor may cost
$400M. It’s easy to see why there are just a few processor architectures
remaining in production. To recover these costs, many more units must be sold,
which requires development of new markets.
The recurring cost can be summarized as
Cost of IC = (Cost of chip + Cost of test + Cost of
package) / Final Yield
The cost of a chip is the cost of a wafer divided by
the number of good dies that are found in the initial testing.
Cost of Chip = Cost of wafer / (Chips per wafer *
Yielddie)
where Yield is the fraction of good dies (chips). A
6-inch wafer (15 cm) costs about $550 (2-metal CMOS, ca. 1990) and an 8-inch
wafer (20 cm) costs about $3500 (4-metal CMOS ca. 1996), and a 12-inch wafer
costs $5000 to $6000 (4 to 6 metal CMOS, ca. 2000). The exact cost of the wafer
depends on its size and on the complexity of the process (ie. number of steps).
The number of chips per wafer is the area of the wafer
divided by the area of the chip less the number of chip sites that are on the
edge of the wafer. If the wafer has separate test blocks, then we have to
remove these from the count as well.
![]()

Where D is the wafer diameter, A is the area of a
chip, and T is the number of test sites.
The Yield per wafer is based on the defects per unit area (Dunit), the die area (A) and a measure of the complexity of the process (P). Our cost model must also take into account that a certain fraction of wafers Ywafer are entirely bad (due, for example, to having a mistake occur in performing a step).
![]()
A typical value of P for a 2-metal CMOS process (ca.
1992) is 2, for a 4-metal CMOS process (ca. 1996) it is 3, for a 6-metal CMOS process
(ca. 2000) it is 4. More complex processes such as BiCMOS and GaAs are higher. Dunit was
about 0.6 to 1.2 in 1996, down from approximately 2 in 1990. In 2000 it was
down further to 0.4 to 0.8. Processor sizes in 2000 typically range from 0.1
sq. cm for an embedded design to 2 sq. cm for some aggressive high-performance
designs.
The cost of testing is based on the time to test each
die
![]()
Time on a tester costs from $50/hour to $1000/hour,
depending on the speed of the chip and the number of pins (imposing
requirements on the capabilities of the tester, which affect its cost) and a
typical chip takes from a few seconds to several minutes to test, depending on
the complexity of the chip and how much test-support circuitry is built into
it. Test circuitry enables the tester to put the chip into a mode in which it
can directly access portions of the chip that might be hidden from the normal
program model (e.g. cache write buffer, shadow registers, etc.) as well as
providing for a simpler test procedure. Designers balance the cost of the
additional test circuitry (which isn’t useful for the operation of the
processor) against the cost of test. In many cases, there is simply a test-cost
target and the designers incorporate just enough test circuitry to reach this
goal.
An example of a test technique is to provide a mode in
which some of the pins become a test port. The port connects to a special set
of data paths in the chip that allow a test pattern to be shifted into the chip
and loaded into its registers. The chip is then clocked at least once and the
new pattern of the registers is shifted out through the port and compared with
an expected result. The chip may include registers that are used only for
testing so that inputs and outputs of particular functional units that are not
normally buffered can be controlled and examined.
Another form of test circuitry provides the ability to
test a chip once it has been mounted on a circuit board without having to
directly probe its leads (in fact, some packages such as those using
solder-balls, are impossible to probe once mounted). The in-circuit testing
standard is called JTAG (Joint Test Action Group) boundary scan architecture.
It essentially connects the pads of the chip together with a shift register. In
test mode, the pad drivers can be switched to take signals either from these
shift register elements or from the actual wires connected to the bonding area.
A single 4-pin port is then used to shift signals into the pads and shift them
out again. The boundary scan architecture permits easy verification that the
chip has good connections to the outside world. It also provides for limited
testing of components that are connected directly to the chip.
For a low-power chip with only a few I/O pins a
package may cost only a few cents. Packages with enough pins for a typical
processor, but a low power level (< 1W), such as a plastic quad flat pack
cost a few dollars (e.g., $2). Higher power packages such as ceramic pin grid
arrays cost tens of dollars (e.g., $20 to $60) but can handle several watts.
For really high power chips, special heat-sinks, fan-sinks, or chilled-fluid
cooling may be employed, with costs of over $100 per chip. Mounting a chip in a
typical package costs about $2. Then the packaged part is burned-in, which
costs only about 25 cents. Finally, a fraction of the chips fail during burn
in, and their cost must be amortized over the remaining good die.
A typical wafer yield might is close to 100%, a
typical defect rate might be 0.6 per square cm, and a typical chip today is
about 1 square cm. Thus, we could expect to see fewer than one in three chips
actually work.
Doubling the size of a chip can more than double its
cost -- a 1-cm square chip might cost $25 while two 0.5-cm square chips would
cost about $14.
On top of these base costs, as mentioned above, the
nonrecurring costs must be amortized. In addition, the final price reflects
profit. And more often than not, the price also reflects knowledge of what the
market will bear, and the price point of the competition.
Cost of Assembled Systems
A processor chip in 2000 accounts for about 23% of the
system cost of a PC. In 1996 it was only about 6% (down from 10% to 15% in
1990). The reason for this shift in relative cost is mainly due to a sharp drop
in memory cost. In 1995 the memory in a PC accounted for roughly 1/3 of the
cost (and nearly half in 1990), but it is now less than 5%. The cost of the
processor, which is indirectly related to its design, can have a broader
overall impact on system cost. A faster processor requires faster (more
expensive) memory. A faster processor and memory require a higher quality
circuit board, more power, and greater cooling capacity (although these are
about 3% of the total cost, unless chilled cooling becomes necessary).
Costs unrelated to the expense of the CPU are the I/O
circuitry (video (5% -- down from 14% in 1995, due mainly to lower memory
cost), network, disk (9%), secondary cache, etc.) which totals about 25%. The
monitor, keyboard and mouse are about 26% of the cost in 2000 (about the same
as in 1995, after an increase from 12% in 1990, due mostly to the additional
cost of a color monitor – early PCs typically had monochrome monitors)
and the chassis, cables, etc. which are about 4% of the cost.
The point to note in this discussion is that the part
of the system that we focus on is only about 23% of the total cost. Thus, our
design choices may seem to have only a modest effect on cost. In reality, poor
choices can have a further impact on cost by requiring additional I/O circuitry
or more expensive RAM. In a commodity market, there is very little margin
between a product that is profitable and one that is not. Thus, even modest
differences in cost can mean success or failure of a product or a whole company.
Given that a competitive system must have certain
features, and must meet certain market-driven cost goals, we can see that the
computer architect has little room to choose with regard to the cost of the
system. Why then worry about architecture? Because the customer is not
concerned with price alone, but with the price performance ratio. And it is in
the area of providing maximum performance within given cost limitations that
the architect's knowledge comes into play. Major manufacturers have comparable
technology for building processors. IBM may pull ahead briefly with copper,
SoI, double-gated technology, but then Intel and Motorola soon catch up. To
gain an additional performance edge, architecture come into play.