Microarchitecture and performance
Pentium Pro 512 KB in PGA package
Pentium Pro 1 MB in PPGA package
Uncapped Pentium Pro 256 KB
Pentium II Overdrive with heatsink removed. Flip-chip Deschutes core is on the left. 512 KB cache is on the right.
Belying its name, the Pentium Pro had a completely new microarchitecture, a departure from the Pentium rather than an extension of it. The Pentium Pro (P6) featured many advanced concepts not found in the Pentium, although it wasn't the first or only x86 processor that did (see NexGen Nx586 or Cyrix 6x86). The Pentium Pro pipeline employed extra decoding steps to translate IA-32 instructions dynamically into buffered micro-operation sequences which could then be analysed, reordered, and renamed in order to detect parallelizable operations that may feed more than one execution unit at once. The Pentium Pro thus featured out of order execution, including speculative execution via register renaming. It also had a wider 36-bit address bus (usable by PAE).
Performance with 32-bit code was excellent and well ahead of the older Pentium at the time, by 25-35%; however, the Pentium Pro's 16-bit performance was approximately only 20% faster than that of a Pentium due to the fact that register renaming was done on full 32-bit registers only (this was fixed in the Pentium-II).
It was this, along with the Pentium Pro's high price, that caused the rather lackluster reception among PC enthusiasts, given the dominance at the time of the 16-bit MS-DOS, 16/32-bit Windows 3.1x, and 32/16-bit Windows 95 (parts of Windows 95, such as USER.exe, were still mostly 16-bit). To gain the full advantages of Pentium Pro's microarchitecture, one needed to run a fully 32-bit OS such as Windows NT 3.51, Unix, Linux or OS/2.
After the microprocessor was released a bug was discovered in the floating point unit, commonly called the "Pentium Pro and Pentium II FPU bug" and by Intel as the "flag erratum". The bug occurs under some circumstances during floating-point to integer conversion when the floating-point number won't fit into the smaller integer format causing the FPU to deviate from its documented behaviour. The bug is considered to be minor and occurs under such special circumstances that very few, if any, software programs are affected.
An innovation in cache
Likely Pentium Pro's most noticeable addition was its on-package L2 cache, which ranged from 256 KB at introduction to 1 MB in 1997. At the time, manufacturing technology did not feasibly allow a large L2 cache to be integrated into the processor core. Intel instead placed the L2 die(s) separately in the package which still allowed it to run at the same clock speed as the CPU core. Additionally, unlike most motherboard-based cache schemes that shared the main system bus with the CPU, the Pentium Pro's cache had its own backside bus (called dual independent bus by Intel). Because of this, the CPU could read main memory and cache concurrently, greatly reducing a traditional bottleneck. The cache was also "non-blocking", meaning that the processor could issue more than one cache request at a time (up to 4), reducing cache-miss penalties. (This is an example of MLP, Memory Level Parallelism.) These properties combined to produce an L2 cache that was immensely faster than the motherboard-based caches of older processors. This cache alone gave the CPU an advantage in input/output performance over older x86 CPUs. In multiprocessor configurations, Pentium Pro's integrated cache skyrocketed performance in comparison to architectures which had each CPU sharing a central cache.
However, this far faster L2 cache did come with some complications. The Pentium Pro's "on-package cache" arrangement was unique. The processor and the cache were on separate dies in the same package and connected closely by a full-speed bus. The two or three dies had to be bonded together early in the production process, before testing was possible. This meant that a single, tiny flaw in either die made it necessary to discard the entire assembly, which was one of the reasons for the Pentium Pro's relatively low production yield and high cost. All versions of the chip were expensive, those with 1024 KB being particularly so, since it required two 512 KB cache dies as well as the processor die.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment