Timeline

Timeline

* 1989:
o HP begins investigating EPIC
* 1994:
o June: HP and Intel announce partnership
* 1995:
o September: HP, Novell, and SCO announce plans for a "high volume UNIX operating system" to deliver "64-bit networked computing on the HP/Intel architecture"
* 1996:
o October: Compaq announces it will use IA-64
* 1997:
o June: IDC predicts IA-64 systems sales will reach $38bn/yr by 2001
o October: Dell announces it will use IA-64
o December: Intel and Sun announce joint effort to port Solaris to IA-64
* 1998:
o March: SCO admits HP/SCO Unix alliance is now dead
o June: IDC predicts IA-64 systems sales will reach $30bn/yr by 2001
o June: Intel announces Merced will be delayed, from second half of 1999 to first half of 2000
o September: IBM announces it will build Merced-based machines
o October: Project Monterey is formed to create a common UNIX for IA-64
* 1999:
o February: Project Trillian is formed to port Linux to IA-64
o August: IDC predicts IA-64 systems sales will reach $25bn/yr by 2002
o October: Intel Announces the Itanium name
o October: the term Itanic is first used
* 2000:
o February: Project Trillian delivers source code
o June: IDC predicts Itanium systems sales will reach $25bn/yr by 2003
o July: Sun and Intel drop Solaris-on-Itanium plans
o August: AMD releases specification for x86-64, a set of 64-bit extensions to Intel's own x86 architecture intended to compete with IA-64. It will eventually market this under the name "AMD64"
* 2001:
o June: IDC predicts Itanium systems sales will reach $15bn/yr by 2004
o June: Project Monterey dies
o July: Itanium is released
o October: IDC predicts Itanium systems sales will reach $12bn/yr by the end of 2004
o November: IBM's 320-processor Titan NOW Cluster at National Center for Supercomputing Applications is listed on the TOP500 list at position #34
o November: Compaq delays Itanium Product release due to problems with processor
o December: Gelato is formed
* 2002:
o March: IDC predicts Itanium systems sales will reach $5bn/yr by end 2004
o June:Itanium 2 is released
* 2003:
o April: IDC predicts Itanium systems sales will reach $9bn/yr by end 2007
o April: AMD releases Opteron, the first processor with x86-64 extensions
o June: Intel releases the "Madison" Itanium 2
* 2004:
o February: Intel announces it has been working on its own x86-64 implementation (which it will eventually market under the name "Intel 64")
o June: Intel releases its first processor with x86-64 extensions, a Xeon processor codenamed "Nocona"
o June: Thunder, a system at LLNL with 4096 Itanium 2 processors, is listed on the TOP500 list at position #2
o November: Columbia, an SGI Altix 3700 with 10160 Itanium 2 processors at NASA Ames Research Center, is listed on the TOP500 list at position #2.
o December: Itanium system sales for 2004 reach $1.4bn
* 2005:
o January: HP ports OpenVMS to Itanium
o February: IBM server design drops Itanium support
o June: An Itanium 2 sets a record SPECfp2000 result of 2,801 in a Hitachi, Ltd. Computing blade.
o September: Itanium Solutions Alliance is formed
o September: Dell exits the Itanium business
o October: Itanium server sales reach $619M/quarter in the third quarter.
o October: Intel announces one-year delays for Montecito, Montvale, and Tukwila
* 2006:
o January: Itanium Solutions Alliance announces a $10bn collective investment in Itanium by 2010
o February: IDC predicts Itanium systems sales will reach $6.6bn/yr by 2009
o June: Intel releases the dual-core "Montecito" Itanium 2
* 2007:
o October: Intel releases the "Montvale" Itanium 2
o November: Intel renames the family back to Itanium.

Competition

Competition

The Itanium 2 competes in the enterprise server and high-performance computing (HPC) markets. Itanium's major competitors include Sun Microsystems' UltraSPARC IV+, Fujitsu's SPARC64, IBM's POWER6, AMD's Opteron, and Intel's own Xeon servers.

Throughout its history, Itanium has had the best floating point performance relative to fixed-point performance of any general-purpose microprocessor. This capability is useful in HPC systems but is not needed for most enterprise server workloads.

By 2005, Itanium systems accounted for about 14% of HPC systems revenue, but the percentage has declined as the industry shifts to x86-64 clusters for this application.

Supercomputers & HPC
Percentage of Top500 systems (x86 includes x86-64)

An Itanium-based computer first appeared on list of the TOP500 supercomputers in November 2001. The best position ever achieved by an Itanium 2 based system in the list was #2, achieved in June 2004, when Thunder (LLNL) entered the list with an Rmax of 19.94 Teraflops. In November 2004, Columbia entered the list at #2 with 51.8 Teraflops, and there was at least one Itanium-based computer in the top 10 from then until June 2007. The peak number of Itanium-based machines on the list occurred in the November 2004 list, at 84 systems (16.8%); by November 2008, this had dropped to nine systems (1.8%).

New Itanium implementations in high performance computing (HPC) are primarily for research areas (such as biochemical research) where typical workloads perform better on large, shared memory systems rather than distributed clusters. These systems typically have 16 to 64 processors, and are not comparable in size to the supercomputers on the TOP500 list.

Hardware support

Hardware support

Systems
Server Manufacturers' Itanium Products Company latest product
name from to name CPUs
Compaq 2001 2001 Proliant 590 1-4
IBM 2001 2005 x455 1-16
Dell 2001 2005 PowerEdge 7250 1-4
HP 2001 now Integrity 1-128
SGI 2001 now Altix 4000 1-2048
Hitachi 2001 now BladeSymphony
1000 1-8
Bull 2002 now NovaScale 1-32
Unisys 2002 now ES7000/one 1-32
NEC 2002 now Express5800
/1000 1-32
Fujitsu 2005 now PRIMEQUEST 1-32

As of 2008, several manufacturers offer Itanium systems, including HP, SGI, NEC, Fujitsu, Unisys, Hitachi, and Groupe Bull. In addition, Intel offers a chassisthat can be used by system integrators to build Itanium systems. HP, the only one of the industry's top four server manufacturers to offer Itanium-based systems today, manufactures at least 80% of all Itanium systems. HP sold 7200 systems in the first quarter of 2006. The bulk of systems sold are enterprise servers and machines for large-scale technical computing, with an average selling price per system in excess of US$200,000. A typical system uses eight or more Itanium processors.

Chipsets

The Itanium bus interfaces to the rest of the system via a chipset. Enterprise server manufacturers differentiate their systems by designing and developing chipsets that interface the processor to memory, interconnections, and peripheral controllers. The chipset is the heart of the system-level architecture for each system design. Development of a chipset costs tens of millions of dollars and represents a major commitment to the use of the Itanium. Currently, modern chipsets for Itanium are manufactured by HP, Fujitsu, SGI, NEC, Hitachi, and Unisys. IBM created a chipset in 2003, and Intel in 2002, but neither of them has developed chipsets to support newer technologies such as DDR2 or PCI Express.

The upcoming Itanium processor (Tukwila) has been designed to share a common chipset with the Intel Xeon processor EX (Intel’s Xeon processor designed for four processor and larger servers). The goal is to provide system development and cost-saving synergies for server OEMs, many of whom develop both Itanium- and Xeon-based servers.

Software support

In order to allow more software to run on the Itanium, Intel supported the development of effective compilers for its platform, especially its own suite of compilers. GCC, Open64 and MS Visual Studio 2005 (and later) are also able to produce machine code for Itanium. As of 2008, Itanium is supported by Windows Server 2003 and Windows Server 2008, multiple Linux distributions (including Debian, Gentoo, Red Hat and Novell SuSE), FreeBSD, and HP-UX, OpenVMS, and NonStop from HP, all natively. HP also sells a virtualization technology for Itanium called Integrity Virtual Machines. Itanium also supports mainframe environment GCOS from Groupe Bull and several IA-32 operating systems via Instruction Set Simulators. Using QuickTransit, application binary software for IRIX/MIPS and Solaris/SPARC can run via "dynamic binary translation" on Linux/Itanium. According to the Itanium Solutions Alliance, as of early 2008, over 13,000 applications are available for Itanium based systems, though Sun has contested Itanium application counts in the past. The ISA also supports Gelato, an Itanium HPC user group and developer community that ports and supports open source software for Itanium.

The software requirements for Itanium were criticized by Donald Knuth who said: "... The Itanium approach ... was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write"
Architecture
The Intel Itanium architecture

Intel has extensively documented the Itanium instruction set and microarchitecture,and the technical press has provided overviews. The architecture has been renamed several times during its history. HP originally called it PA-WideWord. Intel later called it IA-64, then Itanium Processor Architecture (IPA), before settling on Intel Itanium Architecture, but it is still widely referred to as IA-64. It is a 64-bit register-rich explicitly-parallel architecture. The base data word is 64 bits, byte-addressable. The logical address space is 264 bytes. The architecture implements predication, speculation, and branch prediction. It uses a hardware register renaming mechanism rather than simple register windowing for parameter passing. The same mechanism is also used to permit parallel execution of loops. Speculation, prediction, predication, and renaming are under control of the compiler: each instruction word includes extra bits for this. This approach is the distinguishing characteristic of the architecture.

The architecture implements 128 integer registers, 128 floating point registers, 64 one-bit predicates, and eight branch registers. The floating point registers are 82 bits long to preserve precision for intermediate results.

Instruction execution

Each 128-bit instruction word contains three instructions, and the fetch mechanism can read up to two instruction words per clock from the L1 cache into the pipeline. When the compiler can take maximum advantage of this, the processor can execute six instructions per clock cycle. The processor has thirty functional execution units in eleven groups. Each unit can execute a particular subset of the instruction set, and each unit executes at a rate of one instruction per cycle unless execution stalls waiting for data. While not all units in a group execute identical subsets of the instruction set, common instructions can be executed in multiple units.

The execution unit groups include:

* Six general-purpose ALUs, two integer units, one shift unit
* Four data cache units
* Six multimedia units, two parallel shift units, one parallel multiply, one population count
* Two floating-point multiply-accumulate units, two "miscellaneous" floating-point units
* Three branch units

The compiler can often group instructions into sets of six that can execute at the same time. Since the floating-point units implement a multiply-accumulate operation, a single floating point instruction can perform the work of two instructions when the application requires a multiply followed by an add: this is very common in scientific processing. When it occurs, the processor can execute four FLOPs per cycle. For example, the 800 MHz Itanium had a theoretical rating of 3.2 GFLOPS and the fastest Itanium 2, at 1.67 GHz, was rated at 6.67 GFLOPS.

Memory architecture

From 2002 to 2006, Itanium 2 processors shared a common cache hierarchy. They had 16 KB of Level 1 instruction cache and 16 KB of Level 1 data cache. The L2 cache was unified (both instruction and data) and is 256 KB. The Level 3 cache was also unified and varied in size from 1.5 MB to 24 MB. The 256 KB L2 cache contains sufficient logic to handle semaphore operations without disturbing the main arithmetic logic unit (ALU).

Main memory is accessed through a bus to an off-chip chipset. The Itanium 2 bus was initially called the McKinley bus, but is now usually referred to as the Itanium bus. The speed of the bus has increased steadily with new processor releases. The bus transfers 2x128 bits per clock cycle, so the 200 MHz McKinley bus transferred 6.4 GB/s and the 533 MHz Montecito bus transfers 17.056 GB/s

Architectural changes

Itanium processors released prior to 2006 had hardware support for the IA-32 architecture to permit support for legacy server applications, but performance for IA-32 code was much worse than for native code and also worse than the performance of contemporaneous x86 processors. In 2005, Intel developed the IA-32 Execution Layer (IA-32 EL), a software emulator that provides better performance. With Montecito, Intel therefore eliminated hardware support for IA-32 code.

In 2006, with the release of Montecito, Intel made a number of enhancements to the basic processor architecture including:

* Hardware Multithreading: Each processor core maintains context for two threads of execution. When one thread stalls during memory access, the other thread can execute. Intel calls this "coarse multithreading" to distinguish it from the "hyperthreading technology" Intel integrated into some x86 and x86-64 microprocessors. Coarse multithreading is well matched to the Intel Itanium Architecture and results in an appreciable performance gain.
* Hardware Support for Virtualization: Intel added Intel Virtualization Technology (Intel VT), which provides hardware assists for core virtualization functions. Virtualization allows a software "hypervisor" to run multiple operating system instances on the processor concurrently.
* Cache Enhancements: Montecito added a split L2 cache, which included a dedicated 1 MB L2 cache for instructions. The original 256 KB L2 cache was converted to a dedicated data cache. Montecito also included up to 12MB of on-die L3 cache.