Working Notes of Distributed Computing Group
Intel ‘Nehalem’ Processor
The desktop version of the Intel ‘Nehalem’ processor – Core i7 – came out almost simultaneously with the
AMD ‘Shanghai’ processor but Intel decided to hold the server version until the end of March 2009. Recently
we made available a benchmarking report about Shanghai1 and here is an update on the long awaited Nehalem
processor where we compare it against the AMD flagship. Most of the benchmarking results presented here
(and even more tests) are available through the Disco Benchmarking Database (DBD).2
The Nehalem architecture is no doubt a big step for Intel. The delay, at least partly, was due to the company
wanting to make everything perfect from the start. Here is a brief summary:
45 nm process
Integrated memory controller
Triple channel DDR3
Hyper-Threading (HT aka SMT)
Turbo mode
Up to 8 MB shared L3 cache
New SSE 4.2 instructions
QuickPath Interconnect (QPI)
AMD had on-chip memory controllers for years. Having rejected the legacy of the Alpha processor in 2001,
now is the time for Intel to catch up. Finally the Front Side Bus (FSB) is abandoned in favour of a packet based
point-to-point on-die interconnect called QuickPath Interconnect (QPI, formerly CSI) which was designed to
compete with HyperTransport used by AMD.
Currently only quad-core two socket models are available. Introduction of four socket and eight-core models is
scheduled for later. These new models will be much less radical as the new processor design is modular. From
the design prospective the chip is divided into Core containing the actual cores and Uncore. The latter allows
adding or removing different features such as the size of cache, memory channels, QPI links and so on more
easily. The Uncore features together with the clock frequency make the product segmentation much more
diverse and as a consequence the selection of the right processor somewhat more difficult.
Stream Benchmark
Arguably, the major advancement in the