What do Virtualization and Cloud executives think about 2012? Find out in this VMblog.com series exclusive.
Almost a quarter century after Berlin, will the memory wall finally topple in 2012?
Contributed
Article by
Jim Finnegan, senior vice president, silicon engineering at Netronome
It was Professor Sally McKee of
Cornell University who first coined the phrase "memory wall", which
referred to the growing disparity in terms of the rate of processor performance
improvement (~55% annum) versus the rate of memory performance improvement
(~10% per annum). As was correctly projected, this would eventually result in
memory latency becoming the bottleneck in computer architectures. Moore's Law
(processor performance doubles every eighteen months) has continued unabated,
although in recent years the increasing speed of conventional single-core
processors has been replaced with multi-core architectures.
The "memory wall" problem is even more
exacerbated in networking applications where there can be no assumption of locality between packets, which arrive at a
rate of every 5 or 6 nanoseconds in the case of 100G Ethernet. Furthermore,
the real issue is no longer packet processing but
state-ful flow processing.
This makes the memory wall problem extremely difficult. Fortunately, there have
been clever ideas advanced by
individuals such as Raj Yavatkar of Intel, and Jayaram Mudigonda and
Harrick Vin of the University of Texas. Their paper
"Overcoming
the Memory Wall in Packet Processing: Hammers or Ladders" describes two
principles which can be used to good
effect.
-
"Hammers" - meaning the mechanisms to
exploit locality in the workload to reduce the number of costly off-chip memory accesses and to reduce
the average access latency; and,
- "Ladders" - which refer to hardware
multi-threading and asynchronous memory to exploit inter- and intra- packet
parallelism, respectively, in order to hide external memory access latency.
These foundational concepts have been
significantly elaborated and implemented to great effect in Netronome's family
of NFP (Network Flow Processors) which have hundreds of hardware threads and a
much patented intelligent processing on-chip memory unit. This is all good news
for the high performance networking system designer who can now
"smash-and-scale" the memory wall.
But, like Berlin, can we not just
topple the memory wall entirely? Well maybe...
There has been a huge amount of
advanced development applied to the TSV (Through Silicon Via) technique [1] with practical solutions
available to the fundamental challenges of fab processes, yield, reliability,
and power. The early beneficiaries of these techniques have been smartphones.
Meanwhile, the TSV techniques are starting to be deployed for stacking memory.
Samsung, for example, prototyped their WSP (wafer level stack process) in 2006
while others, large (e.g. Micron, Elpida, NEC) and small (e.g. Tezzaron) have
been demonstrating TSV stacked memory prototypes. Besides the manufacturing
innovations achieved through the TSV 3D stacking, additional progress is being
made with the development of new access protocols and efficiencies in signal
switching. The HMC (Hybrid Memory Cube) consortium is predicting significant
performance improvements (x15) and dramatic power reduction (70% less) for the
new DRAM memory architecture. This startling achievement is made possible by exploiting
two factors. The first is the more obvious one where the distance to memory is
much shorter than using external memory devices, whether discrete parts or
DIMMs, and where the path to the external devices includes transmission line
effects of connectors, packages and PCB trace. The second factor is the
development of low-power I/O transceivers which will exploit the best
combination of optimal area and energy scaling for the target silicon process
geometry, and the design of low swing voltage mode transmitters and the power efficiency of more
sensitive receivers.
Recent announcements auger well for
this technology. Intel and Micron demonstrated a prototype of the HMC on
September 15 based upon prior work where Intel demonstrated an I/O prototype
that achieved a mere 1.4 milliwatts per gigabit per second for the hybrid
stacked DRAM application. On October 10, Samsung and Micron announced that they
will collaborate in implementing an open interface specification for HMC, and on
December 1 IBM announced that it intends to manufacture HMC parts using IBM's
advanced TSV (through silicon via) manufacturing process at their Fishkill,
N.Y. fab.
The memory wall is already teetering,
and 2012 could actually be the year when it finally topples!
[1] the seminal reference is Philip Garrou's
"Handbook of 3D Integration"
###
About the Author
Jim
Finnegan has over 20
years of senior management experience in the networks and communications
business. His career has included extended periods based in Ireland, the UK and
Santa Clara, CA, where he now resides.
Jim has had a
career-long commitment to the development of an engineering culture that expects
first-pass success using disciplined software and silicon development
methodologies. This focus has most recently been demonstrated in Intel
Corporation where he oversaw the development of Intel's entire portfolio of
network processors, several of which went into production in first-pass silicon.
Jim began his career at Miles-33 in the UK and subsequently worked in senior
engineering and management positions with Digital Equipment Corporation, Tellabs
and Racal Data Group. Following Racal, he built the engineering organization at
Basis Communications and was a key member of the executive team that negotiated
the acquisition of Basis by Intel Corporation in 2000. At Intel, Jim became
General Manager of both the Network Processor Division and the Communication
Infrastructure Group's Technology Office.
Jim has Bachelors
and Masters Degrees in Electronic Engineering from The Queens University,
Belfast.