Home  >  ADVANCED PACKAGING  > Stacked DRAM in a multicore system: a closer look...
Apr 2nd, 2012
Stacked DRAM in a multicore system: a closer look
A few weeks ago in Dresden “Design, Automation and Test” held its annual European conference better known as DATE. ARM, IMEC and the Swiss Federal Institute of Technology (EPFL) gave an interesting presentation on the “Performance and Efficiency of 3D Stacked DRAM in a Multicore System”. iMicronews thought it worthy of “A Closer Look”.
Send to a friend

The “Euro cloud” program is a 3D server on chip concept to integrate ARM processor cores with 3D DRAM for very dense, low power data centers for mobile cloud services for hand held devices.  The goal of this european commission funded project is to support hundreds cores in a single server and show the path that will make a data centre featuring 1 million cores viable.  At the heart of this program is the efficient coupling of high performance ARM®CortexTM processors with 3D memory technology  targeting the mobile cloud services from Nokia which will serve millions of “mobile handsets”. The project started in Jan-2010 and will complete in Dec-2012.

The key motivation for using 3DIC is the ability to address the BW and latency gap between processing cores and memory. The advantages of 3D stacked memory stacked on SoCs are the ability to increase memory parallelism with TSVs, and the much BW at much lower energy/bit for transfers between the SoC and the memory. ARM has designed a multi-core ARM Cortex-A9 based system to address server applications that achieves a high bandwidth at increased memory parallelism to achieve the desired performance.

Low-power ARM cores enable design of a many-core logic die
- Many core = high performance
- Low-power = low thermal which is critical to the technology

 The energy/bit for a TSV based interface is an order of magnitude lower than the contemporary low power DRAM interfaces (LPDDR2) and two orders lower against existing DDR3 interface, making the communication interface vastly energy efficient. A fully stacked 3D-IC system can also be stacked on a Silicon interposer (2.5D) to tradeoff power density with a relatively small increase in energy/bit and bandwidth. First generation JEDEC Wide-IO, operating at 1.2V, defines the memory interface in units of channels of 128 bits , with each channel giving a peak throughput of 3.2GB/s per channel.


This  presentation explores the effectiveness of the Wide-IO interface both from a performance and energy efficiency. They first explore the impact on performance when considering off chip LPDDR2 like memories vs emerging JEDEC standard of Wide-IO which can be implemented using Si-Interposer or in a DRAM on logic true 3DIC stack. Their analysis shows that although 3D-stacked DRAM, such as Wide-IO, allow for wider buses by provides increased pin density, the wider buses saturate in providing additional throughput. The authors propose that rather than increasing the width, more channels that are effectively managed by memory controllers lead to increased overall system performance.

Their modeling of low power and high power chips under natural vs forced convection is shown below. It is obvious that 2.5D offers advantages for challenging thermal performance systems. At a wide IO DRAM temp of 96 C they conclude that “not all applications are feasible”.



Sep 17th
Sep 11th
Sep 11th
Sep 11th
Sep 11th
©2007 Yole Developpement All rights reserved                  Disclaimer | Legal notice | To advertise
Yole Développement: Le Quartz, 75 cours Emile Zola, 69100 Villeurbanne, France. TEL: (33) 472 83 01 80 FAX: (33) 472 83 01 83 E-Mail: info @yole.fr