Phase-change materials exhibit two meta-stable states, namely, a (poly)-crystalline and an amorphous phase of high and low electrical conductivity, respectively. Switching to the amorphous phase (the RESET transition) is typically achieved in less than 50 ns, but requires relatively high current, whereas the transition to the crystalline phase (SET) is slower, on the order of 100 ns.
PCM scores well in terms of most of the desirable attributes of a universal memory technology. In particular, it exhibits very good endurance, typically exceeding 100 million cycles, excellent retention, and superb scalability to sub-20-nm nodes and beyond. However, a number of technological challenges need to be addressed for PCM to become universal memory.
Apart from the necessary RESET current reduction and SET speed improvement mentioned above, a significant challenge of PCM technology is a phenomenon known as (short-term) resistance drift: The resistance of a cell is observed to drift upwards in time, with the amorphous state drifting more than its crystalline counterpart.
Drift seriously affects the reliability of MLC storage in PCM because of the reduced sensing margin between adjacent tightly-packed resistance levels. Therefore, effective solutions of the drift issue are a key factor of the cost competitiveness of PCM technology [2010-2].
At IBM Research in Zurich we are working on various aspects of PCM technology, including PCM materials and memory cells with a focus on the enablement of MLC storage [2011-5], as well as device architectures and system-level integration.
In particular, we conduct fundamental research on phase change materials to understand their properties and to guide the design of new materials with improved characteristics. We also apply finite element model simulations to study the impact of electrical transport and other material characteristics on memory cells [2014-1, 2014-3].
Furthermore, we engage in experimental characterization of PCM cells in various configurations, from single cells to large (multi-Mbit) cell arrays. Advanced characterization processes provide an abundance of data which serves as input for statistical modeling, and for the definition of effective algorithms that target memory reliability enhancement [2012-1].
We are conducting research into advanced signal processing and coding schemes to improve reliability by means of enabling higher storage capacity, longer data retention and higher endurance.
Moreover, we are designing and implementing novel circuitry for PCM chips in order to program and extract the memory cell information reliably, with low latency and efficiently, in terms of implementation area [2014-2, 2013-3].
Our research in memory reliability enhancement has led to the successful demonstration of reliable 2 bits/cell storage and long data retention in large arrays of PCM cells after they have been cycled 1 million times [2013-1].
Furthermore, recently we have experimentally demonstrated successful storage and retention of 2 bits/cell and 3 bits/cell data on PCM cell arrays that have been pre-cycled 1 million times and have also undergone environmental stress, with long exposure to temperatures up to 80C [2015-1].
This is the first time such levels of reliability have been reached with MLC PCM cell arrays, proving the viability of PCM technology for demanding enterprise (hybrid) memory applications.
In June 2012, IBM and SK hynix signed a joint development agreement to develop MLC PCM technology and to produce competitive PCM memory chips. This deal leverages IBM's expertise and leadership in MLC PCM technology on the one hand, and SK hynix's superior semiconductor manufacturing on the other, in order to introduce MLC PCM technology in future computing systems.
At the device level, we have developed a PCM-based storage subsystem, which is connected to the host over the PCI-e bus. In our research prototype, the PCM chips are connected to custom-designed PCM channel controllers and attached to a mezzanine card, which in turn is attached to an FPGA board.
Our custom PCM controller design employs a 2D channel configuration that allows the designer to trade off read performance for write performance, and vice versa, depending on the needs of their workloads. At a system level, we achieved a steady-state average latency of 35 μsec for random 4 kB reads and 61 μsec for random 4 kB writes. Most importantly, the latency is predictable and consistent: for several hours of sustained random writes, 99.9% of the requests completed within 240 μsec, and the highest latency observed was 2 msec [2014-5].
Conversely, for an MLC-based enterprise-class Flash PCI-e card we put to the same test, the latency for the 99.9th percentile was 3 msec (i.e., 12× higher) and the highest observed latency was 14 msec (i.e., 7× higher). Moreover, a TLC-based Flash SSD we tested showed a 99.9th percentile latency of 66 msec (i.e., 275× higher) and a highest observed latency of 122 msec (i.e., 61× higher).
For part of this research activity, we are exploring various ways of integrating PCM at a system level and at a cluster level, including use cases where PCM is used in the memory subsystem, as well as in the storage subsystem.
The goal of this project is eventually to integrate PCM at a cluster and data center level using low-latency networking and appropriate support from system software, thereby enabling new use cases for data-intensive applications.