Flash endurance enhancement


In the past few years we have witnessed an ever increasing usage of nonvolatile solid-state memory—notably in the form of NAND flash memory—in enterprise applications. This is mainly because of the superior random I/O performance and lower power consumption of solid-state memory compared to hard disk drives.

Furthermore, in order to reduce the total cost, multilevel-cell (MLC) flash memory technology is typically employed, as opposed to the conventional single-level cell (SLC) technology, at the cost of much lower reliability and modest latency loss.

In order for enterprises to guarantee high degrees of data integrity and availability, they must cope with the reliability degradation that comes with the usage of MLC technology.

We are designing signal processing and coding algorithms and schemes to enhance the reliability of MLC NAND flash memory and thus enable its employment in enterprise storage systems and servers. Our work includes advanced characterization and testing of flash memory chips to assess their raw performance and to extract and understand the various noise and distortion sources present in the writing and reading processes.

We are also developing comprehensive models of the write and read channel, and fit those with experimental data. These models are then used to guide the design of advanced signal processing schemes to mitigate the effects of such impairments as cell-to-cell interference, program and read disturb and distribution shifts due to cycling and/or data retention (Fig. 1).

Raw flash performance

Fig. 1.  Illustration of the use of signal processing techniques to improve raw flash performance. By optimizing detection thresholds, a large improvement in raw bit error rate as a function of program-erase cycles is achieved.

Error correction codes are integral modules of flash controllers in storage systems. Historically, BCH codes have been used to correct errors in flash chips. However, the error correcting power of these codes has been increasing exponentially with every flash technology generation, with 40–50 bits correctable per 1 kByte BCH codeword being the norm for the current 19–20 nm technology node (Fig. 2).

However, the industry is quickly approaching a regime of diminishing performance gains in return for large increases in complexity and thus silicon area and cost. In an effort to reverse this trend, alternative approaches to ECC design have recently been introduced in flash.

These approaches borrow from communications technology and are typically geared towards the use of soft information and low density parity check codes (LDPC). These codes offer the potential for much better decoding performance than their BCH counterparts at similar coding rates.

However, although BCH codes are easy to analyze and thus predict their performance at regimes of low error rates, LDPC codes are much harder to analyze and often necessitate simulation or implementation to verify their performance. Furthermore, the extraction of soft information from NAND flash chips requires multiple read operations and thus increases latency, which is at a premium for enterprise applications in particular.

Our work is geared towards addressing all the tradeoffs involved in selecting proper coding schemes and verifying their correction performance, which are critical tasks for the controller design in flash-based storage systems.


Fig. 2.  To cope with device reliability degradation, which comes as an effect of flash technology scaling, ECC correction strength needs to increase with each technology generation. This is even more pronounced with MLC technology, where the increase of ECC complexity is exponential.