Introduction
In magnetic recording—the central process in storage technology—the write head produces a write magnetic field that imprints its polarization upon the thin-film magnetic recording medium which forms the top layer of the disk. The design of the write head and disk recording medium is assisted by micromagnetic numerical simulation. The rapid increase in areal bit density (currently by a factor of 2× per year) is placing greater demands on all aspects of engineering design [1–3]. The difficulties being encountered may lead to a recording-technology paradigm shift from conventional longitudinal recording to perpendicular recording [4–8]. Simulating perpendicular recording is more demanding than simulating longitudinal recording and constitutes a major challenge in itself. Meeting this challenge is the subject of extensive current research [9–21].
The paradigm shift in recording technology is motivated by micromagnetic scaling, wherein higher bit density is linked to higher write fields. In longitudinal recording—the technology embodied in all existing products [1–8, 22, 23]—the write field lies in the disk plane, and there is a large air gap between the write pole pieces. This geometry keeps the write field relatively weak, which limits the achievable bit density. In perpendicular recording [4–8], a “keeper” layer of highly permeable magnetic material, the soft underlayer (SUL), is located between the recording medium and the disk body (Figure 1). The effective air gap between the high-permeability SUL and write-head elements of the magnetic circuit can be defined as the physical air gap plus the thickness of the high-coercive-field recording medium. This gap, which the magnetic field must cross, is reduced to a few tens of nanometers, enabling a substantial increase in the write field. The orientation of the written bit now follows the vertical polarization of the magnetic field in the medium (Figure 1), and hence lies perpendicular to the disk; thus, a recording medium with uniaxial anisotropy perpendicular to the disk plane is used.
Figure 1
In perpendicular recording, the write field is strongly dependent on such disk features as SUL properties and recording medium thickness, and it is expected that numerical simulations will be essential to permit the rapid tuning required in engineering design [1–8]. Simulation of the write head and disk combination necessitates including part of the head and a portion of both the recording medium [24, 25] and SUL in the simulation cell (Figure 1). The combined system is relatively large, on the scale of the required resolution, leading to a large number of elements; typically, a simulation cell of dimensions 2560 × 1280 × 620 nm may be required, with a resolution of 2 nm in the recording medium and 10 nm elsewhere, leading to of the order of N = 3 × 106 elements. The additional requirements that the simulation be run for about 105 time steps with an ensemble of at least 20 independent systems, and for numerous engineering datasets, turn this problem into a major challenge for simulation techniques.
To complete realistic simulations in a usefully short computation time, it is essential that a time step be completed in a linear time of order N or N log2 N, which we term scalability. A range of fundamentally different approaches is currently under investigation to achieve scalability in micromagnetics simulations. These approaches include finite element [9–13], multipole expansion [14–18], and the present technique [20, 21], with historical roots in earlier, not fully scalable, techniques [26–30].
Other applications of micromagnetics, such as to spintronics [31], magnetostrictive actuators [32], and novel approaches to computation [33], may also benefit from the development of scalable computational techniques.
In addition to scalability, the code must also be parallelizable, although large-scale parallelism is not required owing to the existence of the data parallel element of an ensemble of a large number of independent systems. In this paper we describe the scalable approach to magnetic recording that was developed in collaboration among personnel at the IBM Yorktown and Almaden research facilities. We also include the results of tests and an example of a write simulation.
Micromagnetic modeling
The discretization used in our methodology subdivides the orthorhombic simulation cell into orthorhombic elements which, in the present paper, are taken to be cubic elements of side d (the general methodology [20, 21] allows orthorhombic elements and multigridding). The number of elements in each side of the cell is defined as Nx, Ny, and Nz; the total number of elements is N = NxNyNz. A given element is denoted by the three-dimensional (3D) index i = (ix, iy, iz), where 0 < i < N , and = x, y, or z. (Note: Bold typeface indicates terms that are vectorized.) The magnetization (magnetic moment per unit volume) within each element i is assumed to be a constant, mi.
In micromagnetism, the time evolution of the magnetization of each element is controlled by the Landau–Lifschitz–Gilbert (LLG) equation,
 | (1) |
where g is the gyromagnetic ratio, i is the damping constant [34], msi is the saturation magnetization in the material, and Hi is the effective magnetic field. Hi consists of the contributions
 | (2) |
and may also include a Langevin noise term [35]. Here the first term is the exchange interaction, with A the exchange constant ( i is a unit magnetization vector, and nn = nearest neighbor). The second term is the uniaxial anisotropy, with Kui the anisotropy constant and the unit vector
defining the anisotropy axis. The third term is the long-range dipole–dipole (demag) interaction, and the fourth term is the external head field, which drives the write process.
In implementing the computation (Figure 2), the LLG equation is integrated [36] for one time step, updating the magnetization in all elements. Then Hi is updated, the inputs to this calculation being the set of updated magnetizations {mi} and the head field at that time step (the head field is a function of the bit sequence to be written and of the relative head–disk velocity).
Figure 2
The computational cost of most of these tasks, the LLG integration, the nearest-neighbor exchange, and the local anisotropy term, are of order N and hence scalable. However, the demag and external head field terms are in principle order N2 in computing time and are not scalable. Let us look at the demag field result in more detail.
Demag field
The expression for the demag field is written as a sum in real space,
 | (3) |
where
 | (4) |
and, defining
 | (5) |
Here the “response function” R ß(i) is the average magnetic field at element i caused by the magnetization within element 0, located at the origin. The response function R ß(i) has been written down analytically for the general case of two equal orthorhombic elements [26–30].
At long range the response function falls off as the inverse cube of distance, a very slow fall-off that necessitates summing over the entire simulation cell, giving unacceptable N2 scaling for the computation time of the demag field as a real-space sum.
Rapid-convergence approach
A scalable solution to the problem of calculating the demag field HDM is obtained by splitting up the calculation into a short-range part, calculated in real space, and a long-range part, calculated in Fourier space. The decomposition is written symbolically as
 | (6) |
where
 | (7) |
Here the subscript R implies a sum in real space, and K a sum in Fourier space; hR is designed to be a rapidly convergent sum in real space, and HK a rapidly convergent sum in Fourier space. The real-space part hR is the difference between the true demag field and the Fourier-space part expressed in real space (HR = HK).
In this technique, the long-range (in space) demag interaction is broken up into the sum of two components hR and HK, both of which are short-range in their respective spaces. However, HK primarily takes care of the long-range (in space) piece of the original dipole–dipole interaction, while hR primarily takes care of its short-range component. The real-space sum hR can be adjusted to fall off as a high inverse power of distance, while the K-space sum HK falls off with a controllable exponential factor. Hence, the combination enables calculation of the demag field with reasonable effort even if high accuracy is demanded.
Real-space part
The real-space part hR is derived in the same way as the demag field in Equations (3)–(5) in the foregoing section, except that a newly defined interaction function,
 | (8) |
replaces the dipole–dipole interaction  ß(r) in Equation (5). The new interaction function ß(r) contains a set of 2L arbitrary constants, p and ap, which are used to control its leading fall-off to the functional form 1/rL+3. For example, the L = 6 parameter set in Table 1 leads to a fall-off in the interaction function as 1/r9, which is a vast improvement on 1/r3 and is adequately short-range. The response function using the short-range expression (8) is a generalization of the standard one [26–30] and can be expressed analytically [20, 21].
|
Table 1
Coefficients for weighting function (K). |
|
|
|
|
|
p (d = 1 unit) | ap |
|
| 0.5 | 252 |
| 0.6 | -1050 |
| 0.7 | 1800 |
| 0.8 | -1575 |
| 0.9 | 700 |
| 1.0 | -126 |
|
Fourier-space part
A two-dimensional discrete Fourier transform (DFT) is defined in the xy-plane of any given layer i (a layer is normal to the z-axis). We use a notation in which bold capitals, e.g., I = (ix, iy), denote the two-dimensional (2D) index of an element within a layer. A function fI,i is transformed into fi(K) by the DFT,
 | (9) |
where the notation XI denotes the 2D location XI = Id of the element, and in Fourier space, K = 2 d-1(mx/Nx, my/Ny) (ma integers) is the 2D wavevector.
In this notation, the Fourier part of the demag field is given by
 | (10) |
where, defining the reciprocal lattice vectors G = 2 d-1(nx/ny) (n integers), and noting that, from (9), mi(K + G) = mi(K),
 | (11) |
and
 | (12) |
Here, form factors f, h, and g for the cubic element appear,
 | (13) |
together with vectors defining the divergence (div) operation in this representation,
 | (14) |
The constants p and ap reappear in the cutoff function
 | (15) |
in which exponential convergence of the sums (11) and (12) occurs at the rate exp(-K m), m being the minimum of the p.
The quantities i±(K) are given by the recursion relations
 | (16) |
with the boundary conditions
 | (17) |
Note that mNz,z(K) is the DFT of the magnetization in the external write head and Nz - 1 the top cell layer.
Application
There is a straightforward interpretation of the Fourier results (10–17). In (16), a magnetostatic charge density for a given K in each layer is created by applying the div operator [i.e., multiplying by the v±T(K) vectors] to the magnetization, and the magnetostatic potential from all of the layers is then summed by the recursion relation to give the i±(K) in each layer. In (11), multiplying by the form factor hf brings in the cubic shape of the elements in which the magnetostatic charge densities are located. The div operator is applied again to obtain the magnetic field from the magnetostatic potential, and finally another factor of hf averages the field over the element.
The demag field is obtained as the sum of the K-space part, the inverse DFT of (10), and the real-space part, (3) and (4) with (8) instead of (5). Numerically, the DFT is performed as a fast Fourier transform (FFT), giving a scaling as 6N log2(NxNy), there being six DFTs. The remaining parts of the K-space part, in particular the recursion relation, scale as N, as does the real-space part, owing to the extremely short range of the real-space interaction. In practice, the fast FFT used [37] is found not to dominate the calculation, which scales almost as N. Note that, by working with magnetostatic potential, the methodology expressed in Equations (10)–(17) gains a factor of 6 relative to methods [26–30] based on magnetic field, in addition to scalability.
Testing the methodology
A complete micromagnetic code, the Almaden–Yorktown Micromagnetic (AYM) simulator, has been developed. The simulator incorporates an LLG equation integrator, all of the local terms in the effective magnetic field H, and the new scalable approach to calculating the demag field. The real-space response function using (8)—which is a difference formula with short range—is calculated and stored for use at run time in a precomputation step using the exact, analytic formulae derived by us [20, 21].
First, let us check two major assertions: that the novel demag field technique is capable of high accuracy and that computing time scales linearly with system size. Second, we discuss parallelization of the code.
Accuracy and scalability
Figure 3 shows computing cost plotted as a function of demag field accuracy for our rapid-convergence methodology. High accuracies of the order of 10-6 are obtainable, with computational cost increasing by a factor of only about 2 for every order of magnitude in accuracy.
Figure 3
In Figure 3, the demag calculation has been implemented for six sets of parameters covering different choices of the ranges in real space (defined by L) and in K space (defined by m). The optimal (lowest curve) choices were a combination of m 0.70, with the real-space cutoff parameter transitioning from L = 5 (low requested accuracy) to L = 7 (high accuracy), showing that indeed the methodology of breaking up the long-range interaction into real-space and K-space pieces works. The optimal curve in Figure 3 is roughly straight as computer-processing-unit (CPU) time plotted against log of accuracy, showing that computational work increases only linearly, while the specified calculational precision increases exponentially.
In Figure 4 we illustrate the increase in computing time (on a relatively slow machine) with increasing system size. It is seen that the scaling with system size N is almost perfectly linear, the very slight supralinearity being due to the log term arising in the FFT work.
Figure 4
Parallelization
The method used for parallelization in the AYM code is to assign each layer i to a different processor. An advantage of this assignment is that FFT operations, which are intrinsically expensive in terms of communications bandwidth, are local to a processor. With regard to communications, the processors are structured to form a chain connected in the order of the layers i. The most global communication step occurs in the K-space calculation, where the recursion relation for i±(K + G) involves passing packets of 2nGNxNy words, where nG is the number of G vectors in Equations (11) and (12) down the entire chain in both directions.
Figure 5 illustrates results for efficiency of parallelization on various numbers of processors. In Figure 5(a), the parallelization on an IBM scalable parallel (SP*) system is seen to work up to about ten nodes, which is adequate for the present problem since there is a large data-parallel element not included in Figure 5(a), but which in practice will demand multiple nodes. Figure 5(b) represents results for a rather poorly connected system—an Intel** cluster in a 100-Mb/s Ethernet network; parallelization efficiency is poorer than in the SP system case. Analysis shows that bandwidth limitations in the communication system are responsible for the difference.
Figure 5
Summary of tests
It is seen that the rapid-convergence approach indeed achieves high accuracy and that the methodology is scalable with computing time proportional to system size. The parallelizability extends to an order of ten processors, which is adequate because the problems in simulating perpendicular recording involve extensive data parallelism—a factor of at least 20 for the number of ensemble members. Hence, the AYM code would work well on roughly 200 well-coupled processors.
Simulation of writing a tribit
Specifics of simulation
Realistic simulation of the data-writing process must take into account the specific structure of a typical data layer, the thin-film recording medium shown in Figure 1. Whereas the magnetically homogeneous head material and SUL may be modeled adequately using 10-nm cubic elements, the data layer is a grained material in which the grains have a large degree of magnetic independence of one another. The grain size distribution is log-normal, with an average grain diameter of 10 nm, but the grain shape must be modeled on a grid with a 2-nm resolution in order to capture the grain morphology. The methodology allows multigridding, but the choice of a 2-nm grid introduces a factor of 25 into FFT computing time within the data layer, raising problems of computing efficiency.
The solution adopted, described in more detail in [21], is again to exploit the mixed K-space/real-space representation in which the demag field is calculated in two pieces, a short- and a long-range piece. In the long-range piece, the magnetization is computed in Fourier space using a relatively coarse grid, such as 10 nm. In the short-range piece, a set of intergrain interactions—calculated to complement the long-range piece so that they sum to the correct result—are precomputed. The intergrain interactions turn out to be short-range, so again the computation is in linear time.
The simulation using the AYM code employs the basic setup illustrated in Figure 1. A write field is applied via (17) as a time-varying charge sheet to the top of the write and return poles where they intersect the upper surface of the simulation cell. A fairly realistic parameter set (see [21]) was employed. Results are presented for a single-member ensemble run on a single 2-GHz Intel processor using a 64 × 64 × 14-element array with a nonuniform z-spacing, and one grained data layer, for 30,000 time steps, a run that took five hours. The results are presented in Figure 6.
Figure 6
Simulation results
The initial magnetization was taken in the z-direction with polarization up in the data layer, and in the y-direction in the SUL and head layers. This initial condition in the SUL and head accounts for the asymmetric 45° structures generated in the SUL, which were also observed in previous simulations.1 The results presented in Figure 6 display the z-component of magnetization in the top SUL layer (left column) and the data layer (right column) according to the color codes shown in the legend.
Frame 5 in Figure 6 occurs after the write head has applied a “down” field for a short time and the magnetization in the data layer beneath the write pole has begun to turn over. In the SUL, the “down” magnetic field beneath the write pole is clearly evident, as is also the “up” field beneath the return pole. However, because the return pole has the same z-direction magnetic flux along with a larger area, its magnetic field is insufficient to turn over a significant fraction of the grains in the data layer. This effect is by design. It is important that the return pole not interfere with the write process. In the second time slice, the write field is beginning to turn off [left side of Frame 15], while the negative polarization of the data layer under the write head is complete.
Frames 16 and 23 show phenomena very similar to those just described, except that the write head field is now “up.” The result would be to obliterate the effect of the polarization written when the head field was “down,” but for the fact that the head has now moved to the left. Hence, a negatively polarized region of the data layer survives the magnetization reversal, forming a “down” bit.
Frames 28 and 32 show a repeat of the phenomena in Frames 5 and 15: A “down” head field is reversing the data layer magnetization beneath the write pole. Again due to head motion, an “up” polarized region in the data layer survives the second magnetization reversal, forming the second bit. Finally, in Frame 49, the write field is turned off, leaving behind a tribit, manifested by the blue/red/blue (down, up, down) pattern in the data layer.
This realistic simulation demonstrates the essential features of perpendicular magnetic recording, the “graininess” of the written bits, the repeatability of the write operation, and the relatively undisturbed nature of the region beneath the return pole. Perpendicular recording is shown to satisfactorily write 64-nm-length bits with appropriate engineering parameters.
The 64 × 64-tribit simulation took five hours of computing time on a 2-GHz personal computer (PC). The time required to implement a full engineering 128 × 128 simulation with 20 ensembles having statistically independent grain and
distributions in the data layer and independent Langevin noise distributions can be estimated on the basis of the foregoing scaling analyses. We then readily determine that the full simulation would take ten hours on a 40-way cluster built from machines equivalent in performance to the 2-GHz PC. Hence, with reasonable investment, the methodologies we have developed can lead to realistic simulations of the perpendicular recording write process overnight. It is clearly possible to test designs using detailed computational studies of perpendicular recording.
Concluding remarks
In this paper, a new suite of techniques is presented for improving the efficiency and accuracy of simulations of the perpendicular magnetic recording technology. The new methods were designed specifically to improve the calculation of the demag field in layered magnetic materials and to make the fast, efficient simulation of perpendicular magnetic recording feasible for sufficiently small computational cost so that many designs and materials can be tested. First, the accuracy, efficiency, and parallel scaling of the new techniques were demonstrated on challenging model problems designed to probe for weaknesses. The AYM code was then employed to perform a full-scale simulation of perpendicular magnetic recording, a nanoscale machine writing a tribit in a grained data layer, clearly demonstrating the applicability of the techniques to challenging and important technological problems.
Indeed, with a reasonable-sized PC cluster or IBM SP system, it is now possible to simulate the perpendicular recording process and, for example, to obtain overnight engineering statistics of the written bit profile.
Footnotes
*Trademark or registered trademark of International Business Machines Corporation.
**Trademark or registered trademark of Intel Corporation.
1M. E. Schabes and B. H. Lengsfeld III, private communications.
Received July 31, 2003;
accepted
for publication September 5, 2003; Internet publication February 24, 2004 |