IBM Skip to main content
  Home     Products & services     Support & downloads     My account  
  Select a country  
Journals Home  
  Systems Journal  
  ·  Current Issue  
  ·  Recent Issues  
  ·  Papers in Progress  
  ·  Search/Index  
  ·  Orders  
  ·  Description  
  ·  Author's Guide  
Journal of Research
and Development
  Staff  
  Contact Us  
  Related link:  
     IBM Life Sciences  
IBM Systems Journal  
Volume 40, Number 2, 2001
Deep computing for the life sciences
 Table of contents: arrowHTML arrowPDF arrowASCII   This article: HTML arrowPDF arrowASCII   DOI: 10.1147/sj.402.0297 arrowCopyright info
   

Computational protein folding: From lattice to all-atom

by Y. Duan and P. A. Kollman
Understanding the mechanism of protein folding is often referred to as the second half of genetics. Computational approaches have been instrumental in the efforts. Simplified models have been applied to understand the physical principles governing the folding processes and will continue to play important roles in the endeavor. Encouraging results have been obtained from all-atom molecular dynamics simulations of protein folding. A recent microsecond-length molecular dynamics simulation on a small protein, villin headpiece subdomain, with an explicit atomic-level representation of both protein and solvent, has marked the beginning of direct and realistic simulations of the folding processes. With growing computer power and increasingly accurate representations together with the advancement of experimental methods, such approaches will help us to achieve a detailed understanding of protein folding mechanisms.

Proteins support life by carrying out important biological functions, which are determined primarily by their structures. Subjected to evolutionary pressure, only those proteins that are helpful to the survival of living beings have been retained. Though their folding time may not be a subject of active refinement of evolution, proteins are required to be able to adapt well-defined structures soon after being synthesized and transported to their designated locations within cells to perform their functions. Such a requirement sets the upper limit for their folding time and is one of the important aspects of proteins that sets them apart from other polymers, including other nonprotein polypeptides.1,2 The astronomically large number of possible conformations suggests that proteins use some sort of “directed” mechanisms to fold. An elucidation of protein folding mechanisms must address how proteins fold into their well-defined three-dimensional structures within a limited time. We review briefly the history of computational protein folding studies, discuss the recent developments in more detail, and present a perspective of the future.

Under the right physiological conditions, proteins can fold into and subsequently maintain well-defined structures, determined by sequences,3 through delicate balances4 of enthalpy and entropy,5,6 weak interactions, including van der Waals, electrostatic, and hydrogen-bonding forces, and a balance between protein intramolecular interactions and the interactions with solvent that also play major roles in protein folding.7 A major motivation for the mechanistic studies has been the need to understand the roles of these interactions in determining protein structures, since such an understanding can help to improve the accuracy of protein structure prediction. Because of the close association between protein structures and their functions, understanding how protein sequences determine their structures has often been referred to as the second half of genetics. With the explosive growth of genomic sequence data, the need for reliable structural prediction methods that can complement the existing experimental approaches such as X-ray crystallography and NMR (nuclear magnetic resonance) spectroscopy is compelling. In this regard, an appealing aspect of the physically based modeling is its generality. These models use the physical interaction energies as the primary criteria to analyze protein structures. The same set of physical principles that drives protein folding also dictates substrate and ligand binding as well as the induced conformational changes that are often associated with protein functions and are important for a detailed understanding of biochemical processes. Understanding protein folding would inevitably aid in the understanding of these processes. The relatively recent discovery of folding-related diseases8-18 reinforces such a need. Despite great progress made using a variety of approaches, it is still difficult to establish detailed descriptions of the protein folding processes and such descriptions are the necessary steps toward the comprehensive understanding of the mechanisms of folding.

Lattice models

Computational studies of protein folding have come of age. Among the early successes was an Ising model simulation on the unfolding and hydrogen exchange of proteins19 in which a two-state transition was observed, which was not surprising given the three-dimensional nature of the model. Ptitsyn and Rashin20 studied the folding of myoglobin without using a computer by representing the protein at the secondary structure level and treating each alpha helix as a uniform rigid body cylinder. Using the highly simplified representation, they concluded that the folding was a nucleation process, similar to that of crystal growth. A similar representation21 has been applied recently in combination with a Brownian dynamics approach in the study of the folding of a four-helix bundle.

A more detailed representation also appeared22-24 in the late 1970s. Using a combination of Langevin dynamics and energy minimization, Levitt and Warshel studied folding of BPTI (bovine pancreatic trypsan inhibitor)22 and Carp Myogen.24 In these studies, the authors represented each amino acid by two particles. They observed highly complex folding processes in which secondary structures were seen both forming and breaking, and challenged the notion that folding was preceded by forming stable secondary structures first. This pioneering work marked the beginning of physically based models in the studies of protein folding, albeit at a somewhat crude level. The level of approximation, both in the representation and the parameter, naturally implied a certain level of uncertainty and sometimes even significant error, as pointed out by Hagler and Honig.25

Levitt and Warshel also noted that hydrogen bonds seem to slow down the folding process,22 a finding that has yet to be clarified by further studies. Nevertheless, we should recognize the pioneering nature of the work, which helped to topple the then-popular view that stable secondary structure always forms first in the folding process. The fact that most current structure prediction methods use a similar representation to that of Levitt and Warshel is a strong testament to the power of such an approach. Fifteen years later, using a residue-level lattice model, Skolnick and Kolinski26 have successfully simulated the folding of some small proteins. Interestingly, the parameters were obtained by analyzing protein structures deposited in the Protein Data Bank (PDB),27 similar to the approaches of Miyazawa and Jernigan.28

The advantages of lattice models are clear. The highly simplified models allow efficient sampling of conformational space. This was particularly important at the time when the speed of the most powerful computer was many orders of magnitude slower than a current personal computer. When designed properly, the model can give a well-defined global energy minimum that can be calculated analytically. In fact, one can enumerate all energy states and calculate the corresponding free energies in such models. One can also control other features of the energetic surface. When carefully parameterized, lattice models can be applied to structure prediction and can give encouraging results.26 The lattice model also allows Monte Carlo simulations that give ensemble averages. This was a critical advantage as well, because at the time all experiments were conducted macroscopically and could only give ensemble-averaged results. Single molecule studies were much later developments.29,30 This type of model has enjoyed widespread application and has contributed a great deal to our understanding of protein folding mechanisms.

There are two types of lattice model simulations, aimed at two distinct objectives. One, pioneered by Go and coworkers,31 was designed to understand the basic physics governing the protein folding process. A key feature of this type of lattice model is its simplicity (the size can range from 32 to 53 lattice points). A good example of such an approach has been shown by Wolynes and coworkers who, through lattice model simulations, postulated that proteins have a funnel-like energy landscape with a minimally frustrated character that “guides” proteins toward their native states.32,33 The postulate deviated markedly from the old pathway doctrine and elevated our understanding at the conceptual level.34 Another useful example was done by Dill and coworkers,7 who emphasized the importance of hydrophobic interactions. Other examples include studies by Li et al.1,2 and by Shakhnovich and coworkers.35-38 Some of the work has been reviewed previously.39 Recently, this type of approach has been extended to residue-level off-lattice models.40-43 Similar to the approaches of Muñoz et al.,40,41 Zhou and Karplus42,43 assured the foldability of the model by systematically biasing the energetic surface toward the native state of that particular protein under study in a process consistent with the diffusion-collision model.44,45 Because this type of model has not been designed for real proteins, tests on these models have been limited to the studies of general features of protein folding. Nevertheless, a good deal can be learned from these studies. For example, Dill and coworkers7,46 have argued that a small set of amino acids (hydrophobic and hydrophilic) can be combined to produce foldable protein-like peptides, a prediction that has been confirmed recently by experiments.47

Lattice models by Skolnick and coworkers26 and by Miyazawa and Jernigan28,48 belong to the second category. These models are geared toward realistic folding of real proteins and are therefore parameterized using real proteins as templates by statistical sampling of the available structures28,48,49 and are often referred to as statistical potentials (or knowledge-based potentials). Works by Crippen,50 by Eisenberg and coworkers,51,52 and by Sippl and coworkers53 are also good examples in this category that have been reviewed before.54-56 Along the same line was the approach by Scheraga and coworkers, who developed a residue-based off-lattice model.57-60 Because the residue-level representations are applied to real proteins that have large numbers of energy minima, in contrast to the simplified lattice models described above, their energetic surface can no longer be described exactly, even though exhaustive sampling can be conducted for short sequences (shorter than 100 amino acids).61 More importantly, the pair-wise discrete neighboring “energy” for the interactions between the nearest neighbors allows only a small number of possible conformations. The lattice coordinates also impose restrictions to the representation, though a high-coordination lattice model has been developed as well.62 Given their highly simplified approaches, the successes in predicting protein structure are indeed very encouraging.

Off-lattice models

A constant driving force in the computational study of protein folding has been the need to develop methods that can reliably differentiate native states from the non-native ones. The most widely used approaches in protein structure prediction have been based on residue-level models (either lattice or off-lattice models) with typically statistical “potentials” obtained from the structural database (PDB). A growing trend in the community has been the development of atomic-level statistical potentials63-68 in attempts to improve the accuracy. The application of all-atom representation with physical potentials in structural prediction, on the other hand, has been limited. A typical application would be at the final stage—a minor refinement of the structures using limited energy minimization designed to eliminate the bad contacts. It has been pointed out that the gas phase energy calculated by all-atom molecular mechanics is a poor descriptor of the “quality” of the structures.54,69-72 This is not surprising given the critical role that solvent plays in determining protein structures and in fact is reassuring, because gas-phase energy alone should not be able to discriminate good structures from the bad ones. As expected, the accuracy was dramatically improved with the inclusion of the solvent effect.69,70,73–77

An improved level of accuracy has been obtained through a combination of an all-atom representation of protein and a continuum model of solvent. Sung studied the folding of Alanine-based peptides78,79 and noted interesting features from the simulations, including the role of electrostatic interactions between the successive amides, which favored extended conformations and caused energy barriers to helix folding, intermediate states, and formation of both 310 and alpha helices. Karplus and coworkers74,80-82 adopted a similar approach and applied it to the studies of the folding free energy of chymotrypsin inhibitor 283 and that of G-peptide,84 using unfolding simulations, and tested this approach on a set of proteins73 and on two peptides.81,82 Continuing along this line was the work by Wu and Sung,85 who proposed the use of the mean solvation force to represent solvent, and tested this method on alanine-dipeptide. This type of model tries to strike a balance between the accuracy of the representation and the computational cost. Application of the continuum solvent model can significantly reduce the number of particles included in the calculation and, hence, the computational cost, even after considering the overhead due to the added complexity of the continuum solvent model. It is interesting to note that such development came about two decades after the first residue-level simulation of Warshel and Levitt.22 The new development reflects considerable improvement in the level of sophistication, in addition to the improvement due to the differences between all-atom and residue-level models of protein. Compared to the ad hoc approach of Warshel and Levitt in parameter generation,23 the present parameters were based on quantum mechanical calculations and refined against experiments. The solvent model has also been improved substantially from initially simple solvation-free energy approaches23 to today's solvent model based on macroscopic electrostatics.74 As pointed out by many researchers in the field, a common deficiency of the continuum solvent models is that the simulated events can occur at time scales much smaller than those found in experiments,86 which, in many cases, can be corrected by taking into account the viscosity of the solvent. A more serious problem can arise when solvent plays a structural role. This becomes an important issue in protein folding, since proteins can have substantial solvent molecules in the interior in some important states, such as molten globule states. Furthermore, studies have suggested that solvent plays a role as the lubricant prior to reaching the native state87,88 and ejection of a solvent molecule from the interior may contribute a nontrivial portion to the free-energy barriers.89

All-atom models

At an even higher level of sophistication is the all-atom representation of both solvent and protein. A hallmark of models of this type is that their parameters are obtained through high-level quantum mechanical calculations on short peptide fragments. Such an approach has several advantages. It assures the generality and allows further refinement upon the availability of more accurate quantum mechanical methods and upon the need for such an improvement. Such models also allow further extension. For instance, active efforts have been undertaken to parameterize polarization energy that can be integrated seamlessly into present simulation methods.90-92 Some of the earlier developments have been reviewed.93 Because the detailed models require both a large number of particles, typically more than 10000, and a small time step of one to two femtoseconds (10–15 seconds), direct simulation of the folding processes, which take place on a microsecond or larger time scale, has been difficult. Therefore, such models have been applied to study the unfolding processes of small proteins that can be accelerated substantially by raising the simulation temperature,88,94-100 by changing solvent condition,88,101,102 by applying external forces,103-105 and by applying pressure.102 The detailed representation has allowed direct comparisons with experiments and encouraging results have been obtained.98,106 Limited refolding simulations were also attempted starting from partially unfolded structures generated from the unfolding simulations and considerable fluctuations were observed.97,100 These short-time refolding simulations have also identified the transition states in the vicinity of the native state.100,106 Care must be taken, though, because the short-time refolding simulations can only sample the conformational space in the vicinity of the unfolding trajectories. Equilibration of water in this type of short-time refolding simulation is needed to avoid simulating a trivial collapse process of water equilibration when the system is brought to room temperature and to restore faithfully the room-temperature solvent condition that has been distorted significantly due to the entropy-enthalpy imbalance107 at high temperature. Such an imbalance inherent in the typical unfolding simulations may also be reduced by conducting the unfolding simulations at moderate unfolding temperatures108 such that both temperature and pressure can be maintained at experimentally relevant conditions.

A powerful extension of unfolding simulations is the attempt to reconstruct the free-energy landscape. Using the weighted histogram method,109 Brooks and coworkers calculated the free-energy landscapes of folding a three-helix bundle,110 the segment B1 of streptococcal protein G,87 and the Betanova89 from restrained unfolding simulations. They demonstrated funnel-shaped free-energy landscapes, the existence of multiple folding pathways, and showed that the shapes of the funnels are also dependent on the type of proteins (i.e., alpha helical or alpha/ß). They also observed that ejection of water from the interior of the intermediate state contributes to the free energy barrier of folding,89 suggesting the role that water may play in the folding process in addition to its role as solvent. Such an observation is only possible with the explicit inclusion of solvent in the simulation. It is noteworthy that the application of the restraint functions is an integral part of the methodology, because it ensures a sufficient number of transitions between neighboring states and hence ensures the reversibility that is absent in the unrestrained unfolding simulations. Nevertheless, the weighted histogram method has also been applied in the analysis of unrestrained unfolding trajectories.83 Free energy profiles (or probability profiles) have also been generated directly from the unfolding trajectories,100 but it is unclear at what temperature the profiles were generated.

A central question concerning the elucidation of the protein folding mechanism is: how do proteins reach their native state? Therefore, direct simulation of protein folding using an all-atom model has been termed the “holy grail.”111 Encouraging developments have been made in the simulations of the folding processes of small peptide fragments with explicit representations of both solvent and peptides.112-116 Tobias and Brooks studied the formation of a ß-turn in aqueous solution.112 Case and coworkers studied the transitions between two conformations of a ß-turn motif and most of the simulated distances agree with NMR data.113 Daura et al.114 studied the folding and unfolding processes of a short ß-peptide in methanol at temperatures below, around, and above Tm of the peptide. They observed reversible formation of secondary structure in a simple two-state manner within 50 nanoseconds. The estimated folding free energies, based on the population of the states, are in qualitative agreement with experimental observations. Chipot et al. studied the formation of undecamer peptides at the water-hexane interface.115,116 Wang and coworkers recently developed a method called Self-Guided Molecular Dynamics (SGMD)117,118 and applied it to study the folding of a 16-residue peptide.119 These studies provided detailed atomic-level descriptions of the formation of isolated secondary structure motifs in their respective solvent environments.

Direct simulations of protein folding with all-atom models

An exciting development was the application of such models in the direct simulations of the early stages of the folding process of small proteins, including a 36-residue villin headpiece subdomain (HP-36)120,121 and a zinc-finger-like protein BBA1,122 on the microsecond time scale.123,124 Such “top-down” simulations were perceived as impossible for the study of protein folding “in the foreseeable future.”39 Even though both the villin headpiece subdomain and BBA1 are small, they both share the features common to other proteins. The villin headpiece subdomain is a helical protein with well-defined tertiary and secondary structures.119 Its three helices form a unique type of fold with Helix 1 aligned perpendicular to the plane formed by Helices 2 and 3. The three helices are held together by a tightly packed hydrophobic core. Its melting temperature Tm is about 70 degrees Centigrade122 and the estimated folding time is about 10 microseconds,125,126 making it one of the fastest folding small stable proteins. BBA1 was designed by Imperiali and coworkers122 using the zinc finger as the template. It has a three-turn alpha helix packed against a short ß sheet. Unfolding simulations suggested that BBA1 might fold by forming its secondary structures first.108

The simulation on HP-36 indicated that its initial collapse phase was accompanied by partial formation of the native helices and reduction of hydrophobic surface in a simple downhill process.123 The observed time scale of helix formation, 60 nanoseconds, agrees qualitatively with experimental observations on other proteins.86,127,128 The importance of the initial collapse phase has been understated somewhat in the past and has often been termed as the dead time “burst phase,” perhaps due to the fact that experimental studies of these ultrafast processes have been difficult until recently.86,127-129 Because early-stage species can often form in the burst phase and they may lead to the subsequent formation of other species and intermediates, the characteristics of these early stage species may affect the folding kinetics, whether or not they themselves are productive intermediates. The observed concomitant formations of both helical domains and hydrophobic clusters in the simulation suggest a way to lower the entropy cost in the subsequent folding processes by reducing the protein internal entropy in the early stages. The reduced protein entropy can be partly compensated by the entropy gained due to releasing water from the surface, as shown in the simulation by the strong correlation between the radius of gyration (Rgamma) and the solvation free energy (SFE) and by the large decrease of the SFE during the collapse process.130 These early stage nascent domains may (1) dissipate, (2) aggregate, or (3) grow. Some of these domains may well be the “nuclei” for the later stage intermediate structures. Formation of native-like domains in the early stage helps the formation of the later-stage intermediate structures and perhaps the formation of the native structure. Conversely, the non-native domains will eventually dissipate and those that are retained in the later stage intermediate species will be difficult to dissipate and contribute to the free energy barriers. Correlation calculations indicated that the collapse was driven both by burial of hydrophobic surface and a lowering of the internal energy of the protein.130

Considerable fluctuations between compact and extended states were also observed in the simulation, suggesting a shallow free-energy landscape in the vicinity of extended conformations. The residence times of the compact conformations are much longer than those of the extended conformations, suggesting the compact conformations are energetically more favorable than the extended conformations. These indicated that the free-energy landscape is rugged and the existence of intermediate states is likely.

Proteins can fold from fully unfolded states to the native state by going through many intermediate states of varying degrees of stability. In fact, since these intermediate states can be referred to as the “landmarks” of the protein folding free-energy landscape, studying the mechanism of protein folding, in a sense, is to study these intermediate states, the relationship between them, and the relationship between them and the native state. One microsecond, even though it was more than two orders of magnitude longer than the longest simulation conducted up to 1998 on proteins in water, is still an order of magnitude shorter than the shortest estimated folding time for a protein; thus, it is unrealistic to expect the protein to reach the native state during such a simulation. However, this time scale appears to be sufficient to observe some marginally stable intermediates, if they exist. This was indeed the case in the simulation. A marginally stable intermediate was observed in the simulation and lasted for about 150 nanoseconds. As shown in Figure 1, the main-chain structure of the intermediate was remarkably similar to the native structure, including partial formation of Helices 2 and 3 and a closely packed hydrophobic cluster. The solvation free energy, calculated using the method and parameters developed by Eisenberg and McLachlan,131 reached a level comparable to that of the native structure. Further analyses using the MM-PB/SA132,133 method indicated that the free energy of the intermediate state is the lowest among all the states sampled during the simulation134 and both were significantly higher in free energy than the simulation starting from the native structure. This suggests that the reason the simulation did not reach the native state was because of kinetics and not force-field artifacts. Both the solvation free-energy and MM-PB/SA (molecular mechanics-Poisson Boltzmann/surface area) free-energy calculations were independent from the simulation, yet both of their results were consistent with the simulations. Both calculations indicated that the intermediate state was the most favorable one sampled during the simulation, consistent with the long residence time of the state.

Figure 1Figure 1

It is generally perceived, perhaps even among most specialists in the field, that molecular dynamics simulation is deterministic in a way that is different from stochastic algorithms, such as Monte Carlo, in which the built-in randomness ensures that the simulation will asymptotically approach the ergodic limit. Because of that, it has been argued, molecular dynamics simulation methods are nonergodic. This holds true, however, only to a limited extent. Recent studies by two groups135,136 demonstrated that molecular dynamics simulations on complex systems are inherently chaotic. Zhou and Wang135 demonstrated that near-identical simulations differing by a root-mean-square deviation (RMSD) of 0.02 Å (angstrom) resulted in two very different trajectories within 1 nanosecond and the resulting structures can differ by an RMSD of as much as 5.0 Å. Yet the RMSD can be reduced substantially to within 2 Å after rigid-body alignment, clearly indicating that the trajectories sampled the same conformational free-energy basin.137 Moult and coworkers further demonstrated that despite short-time (<1 picosecond) chaos, which is due to physical interaction (e.g., van der Waals forces), not the algorithmic instability, all trajectories remained close to each other with an RMSD of less than 2 Å,136 similar to the typical thermal fluctuation.

Because the source of chaotic behavior is the nonlinearity of the physical interactions, it is not surprising that earlier tests on simple harmonic systems failed to reveal such an important aspect. This has important implications to our simulations of protein folding. Because of the presence of chaos in the system, which is a source of randomness, the simulations have a stochastic character in the long time scale, hence can asymptotically approach the ergodicity limit, as well as have deterministic behavior in the short time scale. Simulations of similar conditions are expected to sample the areas that are close to each other in the phase space and produce similar trajectories differing in detail. Due to the randomness, which is a source of uncertainty, one should focus on the qualitative behavior in the analyses of individual trajectories, such as the time scale of the events and general trend.123 The randomness also contributes to the level of fluctuations exhibited in the folding simulations.123 When one wants to focus on the detail, such as the role of individual hydrogen bonds and contacts, multiple simulations are needed for statistically meaningful results.

We therefore conducted two additional simulations on HP-36, each to 0.5 microseconds starting from different states.138 In both simulations, HP-36 reached compact states within about 50 nanoseconds with the radius of gyrations comparable to that of the native state. More importantly, substantial secondary structures were formed during the initial collapse phase. Both the time scale and the concomitant occurrence of the hydrophobic collapse and the secondary structures were consistent with our earlier simulation123,130 and with experiments.86,127,128 One of these two trajectories also reached a state with structure resembling that of the intermediate state found earlier, including similarities of both topology and formation of a tightly packed hydrophobic cluster, as shown in Figure 2.

Figure 2Figure 2

Perspective

Understanding the mechanisms of protein folding has been an evolving field. Increasingly detailed studies, both at structural level and time scale, have highlighted recent developments. For instance, earlier studies would try to address questions like the cooperativity, high rate of folding, ability to converge on the native state from so many starting conformations in the unfolded state, and secondary structures. Many current studies try to address questions on transition states, the relationship between protein structures and the folding processes, and characterization of intermediate and molten globule states. Many of these developments have been catalyzed by the advancements of experimental methods, which have been summarized recently.139 Mutageneses have allowed extensive and detailed characterization of the transition states for a few small proteins.140-142 Hydrogen exchange experiments have been applied to study the dynamics of proteins.143,144 An atomic force microscope (AFM) can probe the stability of proteins145 that can be compared directly with simulations.146 Solution structure techniques can be used to study some of the intermediate states, including molten globule,147,148 partially folded,149 and unfolded150,151 states. Ultrafast kinetic experiments can provide increasingly detailed information on the early-stage folding processes.40,86,127,128 Single-molecule techniques have made it possible to monitor the dynamics and folding and unfolding processes of individual molecules.30

Encouraging progress has been made recently and we are now in an era of active application of molecular dynamics simulations to study the folding process. Because of the vital importance of water in protein folding and in the cell, the explicit representation of both solvent and proteins is expected to play an increasingly important role in our understanding of protein folding mechanisms. With the promises of even more accurate models and considerably faster computer speed, together with the advancement of experimental approaches, the goal of understanding the folding of small proteins should be achievable in the future.

Besides the understanding of protein folding mechanisms, a major goal is the prediction of structures from sequences. It has become increasingly clear that structural prediction methods have made considerable progress,152-154 along with the development of protein design,122,155-157 despite our lack of comprehensive understanding of the folding mechanisms. The quality of knowledge-based structure prediction methods depends on the available experimental structures. They can reach reasonable accuracy for proteins with sequence homologies to proteins with known structures at above 50 percent. But the quality decreases substantially for lower sequence homologies.154 Other related issues include the need to predict the binding affinities of small molecules to proteins when the flexibility of either ligands or proteins plays a role and the need to understand conformational changes using simulation methods. These reinforce the need for further understanding of protein folding mechanisms.

The subject of protein folding is perhaps one of the most challenging areas in biophysical chemistry. Given the complexity of protein structures, diverse folding processes should be expected, including the possible role of certain folding-assisting domains within large proteins.158 Furthermore, the complexity of in vivo folding processes is, as yet, another challenge.159-163 Therefore, full understanding of protein folding mechanisms is indeed a daunting task. Despite this, recent developments have made us believe that an eventual solution may lie ahead. From a theoretical perspective, an immediate objective is to accurately replicate the complete folding process of small fast-folding proteins on computers, including atomic details. Such simulation results would provide the data for developing abstract models at a conceptual level that describe general and unambiguous features of protein folding mechanisms. The success of such simulations would itself be a strong testament to the accuracy of the method and parameters. The diversity of protein structures and the complexity of the in vivo folding process can, in principle, be dealt with by a combination of experiments and further simulations. We may also be able to answer questions such as whether or not a particular part of a protein is designed to assist the folding of the rest of the protein, as found in “intramolecular chaperons,” and how this assistance occurs. The mechanism of chaperon-assisted folding processes can also be better understood. But without the basic understanding of the folding process of single-domain, small, fast-folding proteins, understanding of more complex folding processes will be more difficult to achieve.

Acknowledgment

We are in debt to Ken Dill, Michael Levitt, and Tack Kuntz for helpful discussions. Computer time was provided by the Pittsburgh Supercomputer Center and by Silicon Graphics, Inc. This work has been supported by the National Institutes of Health (Grant GM-29072) and a University of California Biotechnology Star grant from Amgen Inc. (to Dr. Kollman).

Cited references

Accepted for publication January 2, 2001.