6.4 How Do Polypeptides Fold into Three-Dimensional Protein Structures? 153 core of the protein. The extensively H-bonded nature of ␣-helices and -sheets is ideal for this purpose, and these structures effectively stabilize the polar groups of the peptide backbone in the protein core. The framework of sheets and helices in the interior of a globular protein is typi- cally constant and conserved in both sequence and structure. The surface of a glob- ular protein is different in several ways. Typically, much of the protein surface is com- posed of the loops and tight turns that connect the helices and sheets of the protein core, although helices and sheets may also be found on the surface. The result is that the surface of a globular protein is often a complex landscape of different structural elements. These complex surface structures can interact in certain cases with small molecules or even large proteins that have complementary structure or charge (Fig- ure 6.20). These regions of complementary, recognizable structure are formed typi- cally from the peptide segments that connect elements of secondary structure. They are the basis for enzyme–substrate interactions, protein–protein associations in cell signaling pathways, and antigen–antibody interactions, and more. The segments of the protein that are neither helix, sheet, nor turn have tradi- tionally been referred to as coil or random coil. Both of these terms are misleading. Most of these “loop” segments are neither coiled nor random, in any sense of the words. These structures are every bit as organized and stable as the defined sec- ondary structures. They just don’t conform to any frequently recurring pattern. These so-called coil structures are strongly influenced by side-chain interactions with the rest of the protein. Waters on the Protein Surface Stabilize the Structure A globular protein’s surface structure also includes water molecules. Many of the polar backbone and side chain groups on the surface of a globular protein make H bonds with solvent water molecules. There are often several such water molecules per amino acid residue, and some are in fixed positions (Figure 6.21). Relatively few water molecules are found inside the protein. In some globular proteins (Figure 6.22), it is common for one face of an ␣-helix to be exposed to the water solvent, with the other face toward the hydro- phobic interior of the protein. The outward face of such an amphiphilic helix con- sists mainly of polar and charged residues, whereas the inward face contains mostly nonpolar, hydrophobic residues. A g ood example of such a surface helix is that of residues 153 to 166 of flavodoxin from Anabaena (Figure 6.22a). Note that the helical wheel presentation of this helix readily shows that one face contains four hydrophobic residues and that the other is almost entirely polar and charged. Less commonly, an ␣-helix can be completely buried in the protein interior or completely exposed to solvent. Citrate synthase is a dimeric protein in which ␣-helical segments form part of the subunit–subunit interface. As shown in Figure 6.22b, one of these helices (residues 260 to 270) is highly hydrophobic and contains only two polar residues, as would befit a helix in the protein core. On the other hand, Figure 6.22c shows the solvent-exposed helix (residues 74 to 87) of calmodulin, which consists of 10 charged residues, 2 polar residues, and only 2 nonpolar residues. Packing Considerations The secondary and tertiary structures of ribonuclease A (Figure 6.19) and other glob- ular proteins illustrate the importance of packing in tertiary structures. Secondary structures pack closely to one another and also intercalate with (insert between) ex- tended polypeptide chains. If the sum of the van der Waals volumes of a protein’s con- stituent amino acids is divided by the volume occupied by the protein, packing densi- ties of 0.72 to 0.77 are typically obtained. These packing densities are similar to those of a collection of solid spheres. This means that even with close packing, approximately 25% of the total volume of a protein is not occupied by protein atoms. Nearly all of this space is in the form of very small cavities. Cavities the size of water molecules or larger do occasionally occur, but they make up only a small fraction of the total protein vol- FIGURE 6.20 The surfaces of proteins are complemen- tary to the molecules they bind. PEP carboxykinase (shown here, pdb id ϭ 1K3D) is an enzyme from the metabolic pathway that synthesizes glucose (gluconeo- genesis; see Chapter 22). In the so-called “active site” (yel- low) of this enzyme, catalysis depends on complemen- tary binding of substrates. Shown in this image are ADP (brown), a Mg 2ϩ ion (blue), and AlF 3 Ϫ (a phosphate ana- log, in green, above the Mg 2ϩ ). FIGURE 6.21 The surfaces of proteins are ideally suited to form multiple H bonds with water molecules. Shown here are waters (blue and white) associated with actini- din, an enzyme from kiwi fruit that cleaves polypeptide chains at arginine residues (pdb id ϭ 2ACT).The polar backbone atoms and side chain groups on the surface of actinidin are extensively H-bonded with water. Go to CengageNOW at www .cengage.com/login and click BiochemistryInteractive to examine the secondary and tertiary structure of ribonuclease. 154 Chapter 6 Proteins: Secondary, Tertiary, and Quaternary Structure 1 3 4 5 6 7 8 9 10 11 12 Asp 153 13 14 Lys Lys Ala Asp Ser Ser Glu Glu Arg Trp Leu Ile Val (a) ␣-Helix from flavodoxin (residues 153–166) (b) (c) 2 1 3 4 5 6 7 8 9 10 11 Leu 260 Gly Ala Ala Ser Leu Phe Met Ala Ala Asn ␣-Helix from citrate synthase (residues 260–270) 2 1 3 4 5 6 7 8 9 10 11 12 Arg 74 13 14 Asp Ile Glu Lys Arg Thr Glu Glu Met Asp Glu Lys Ser ␣-Helix from calmodulin (residues 74–87) 2 ACTIVE FIGURE 6.22 The so-called helical wheel presentation can reveal the polar or nonpo- lar character of ␣-helices. If the helix is viewed end on, and the residues are numbered with residue 1 closest to the viewer, it is easy to see how polar and nonpolar residues are distributed to form a wheel. (a) The ␣-helix consisting of residues 153–166 (red) in flavodoxin from Anabaena is a surface helix and is amphipathic (pdb id ϭ 1RCF). (b) The two helices (orange and red) in the interior of the citrate synthase dimer (residues 260–270 in each monomer) are mostly hydrophobic (pdb id ϭ 5CSC). (c) The exposed helix (residues 74–87—red) of calmodulin is entirely accessible to solvent and consists mainly of polar and charged residues (pdb id ϭ 1CLL). Test yourself on the concepts in this figure at www.cengage.com/login 6.4 How Do Polypeptides Fold into Three-Dimensional Protein Structures? 155 HUMAN BIOCHEMISTRY Collagen-Related Diseases Collagen provides an ideal case study of the molecular basis of physiology and disease. For example, the nature and extent of col- lagen crosslinking depends on the age and function of the tissue. Collagen from young animals is predominantly un-crosslinked and can be extracted in soluble form, whereas collagen from older an- imals is highly crosslinked and thus insoluble. The loss of flexibility of joints with aging is probably due in part to increased crosslink- ing of collagen. Several serious and debilitating diseases involving collagen ab- normalities are known. Lathyrism occurs in animals due to the regular consumption of seeds of Lathyrus odoratus, the sweet pea, and involves weakening and abnormalities in blood vessels, joints, and bones. These conditions are caused by -aminopropionitrile (see figure), which covalently inactivates lysyl oxidase, preventing intramolecular crosslinking of collagen and causing abnormalities in joints, bones, and blood vessels. Scurvy results from a dietary vitamin C deficiency and in- volves the inability to form collagen fibrils properly. This is the result of reduced activity of prolyl hydroxylase, which is vitamin C–dependent, as previously noted. Scurvy leads to lesions in the skin and blood vessels, and in its advanced stages, it can lead to grotesque disfiguration and eventual death. Although rare in the modern world, it was a disease well known to sea-faring explorers in earlier times who did not appreciate the importance of fresh fruits and vegetables in the diet. A number of rare genetic diseases involve collagen abnormali- ties, including Marfan’s syndrome and the Ehlers–Danlos syndromes, which result in hyperextensible joints and skin. The formation of atherosclerotic plaques, which cause arterial blockag es in advanced stages, is due in part to the abnormal formation of collagenous structures in blood vessels. N C CH 2 CH 2 -Aminopropionitrile NH 3 + FIGURE 6.23 Ton-EBP is a DNA-binding protein consist- ing of two distinct domains. The N-terminal domain is shown here on the right, with DNA (orange) in the middle, and the C-terminal domain on the left (pdb id ϭ 1IMH). ume. It is likely that such cavities provide flexibility for proteins and facilitate confor- mation changes and a wide range of protein dynamics (discussed later). Protein Domains Are Nature’s Modular Strategy for Protein Design Proteins range in molecular weight from a thousand to more than a million. It is tempting to think that the size of unique globular, folded structures would increase with molecular weight, but this is not what has been observed. Proteins composed of about 250 amino acids or less often have a simple, compact globular shape. How- ever, larger globular proteins are usually made up of two or more recognizable and distinct structures, termed domains or modules—compact, folded protein struc- tures that are usually stable by themselves in aqueous solution. Figure 6.23 shows a two-domain DNA-binding protein, TonEBP, in which the two distinct domains are joined by a short segment of the peptide chain. Most domains consist of a single continuous portion of the protein sequence, but in some proteins the domain se- quence is interrupted by a sequence belonging to some other part of the protein 156 Chapter 6 Proteins: Secondary, Tertiary, and Quaternary Structure that may even form a separate domain (Figure 6.24). In either case, typical domain structures consist of hydrophobic cores with hydrophilic surfaces (as was the case for ribonuclease, Figure 6.19). Importantly, individual domains often possess unique functional behaviors (for example, the ability to bind a particular ligand with high affinity and specificity), and an individual domain from a larger protein often expresses its unique function within the larger protein in which it is found. Multidomain proteins typically possess the sum total of functional properties and behaviors of their constituent domains. It is likely that proteins consisting of multiple domains (and thus multiple func- tions) evolved by the fusion of genes that once coded for separate proteins. This would require gene duplication to be common in nature, and analysis of completed genomes has confirmed that approximately 90% of domains in eukaryotes have been duplicated. Thus, the protein domain is a fundamental unit in evolution. Many proteins have been “assembled” by duplicating domains and then combining them in different ways. Many proteins are assemblies constructed from several in- dividual domains, and some proteins contain multiple copies of the same domain. Figure 6.25 shows the tertiary structures of nine domains that are frequently dupli- cated, and Figure 6.26 presents several proteins that contain multiple copies of one or more of these domains. FIGURE 6.24 Malonyl CoA:ACP transacylase (pdb id ϭ 1NM2) is a metabolic enzyme consisting of two subdo- mains.The large subdomain (blue) includes residues 1–132 and 198–316 and consists of a -sheet sur- rounded by 12 ␣-helices.The small subdomain (gold ϭ residues 133–197) consists of a four-stranded antiparal- lel -sheet and two ␣-helices. (a) (f) (e)(b) 1 nm (g) (h) (i) (c) (d) FIGURE 6.25 Ribbon structures of several protein modules used in the construction of complex multimodule proteins. (a) The complement control protein module (pdb id ϭ 1HCC). (b) The immunoglobulin module (pdb id ϭ 1T89). (c) The fibronectin type I module (pdb id ϭ 1Q06). (d) The growth factor module (pdb id ϭ 1FSB). (e) The kringle module (pdb id ϭ 1HPK). (f) The GYF module (pdb id ϭ 1GYF). (g) The ␥-carboxygluta- mate module (pdb id ϭ 1CFI). (h) The FF module (pdb id ϭ 1UZC). (i) The DED domain (pdb id ϭ 1A1W). 6.4 How Do Polypeptides Fold into Three-Dimensional Protein Structures? 157 Classification Schemes for the Protein Universe Are Based on Domains The astounding diversity of properties and behaviors in living things can now be explored through the analysis of vast amounts of genomic information. Assessment of sequence and structural data from several million proteins in both protein and genome databases has shown that there is a relatively limited number of structurally distinct domains in proteins. Several comprehensive projects have organized the available information in defined hierarchies or levels of protein structure. The Structural Classification of Proteins database (SCOP, http://scop.mrc-lmb .cam.ac.uk/scop) recognizes five overarching classes, which encompass most pro- teins. SCOP is based on hierarchical levels that embody the evolutionary and struc- tural relationships among known proteins, and protein classification in SCOP is es- sentially a manual process using visual inspection and comparison of structures. CATH is another hierarchical classification system (http://www.cathdb.info) that groups protein domain structures into evolutionary families and structural group- ings, depending on sequence and structure similarities. CATH differs from SCOP Factors VII, IX, X and protein C γCG G G GGKF 1 F 2 Factor XII KKGF 1 tPA GCC Clr,Cls CCC C2, factor B F 1 F 1 F 1 F 1 F 1 F 1 F 2 F 2 F 1 F 1 F 1 F 3 F 3 F 3 F 3 F 3 F 3 F 3 F 3 F 3 F 3 F 3 F 3 F 3 F 3 F 3 F 1 F 1 F 1 Fibronectin F 3 F 3 F 3 F 3 F 3 F 3 F 3 F 3 F 3 F 3 F 3 F 3 F 3 IIIII I I I II II IIIII [] 10 Twitchin CCCCCCLB F 3 I F 3 III NNNN CC IIII ELAM-1 N-CAM NGF receptor IL-2 receptor PDGF receptor Plasma membrane G FIGURE 6.26 A sampling of proteins that consist of mo- saics of individual protein modules.The modules shown include ␥CG, a module containing ␥-carboxyglutamate residues; G, an epidermal growth factor–like module; K, the “kringle” domain, named for a Danish pastry; C, which is found in complement proteins; F1, F2, and F3, first found in fibronectin; I, the immunoglobulin superfamily domain;N, found in some growth factor receptors; E, a module homologous to the calcium- binding E–F hand domain; and LB, a lectin module found in some cell surface proteins. (Adapted from Baron, M., Norman,D., and Campbell, I., 1991. Protein modules. Trends in Biochemical Sciences 16:13–17.) 158 Chapter 6 Proteins: Secondary, Tertiary, and Quaternary Structure in that it combines manual analysis with automation based on quantitative algo- rithms to classify protein structures. Figure 6.27 compares the hierarchical struc- tures of SCOP and CATH and defines the different levels of structure. Although the hierarchical names in SCOP and CATH differ somewhat, there are common threads shared in these schemes. Class is determined from the overall com- position of secondary structure elements in a domain. A fold describes the number, arrangement, and connections of these secondary structure elements. A superfamily includes domains of similar folds and usually similar functions, thus suggesting a common evolutionary ancestry. A family usually includes domains with closely re- lated amino acid sequences (in addition to folding similarities). Although the num- bers of unique folds, superfamilies, and families increase as more genomes are known and analyzed, it has become apparent that the number of protein domains in na- ture is large but limited. How many proteins can we expect to identify and understand someday? There are approximately 10 3 to 10 5 genes per organism and approximately 13.6 million species of living organisms on earth (and this latter number is likely an underestimate). Thus, there may be approximately (10 3 ϫ 1.36 ϫ 10 7 ) or 10 10 to 10 12 different proteins in all organisms on earth. Still, this vast number of proteins may well consist of only about 10 5 sequence domain families (Figure 6.27) and approxi- The CATH Hierarchy The SCOP Hierarchy Class (4) Class (7) Architecture (40) Topology (1084) Fold (1086) Superfamily (1777) Family (3464) Homologous Superfamily (2091) Sequence Family (7794) Domain (93885) Domain (97178) FIGURE 6.27 SCOP and CATH are hierarchical classifica- tion systems for the known proteins. Proteins are classi- fied in SCOP by a manual process, whereas CATH com- bines manual and automated procedures. Numbers indicate the population of each category. 6.4 How Do Polypeptides Fold into Three-Dimensional Protein Structures? 159 mately 10 3 protein folds of known structure—a remarkably small number compared to the total number of protein-coding genes (see Table 1.6). It is anticipated that most newly identified proteins will resemble other known proteins and that most structures can be broken into two or more domains, which resemble tertiary struc- tures observed in other proteins. Because structure depends on sequence, and because function depends on struc- ture, it is tempting to imagine that all proteins of a similar structure should share a common function, but this is not always true. For example, the TIM barrel is a com- mon protein fold consisting of eight ␣-helices and eight -strands that alternate along the peptide backbone to form a doughnutlike tertiary structure. The TIM barrel is named for triose phosphate isomerase, an enzyme that interconverts ke- tone and aldehyde substrates in the breakdown of sugars (see Chapter 18). How- ever, other TIM barrel proteins carry out very different functions (Figure 6.28a), including the reduction of aldose sugars and hydrolysis of phosphate esters. More- over, not all proteins of similar function possess similar domains. Both proteins shown in Figure 6.28b catalyze the same reaction, but they bear little structural sim- ilarity to each other. Denaturation Leads to Loss of Protein Structure and Function Whereas the primary structure of proteins arises from covalent bonds, the sec- ondary, tertiary, and quaternary levels of protein structure are maintained by weak, noncovalent forces. The environment of a living cell is exquisitely suited to main- tain these weak forces and to preserve the structures of its many proteins. However, a variety of external stresses—for example, heat or chemical treatment—can disrupt Triose phosphate isomerase Aldose reductase Phosphotriesterase As p artate aminotransferase D-amino acid aminotransferase Same domain type, different functions:(a) Same function, different structures:(b) FIGURE 6.28 (a) Some proteins share similar structural features but carry out quite different functions (triose phosphate isomerase, pdb id ϭ 8TIM; aldose reductase, pdb id ϭ 1ADS; phosphotriesterase, pdb id ϭ 1DPM). (b) Proteins with quite different structures can carry out similar functions (yeast aspartate aminotransferase, pdb id ϭ 1YAA); D-amino acid aminotransferase, pdb id ϭ 3DAA). 160 Chapter 6 Proteins: Secondary, Tertiary, and Quaternary Structure these weak forces in a process termed denaturation—the loss of protein structure and function. An everyday example is the denaturation of the protein ovalbumin during the cooking of an egg (Figure 6.29). About 10% of the mass of an egg white is protein, and 54% of that is ovalbumin. When a chicken egg is cracked open, the “egg white” is a nearly transparent, viscous fluid. Cooking turns this fluid to a solid, white mass. The egg white proteins have unfolded and have precipitated out of solution, and the unfolded proteins have aggregated into a solid mass. As a typical protein solution is heated slowly, the protein remains in its native state until it approaches a characteristic melting temperature, T m . As the solution is heated further, the protein denatures over a narrow range of temperatures around T m (Figure 6.30). Denaturation over a very small temperature range such as this is evidence of a two-state transition between the native and the unfolded states of the protein, and this implies that unfolding is an all-or-none process: When weak forces are disrupted in one part of the protein, the entire structure breaks down. Most proteins can also be denatured below the transition temperature by a vari- ety of chemical agents, including acid or base, organic solvents, detergents, and par- ticular denaturing solutes. Guanidine hydrochloride and urea are examples of the latter (Figure 6.31). Denaturation in all these cases involves disruption of the weak forces that stabilize proteins. Covalent bonds are not affected. Acids and bases cause protonation and deprotonation of dissociable groups on the protein, altering ionic interactions and hydrogen bonds. Organic solvents and detergents disrupt hydro- phobic interactions that bury nonpolar groups in the protein interior. The effects of guanidine hydrochloride and urea are more complex. Recent research indicates Ovalbumin monomer FIGURE 6.29 The proteins of egg white are denatured (as evidenced by their precipitation and aggregation) during cooking. More than half of the protein in egg whites is ovalbumin. Ovalbumin pdb id ϭ 1OVA. Fraction unfolded 01 [GdmCl] (M) 0.8 1.0 0 0.6 0.4 0.2 23456 (b) FIGURE 6.31 Proteins can be denatured (unfolded) by high concentrations of guanidine-HCl or urea.The denaturation of chymotrypsin is plotted here. (Adapted from Fersht, A., 1999. Structure and Mechanism in Protein Science. New York, W. H. Freeman.) 1.00 Fraction of native protein 20 30 Tem p erature (°C) 0.75 0.50 0.25 0.00 40 50 60 70 80 FIGURE 6.30 Proteins can be denatured by heat, with commensurate loss of function. Ribonuclease A (blue) and ribonuclease B (red) lose activity above about 55°C. These two enzymes share identical protein structures, but ribonuclease B possesses a carbohydrate chain attached to Asn 34 . (Adapted from Arnold, U., and Ulbrich- Hofmann, R., 1997. Kinetic and thermodynamic thermal stabili- ties of ribonuclease A and ribonuclease B. Biochemistry 36: 2166-2172.) NH 2 H 2 N C O + NH 2 H 2 N C NH 2 (a) Guanidine HCl Urea Cl – © Vladimir Glazkov/iStockphoto.com 6.4 How Do Polypeptides Fold into Three-Dimensional Protein Structures? 161 that these agents denature proteins by both direct effects (binding to hydrophilic groups on the protein) and indirect effects (altering the structure and dynamics of the water solvent). Also, both guanidine hydrochloride and urea are good H-bond donors and acceptors. Anfinsen’s Classic Experiment Proved That Sequence Determines Structure As noted earlier (Section 6.2), all the information needed to fold a polypeptide into its native structure is contained in the amino acid sequence. This simple but pro- found truth of protein structure was confirmed in the 1950s by the elegant studies of denaturation and renaturation of proteins by Christian Anfinsen and his co- workers at the National Institutes of Health. For their pivotal studies, they chose the small enzyme ribonuclease A from bovine pancreas, a protein with 124 residues and four disulfide bonds (Figures 6.19 and 6.32). (Ribonuclease cleaves chains of 95 95 40 40 26 26 110 110 58 58 84 84 72 72 65 65 26 40 58 65 84 95 110 72 Active Inactive Hypothetical Inactive Form (Note random formation of disulfides) – MCE – Urea + Oxygen Small amount of MCE w/gentle warming – MCE + Oxygen + MCE + Urea FIGURE 6.32 Ribonuclease can be unfolded by treatment with urea, and -mercaptoethanol (MCE) cleaves disulfide bonds. If -mercaptoethanol is then removed (but not urea) under oxidizing conditions, disulfide bonds reform in the still-unfolded protein (one possible hypo- thetical inactive form is shown). If urea is removed in the presence of a small amount of -mercaptoethanol with gentle warming, ribonuclease returns to its native structure (with the correct set of disulfide bonds), and full enzymatic activity is restored.This experiment demon- strated that the information required for folding of globular proteins is contained in the pri- mary structure. 162 Chapter 6 Proteins: Secondary, Tertiary, and Quaternary Structure ribonucleic acid. Only ribonuclease in its native structure posseses enzyme activity, so loss of activity in a denaturation experiment was proof of loss of structure.) They treated solutions of ribonuclease with a combination of urea, which unfolded the protein, and mercaptoethanol, which reduced the disulfide bridges. This treatment destroyed all enzymatic activity of ribonuclease. Anfinsen discovered that removing the mercaptoethanol but not the urea re- stored only 1% of the enzyme activity. This was attributed to the formation of ran- dom disulfide bridges by the still-denatured protein. With eight Cys residues, there are 105 possible ways to make four disulfide bridges; thus, a residual activity of 1% made sense to Anfinsen. (The first Cys to form a disulfide has seven possible part- ners, the next Cys has five possible partners, the next has three, and the last Cys has only one choice. 7 ϫ 5 ϫ 3 ϫ 1 = 105). However, if Anfinsen removed mercap- toethanol and urea at the same time, the polypeptide was able to fold into its native structure, the correct set of four disulfides reformed, and full enzyme activity was recovered (Figure 6.32). This experiment demonstrated that the information needed for protein folding resided entirely within the amino acid sequence of the protein itself. Many subsequent experiments with a variety of proteins have con- firmed this fundamental postulate. For his studies of the relationship of sequence and structure, Anfinsen shared the 1972 Nobel Prize in Chemistry (with William H. Stein and Stanford Moore). Is There a Single Mechanism for Protein Folding? Christian Anfinsen’s experiments demonstrated that proteins can fold reversibly. A corollary result of Anfinsen’s work is that the native structures of at least some glob- ular proteins are thermodynamically stable states. But the matter of how a given pro- tein achieves such a stable state is a complex one. Cyrus Levinthal pointed out in 1968 that so many conformations are possible for a typical protein that the protein does not have sufficient time to reach its most stable conformational state by sam- pling all the possible conformations. This argument, termed Levinthal’s paradox, goes as follows: Consider a protein of 100 amino acids. Assume that there are only two conformational possibilities per amino acid, or 2 100 ϭ 1.27 ϫ 10 30 possibilities. Allow 10 Ϫ13 sec for the protein to test each conformational possibility in search of the over- all energy minimum: (10 Ϫ13 sec)(1.27 ϫ 10 30 ) ϭ 1.27 ϫ 10 17 sec ϭ 4 ϫ 10 9 years Four billion years is the approximate age of the earth. Levinthal’s paradox led protein chemists to hypothesize that proteins must fold by specific “folding pathways,” and many research efforts have been devoted to the search for these pathways. Several consistent themes have emerged from these stud- ies. Each of them may well play a role in the folding process: • Secondary structures—helices, sheets, and turns—probably form first. • Nonpolar residues may aggregate or coalesce in a process termed a hydrophobic collapse. • Subsequent steps probably involve formation of long-range interactions between secondary structures or involving other hydrophobic interactions. • The folding process may involve one or more intermediate states, including tran- sition states and what have become known as molten globules. The folding of most globular proteins may well involve several of these themes. For example, even in the denatured state, many proteins appear to possess small amounts of residual structure due to hydrophobic interactions, with strong inter- residue contacts between side chains that are relatively distant in the native protein structure. Such interactions, together with small amounts of secondary structure, may act as sites of nucleation for the folding process. A bit further in the folding process, the molten globule is postulated to be a flexible but compact form charac- terized by significant amounts of secondary structure, virtually no precise tertiary structure, and a loosely packed hydrophobic core. Moreover, it is likely that any one