6.4 How Do Polypeptides Fold into Three-Dimensional Protein Structures? 163 of these themes is more important for some proteins than for others. The process of folding is clearly complex, but sophisticated simulations have already provided reasonable models of folding (and unfolding) pathways for many proteins (Figure 6.33). One school of thought suggests that for any given protein there may be mul- tiple folding pathways. For these cases, Ken Dill has suggested that the folding process can be pictured as a funnel of free energies—an energy landscape (Figure 6.34). The rim at the top of the funnel represents the many possible un- folded states for a polypeptide chain, each characterized by high free energy and significant conformational entropy. Polypeptides fall down the wall of the funnel as contacts made between residues establish different folding possibilities. The nar- rowing of the funnel reflects the smaller number of available states as the protein approaches its final state, and bumps or pockets on the funnel walls represent par- tially stable intermediates in the folding pathway. The most stable (native) folded state of the protein lies at the bottom of the funnel. What Is the Thermodynamic Driving Force for Folding of Globular Proteins? The free energy change for the folding of a globular protein must be negative if the folded state is more stable than the unfolded state. The free energy change for fold- ing depends, in turn, on changes in enthalpy and entropy for the process: ⌬G ϭ ⌬H Ϫ T⌬S When ⌬H, ϪT⌬S, and ⌬G are measured separately for the polar side chains and for the nonpolar side chains of the protein, an important insight is apparent. The en- thalpy and entropy changes for polar residues largely cancel each other out, and the ⌬G of folding for the polar residues is approximately zero. To understand the behavior of the nonpolar residues, it is helpful to distinguish the ⌬H and ϪT⌬S contributions for the polypeptide chain and for the water sol- vent. Both ⌬H and ϪT⌬S for the nonpolar residues of the peptide chain are posi- tive and thus make unfavorable contributions to the folding free energy. However, large numbers of water molecules restricted and immobilized around nonpolar residues in the unfolded protein are liberated in the folding process. The burying of nonpolar residues in the folded protein’s core produces a dramatic entropy D (2 ns) N (0 ns)I (0.7 ns) TS (0.15 ns)D (4 ns) D (70 ns) N (0 ns)I (30 ns) TS (0.26 ns)D (94 ns) Cl2 Barnase FIGURE 6.33 Computer simulations of folding and unfolding of proteins can reveal possible folding pathways. Molecular dynamics simulations of the unfolding of small proteins such as chymotrypsin inhibitor 2 (CI2) and barnase are presented here on a reversed time scale, to show how folding may occur. D ϭ denatured, I ϭ intermediate,TS ϭ transition state, N ϭ native. (Adapted from Daggett, V., and Fersht, A. R., 2003. Is there a unifying mechanism for protein folding? Trends in Biochemical Sciences 28:18-25. Figures provided by Alan Fersht and Valerie Daggett.) 164 Chapter 6 Proteins: Secondary,Tertiary, and Quaternary Structure increase for these liberated water molecules. This is just enough to make the over- all ⌬G for folding negative (and thus favorable). The crucial results: • The largest contribution to the stability of a folded protein is the entropy change for the water molecules associated with the nonpolar residues. • The overall free energy change for the folding process is not large—typically Ϫ20 to Ϫ40 kJ/mol. Marginal Stability of the Tertiary Structure Makes Proteins Flexible A typical folded protein is only marginally stable. The hundreds of van der Waals in- teractions and hydrogen bonds in a folded structure are compensated and balanced by a dramatic loss of entropy suffered by the polypeptide as it assumes a compact folded structure. Because stability seems important to protein and cellular function, it is tempting to ask what the advantage of marginal stability might be. The answer appears to lie in flexibility and motion. All chemical bonds undergo a variety of mo- tions, including vibrations and (for single bonds) rotations. This propensity to move, together with the marginal stability of protein structures, means that the many noncovalent interactions within a protein can be interrupted, broken, and re- arranged rapidly. FIGURE 6.34 A model for the steps involved in the fold- ing of globular proteins.The funnel represents a free energy surface or energy landscape for the folding process.The protein folding process is highly coopera- tive. Rapid and reversible formation of local secondary structures is followed by a slower phase in which estab- lishment of partially folded intermediates leads to the fi- nal tertiary structure. Substantial exclusion of water oc- curs very early in the folding process. 6.4 How Do Polypeptides Fold into Three-Dimensional Protein Structures? 165 Motion in Globular Proteins Proteins are best viewed as dynamic structures. Most globular proteins oscillate and fluc- tuate continuously about their average or equilibrium structures (Figure 6.35). This flexibility is essential for a variety of protein functions, including ligand binding, en- zyme catalysis, and enzyme regulation, as shown throughout the remainder of this text. The motions of proteins may be motions of individual atoms, groups of atoms, or even whole sections of the protein. Furthermore, they may arise either from ther- mal energy or from specific, triggered conformational changes in the protein. Atomic fluctuations such as vibrations typically are random, are very fast, and usu- ally occur over small distances, as shown in Table 6.2. These motions arise from the kinetic energy within the protein and are a function of temperature. In the tightly packed interior of the typical protein, atomic movements of an angstrom or less are typical. The closer to the surface of the protein, the more movement can occur, and on the surface atomic movements of several angstroms are possible. A class of slower motions, which may extend over larger distances, is collective motions. These are movements of a group of atoms covalently linked in such a way that the group moves as a unit. Such a group can range from a few atoms to hun- dreds of atoms. These motions are of two types: (1) those that occur quickly but in- frequently, such as tyrosine ring flips, and (2) those that occur slowly, such as the hinge-bending movement between protein domains. For example, the two antigen- binding domains of immunoglobulins move as relatively rigid units to selectively bind separate antigen molecules. These collective motions also arise from thermal energies in the protein and operate on a timescale of 10 Ϫ12 to 10 Ϫ3 sec. It is often important to distinguish the time scale of the motion itself versus the frequency of its occurrence. A tyrosine ring flip takes only a picosecond (1 ϫ 10 Ϫ12 sec), but such flips occur only about once every millisecond (1 ϫ 10 Ϫ3 sec). Conformational changes involve motions of groups of atoms (individual side chains, for example) or even whole sections of proteins. These motions occur on a time scale of 10 Ϫ9 to 10 3 sec, and the distances covered can be as large as 1 nm. These motions may occur in response to specific stimuli or arise from specific in- teractions within the protein (hydrogen bonding, electrostatic interactions, or lig- and binding—see Chapters 14 and 15). The cis–trans isomerization of proline residues in proteins (Figure 6.36) occurs over an even longer time scale—typically 10 1 to 10 4 sec. Conversion of even a single proline from its cis to its trans configuration can alter a protein structure dramatically. FIGURE 6.35 Proteins are dynamic structures.The mar- ginal stability of a tertiary structure leads to flexibility and motion in the protein. Determination of structures of proteins (such as the SH3 domain of the ␣-chain of spectrin, shown here) by nuclear magnetic resonance produces a variety of stable tertiary structures that fit the data. Such structural ensembles provide a glimpse into the range of structures that may be accessible to a flexible, dynamic protein (pdb id ϭ 1M8M). Spatial Displacement Characteristic Type of Motion (Å) Time (sec) Source of Energy Atomic vibrations 0.01–1 10 Ϫ15 –10 Ϫ11 Kinetic energy Collective motions 0.01–5 10 Ϫ12 –10 Ϫ3 Kinetic energy or more 1. Fast: Tyr ring flips; methyl group rotations 2. Slow: hinge bending between domains Triggered conformation 0.5–10 10 Ϫ9 –10 3 Interactions with changes or more triggering agent Proline cis–trans 3–10 10 1 –10 4 Kinetic energy or isomerization enzyme driven Adapted from Petsko, G. A., and Ringe, D., 1984. Fluctuations in protein structure from X-ray diffraction. Annual Review of Biophysics and Bioengineering 13:331–371. TABLE 6.2 Motion and Fluctuations in Proteins 166 Chapter 6 Proteins: Secondary,Tertiary, and Quaternary Structure Proline cis–trans isomerizations sometimes act as switches to activate a protein or open a channel across a membrane (see Chapter 9). The Folding Tendencies and Patterns of Globular Proteins Globular proteins adopt the most stable tertiary structure possible. To do this, the peptide chain must both (1) satisfy the constraints inherent in its own structure and (2) fold so as to “bury” the hydrophobic side chains, minimizing their contact with solvent. The polypeptide itself does not usually form simple straight chains. Even in chain segments where helices and sheets are not formed, an extended peptide chain, being composed of L-amino acids, has a tendency to twist slightly in a right- handed direction. As shown in Figure 6.37, this tendency is apparently the basis for the formation of a variety of tertiary structures having a right-handed sense. Princi- pal among these are the right-handed twists in -sheets and right-handed cross- overs in parallel -sheets. Right-handed twisted -sheets are found at the center of a number of proteins (Figure 6.38) and provide an extended, highly stable struc- tural core. Connections between -strands are of two types—hairpins and cross-overs. Hairpins, as shown in Figure 6.37, connect adjacent antiparallel -strands. Cross- overs are necessary to connect adjacent (or nearly adjacent) parallel -strands. CH 2 CH 2 N H C H 2 C C ␣ HR O C ␣ H C ␣ HR trans cis CH 2 CH 2 N C C ␣ H 2 C O FIGURE 6.36 The cis and trans configurations of proline residues in peptide chain are almost equally stable. Proline cis-trans isomerizations, often occurring over relatively long time scales, can alter protein structure significantly. Antiparallel hairpin Cross-overs Parallel, right-handed Parallel, left-handed (b) Natural right-handed twist by polypeptide chain (a) FIGURE 6.37 (a) The natural right-handed twist exhibited by polypeptide chains, and (b) the types of connec- tions between -strands. 6.4 How Do Polypeptides Fold into Three-Dimensional Protein Structures? 167 Nearly all cross-over structures are right-handed. In many cross-over structures, the cross-over connection itself contains an ␣-helical segment. This creates a ␣-loop. As shown in Figure 6.37, the strong tendency in nature to form right-handed cross- overs, the wide occurrence of ␣-helices in the cross-over connection, and the right- handed twists of -sheets can all be understood as arising from the tendency of an extended polypeptide chain of L-amino acids to adopt a right-handed twist struc- ture. This is a chiral effect. Proteins composed of D-amino acids would tend to adopt left-handed twist structures. The second driving force that affects the folding of polypeptide chains is the need to bury the hydrophobic residues of the chain, protecting them from solvent water. From a topological viewpoint, then, all globular proteins must have an “inside” where the hydrophobic core can be arranged and an “outside” toward which the hy- drophilic groups must be directed. The sequestration of hydrophobic residues away from water is the dominant force in the arrangement of secondary structures and nonrepetitive peptide segments to form a given tertiary structure. Globular proteins can be classified mainly on the basis of the particular kind of core or backbone struc- ture they use to accomplish this goal. The term hydrophobic core, as used here, refers to a region in which hydrophobic side chains cluster together, away from the solvent. Backbone refers to the polypeptide backbone itself, excluding the particular side chains. Globular proteins can be pictured as consisting of “layers” of backbone, with hydrophobic core regions between them. More than half the known globular pro- tein structures have two layers of backbone (separated by one hydrophobic core). Roughly one-third of the known structures are composed of three backbone layers and two hydrophobic cores. There are also a few known four-layer structures and at least one five-layer structure. A few structures are not easily classified in this way, but it is remarkable that most proteins fit into one of these classes. Examples of each are presented in Figure 6.38. (a) Cytochrome cЈ Layer 1 Layer 2 Hydrophobic residues are buried between layers (b) Phosphoglycerate kinase (domain 2) (c) Phosphorylase (domain 2) (d) Triose p hos p hate isomerase ACTIVE FIGURE 6.38 Examples of protein domains with different numbers of layers of back- bone structure. (a) Cytochrome cЈ with two layers of ␣-helix. (b) Domain 2 of phosphoglycerate kinase, com- posed of a -sheet layer between two layers of helix, three layers overall. (c) An unusual five-layer structure, domain 2 of glycogen phosphorylase, a -sheet layer sandwiched between four layers of ␣-helix. (d) The con- centric “layers” of -sheet (inside) and ␣-helix (outside) in triose phosphate isomerase. Hydrophobic residues are buried between these concentric layers in the same manner as in the planar layers of the other proteins. The hydrophobic layers are shaded yellow. (Original art courtesy of Jane Richardson.) Test yourself on the con- cepts in this figure at www.cengage.com/login 168 Chapter 6 Proteins: Secondary,Tertiary, and Quaternary Structure Most Globular Proteins Belong to One of Four Structural Classes In addition to classification based on layer structure, proteins can be grouped ac- cording to the type and arrangement of secondary structure (Figure 6.39). There are four such broad groups: all ␣ proteins and all  proteins (in which the struc- tures are dominated by ␣-helices and -sheets, respectively), ␣/ proteins (in which helices and sheets are intermingled), and ␣؉ proteins (in which ␣-helical and -sheet domains are separated for the most part). It is important to note that the similarities of tertiary structure within these groups do not necessarily reflect similar or even related functions. Instead, func- tional homology usually depends on structural similarities on a smaller and more in- timate scale. Molecular Chaperones Are Proteins That Help Other Proteins to Fold To a first approximation, all the information necessary to direct the folding of a polypeptide is contained in its primary structure. On the other hand, the high protein concentration inside cells may adversely affect the folding process be- cause hydrophobic interactions may lead to aggregation of some unfolded or par- tially folded proteins. Also, it may be necessary to suppress or reverse incorrect or premature folding. A family of proteins, known as molecular chaperones, are es- sential for the correct folding of certain polypeptide chains in vivo; for their as- sembly into oligomers; and for preventing inappropriate liaisons with other pro- teins during their synthesis, folding, and transport. Many of these proteins were first identified as heat shock proteins, which are induced in cells by elevated tem- perature or other stress. The most thoroughly studied proteins are Hsp70, a 70-kD heat shock protein, and the so-called chaperonins, also known as Cpn60s or Hsp60s, a class of 60-kD heat shock proteins. A well-characterized Hsp60 chaper- onin is GroEL, an E. coli protein that has been shown to affect the folding of sev- eral proteins. The mechanism of action of chaperones is discussed in Chapter 31. Some Proteins Are Intrinsically Unstructured Remarkably, it is now becoming clear that many proteins exist and function nor- mally in a partially unfolded state. Such proteins, termed intrinsically unstructured proteins (IUPs) or natively unfolded proteins, do not possess uniform structural properties but are nonetheless essential for basic cellular functions. These proteins are characterized by an almost complete lack of folded structure and an extended conformation with high intramolecular flexibility. Intrinsically unstructured proteins contact their targets over a large surface area (Figure 6.40). The p27 protein complexed with cyclin-dependent protein kinase 2 (Cdk2) and cyclin A shows that p27 is in contact with its binding partners across its entire length. It binds in a groove consisting of conserved residues on cyclin A. On Cdk2, it binds to the N-terminal domain and also to the catalytic cleft. One of the most appropriate roles for such long-range interactions is assembly of complexes involved in the transcription of DNA into RNA, where large numbers of proteins must be recruited in macromolecular complexes. Thus, the transactivator domain catenin-binding domain (CBD) of tcf3 is bound to several functional domains of -catenin (Figure 6.40). Can amino acid sequence information predict the existence of intrinsically un- structured regions on proteins? Intrinsically unstructured proteins are character- ized by a unique combination of high net charge and low overall hydrophobicity. Compared with ordered proteins, IUPs have higher levels of E, K, R, G, Q, S, and P, and low amounts of I, L, V, W, F, Y, C, and N. These features provide a rationale for prediction of regions of disorder from amino acid sequence information, and ex- perimental evidence shows that such predictions are better than 80% accurate. Genomic analysis of disordered proteins indicates that the proportion of the genome encoding IUPs and proteins with substantial regions of disorder tends to increase with the complexity of organisms. Thus, predictive analysis of whole Leucine-rich repeat variant (pdb id = 1LRV) Peridinin-chlorophyll protein (a “solenoid”—pdb id = 1PPR) Endoglucanase A (an ␣-helical barrel—pdb id = 1CEM) Cat allergen (pdb id = 1PUO) Human growth hormone (pdb id = 1HGU) Rieske iron protein (a 3-layer -sandwich— (pdb id = 1RIE) Hemopexin C-terminal domain (a 4-bladed propellor—pdb id = 1HXN) Pleckstrin domain of protein kinase B/AKT (pdb id = 1UNQ) Lectin from R. solanacearum (a 6-bladed propellor— pdb id = 1BT9) Mannose-specific aggluttinin (a prism— (pdb id = 1JPC) Hevamine (a “TIM barrel” —pdb id = 2HVM) Hepatocyte growth factor (N-terminal domain —pdb id = 2HGF) Human bactericidal permeability-increasing protein (pdb id = 1BP1) Prokaryotic ribosomal protein L9 (pdb id = 1DIV) MurA (an ␣– prism —pdb id = 1EYN) Porcine ribonuclease inhibitor (a “horseshoe”—pdb id = 2BNH) All ␣ proteins: All  proteins: RuvA protein (pdb id = 1CUK) Ribonuclease H (pdb id = 1RNH) L-Arginine: glycine amidinotransferase (a metabolic enzyme—pdb id = 4JDW) Thymidylate synthase (pdb id = 3TMS) Equine leucocyte elastase inhibitor (pdb id = 1HLE) ␣/ proteins: ␣+ proteins: FIGURE 6.39 Four major classes of protein structure (as defined in the SCOP database). (a) All ␣ proteins, where ␣-helices dominate the structure; (b) All  proteins, in which -sheets are the primary feature; (c) ␣/ proteins, where ␣-helices and -sheets are mixed within a domain; (d) ␣؉ proteins, in which ␣-helical and -sheet domains are separated to at least some extent. 6.4 How Do Polypeptides Fold into Three-Dimensional Protein Structures? 169 170 Chapter 6 Proteins: Secondary,Tertiary, and Quaternary Structure genomes indicates that 2% of archaeal, 4.2% of bacterial, and 33% of eukaryotic proteins probably contain long regions of disorder. Some proteins are disordered throughout their length, whereas others may con- tain stretches of 30 to 40 residues or more that are disordered and imbedded in an otherwise folded protein. The prevalence of disordered segments in proteins may reflect two different cellular needs. (1) Disordered proteins are more mal- leable and thus can adapt their structures to bind to multiple ligands, including other proteins. Each such interaction could provide a different function in the (a) (b) Cdk2 CycA Oct 1 POU SD Ig -catenin TAF II 105 Oct 1 POU HD (c) (d) (e) (f) (g) FIGURE 6.40 Intrinsically unstructured proteins (IUPs) contact their target proteins over a large surface area. (a) p27 Kip1 (yellow) complexed with cyclin-dependent kinase 2 (Cdk2, blue) and cyclin A (CycA, green). (b) The transactivator domain CBD of Tcf3 (yellow) bound to -catenin (blue). Note: Part of the -catenin has been removed for a clear view of the CBD. (c) Bob 1 transcrip- tional coactivator (yellow) in contact with its four part- ners:TAF II 105 (green oval), the Oct 1 domains POU SD and POU HD (green), and the Ig promoter (blue). (From Tompa, P., 2002. Intrinsically unstructured proteins. Trends in Bio- chemical Sciences 27:527–533.) (d-g) Some intrinsically un- structured proteins (in red and yellow) bind to their targets by wrapping around them. Shown here are (d) SNAP-25 bound to BoNT/A, (e) SARA SBD bound to Smad 2 MH2, (f) HIF-1␣ interaction domain bound to the TAZ1 domain of CBP, and (g) HIF-1␣ interaction domain bound to asparagine hydroxylase FIH. (From Trends in Biochemical Sciences, Vol. 27, No. 10, page 530.October 2002.) HUMAN BIOCHEMISTRY ␣ 1 -Antitrypsin—A Tale of Molecular Mousetraps and a Folding Disease In the human lung, oxygen and CO 2 are exchanged across the walls of alveoli—air sacs surrounded by capillaries that connect the pul- monary veins and arteries. The walls of alveoli consist of the elastic protein elastin. Inhalation expands the alveoli, and exhalation com- presses them. A pair of human lungs contains 300 million alveoli, and the total area of the alveolar walls in contact with capillaries is about 70 m 2 —an area about the size of a tennis court! In the lungs, neutrophils (a type of white blood cell) naturally secrete elastase, a protein-cleaving enzyme essential to tissue repair. However, elastase also can attack and break down the elastin of the alveolar walls if it spreads from the site of inflammation repair. To prevent this, the liver secretes into the blood ␣ 1 -antitrypsin—a 52-kD protein belonging to the serpin (serine protease inhibitor) family—which blocks elastase action, preventing alveolar damage. ␣ 1 -Antitrypsin is a molecular mousetrap, with a flexible peptide loop (blue in the figure) that contains a Met residue as “bait” for the elastase and that can swing like the arm of a mousetrap. When elas- tase binds to the loop at the Met residue, it cuts the peptide loop. Now free to move, the loop slides into the middle of a large beta sheet (green), at the same time dragging elastase to the opposite side of the ␣ 1 -antitrypsin structure. At this new binding site, the elastase structure is distorted, and it cannot complete its reaction and free it- self from the ␣ 1 -antitrypsin. Cellular scavenger enzymes then attack the elastase–antitypsin complex and destroy it. By sacrificing itself in this way, the ␣ 1 -antitrypsin has prevented damage to the alveolar elastin. Defects in ␣ 1 -antitrypsin can cause serious lung and liver damage. The gene for ␣ 1 -antitrypsin is polymorphic (that is, it occurs as many different sequence variants) and many variants of ␣ 1 -antitrypsin are either poorly secreted by the liver or function poorly in the lungs. Even worse, tobacco smoke oxidizes the critical Met residue in the flexible loop of ␣ 1 -antitrypsin, and smokers, especially those who carry mutants of this protein, often develop emphysema—the de- struction of the elastin connective tissue in the lungs. The flexible loop of ␣ 1 -antitrypsin—its mousetrap spring—is also its Achilles’ heel. Mutations in this loop make the protein vul- nerable to aberrant conformational changes. The Z-mutation of ␣ 1 -antitrypsin is an interesting case, with a Lys in place of Glu at residue 342 (indicated by the arrow in M) at the base of the flexi- ble loop. This causes partial loop insertion in the large -sheet (M*). This induces the modified -sheet to accept the flexible loop of another ␣ 1 -antitrypsin, forming a dimer. Repetition of these events forms polymers, which are trapped in the liver (often leading to cirrhosis and death). Z variants that manage to make it to the lungs associate so slowly with elastase that they are ineffec- tive in preventing lung damage. (a) Elastase ␣ 1 -AT Met-containing loop ␣ 1 -AT Elastase P (b) Z MM* D ᮤ (a) Elastase (dark gray) is inactivated by binding to ␣ 1 -antitrypsin. When elastase binds, cleaving the flexible loop at a Met residue, the rest of the loop (the red -strand) rotates more than 180° and inserts into the green -sheet, swinging the elastase to the other end of the molecule. At this new location, the elastase is distorted and inactivated. (b) In the Z-mutant of ␣ 1 -antitrypsin, the flexible loop is only partially inserted in the large -sheet, promoting poly- mer formation and trapping ␣ 1 -antitrypsin at its site of synthesis in the liver. The consequences of this are cirrhosis of the liver, as well as lung damage, since the small amount of ␣ 1 -antitrypsin that reaches the lungs is ineffective in preventing lung damage. Individ- ual monomers in the ␣ 1 -antitrypsin polymer are colored red, blue, and gold (far right). (From Lomas, D. A., et al., 2005. Molecular mousetraps and serpinopathies. Biochem Soc. Transactions 33:321-330. Figure provided by David Lomas.) 172 Chapter 6 Proteins: Secondary,Tertiary, and Quaternary Structure cell. (2) Compared with compact, folded proteins, disordered segments in pro- teins appear to be able to form larger intermolecular interfaces to which ligands, such as other proteins, could bind (Figure 6.40). Folded proteins might have to be two to three times larger to produce the binding surface possible with a disordered protein. Larger proteins would increase cellular crowding or could increase cell size by 15% to 30%. The flexibility of disordered proteins may thus reduce protein, genome, and cell sizes. HUMAN BIOCHEMISTRY Diseases of Protein Folding A number of human diseases are linked to abnormalities of pro- tein folding. Protein misfolding may cause disease by a variety of mechanisms. For example, misfolding may result in loss of func- tion and the onset of disease. The following table summarizes sev- eral other mechanisms and provides an example of each. Disease Affected Protein Mechanism Alzheimer’s disease Familial amyloidotic polyneuropathy Cancer Creutzfeldt-Jakob disease (human equivalent of mad cow disease) Hereditary emphysema Cystic fibrosis -Amyloid peptide (derived from amyloid precursor protein) Transthyretin p53 Prion ␣ 1 -Antitrypsin CFTR (cystic fibrosis transmembrane conductance regulator) Misfolded -amyloid peptide accumulates in human neural tissue, forming deposits known as neuritic plaques. Aggregation of unfolded proteins. Nerves and other organs are damaged by deposits of insoluble protein products. p53 prevents cells with damaged DNA from dividing. One class of p53 mutations leads to misfolding; the misfolded protein is unstable and is destroyed. Prion protein with an altered conformation (PrP SC ) may seed conformational transitions in normal PrP (PrP C ) molecules. Mutated forms of this protein fold slowly, allowing its target, elastase, to destroy lung tissue. Folding intermediates of mutant CFTR forms don’t dissociate freely from chaperones, preventing the CFTR from reaching its destination in the membrane. HUMAN BIOCHEMISTRY Structural Genomics The prodigious advances in genome sequencing in recent years, together with advances in techniques for protein structure deter- mination, have not only provided much new information for bio- chemists but have also spawned a new field of investigation— structural genomics, the large-scale analysis of protein structures and functions based on gene sequences. The scale of this new en- deavor is daunting: hundreds of thousands of gene sequences are rapidly being determined, and current estimates suggest that there are probably less than 10,000 distinct and stable polypep- tide folding patterns in nature. The feasibility of large-scale, high- throughput structure determination programs is being explored in a variety of pilot studies in Europe, Asia, and North America. These efforts seek to add 20,000 or more new protein structures to our collected knowledge in the near future; from this wealth of new information, it should be possible to predict and determine new structures from sequence information alone. This effort will be vastly more complex and more expensive than the Human Genome Project. It presently costs about $100,000 to determine the structure of the typical globular protein, and one of the goals of structural genomics is to reduce this number to $20,000 or less. Advances in techniques for protein crystallization, X-ray diffrac- tion, and NMR spectroscopy, the three techniques essential to protein structure determination, will be needed to reach this goal in the near future. The payoffs anticipated from structural genomics are substantial. Access to large amounts of new three-dimensional structural infor- mation should accelerate the development of new families of drugs. The ability to scan databases of chemical entities for activities against drug targets will be enhanced if large numbers of new pro- tein structures are available, especially if complexes of drugs and tar- get proteins can be obtained or predicted. The impact of structural genomics will also extend, however, to functional genomics—the study of the functional relationships of genomic content—which will enable the comparison of the composite functions of whole genomes, leading eventually to a complete biochemical and mech- anistic understanding of all organisms, includin g humans.