Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 25 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
25
Dung lượng
280,93 KB
Nội dung
CHAPTER INTRODUCTION TO MACROMOLECULAR X-RAY CRYSTALLOGRAPHY The 1901 Nobel Prize for physics was awarded to Roentgen for his discovery of X-rays. X-rays are electromagnetic waves whose wavelengths are in the range of 0.1-100 Å. They are produced when rapidly moving electrons strike a solid metal target and their kinetic energy is converted into radiation. The wavelength of the emitted radiation depends on the energy of the electrons. In 1912, von Laue’s group discovered X-ray diffraction by crystals and this discovery gave rise to the development of a very rich scientific period and created a new academic branch – Xray crystallography. One year later, W. L. Bragg determined the first crystal structure. From then on, crystal structure determination has been broadly undertaken on inorganic and organic molecules. X-ray crystallography is now a commonly used technique for determination of the three-dimensional structure of biomolecules. The methodology is fairly robust in that the experimental and computational methods for these studies are now well developed. The use of advanced protein expression and purification procedures, crystallization robots and powerful synchrotron radiation sources has enabled highthroughput structure determination. This chapter briefly discusses the concepts and methodologies used in macromolecular X-ray crystallography. 1.1 MACROMOLECULAR CRYSTALLIZATION To perform X-ray crystallography, it is necessary to grow crystals with edge lengths around 0.1-0.3 mm. Crystals are formed as the conditions in a supersaturated solution slowly change. For small molecules, growing large crystals is relatively simple. Proteins are difficult to crystallize because of their complexity, molecular weight and flexibility. Also purification of a protein to homogeneity is a very tedious process. The strategy to crystallize a protein is to guide a protein/solvent system very slowly toward a state of reduced solubility by modifying the properties of the solvent or the character of the macromolecule. This is most frequently accomplished by increasing the concentration of precipitating agents or by altering some physical properties (e.g., pH, temperature) to achieve supersaturation. Efforts then have to be put into the refinement and optimization of the crystallization conditions that will encourage and promote specific bonding interactions between molecules, bigger single crystal formation and to stabilize the crystals once they are formed. The ‘salting in’ and ‘salting out’ properties of proteins have been used to push them into supersaturation. Although the ‘salting in’ effect can be used as a method for crystallization, however, most proteins are not stable at a low salt environment. Therefore, exploration of the protein ‘salting out’ property is more commonly used. A number of methods have been attempted to bring proteins in an unsaturated state gradually into a supersaturated state. The most commonly used method to crystallize proteins is the vapor diffusion method. A drop of protein solution is suspended over a reservoir containing buffer and precipitant. Water diffuses from the drop to the reservoir solution leaving the drop with optimal crystal growth conditions. The other methods include batch crystallization, micro-batch crystallization and dialysis. 1.2 BASIC CONCEPTS OF CRYSTALLOGRAPHY 1.2.1 Crystal, unit-cell and asymmetric unit Protein crystals are usually about 40-60% solvent by weight and are thus fragile and sensitive to drying out. In a crystal, molecules are arranged with regular repeats of symmetry. A unit-cell is defined as the smallest possible volume that when repeated, represents the entire crystal. The dimensions of a unit-cell can be described with edge lengths (a, b, c) and interaxial angles (α, β, γ). The location of atoms within a unit-cell can be listed in the Cartesian coordinate system. The smallest volume within the unit-cell that can be rotated and translated to generate one unit-cell is called the asymmetric unit. Only the symmetry operators that are allowed by the crystallographic symmetry must be used for the construction of the entire unit-cell. Even though the asymmetric unit may commonly contain only one molecule or one subunit of a multimeric protein, it can also be more than one. 1.2.2 Lattice, point group and space group A lattice is classically defined as a group of points organized in space in such a way that each point has the same environment. There are 14 types of unit-cells in crystallography that lead to 14 Bravais lattices. The Bravais lattices are the distinct lattice types which, when repeated can fill the whole space. They can be classified as primitive (simple unit-cell), face centered (equals the simple lattice with the addition of a lattice point in the center of each of the six faces of each unit-cell), body centered (point at the center of the cell) and end centered (point at the center of one face). The cubic crystal system (which warrants a cubic unit-cell) can have a primitive, body centered and face centered lattice; the tetragonal system can have a primitive and body centered lattice; the orthorhombic system can have a primitive, face centered, body centered and end centered lattice; the hexagonal crystal system can only have a primitive lattice while the trigonal system can have a rhombohedral lattice; the monoclinic system needs a primitive or an end centered lattice while the triclinic system can only have a primitive lattice. Molecules follow certain symmetry operations when they are packed into a crystal. Beside unit translations along the three unit-cell axes, called threedimensional translation symmetry, other symmetry elements are rotation, reflection, and inversion. The combination of these symmetry elements that acts on a unit-cell is commonly called a crystallographic point group. The simplest point groups are composed of proper rotation around the symmetry axis. These are the point groups 1, 2, 3, 4, and 6. The total number of crystallographic point groups involving proper rotation is 11. Point groups also contain improper rotations, which are conformed to one of the six general types: n , n n , PII, IPI, IIP, and P/I P/I P/I. There are 21 improper rotations. Thus there are totally 32 crystallographic point groups (Buerger, 1956). Rotation or reflection combined with translation will generate screw or glide symmetry, respectively. The combination of lattices and points groups (including their allowed screw axes and glide planes) leads to 230 different ways to combine the allowed symmetry operations in a crystal, known as space groups. Because only Lamino acids are present in proteins and application of the mirror plane and inversion center to an L-amino acid would demand a D-amino acid not all the 230 space groups are allowed in protein crystals and only 65 space groups are applicable (McRee, 1999). 1.2.3 hkl plane A convenient way to study the crystalline lattice is through the use of hkl planes. The index h gives the number of planes in the set per unit-cell in the X direction, or equivalently, the number of parts into which the set of planes cuts the X edge of each cell. Similarly, the indices k and l specify how many such planes exist per unit-cell in the Y and Z directions. The family of planes having indices hkl is the (hkl) family of planes. This concept is very useful in explaining the diffraction of Xrays by crystals. 1.3 PRINCIPLES OF X-RAY DIFFRACTION Diffraction occurs as waves interact with a regular structure whose repeat distance is about the same as the wavelength. It happens that X-rays have wavelengths in the order of Angstroms, same as typical interatomic distances in crystalline solids. That means X-rays can be diffracted by minerals, which, by definition, are crystalline and have regularly repeating atomic structures. When certain geometric requirements are satisfied, X-rays that are scattered from a crystalline solid can constructively interfere, thereby producing a diffracted beam. These geometric requirements were first explained by Bragg. 1.3.1 Bragg's law Diffraction depends on spacing between scattering bodies and wavelengths of incident radiation. In Bragg’s model of diffraction as reflection from parallel sets of planes, (Fig. 1.1) any of these sets of planes can be the source of one diffracted X-ray beam. Bragg showed that a set of parallel planes with indices hkl and interplanar spacing dhkl produces a diffracted beam when X-rays of wavelength λ impinge on the planes at an angle θ and are reflected at the same angle, only if θ meets the condition dhkl sinθ = nλ (1.1) R1 R2 A θ θ dhkl C C B Figure 1.1 The Bragg’s law. The condition that produces diffracted rays. sin θ = BC/AB, BC = AB sinθ = dhkl sinθ. If the additional distance (2BC) travelled by the more deeply penetrating ray R2 is an integral multiple of λ, then rays R1 and R2 interfere constructively. Notice that the angle of diffraction θ is inversely related to the interplanar spacing dhkl (sinθ is proportional to 1/dhkl). This implies that large unit-cells, with large spacing, give small angles of diffraction and hence produce many reflections that fall within a convenient angle from the incident beam. On the other hand, small unit-cells give a large angle of diffraction, producing fewer measurable reflections. In a sense, the number of measurable reflections depends on how many reflections are possible from a unit-cell under a given experimental condition. Each set of parallel planes in a crystal produces one reflection. The intensity of a reflection depends on the summation of the electron distribution in the unit-cell along the direction of the planes that produce that reflection. 1.3.2 Reciprocal lattice Although Bragg’s law gives a simple and convenient method for calculating the separation of crystallographic planes, further analysis is necessary to calculate the intensity of scattering from a spatial distribution of electrons within each unit-cell. A reciprocal lattice is defined as a discrete set of diffracted rays (reflections). The reciprocal lattice vectors are perpendicular to the real lattice planes from which they are derived. The dimensions of the reciprocal lattice are inversely related to those of the real lattice. Thus large unit-cells result in a very closely spaced reciprocal lattice and small unit-cells result in a reciprocal lattice with large intervals. Fig. 1.2 explains how a reciprocal lattice is generated. Take O as the origin. Through a neighboring crystal lattice point N, draw one plane each of the set (110), (120) and so forth, whose interplanar distances will be d110, d120 and so on. From the origin, draw a line normal to the (110) plane. The point at a distance, 1/d110, on this line will define the reciprocal lattice point 110. Do the same for (120) and so on. Note that the points defined by this operation form a lattice, with the chosen origin. This new lattice is the reciprocal lattice. If the real unit-cell angles α, β and γ are 90°, the reciprocal unit-cell has axes a* lying along the real unit-cell edge with the corresponding length of 1/a. Similarly, the other parameters, b* and c* are defined. If the axial lengths are expressed in Angstroms, then the reciprocal lattice spacing is in the unit 1/Å or Å-1 (reciprocal Angstroms). 1.3.3 Ewald sphere Reciprocal lattice points give the crystallographer a convenient way to compute the direction of diffracted beams from all sets of parallel planes in the crystalline lattice (real space). The following geometrical interpretation of diffraction was formulated by Ewald. 140 b* 130 120 N (010) 110 b O (130) y (120) x (110) Figure 1.2 The reciprocal lattice Assume that an X-ray beam (arrow XO in Fig. 1.3) impinges on the crystal on a plane. Point O is arbitrarily chosen as the origin of the reciprocal lattice. O is also the real lattice origin in the crystal. Draw a circle of radius 1/λ with its center C on XO and passes through O. This circle represents the wavelength of X-rays in the a* b* O C θ θ θ θ P' P R B X Figure 1.3 The Ewald sphere reciprocal space. Rotating the crystal about O will also rotate the reciprocal lattice about O, successively bringing the reciprocal lattice points P and P' into contact with the circle. Because the triangle PBO is inscribed in a semicircle, it is a right angled triangle and sinθ = OP/ BO = OP/ (2/λ). Because P is a reciprocal lattice point, the length of line OP is 1/dhkl, where h, k and l are the indices of the set of planes represented by P. So, 1/OP = dhkl and 2dhkl sinθ = λ, which is Bragg's law with n = 1. The line defining a reciprocal lattice point is normal to the set of planes having the same indices as the point. BP, which is perpendicular to OP, is parallel to the planes that are producing the reflection P in Fig. 1.3. If we draw a line parallel to BP and passing through C, the center of the circle, this line represents a plane in the set that reflects the X-ray beam under these conditions. The beam impinges on this plane at an angleθ, reflected at the same angle and diverges from the plane at C by an angle 2θ, which takes it precisely through the point P. CP gives the direction of the reflected ray R. In conclusion, reflection occurs in the direction CP when the reciprocal lattice point P comes in contact with this circle. As the crystal is rotated in the X-ray beam, all reciprocal lattice points come into contact with this sphere. Each reciprocal lattice point produces a beam in the direction of a line from the center of the sphere of reflection through the reciprocal lattice point that is in contact with the sphere. This model of diffraction also implies that the directions of reflections, as well as the number of reflections, depend only on the unit-cell dimensions, and not on the contents of the unit-cell. 1.4 FOURIER TRANSFORM 1.4.1 The Fourier series A Fourier series, named after Joseph Fourier, is an expansion of a periodic function f(x) in terms of an infinite sum of sines and cosines and makes use of the orthogonality relationships of the sine and cosine functions. The computation and study of the Fourier series is known as harmonic analysis and is extremely useful as a way to break up an arbitrary periodic function into a set of simple terms that can be plugged in, solved individually, and then recombined to obtain a solution to the original problem or an approximation to it to whatever accuracy is desired in practice. Each reflection is the result of diffraction from atoms in the unit-cell. As a wave is periodic, Fourier analysis is the approximation of periodic functions by sine and cosine. The basic idea of Fourier analysis is that any function f(x) of period can be approximated by sums of the type n f ( x) = ∑ | Fh | [cos 2π (hx) + i sin 2π (hx)] (1.2) h =0 Here f (x) specifies the resulting diffracting wave and it is the sum of n Fourier terms or diffraction from n atoms. Each term is a simple wave with its own amplitude |Fh|, its own frequency h, and implicitly, its own phase αh. Since cosθ + isinθ = eiθ (1.3) the above Fourier series can be written as f ( x) = ∑ | Fh | e 2πi ( hx ) (1.4) h When the above Fourier series is derived as a three dimensional Fourier series, the equation will be 10 f ( x, y, z ) = ∑∑∑ | Fhkl | e 2πi ( hx + ky +lz ) h k (1.5) l Here each term in the series is a simple three-dimensional wave whose frequency is h in the X direction, k in the Y direction and l in the Z direction. For each possible set of value h, k and l, the associated wave has an amplitude |Fhkl|. 1.4.2 The Fourier transform The Fourier transform defines a relationship between a signal in the time domain and its representation in the frequency domain. Being a transform, no information is created or lost in the process, so the original signal can be recovered from knowing the Fourier transform, and vice versa. Fourier demonstrated that for any function f(x), there exists another Function F(h) such that +∞ F (h) = ∫ f ( x)e 2πi ( hx ) dx −∞ (1.6) Where F(h) is called the Fourier transformation (FT) of f(x), and the unit of the variable h is the reciprocal of the unit of x. The Fourier transform operation is reversible. That is, the same mathematical operation that gives F(h) from f(x) can be carried out in the opposite direction to give f(x) from F(h), if x and h are reciprocal to each other. +∞ f ( x) = ∫ F (h)e −2πi ( hx ) dh −∞ (1.7) The above functions f(x) and F(h) are one-dimensional. If stated in three dimensions, the Fourier transform would be: F (h, k , l ) = ∫ ∫ ∫ f ( x, y, z )e 2πi ( hx+ ky +lz ) dxdydz (1.8) x y z and in turn the reverse Fourier transform is 11 f ( x, y, z ) = ∫∫∫ F (h, k , l )e −2πi ( hx+ ky +lz ) dhdkdl (1.9) h k l 1.4.3 Electron density and structure factor The Fourier series is directly applicable in the study of crystals because the electron density function in a crystal is periodic. Although the information about a protein structure is presented in the Cartesian coordinates of each atom, in reality what the crystallographer sees is the electron density, the cloud of electrons surrounding the nucleus of an atom with which X-rays interact. The unit-cell can be represented as an assembly of electron density in several defined volume elements. The electron density of each volume element centered at (x, y, z) is roughly the average value of ρ (x, y, z) in that region. Smaller the volume elements, the more precisely these averages approach the correct value of ρ (x, y, z) at all points. The electron density is written as ρ ( x, y , z ) = Fhkl e − 2πi ( hx + ky +lz ) ∑ ∑ ∑ v h k l (1.10) where Fhkl is called structure factor, whose Fourier transform is the electron density and vice versa. In turn, the structure factor is written as Fhkl = ∫ ∫ ∫ ρ ( x, y, z )e 2πi ( hx + ky +lz ) dxdydz (1.11) h k l In other words, the structure factor is the resultant of N waves scattered in the direction of the reflection hkl by the N atoms in the unit-cell. Each of these waves has an amplitude, which is proportional to the sum of fj, the scattering factor of atom j, and a phase angle αj with respect to the origin of the unit-cell. Crystallographers represent each structure factor as a complex vector. The length of this vector represents the amplitude of the structure factor Fhkl, which is 12 proportional to the square root of the intensity of the reflection hkl, (Ihkl)1/2. The phase is represented by the angle α that the vector makes with the real axis when the origin of the vector is placed at the origin of the complex plane. The structure factor F can be represented as a vector A + iB on this plane, Fig. 1.4 The projection of F on the real axis is its real part A, a vector of length |A| and the projection of F on the Imaginary imaginary axis is its imaginary part iB, a vector of length |B|. i |B| Real F α |A| Figure 1.4 Real and imaginary components of the structure factor From the above figure sin α = | A| |B| and cos α = |F| |F| (1.12) and |A| = |F|cosα and |B| = |F|sinα (1.13) F = |A| + i|B| = |F|(cosα + isinα) (1.14) Expressing the complex terms in parentheses as an exponential, 13 F =| F | ⋅e iα (1.15) Substituting this expression for Fhkl in equation 1.10 will generate ρ ( x, y , z ) = | Fhkl | eiα hkl e − 2πi ( hx + ky + lz ) ∑∑∑ V h k l (1.16) The structure factor for the reflection Fhkl can be rearranged as n Fhkl = ∑ f j e 2πi ( hx j + ky j + lz j ) (1.17) j =1 where fj is the scattering factor and (xj, yj, zj) are the fractional coordinates of atom j in the unit-cell. In X-ray crystallography the structure factor F(hkl) of any X-ray reflection (diffracted beam) hkl is the quantity that expresses both the amplitude and the phase of that reflection. It plays a central role in the determination and refinement of crystal structures because it represents the quantity related to the intensity of the reflection which depends on the structure that gives rise to that reflection and is independent of the method and conditions of observation of the reflection. The set of structure factors for all the reflections are the primary quantities necessary for the derivation of the three-dimensional distribution of electron density, which is the image of the crystal structure, calculated by Fourier methods. This image is the crystallographic analogue of the image formed in a microscope by the recombination of the rays that are scattered by the object. In a microscope this recombination is physically performed by lenses but in crystallography the corresponding recombination of diffracted beams must be achieved by mathematical calculations. 14 1.5 THE PHASE PROBLEM In a diffraction experiment, we measure the intensities of waves scattered from planes (denoted by hkl) in a crystal. The amplitude of the wave, |Fhkl|, is proportional to the square root of the intensity of the reflection measured by a detector. To calculate the electron density at a position (xyz) in the unit-cell, we need to compute the summation of Equation 1.16 over all the hkl planes, which we can express in words as: electron density at (xyz) = the sum of contributions [to the point (xyz)] of waves scattered from all possible planes, whose amplitudes depend on the number of electrons in the unit-cell and the contributions are added with the correct relative phase relationship. In Equation 1.16, V is the volume of the unit-cell and αhkl is the phase associated with the structure-factor amplitude |Fhkl|. We can measure the amplitudes, but the phases are immeasurable in a diffraction experiment. This is the phase problem of X-ray crystallography. If we can somehow assume or arrive at some prior knowledge of the electron density or the structure, we can calculate the phase angle. This is the basis for all phasing methods. The structure determination process of a crystal structure therefore consists of applying a technique, which is relevant to that particular crystal, for obtaining the approximate phases of at least some of the X-ray reflections. In the process of structure refinement the knowledge of the initial phases is extended to all reflections as accurately as possible. 1.5.1 Solving the phase problem Four methods are used to solve the phase problem in macromolecular structure determination. They are: direct methods, heavy-atom method (or isomorphous replacement method), anomalous scattering method (also called anomalous 15 dispersion) and molecular replacement method. All these methods only yield estimates of phases for a limited set of reflections which in some cases must be improved before an interpretable electron density map can be obtained. Subsequently, phases are assigned to as many reflections as possible. 1.5.2 Direct methods If you assume that a crystal is made up of similarly-shaped atoms that all have positive electron density, then there are statistical relationships between sets of structure factors. These statistical relationships can be used to deduce possible values for the phases. Direct methods exploit such relationships, and can be used to solve small molecule structures relatively easily. The direct methods estimate the initial phases for a selected set of reflections using a triple relation and extend phases to more reflections. A trio of reflections in which the intensity and phase of one reflection can be explained by the other two has a triple relation. A number of initial phases are tested and selected by this method. Unfortunately, the statistical relationships become weaker as the number of atoms increases, and direct methods are limited to structures with, at most, a few hundred atoms in the unit-cell. The prime requirement for the direct methods to be successful in protein crystallography is very high resolution data (> 1.2 Å). This has limited the usefulness of ab initio phase determination in protein crystallography, although the direct methods have been used to phase proteins up to 1000 atoms. 1.5.3 Molecular replacement (MR) When a structural model, called the search model that is highly homologous to the subject protein, is available, molecular replacement can be successful. The 16 principles of this method were first described by Michael Rossmann and David Blow (Rossmann, 1962). Usually, the Patterson function of the search model is first correctly orientated in the new crystal unit-cell by means of rotation functions and then the correctly oriented model is translated in the new unit-cell to achieve the best fit that is supported by a convincing correlation factor and a residual factor (details of the residual factor are discussed in Section 1.8). 1.5.4 Multiple isomorphous replacement (MIR) The use of heavy-atom substitution was formulated very early by smallmolecular crystallographers to solve the phase problem. It was Max Perutz and John Kendrew who first applied this method to proteins (Perutz, 1956; Kendrew et al, 1958) by soaking protein crystals in heavy-atom solutions to create isomorphous heavy-atom derivatives (same unit-cell, same orientation of the protein in the unitcell), which gave rise to measurable intensity changes that could be used to deduce the positions of the heavy atoms. In this method, crystals of the wild type protein, whose structure is to be determined are grown in the usual manner. After reaching maturity they are soaked in solutions of heavy atom compounds. The goal is to obtain derivative crystals in which heavy atoms bind specifically and consistently to each protein molecule in the unitcell. After soaking, the positions of the heavy atoms are determined using difference Pattersons. For this step to be successful it is important that only a few heavy atoms should bind in each asymmetric unit. Once the initial heavy atom locations have been determined, the coordinates, occupancy and temperature factors of each heavy atom are refined. At least two isomorphous derivatives are needed for successful structure determination by MIR whereas for multiple isomorphous replacement with 17 anomalous scattering (MIRAS) phasing, one isomorphous derivative and anomalous scattering data are needed. In practice, data from several derivatives are combined for the refinement of heavy atom parameters and for the calculation of MIR or MIRAS phases. 1.5.5 Anomalous scattering The atomic scattering factor of an atom has three components: f0, a scattering term that is dependent on the Bragg angle and two terms (f′ and f″) that are not dependent on the scattering angle, but on wavelengths. These latter two terms represent the anomalous scattering that occurs at the absorption edge when the X-ray photon energy is sufficient to promote an electron from an inner shell. The dispersive term f′ reduces f0 whereas the absorption term f″ is 90° advanced in phase with respect to f′ . This leads to a breakdown in Friedel's law, giving rise to anomalous differences that can be used to locate anomalous scatterers in a crystal, if any. The anomalous or Bijvoet difference can be used in the same way as the isomorphous difference in the Patterson or direct methods to locate anomalous scatterers. Phases for the native structure factors can then be derived in a way similar to single or multiple isomorphous replacement (SIR or MIR). Anomalous scattering can be used to break the phase ambiguity in a single isomorphous replacement experiment, leading to single isomorphous replacement with anomalous scattering (SIRAS). 1.5.5.1 MAD Isomorphous replacement has several problems: non-isomorphism between crystals (unit-cell changes, reorientation of the protein, conformational changes, 18 changes in salt and solvent ions), problems in locating all the heavy atoms, problems in refining heavy-atom positions, occupancies and thermal parameters and errors in intensity measurements. The use of the multiwavelength anomalous diffraction (MAD) method overcomes the non-isomorphism problems. Data are collected at several, typically three, wavelengths in order to maximize the absorption and dispersive effects. The changes in structure-factor amplitudes arising from anomalous scattering are generally small and require accurate measurement of intensities. The actual profile of the absorption curve must be determined experimentally by a fluorescence scan on the crystal at the synchrotron, as the environment of the anomalous scatterers can affect the details of the absorption. There is a need for excellent optics for accurate wavelength setting with minimum wavelength dispersion. Generally, all data are collected from a single frozen crystal with high redundancy in order to increase the statistical significance of the measurements and data are collected with as high a completeness as possible. 1.5.5.2 SAD Single anomalous dispersion (SAD) is a sub-set of MAD. It is becoming increasingly practical to collect data at the absorption peak and use densitymodification protocols to break the phase ambiguity and provide interpretable maps. 1.6 PHASE IMPROVEMENT Generally, experimentally determined phases are not sufficiently accurate to give a completely interpretable electron-density map. Experimental phases are often the starting point for phase improvement using a variety of density modification 19 methods, which are also based on some prior knowledge of the structure. Solvent flattening, histogram matching and non-crystallographic averaging are the main techniques used to modify electron density and improve phases. Solvent flattening is a powerful technique that removes negative electron density and sets the value of electron density in the solvent regions to a typical value of 0.33 e Å-3, in contrast to a typical protein electron density of 0.43 e Å-3. Density modification is often a cyclic procedure, involving back-transformation of the modified electron-density map to give modified phases, recombination of these phases with the experimental phases (so as not to throw away experimental reality) and calculation of a new map which is then modified iteratively until convergence. Such methods can also be used to provide phases beyond the resolution for which experimental phase information is available, assuming higher resolution native data have been collected. In such cases, the modified map is back-transformed to a slightly higher resolution on each cycle to provide new phases for higher resolution reflections. 1.7 MODEL BUILDING A model of the subject protein is produced by fitting the components of the structure into the experimentally derived electron density map followed by refinement. In protein crystallography, the generation of an atomic model of the molecule(s) is a crucial step in the structure-determination process. With an atomic model available, the vast amount of geometrical data of protein structures can be applied in structure refinement in order to generate better phases and a better atomic model. In practice, an atomic model can only be generated when sufficient phase 20 information has been obtained to produce an interpretable electron-density map either by experimental means or through the use of known homologous structures. The model-building task may be far from straightforward, because the phase information may be poor and the resolution of the diffraction data may be limited. An initial model built into an experimental map, or in a poorly phased molecular replacement map, will usually contain many errors. In order to produce an accurate model, it is necessary to carry out crystallographic refinement as well as rebuilding at the graphics display. These steps are carried out in a cyclic process of gradual improvement of the model. Depending on the size of the structure, the automatic (refinement) or the manual (rebuilding) part may be rate-limiting. Refinement is the process of adjusting the parameters of a model to find values most nearly compatible with the observations. It is to minimize the sum of the weighted differences between (|Fo| - |Fc|)2 For maps at high resolution (d ≤ 2.2 Å) and with good starting phases, automation of the model-building process has been highly successful in recent years (Perrakis et al, 1999). Automation has enormously reduced the amount of time involved in manual model building using computer graphics programs. Currently, various approaches are being developed to improve the pattern recognition of protein structural features in electron-density maps (Terwilliger, 2003; Holton et al, 2000; Levitt, 2001) so that automated model building can deal with even lower resolution data and poorer phase information. Nonetheless, at increasingly lower resolution and with poorer phase information map interpretation will become increasingly unreliable. 21 1.8 REFINEMENT Once all the atoms in a structure have been located, the final part of the process is to refine them. An atomic model can never be perfect, but it can be improved a great deal by a process called refinement, in which the atomic model is adjusted to improve the agreement with the measured diffraction data. Refinement is the optimization of a function of a set of observations by changing the parameters of a model. During the refinement of a protein structure no data cut-off should be applied and generally all observed reflections at low resolutions should be used. Lowresolution data must be collected for proper evaluation of the structure because this is used in bulk solvent averaging. Failing to so could result in underestimation of Bfactors or even negative values. The highest possible resolution limit should be used because this maximizes the accuracy and precision of the structure. This is determined by the signal-to-noise ratio [I/σ(I)], completeness and redundancy of data within the highest resolution shell. It is important not to over-refine a structure by building the model into density of inadequate quality. This can generally be revealed by visual inspection of the electron density maps and the presence of high B-factors (i.e. atoms with high thermal mobility). The majority of structures are refined using isotropic B-factors with the assumption that an atom moves equally in all directions. Anisotropic refinement enables the movement of an atom in each direction to be individually refined but the higher number of parameters means that more unique reflections are also required. Consequently, anisotropic B-factor refinement is only possible for atomic resolution structures. 22 Crystallographic refinement procedures are self-monitoring but are prone to contain experimental errors and hence interpretation of the electron density map can be difficult (Kleywegt, 2000). This is further complicated by the presence of model bias: that is, the structural model influences the appearance of the electron density map. The use of annealing procedures to avoid false energy minima can reduce model bias, and should be applied to models that contain protein structures and ligands. The procedure simulates gradual heating and cooling of the molecule and also has the advantage of correcting many small errors in the model. However, visual inspection of the model during refinement is most important because this can reveal unexpected errors. Modern crystallographic refinement uses automatic routines to add water molecules to structures. The number of ordered water molecules within the model increases with increasing resolution. Most proteins are crystallized in the presence of salts, and it is often difficult to differentiate them from water molecules, particularly for similarly sized ions (e.g. NH4+ and Na+) or noise in electron density maps. Hydrogen-bonding patterns and coordination geometry of ions can be used to differentiate water molecules and ions in these circumstances. After several rounds of refinement and map fitting, the model is slowly converged to the final model. The refinement of the structural model against the X-ray diffraction data is measured by a ‘residual’ or ‘reliability’ factor (R-factor). The progress in iterative real and reciprocal space refinement is monitored by computing the difference between the measured structure factor |Fo| and the calculated structure factor |Fc| from the current model. R= ∑ || F | − | F ∑| F | obs calc || (1.20) obs 23 When the model converges to the correct structure, the difference between measured F's and calculated F's will also converge. A desirable target R factor for a protein model should be less than 0.3. Occasionally, a small and well ordered protein structure may refine to about R = 0.1. Modern structure refinement algorithms split the diffraction data into a ‘test set’ to calculate ‘Rfree’ and ‘model set’ to calculate ‘Rcryst’ as global measures of refinement. The Rfree parameter is particularly important for judging against ‘over-fitting’: that is, when no further improvement of the model is obtained and is also sensitive to the presence of errors (Brunger et al, 1992 and Kleywegt et al, 1996). R-factor values alone can be misleading about the quality of the structure because several factors, such as the omission of weak data using a sigma cut-off, influence them. Inappropriate refinement procedures can produce artificially low R values (i.e. the fit appears to be better than it really is). The inclusion of weak diffraction data is essential to obtain a complete model of the structure, including conformational variations of the amino acid side-chains and bound solvent. Thus, Rfactors must be examined critically when judging the reliability of the model, and sometimes even properly refined structures with ‘acceptable’ R-factors can have significant errors associated with them. 1.9 VALIDATION AND DEPOSITION Since the process of building and refining a model of a biomacromolecule based on crystallographic data is subjective, quality-control techniques are required to assess the validity of such models. Errors in the process of model building are almost unavoidable, but it is the crystallographer's task to remove as many of these errors as possible prior to analysis, publication and deposition of the structure. There are many 24 methods to reduce or avoid these errors. These include (i) the use of information derived from databases of well refined structures in model building (ii) the use of various sorts of local quality checks (iii) the use of global quality indicators. Many statistics, methods and programs have been developed to help identify errors in protein models. These methods generally fall into two classes: one in which only coordinates and B factors are taken into account and the second in which both the model and the crystallographic data are analyzed. In a well refined model, the root mean square deviation (RMSD) for bond lengths should not be more than 0.02 Å and for bond angles it is less than 4°. Also, there should be no D-amino acid residues present. Peptide planes must be nearly planar and the back bone conformational angles φ and ψ should fall in the allowed regions. Torsional angles in side chains should lie within a few degrees of stable and staggered conformation. Finally, the well refined model is deposited in the Protein Data Bank (http://pdbdep.protein.osaka-u.ac.jp/adit/). 25 [...]... scattering data are needed In practice, data from several derivatives are combined for the refinement of heavy atom parameters and for the calculation of MIR or MIRAS phases 1. 5.5 Anomalous scattering The atomic scattering factor of an atom has three components: f0, a scattering term that is dependent on the Bragg angle and two terms (f′ and f″) that are not dependent on the scattering angle, but on wavelengths... each asymmetric unit Once the initial heavy atom locations have been determined, the coordinates, occupancy and temperature factors of each heavy atom are refined At least two isomorphous derivatives are needed for successful structure determination by MIR whereas for multiple isomorphous replacement with 17 anomalous scattering (MIRAS) phasing, one isomorphous derivative and anomalous scattering data... phase problem of X-ray crystallography If we can somehow assume or arrive at some prior knowledge of the electron density or the structure, we can calculate the phase angle This is the basis for all phasing methods The structure determination process of a crystal structure therefore consists of applying a technique, which is relevant to that particular crystal, for obtaining the approximate phases of. .. crystal with high redundancy in order to increase the statistical significance of the measurements and data are collected with as high a completeness as possible 1. 5.5.2 SAD Single anomalous dispersion (SAD) is a sub-set of MAD It is becoming increasingly practical to collect data at the absorption peak and use densitymodification protocols to break the phase ambiguity and provide interpretable maps 1. 6. .. is a crucial step in the structure- determination process With an atomic model available, the vast amount of geometrical data of protein structures can be applied in structure refinement in order to generate better phases and a better atomic model In practice, an atomic model can only be generated when sufficient phase 20 information has been obtained to produce an interpretable electron-density map... analogue of the image formed in a microscope by the recombination of the rays that are scattered by the object In a microscope this recombination is physically performed by lenses but in crystallography the corresponding recombination of diffracted beams must be achieved by mathematical calculations 14 1. 5 THE PHASE PROBLEM In a diffraction experiment, we measure the intensities of waves scattered from planes... on the real axis is its real part A, a vector of length |A| and the projection of F on the Imaginary imaginary axis is its imaginary part iB, a vector of length |B| i |B| Real F α |A| Figure 1. 4 Real and imaginary components of the structure factor From the above figure sin α = | A| |B| and cos α = |F| |F| (1. 12) and |A| = |F|cosα and |B| = |F|sinα (1. 13) F = |A| + i|B| = |F|(cosα + isinα) (1. 14) Expressing... unreliable 21 1.8 REFINEMENT Once all the atoms in a structure have been located, the final part of the process is to refine them An atomic model can never be perfect, but it can be improved a great deal by a process called refinement, in which the atomic model is adjusted to improve the agreement with the measured diffraction data Refinement is the optimization of a function of a set of observations... inclusion of weak diffraction data is essential to obtain a complete model of the structure, including conformational variations of the amino acid side-chains and bound solvent Thus, Rfactors must be examined critically when judging the reliability of the model, and sometimes even properly refined structures with ‘acceptable’ R-factors can have significant errors associated with them 1. 9 VALIDATION AND DEPOSITION... is obtained and is also sensitive to the presence of errors (Brunger et al, 19 92 and Kleywegt et al, 19 96) R-factor values alone can be misleading about the quality of the structure because several factors, such as the omission of weak data using a sigma cut-off, influence them Inappropriate refinement procedures can produce artificially low R values (i.e the fit appears to be better than it really . include batch crystallization, micro-batch crystallization and dialysis. 3 1. 2 BASIC CONCEPTS OF CRYSTALLOGRAPHY 1. 2 .1 Crystal, unit-cell and asymmetric unit Protein crystals are usually about. (MIRAS) phasing, one isomorphous derivative and anomalous scattering data are needed. In practice, data from several derivatives are combined for the refinement of heavy atom parameters and. the calculation of MIR or MIRAS phases. 1. 5.5 Anomalous scattering The atomic scattering factor of an atom has three components: f 0 , a scattering term that is dependent on the Bragg angle