necessary and fresh crystals from the same drop were flash-cooled in liquid nitrogen and sent to a synchrotron X-ray facility for data collection. HLH24-82-L and the seleno-methionine version HLH24-82-L-Se-Met were sent to the Argonne National Laboratory synchrotron for data collection. The native crystal resulted in a 3.1Å resolution dataset which indexed in the space group P212121 with unit cell parameters a=68.052, b=86.803, c=93.638, α=β=γ= 90.00°. Based on the volume of the unit cell, and a molecular weight of 8.3 kDa, the Matthews coefficient was 2.77 Å3/Da at a solvent content of 55% assuming molecules (3 dimers) per asymmetric unit. A self rotation function in Molrep (CCP4 suite) (Lebedev, et al., 2008) clearly showed 2-fold and 3-fold symmetry, indicating dimers per asymmetric unit. A MAD dataset was collected for HLH24-82-L-Se-Met at energies: Peak (12,658.3 eV), Inflection (12,656.5 eV), and Remote (13,058.3 eV) at a resolution range of 50-2.5Å. Only the peak dataset was used to identify selenium peaks and was indexed and scaled in space group P3221 with unit cell parameters a=51.5Å, b=51.5Å, c=111.72Å and α=β=90° γ=120°. Matthews coefficient was 2.52 Å3/Da assuming molecules (1 dimer) in the asymmetric unit at 51% solvent content. In parallel, a native dataset was collected for the longer form N-HLH82-L from Brookhaven National Laboratory synchrotron at 2.1Å resolution. Auto-indexing and scaling was done in space group P3121 with unit cell parameters a=51.62Å, b=51.62Å, c=111.47Å and α=β=90°, γ=120°. The HKL representation of the reflections in the kl plane in reciprocal space is given in Figure 12. The Matthews coefficient was 1.98 Å3/Da assuming molecules (1 dimer) in the asymmetric unit at 40% solvent content. Data collection statistics for this native dataset and the selenomethionine dataset mirrored each other well in terms of space group (trigonal) and unit cell dimensions (Table 8), as well as Matthews estimations. They also had ! 43! very similar shaped crystals, even though they were grown in different conditions (Figure 11). Table 8: Crystallographic Data Collection Statistics. Values for the highest resolution shell in parentheses. Parameters Detector Wavelength (Å) Detector distance (mm) Rotation/image (°) Number of images Crystal Data Space Group* Unit Cell Dimensions (Å) a b c Diffraction Data Resolution (Å) No. of observed reflections No. of unique reflections Average Mosaicity Rmerge† (%) /σI Completeness (%) Multiplicity Native CCD ADSC unsupported-q315 1.0809 240 and 180 and 90 (merged) Selenomethionine CCD MAR300 0.9794 275 180 P 31 P 32 51.62 51.62 111.47 α=β=90°, γ=120° 51.5 51.5 111.72 α=β=90°, γ=120° 50−2.1 (2.18-2.1) 125386 10569 0.53 5.8 40.6 (4.2) 100 (100) 11.9 (10.2) 55.86-2.5 (2.64-2.5) 104381 10160 0.55 17.5 7.8 (2.1) 100 (100) 10.3 (10.5) † Rmerge = ∑hkl ∑i |Ii (hkl) – [I (hkl)]|/ ∑hkl ∑iIi (hkl), where Ii (hkl)and [I (hkl)] are the intensity of measurement i and the mean intensity for the reflection with indices hkl, respectively. * See Section 4.1 for differing space group explanation ! ! 44! Figure 12: HKL view of reflections in the kl plane in reciprocal space for N-HLH82-L crystal at 2.1Å resolution. ! 45! CHAPTER 4: RESULTS and DISCUSSION (Structure Solution and Insights) 4.1 Structure solution and Refinement The PHENIX (Adams, et al., 2002) suite of tools was used for most of the structure solution steps. The top hit after protein sequence alignment against structures in the PDB was 2QL2. After removal of DNA, monomers and dimers from chains A and C were used as starting models for PHENIX.AUTOMR, an interface to Phaser molecular replacement (MR) program. Based on the Matthews estimation of the number of molecules in the asymmetric unit, 1-6 different ensembles for the monomers and 1-3 ensembles for the dimers were tested but none yielded any solutions. Using other HLH structures as templates also did not yield solutions, so the Se-Met dataset was analyzed instead. A MAD dataset was collected for HLH24-82-L-Se-Met, but only the peak energy dataset was required to identify the selenium sites using SOLVE in PHENIX at a resolution range of 50-2.5Å. The dataset was initially indexed and scaled in space group P3121. Running SOLVE found peaks corresponding to chain A M33, M62 and chain B M39, M62. However, they did not place well in the density even though the density had a fairly good protein envelope. So SOLVE was re-run on data reindexed in the alternate space group P3221 which placed the Se-Mets within the density and clearly showed two monomers of ID2. Phasing statistics are given in Table 9. Unfortunately, the highest resolution shell statistics for this structure was poor and the best refinement was only acceptable if the data was truncated to around 3Å resolution. Hence, this structure was used as a template for MR of the native datasets. ! 46! Table 9: Phasing statistics of Se-Met construct HLH24-82-L-Se-Met. Values in parantheses are for the highest resolution shell. Phasing Statistics † Ranom (%) Rpim (%) Selenium sites Anomalous multiplicity Anomalous completeness DelAnom correlation between half-sets Mid-Slope of Anom Normal Probability † HLH24-82-L-Se-Met 7.7 6.3 5.5 (5.4) 100 (100) 0.495 1.139 Ranom = Sum |Mn (I+) - Mn (I-)| / Sum (Mn (I+) + Mn (I-)) MR on the native HLH24-82-L dataset using the same strategy as before still provided no viable solution and was abandoned in favour of the higher resolution dataset of the longer form of ID2, N-HLH82-L. The strategy for running MR was to use 1-2 ensembles with the Se-Met dimer and monomer respectively while stipulating that the scaled input file, originally indexed in P3121, apply the alternative P3221 space group. Both strategies were successful and the resulting coordinates were used for automated model building (PHENIX.AUTOBUILD) resulting in a model with optimized phases. Subsequently, the rest of the model was manually built into 2Fo–Fc and Fo-Fc maps using COOT (Emsley, et al., 2004). CNS (Brunger, et al., 1998) was used at the initiation of refinement to monitor model bias by calculating simulated annealing composite omit maps. Random assignment of 10% of the reflections to the Rfree set was used for cross-validation. Further model building and refinement was done manually by iterative X,Y,Z coordinate and isotropic B-factor cycles using PHENIX.REFINE. The final model was composed of a 4-helix bundle refined to 2.1Å with an Rfree value of 25% and no Ramachandran outlier (Table 10, Figure 13). PyMol (DeLano, 2002) was used for generating all the structural figures in the following sections and chapters. ! 47! Table 10: Refinement statistics for native ID2 N-HLH82-L construct. Refinement Space Group Resolution (Å) No. of reflections Rwork/Rfree† (%) No. of atoms Protein Water Potassium Average isotropic (or equivalent) B factors Macromolecule Solvent R.M.S deviations from ideal Bond angles (°) Bond lengths (Å) Ramachandran analysis (%) Favoured Allowed Outliers † Native N-HLH82-L P32 44.71−2.10 10026 22.5/25.0 870 24 54.4 55.3 1.07 0.007 97.09 2.91 Rwork = Σhkl[""Fobs" - k"Fcalc""] / Σhkl["Fobs"]; Rfree = Σhkl⊂T[""Fobs" - k"Fcalc""] / Σ hkl⊂T["Fobs"]; hkl⊂T – test set. ! ! 48! Figure 13: Ramachandran cryst.bioc.cam.ac.uk/rampage/) plot General of ID2 N-HLH82-L by (Lovell, et Glycine RAMPAGE al., (http://www2003) 180 -180 Pre-Pro Proline 180 -180 -180 180 -180 General Favoured General Allowed Glycine Favoured Glycine Allowed Pre-Pro Favoured Pre-Pro Allowed Proline Favoured Proline Allowed Number of residues in favoured region (~98.0% expected) Number of residues in allowed region (~2.0% expected) Number of residues in outlier region 180 : 100 (97.1%) : (2.9%) : (0.0%) RAMPAGE by Paul de Bakker and Simon Lovell available at http://www-cryst.bioc.cam.ac.uk/rampage/ Please cite: S.C. Lovell, I.W. Davis, W.B. Arendall III, P.I.W. de Bakker, J.M. Word, M.G. Prisant, J.S. Richardson & D.C. Richardson (2002) Structure validation by C geometry: ! and C deviation. Proteins: Structure, Function & Genetics. 50: 437-450 ! 49! 4.2 Overall Structure The structure of the ID2 homodimer was solved to 2.1Å resolution. The asymmetric unit of the crystal contained two monomers of the HLH domain (A and B chains) (Figure 14A). Even though the protein contained residues 1-82 that included the N-terminus up to the end of the predicted HLH domain, the first 31 residues had no interpretable density. The final model of ID2 unambiguously showed the boundaries of the HLH domain to center around residues 32 to 82 in chain A and residues 39 to 81 in chain B with the loop region for both chains hinging between residues 51 to 59. Overall, chain A contained 59 residues corresponding to residues 30 to 82 of ID2 and residues (83 to 88) belonging to the polypeptide stabilizer. Chain B contained 47 residues corresponding to residues 35 to 82 of ID2 with no sign of the stabilizing polypeptide. ! 50! . Peak (12 ,65 8.3 eV), Inflection (12 ,65 6.5 eV), and Remote (13,058.3 eV) at a resolution range of 50-2.5Å. Only the peak dataset was used to identify selenium peaks and was indexed and scaled. was composed of a 4-helix bundle refined to 2.1Å with an Rfree value of 25% and no Ramachandran outlier (Table 10, Figure 13). PyMol (DeLano, 2002) was used for generating all the structural. maps. Random assignment of 10% of the reflections to the Rfree set was used for cross-validation. Further model building and refinement was done manually by iterative X,Y,Z coordinate and isotropic