Báo cáo khoa học: Short-chain dehydrogenases/reductases (SDRs) Coenzyme-based functional assignments in completed genomes pptx

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	9
Dung lượng	441,42 KB

Nội dung

Short-chain dehydrogenases/reductases (SDRs) Coenzyme-based functional assignments in completed genomes Yvonne Kallberg 1,2 , Udo Oppermann 1 , Hans Jo¨ rnvall 1 and Bengt Persson 1,2 1 Department of Medical Biochemistry and Biophysics and 2 Stockholm Bioinformatics Centre, Karolinska Institutet, Sweden Short-chain dehydrogenases/reductases (SDRs) are enzymes of great functional diversity. Even at sequence identities of typically only 15–30%, specific sequence motifs are detect- able, reflecting common folding patterns. We have developed a functional assignment scheme based on these motifs and we find five families. Two of these families were known previously and are called ÔclassicalÕ and ÔextendedÕ families, but they are now distinguished at a further level based on coenzyme specificities. This analysis gives seven subfamilies of classical SDRs and three subfamilies of extended SDRs. We find that NADP(H) is the preferred coenzyme among most classical SDRs, while NAD(H) is that preferred among most extended SDRs. Three families are novel entities, denoted ÔintermediateÕ, ÔdivergentÕ and ÔcomplexÕ, encom- passing short-chain alcohol dehydrogenases, enoyl reductases and multifunctional enzymes, respectively. The assignment scheme was applied to the genomes of human, mouse, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana and Saccharomyces cerevisiae.Inthe animal genomes, the extended SDRs amount to around one quarter or less of the total number of SDRs, while in the A. thaliana and S. cerevisiae genomes, the extended members constitute about 40% of the SDR forms. The numbers of NAD(H)-dependent and NADP(H)-dependent SDRs are similar in human, mouse and plant, while the propor- tions of NAD(H)-dependent enzymes are much lower in fruit fly, worm and yeast. We show that, in spite of the great diversity of the SDR superfamily, the primary structure alone can be used for functional assignments and for predictions of coenzyme preference. Keywords: short-chain dehydrogenases/reductases; genome; coenzyme; sequence patterns; bioinformatics. Short-chain dehydrogenases/reductases (SDRs) are enzymes of  250 residue subunits catalysing NAD(P)(H)- dependent oxidation/reduction reactions. The concept of SDRs was established in 1981 [1], at a time when the only members known were a prokaryotic ribitol dehydrogenase and an insect alcohol dehydrogenase. Since then, the SDR family has grown enormously, both in the number of known members and the diversity of their functions. Already some years ago, over 1000 forms were ascribed to the SDR superfamily [2], and currently at least 3000 members, including species variants, are known with a substrate spectrum ranging from alcohols, sugars, steroids and aromatic compounds to xenobiotics. The N-terminal region binds the coenzymes NAD(H) or NADP(H), while the C-terminal region constitutes the substrate binding part. Although the residue identity is as low as 15–30%, the 3D folds are quite similar, except for the C-terminal regions. The SDRs have been divided into two large families, ÔclassicalÕ and ÔextendedÕ, with different Gly-motifs in the coenzyme-binding regions, and different chain lengths; around 250 residues in classical SDRs and 350 in extended SDRs [3]. Few residues are completely conserved, but several sequence motifs are distinguishable within the families. It is desirable to define distinct characteristics of these families for functional assignments of new sequences added to the SDR superfamily. We have now defined characteristic differences for all SDR types and distinguish five SDR families. Furthermore, seven subfamilies are delineated within the classical SDRs and three subfamilies within the extended SDRs. These characteristics can be used for functional predictions of further, novel structures, and the assignment system developed is now applied to the genomes of human, mouse, Drosphila melanogaster, Arabidopsis thaliana, Caenorhabditis elegans and Saccharomyces cerevisiae. MATERIALS AND METHODS We trained a Hidden Markov model [4] on a set of 95 SDR sequences extracted from SWISSPROT with less than 70% identity in pairwise comparisons, using a manually curated alignment based on human SDRs as seed sequences. The resulting Hidden Markov model was subsequently used to search the databases SWISSPROT [5] and KIND [6], selecting every sequence that had an expect value below 10 )15 as a candidate SDR. When these candidate sequences were aligned, they separated into five clusters (Fig. 1), two of which were the classical and extended families [3] and three were the specific families of insect alcohol dehydrogenase, enoyl reductase and multifunctional enzymes. These three novel families were named ÔintermediateÕ, ÔdivergentÕ and ÔcomplexÕ, respectively. The first level of assignments would then be to sort sequences into these five families using a motif-based approach. Correspondence to B. Persson, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, S-171 77 Stockholm, Sweden. Fax: + 46 8 337 462, Tel.: + 46 8 728 7730, E-mail: bengt.persson@mbb.ki.se Abbreviation: SDR, short-chain dehydrogenase/reductase. (Received 25 April 2002, revised 16 July 2002, accepted 24 July 2002) Eur. J. Biochem. 269, 4409–4417 (2002) Ó FEBS 2002 doi:10.1046/j.1432-1033.2002.03130.x Based on a nonredundant set (<80% identity; 100 classical, 80 extended, 7 intermediate, 12 divergent and 12 complex) of known SDR members in SWISSPROT ,we developed sequence motifs covering the most conserved parts of the sequences. Three sequence motifs were developed for each family (Fig. 2) to optimize specificity and sensitivity. Within each family, 40 of the most preserved amino acid residues in the alignment were selected. The amino acid types ÔacceptedÕ at a position were those observed together with those with similar amino acid properties, e.g. if Ile and Val are observed, then Leu is also accepted at that position. During an iterative process, an automated sorting procedure was developed. The sequences aligned were scored against the sequence motifs in the following manner. The presence of an accepted amino acid residue type at a motif position increases the sequence score with one point. If instead a gap is found at that position, the score is decreased by one point. A large region of the motifs cover the coenzyme-binding region. Other enzyme families that also bind NAD(P)(H) might be detected with this profile, and introduce false positives in our set. Thus, in order to separate SDRs from other enzymes, key positions in the classical and extended motifs were deduced from multiple sequence alignments. These key positions (bold in Fig. 2) render a score of +3 if present and a score of )3ifabsent. Thus, each sequence is associated with five different scores, one for each family. Incomplete sequences can pose a problem when using sequence-based methods, because such sequences might render a low score and thus be classified incorrectly. In this report, sequences with more than 20% gap positions in the alignment were removed from the data set and not subjected to the scoring process. The sorting procedure, with the groups and thresholds, is shown in Fig. 3. The thresholds were obtained through a systematic iterative procedure. The scores were used to sort the sequences into one of the five families. There are members of the SDR superfamily that do not meet any of the family requirements, i.e. the scores are below the thresholds. Rather than to lower the thresholds or to extend the motifs, such sequences are sorted into an artificial group called ÔunclassifiedÕ SDR. Another artificial group, Ôpoten- tialÕ SDR, is also used. It will consist of sequences that are not SDR members as far as can be judged today, but have some properties in common with the SDR family. For the structural comparisons, the 3D structures of members within the SDR superfamily were superimposed using ICM (version 2.7, Molsoft LLC, San Diego, CA, USA) [7]. RESULTS Five SDR families In order to get functional assignments for the members of the SDR superfamily, we developed an assignment system to distinguish families with specific characteristics. The SDR superfamily divides into five families (Fig. 1), of which two are the previously established classical and extended, and three are novel entities, denoted intermediate, divergent and complex. The classical family encompasses oxidoreductases (EC 1 ), such as steroid dehydrogenases and carbonyl reductases. The extended family consists of isomerases (EC 5 ), e.g. galactose epimerases, and lyases (EC 4 ), such as glucose dehydratases, but several oxidoreductases are also found within this family, e.g. in Fig. 2. Conserved sequence motifs in the SDR families as derived from a multiple sequence alignment. For each of the five SDR families, specific sequence patterns exist. Three motif segments, with a total of 40 preserved positions that cover the coenzyme-binding and active site regions, have been chosen for each family. Multiple amino acid occurrences at a position are written within brackets. ÔxÕ denotes any amino acid residue or gap, and when present the subsequent number indicates the number of x residues/gaps. Amino acid residues written in bold indicate positions of special importance in the classical and extended motifs. Because the motifs are based upon the sequence majority, insertions in single sequences do not affect the patterns. Fig. 1. The two levels of classification within the SDR superfamily. At the first level, the members of the SDR superfamily are separated into five families. At the second level, members of the classical and extended families are separated into seven and three subfamilies, respectively, based upon coenzyme-binding residue patterns. 4410 Y. Kallberg et al. (Eur. J. Biochem. 269) Ó FEBS 2002 multifunctional enzymes such as the 3b-hydroxysteroid dehydrogenase/D 4,5 isomerase cluster. The intermediate family exhibits an atypical Gly-motif (G/AxxGxxG/A) that resembles patterns of extended SDRs, except that Ala is highly represented instead of Gly. However, the remaining parts of the sequences are more closely related to the classical SDRs, e.g. with an NGAG motif (corresponding to the NNAG motif in b4, Table 4), and with a subunit size ( 250 residues) as the classical SDRs. In this family, thus denoted intermediate, we find fruit fly alcohol dehydrogenases, constituting a set of SDRs that divides into three lines with around 35% sequence identity, in pair-wise comparisons, between them. The divergent family with enoyl reductases from bacteria and plants constitutes a set of NADH-dependent enzymes with three patterns that deviate from those typical of most SDRs. First, the Gly-motif is differently spaced with five residues instead of three between the first two glycine residues. Second, in bacteria the second and third glycines have been replaced with serine and alanine, i.e. the motif is GxxxxxSxA. Third, there is a methionine instead of a tyrosine in the active site motif, while the tyrosine is found three positions upchain, i.e. YxxMxxxK instead of YxxxK. The3DstructuresofFabIfromEscherichia coli (PDB code 1qsg) and Mycobacterium tuberculosis (1bvr) reveal that the tyrosine and lysine residues are close in space. They are located within an a-helix and the spacing between the two residues makes them face the same side with a similar distance between Tyr-O g and Lys-N f as for the classical SDRs, i.e. with a 1.3-A ˚ difference compared to the 3a,20b- hydroxysteroid dehydrogenase, and with spatial freedom for the lysine residue to move closer to the tyrosine residue. Thus, they can function the same way as when the residues are only three positions apart [8,9]. The complex family is named after its members, which are parts of multifunctional enzyme complexes present in all forms of life, e.g. fatty acid synthase. They are NADP(H)- binding proteins with the SDR region having a beta- ketoacyl reductive function. This group has the unique motif of YxxxN at the active site rather than the typical YxxxK. Using a Hidden Markov model, candidate SDR sequences were extracted from SWISSPROT and KIND .These sequences were aligned and sorted into the five families, classical, extended, intermediate, divergent and complex, using a motif-based approach (for details please see the Materials and methods section). The two databases show the same ratios for the different families (Table 1). The family of classical SDRs is the largest, capturing half of the sequences, while the family of extended SDRs is second in size with a quarter of the sequences. Even when the most divergent sequences have been assigned to families, there is still large sequence variation among the members of the classical and the extended SDRs. The sequence identity is as low as 8% (classical) and 10% (extended) in pair-wise comparisons (Table 1). Thus, these two families are subject to a further assignment procedure, at a second level, based upon coenzyme-specificity. Table 1. Number of SDR family members in the SWISSPROT and KIND databases. Family SWISSPROT KIND Group size Residue identity Group size Residue identity Classical 253 (50%) 8–99% 1512 (47%) 8–99% Extended 125 (25%) 10–99% 856 (27%) 6–99% Intermediate 62 (12%) 27–99% 158 (5%) 25–99% Divergent 16 (3%) 28–98% 53 (2%) 24–99% Complex 12 (2%) 20–74% 133 (4%) 15–99% unclassified 16 (3%) – 97 (3%) – potential 17 (3%) – 128 (4%) – partial 12 (2%) – 267 (8%) – Total 513 – 3204 – Fig. 3. Flow chart of the family assignment procedure. Each sequence is scored against the five different family motifs. Depending on these scores, the sequences are sorted into seven groups – the five families and two ÔartificialÕ groups. The conditions for each selection are given within boxes. Ó FEBS 2002 Coenzyme-based functional assignments of SDRs (Eur. J. Biochem. 269) 4411 Coenzyme-based subfamily assignments The coenzyme-binding residues were used in the subfamily assignments. A bab-fold, part of the Rossmann fold [10], has been found to be in common in enzymes that bind NAD(H), NADP(H) or FAD [11]. An acidic residue is often present at the C-terminal end of the second b-strand in enzymes that are NAD(H)-binding [12]. This residue forms hydrogen bonds to the 2¢-and3¢-hydroxyl groups of the adenine ribose moiety. NADP(H)-preferring enzymes have instead two basic residues (Arg or Lys) that bind to the 2¢-phosphate [cf 13]. The first of these basic residues is found in the Gly-motif, immediately preceding the second glycine. The second basic residue is positioned directly after the crucial acidic residue of NAD(H)-preferring enzymes, i.e. at the first loop position after the second b-strand. The pattern of charged residues was used to distinguish subfamilies within the classical and extended SDR families. Subfamilies within the classical SDR family We superimposed experimentally solved 3D structures of classical SDRs, and compared residues within 4 A ˚ of the coenzyme. NAD(H)-preferring enzymes (3a,20b-hydroxysteroid dehydrogenase, 7a-hydroxysteroid dehydrogenase, 2,3-dihydroxybiphenyl dehydrogenase, 2,3-butanediol dehydrogenase, 3-hydroxyacyl-CoA dehydrogenase type 2 and dihydropteridine reductase; PDB codes 2hsd, 1ahh, 1bdb, 1geg, 1e3w and 1dhr), have an acidic residue present at the end of the second b-strand(keyposition36inTable2). Presence of the Asp residue at this position alone seems to determine the preference of NAD(H) over NADP(H), as neither a basic residue adjacent to this acidic residue (1bdb), nor a basic residue in the Gly-motif (2hsd) alters the coenzyme preference. NADP(H)-binding enzymes seem to be less strict in their requirement for two basic residues. Three structures (carbonyl reductase, troponine reductase II and sepiapterin reductase; PDB codes 1cyd, 2ae2, and 1oaa) have both these residues (key positions 15 and 37 in Table 2), while trihydroxynaphthalene reductase (1ybv) and 3-oxoacyl reductase (1edo) have only the first, and 17b-hydroxysteroid dehydrogenase type 1 (1fdu) has only the second basic residue. Because only few structures are experimentally solved, we created an alignment including all classical SDRs with coenzyme specificity annotated in SWISSPROT . The sequences were aligned using a Hidden Markov model trained on sequences from the classical family only, to avoid artefacts due to the great diversity of the SDR superfamily. We found that the correlations between patterns of charged residues and coenzyme specificity are generally applicable. Sequence motifs based upon the patterns of charged residues were developed and used to sort the classical SDRs into four subfamilies of NAD(H)-binding proteins (Fig. 1). These subfamilies were denoted cD1d, cD1e, cD2 and cD3. Sequences that bind NAD(H) and have a negatively charged amino acid residue present at the end of the second b-strand (key position 36, Table 2) are sorted into subfamily cD1d if this charged residue is aspartic acid or subfamily cD1e if it is glutamic acid. Sequences that instead have a negatively charged residue at the first or second position after the second b-strand (key positions 37 or 38, Table 2) are sorted into subfamily cD2 or cD3, respectively. The NADP(H)-binding proteins are sorted into three subfamilies. Sequences with a basic residue in the Gly-motif (key position 15, Table 2) are sorted into subfamily cP1, while those with a basic residue at the first position after the second b-strand (key position 37, Table 2) are sorted into subfamily cP2. The cP3 subfamily is formed from sequences that have basic residues at both these positions. The new sorting process was applied to every classical SDR sequence in SWISSPROT and KIND , giving the distribution of subfamilies shown in Table 2. NADP(H)-binding is twice as frequent as NAD(H)-binding ( 60% vs. 30%), indicating that there are more forms catalysing the reductive reactions than the oxidative reactions. Only about 10% of the sequences do not have any of the typical patterns and thus cannot be classified. For all but six of the 218 assigned classical SDRs, the coenzyme specificity is correctly predicted, as judged by agreements with the annotations in the SWISSPROT database entries. Scrutinizing the six deviating cases, we find that in four (Dhb1_Human, Dhb7_Mouse, Dhpr_Rat and Idno_Ecoli) there are experimental studies [14–17] that support our predictions. The remaining two cases are sequences involved in fatty acid biosynthesis (Fabg_Thema and Fag2_Syny3). They are annotated as NADPH-binding in SWISSPROT , and other proteins of the same functional type indeed use NADPH as coenzyme. However, in contrast to them, these two sequences have an aspartic acid at the last Table 2. Number of classical SDRs within the SWISSPROT and KIND databases, divided into different coenzyme-binding subfamilies. Key position numbers refer to 3a,20b-hydroxysteroid dehydrogenase (PDB code 2hsd). Subfamily Key positions SWISSPROT KIND 15 36 37 38 cD1d D 64 (25%) 389 (26%) cD1e E 2 (1%) 16 (1%) cD2 D/E 2 (1%) 6 (< 1%) cD3 D/E 8 (3%) 28 (2%) cP1 K/R 24 (10%) 120 (8%) cP2 K/R 41 (16%) 280 (19%) cP3 K/R K/R 77 (30%) 530 (35%) Unclassified 35 (14%) 143 (9%) Total 253 1512 4412 Y. Kallberg et al. (Eur. J. Biochem. 269) Ó FEBS 2002 position of the second b-strand and are thus predicted to be NAD(H)-binding by our method (subfamily cD1d). It is still not experimentally verified if these two sequences bind NADH or if they bind NADPH in an atypical manner. Subfamilies within the extended SDR family The number of experimentally solved 3D structures for the extended family is lower than for the classical family. At present, there are two known structures for NAD(H)- preferring enzymes (UDP-galactose 4-epimerase and dTDP-glucose 4,6-dehydratase; PDB codes 1ek6 and 1bxk). As for the NAD(H)-preferring enzymes of the classical type, those of the extended family also present the acidic residue (at key position 33, Table 3), and it is concluded to be the exclusive determinant of an NAD(H)-preferring enzyme. There are two structures of NADP(H)-preferring enzymes (GDP-fucose synthetase and ADP- L -glycero- D -mannoheptose 6-epimerase; PDB codes 1bsv and 1eq2). However, when superimposing these structures the root mean square deviation is 10 A ˚ ,and one of the main differences between the structures is in the coenzyme-binding region. The second structure (1eq2) is atypical of the family [18,19], as it prefers NADP(H) but still has the aspartic acid at the end of the second b-strand typical of NAD(H)-binding. Thus, the assignments of NADP(H)-preferring enzymes of the extended type is based on only the alignment of known annotated members of this type. In the alignment, we find that the basic residue present in the Gly-motif among the classical SDRs does not have a counterpart among the extended SDRs. The second basic residue, in the loop after the second b-strand, is conserved among extended SDRs as well (key position 34, Table 3). For the extended SDRs, two NAD(H)-binding subfamilies (eD1 and eD2) and one NADP(H)-binding subfamily (eP1) were defined based on the alignment. NAD(H)-binding sequences with an acidic residue at the end of the second b-strand (key position 33, Table 3) are sorted into the eD1 subfamily and those that have an acidic residue two positions downchain are sorted into the eD2 subfamily. The eP1 subfamily will consist of NADP(H)- bound sequences that have a basic residue at the first loop position after the second b-strand (key position 34, Table 3). Table 3 displays the results when this classification system is appliedtothe SWISSPROT and KIND databases. In contrast to the results for the classical SDRs, a majority of the extended SDRs are predicted to be NAD(H)-binding rather than NADP(H)-binding. The NAD(H)-binding enzymes are twice as many as the NADP(H)-binding ones, indicating that there are more dehydrogenases than reductases in the extended SDR family. Around 10% of the sequences lack charged residues at the deterministic positions. For all but eight of the 118 assigned extended SDRs, the predicted coenzyme specificities agree with those annotated in SWISSPROT . There are three ADP- L -glycero- D -mannoheptose 6-epimerases that are predicted to be NAD(H)-binding. The sequences harbour an aspartic acid residue at the NAD(H)-deterministic position, but these enzymes prefer NADP(H) rather than NAD(H). The structure of the E. coli enzyme (1eq2) shows that the Asp residue is in a more open conformation in contrast to other NAD(H)- preferring enzymes, and that therefore NADP(H) can be accommodated [18,19]. There are five other sequences where the predicted coenzyme preferences are in disagree- ment with the annotated preferences. One enzyme (galactose epimerase, Gale_Vibch) is predicted to prefer NADP(H), but as the galactose epimerases normally prefer NAD(H), the prediction is probably deceived by a mis- alignment due to a deletion of nine residues. Another NADP(H)-predicted sequence (Noel_Rhifr) is annotated as NAD(H)-preferring, but also as a mannose dehydratase, which in general prefer NADP(H) to NAD(H). There are no experimental data to support either alternative. The last three sequences are dTDP-4-dehydrorhamnose reductases (Rbd1_Ecoli, Rbd2_Ecoli and Rfbd_Salty) with around 80% pair-wise residue identity. They are predicted to be NAD(H)-preferring but are annotated to be NADP(H)- preferring. However, the enzyme from S. enterica (Rfbd_Salty) has been shown to have dual coenzyme specificity, with a slight preference for NADH [20]. Application to genome data We also applied our method to six of the genome databases available, i.e. human [21], mouse (July 2001; Celera Genomics, Rockville, MD), C. elegans [22], D. melanogaster [23], A. thaliana [24]; and S. cerevisiae [25].InFig.4,results of the assignments are displayed. The numbers of SDRs found are similar when comparing the human and mouse genomes. These genomes were released recently and cannot be considered to be complete. Thus, the number of SDRs in these genomes can be expected to increase [26]. For the human and mouse genomes, the distribution between classical (gray) and extended (white) families is similar to that in the general protein databases, where the extended members amount to around 25% or less of the total SDR number. However, in the S. cerevisiae and A. thaliana genomes about 40% of the SDR forms are Table 3. Number of extended SDRs, within the SWISSPROT and KIND databases, assigned into different coenzyme-binding subfamilies. Key positions numbers refer to UDP-galactose 4-epimerase (PDB code 1ek6). Subfamily Key positions 35 SWISSPROT KIND 33 34 eD1 D/E 79 (63%) 469 (55%) eD2 D/E 3 (2%) 9 (1%) eP1 K/R 36 (29%) 277 (32%) Unclassified 7 (6%) 101 (12%) Total 125 856 Ó FEBS 2002 Coenzyme-based functional assignments of SDRs (Eur. J. Biochem. 269) 4413 extended. Yeast has a much smaller genome than the others with only 19 SDRs in total, and the seven extended SDRs might reflect a critical minimum of extended SDRs [2]. In the plant (A. thaliana) genome the extended members are close to half of the total SDR forms, reflecting the different metabolic requirements in plants involving several carbo- hydrate rearrangements. The total number of SDR forms is greater in A. thaliana than in other species, compatible with the large number of gene duplications in plants [27]. However, the ratio between extended and classical forms is still the same when reducing the data set for homology at the 60% and 80% levels. The absolute numbers of extended SDRs are similar in the animal species (10–18). The number of classical SDRs is between 39 and 48 in human, mouse and fruit fly, while the worm has 72 classical SDRs. The worm shows a consid- erable gene duplication tendency [28], which if affecting classical and extended SDRs differently could explain this difference. Also shown in Fig. 4 are the results of the subfamily assignments within the classical and extended SDRs. The pie charts show the relative number of NAD(H)-preferring sequences (lined pattern) vs. NADP(H)-preferring sequences (solid) in each genome. The number of NAD(H)-dependent SDRs is close to the number of NADP(H)-dependent SDRs in human, mouse and A. thaliana. In contrast, the NAD(H)- dependent enzymes amount to only one quarter in fruit fly and one eighth in worm and yeast. The observation that classical SDRs most frequently utilize NADP(H) is remarkable. In the worm genome, 60 sequences are sorted into the NADP(H) classes, while only eight are sorted into NAD(H) classes. For extended SDRs, the observation that most of them in general are NAD(H)- dependent is not valid for fruit fly and yeast, where most extended SDRs instead bind NADP(H), and A. thaliana, where the numbers of NAD(H)- and NADP(H)-dependent forms are close to equal (34 vs. 27). DISCUSSION Database quality considerations Our method for functional assignments was applied to completed eukaryotic genomes, revealing that the SDR subfamily patterns vary considerably between different species. However, the genome databases are often preliminary and contain errors. Exons might be missing resulting in partial sequences. Falsely ascribed exon borders will result in sequences with erroneous deletions and/or insertions. A motif-based method, that is dependent on a correct alignment, is of course sensitive to these types of error. Still, bearing in mind that several genome sequences are preliminary, this type of classification is valuable to deduce early functional assignments. Automated annotation methods are developed to assign functions to newly sequenced proteins. A drawback with automated annotation is that errors might be introduced [29]. Manual annotation should be of higher quality but is very time-consuming, which leads to difficulties in keeping up the pace with the genome sequencing projects. In this study, we detected some errors in annotation of coenzyme specificity in SWISSPROT , a database that is manually annotated and thereby believed to be reliable. There were three different types of error between the keywords and the references in these database entries. First, the quoted publications reported different coenzyme specificities, but the keywords only mentioned one of them. Second, there were entries where the quoted publications stated one type of coenzyme while the keyword stated a different type. Third, there were entries where the keywords reported a coenzyme specificity without any verifying reference, and the keywords did not say ÔprobableÕ or Ôby similarityÕ,orany other word to inform about the uncertainty. Thus, it is still necessary to perform database assignment checks, and the present method is useful for this purpose, in addition to its value in primary assignments. Classical SDRs vs. extended SDRs The multiple sequence alignments of classical and extended SDRs (Fig. 5) show that even though these families are highly divergent, there are conserved regions that can serve as fingerprints in the identification of novel SDR members (Fig. 6). In these regions, used to identify classical and extended SDR family members (see Materials and methods), some motifs are of special interest. These are listed in Table 4. In the N-terminal region, we find the pattern of three glycine residues that is characteristic of NAD(P)(H)-binding enzymes. These residues are spaced differently in classical and extended SDRs (Table 4). Fig. 4. Classical and extended SDRs and their coenzyme preference shown for the genomes investigated. The pie charts display the pro- portions between classical (gray) and extended (white) SDRs with specificity for NAD(H) (lined pattern) and NADP(H) (solid), for each of the six genomes studied. The number of SDR enzymes with their coenzyme-specificity assigned is given within parentheses. 4414 Y. Kallberg et al. (Eur. J. Biochem. 269) Ó FEBS 2002 In both families there is a conserved aspartic acid residue, in the loop between b3anda3, required for stabilization of the adenine-binding pocket [13,30]. In the extended family this residue if often followed by another charged residue two positions downchain. The motif positioned in and adjacent to b4(Table4)is less conserved among extended SDRs compared to classical SDRs. Typically, extended SDRs prefer a histidine residue rather than an asparagine residue at the end of this b-strand. In classical SDRs, the NNAG motif has a role to stabilize the b-strands within the central b-sheetandtopositionthis central b-sheet [30]. There is a motif in a4 that is especially well conserved among the extended SDRs. The a4 motif is also conserved among the classical SDRs. Here, the asparagine residue is involved in building the active site geometry by positioning the lysine residue and being part of a postulated proton relay [30]. The active site residues in b5anda5 (serine, tyrosine and lysine) are found in both classical and extended SDRs. The extended SDRs have a conserved proline residue preceding the tyrosine residue, and also a conserved negatively charged residue four residues downchain of the lysine residue. Neither of these two residues are conserved in the Fig. 6. 3D structure of a classical SDR enzyme with motifs indicated. The spheres show the coenzyme-deterministic positions for NAD(H) in red and NADP(H) in blue. Regions used to identify SDR members (cf. Figure 2) are shown by blue ribbons. The coenzyme is coloured magenta. The structure is 3a,20b-hydroxysteroid dehydrogenase (PDB code 2hsd). The figure was made using the programme ICM . Fig. 5. Multiple sequence alignments of classical and extended SDRs. Thefirstthreecolumnsgivethe SWISSPROT sequence identifier, PDB identifier and subfamily membership. The secondary structure elements of 3a,20b-hydroxysteroid dehydrogenase (PDB code 2hsd) are shown above the classical SDR alignment, while the secondary structure elements of UDP-galactose 4-epimerase (PDB code 1ek6) is shown below the extended SDR alignment. Boxed residues denote key positions in coenzyme binding. Coloured residues represent conservation of 60%, as calculated for a larger data set (red ¼ acidic, green ¼ polar, light blue ¼ hydrophobic, dark blue ¼ basic, purple ¼ Gly or Pro). Arrows 1, 2 and 3 above the alignment show the key positions 15, 36 and 37 (cf. Table 2). Arrows 1, 2 and 3 below the alignment show the key positions 33, 34 and 35 (cf. Table 3). Ó FEBS 2002 Coenzyme-based functional assignments of SDRs (Eur. J. Biochem. 269) 4415 classical family, instead, they have a conserved aspartic acid residue about 13 positions downchain from the lysine residue. Coenzyme specificity as classification basis The two-level classification system divides members of the SDR superfamily into families and subfamilies, using a motif-based approach. For the five families detected at the first level – classical, extended, intermediate, divergent and complex – specific sequence patterns were extracted (Table 2). The patterns for families with few and/or closely related members (i.e. the intermediate, divergent and complex families) might be necessary to update when further members are added, to avoid a bias towards the presently known sequences. At the second level, the sequences belonging to the classical and extended families were further divided into seven and three subfamilies, respectively. These subfamilies were defined based on coenzyme specificity and patterns of charged residues in the coenzyme-binding region. The human 17b-hydroxysteroid dehydrogenase type 1 is an NADP(H)-preferring enzyme with a serine residue (Ser12) at the position before the second glycine residue of the glycine motif. There is an arginine residue (Arg37) at the first position after the second b-strand. Site-directed mutagenesis experiments show that an exchange of Ser12 to lysine increased the specificity for NADP(H), while a substitution of Leu36 to an aspartic acid changed the preference from NADP(H) to NAD(H) [34], supporting the crystallographic analysis and our motif-based assignments. The specificity might also depend on other factors than the sequence patterns defined thus far. Some enzymes show dual coenzyme specificity and might bind alternative coenzymes in different tissues and in different cellular compartments. Molecular modelling using docking calculations might be helpful in the prediction of coenzyme preference [35]. There are members of the classical type where no motifsfor coenzyme specificity were established, as no charged residues are found at the key positions otherwise identified as crucial for this task (Table 2). This is the situation for 11b-hydroxysteroid dehydrogenases type 2 and human 17b-hydroxysteroid dehydrogenase type 2. However, charged residuesare found further downchain, and their roles might be clarified when the 3D structures become known. The retinol dehydrogenases (RDH) constitute a group where experiments show that bovine RDH is NAD + -dependent [36], while the rat RDH is NADP + -dependent [37]. These two sequences are very similar in the Gly-region and identical at the positions used to distinguish between NAD(H) and NADP(H) enzymes. Based on homology modelling of rat and bovine RDH [38], a basic residue further downchain (Lys64) in rat RDH is believed to enable NADP + to bind. The corresponding residue in bovine is polar (Thr61). Only when their respective 3D structures have been experimentally determined, will it be possible to check which residues have shouldered the burden of separating between NAD(H) and NADP(H) specificity in these enzymes. In summary, we have shown that functional assignments can be made and coenzyme preferences can be predicted from the amino acid sequence alone for SDR enzymes. For this divergent superfamily, we could distinguish families and subfamilies, which will help future assignments. The present approach using hidden Markov models and sequence patterns is general and can be extended to further enzyme families. ACKNOWLEDGEMENTS Financial support from the Swedish Research Council, the Swedish Foundation for Strategic Research, the Swedish Society for Medical Research, the Swedish Society of Medicine, the Novo Nordisk Foundation and Karolinska Institutet is gratefully acknowledged. REFERENCES 1. Jo ¨ rnvall, H., Persson, M. & Jeffery, J. (1981) Alcohol and polyol dehydrogenases are both divided into two protein types, and structural properties cross-relate the different enzyme activities within each type. Proc. Natl Acad. Sci. USA 78, 4226– 4230. 2. Jo ¨ rnvall, H., Ho ¨ o ¨ g, J O. & Persson, B. (1999) SDR and MDR: completed genome sequences show these protein families to be large, of old origin, and of complex nature. FEBS Lett. 445,261– 264. 3. Jo ¨ rnvall, H., Persson, B., Krook, M., Atrian, S., Gonzalez- Duarte, R., Jeffery, J. & Ghosh, D. (1995) Short-chain dehydrogenases/reductases (SDR). Biochemistry 34, 6003–6013. 4. Karplus, K., Barrett, C. & Hughey, R. (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14, 846–856. 5. Bairoch, A. & Apweiler, R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48. Table 4. Conserved sequence motifs in the classical and the extended SDR families. In the motifs, ÔaÕ denotes an aromatic residue, ÔcÕ acharged residue, ÔhÕ a hydrophobic residue, ÔpÕ a polar residue and ÔxÕ any residue. Alternative amino acids at a motif position are given within brackets. Secondary structure element SDR motifs Suggested function Reference Classical Extended b1+a1 TGxxxGhG TGxxGhaG Structural role in coenzyme binding region [1,2,31] b3+a3 Dhx[cp] DhxD Adenine ring binding of coenzyme [30] b4 GxhDhhhNNAGh [DE]xhhHxAA Structural role in stabilizing central b-sheet [30] a4 hNhxG hNhhGTxxhhc Part of active site [30] b5 GxhhxhSSh hhhxSSxxhaG Part of active site [2,31] a5 Yx[AS][ST]K PYxx[AS]Kxxh[DE] Part of active site [2,31] b6 h[KR]h[NS]xhxPGxxxT h[KR]xxNGP Structural role, reaction direction [32,33] 4416 Y. Kallberg et al. (Eur. J. Biochem. 269) Ó FEBS 2002 6. Kallberg, Y. & Persson, B. (1999) KIND – a nonredundant protein database. Bioinformatics 15, 260–261. 7. Abagyan, R. & Totrov, M. (1994) Biased probability Monte Carlo conformational searches and electrostatic calculations for peptides and proteins. J. Mol. Biol. 235, 983–1002. 8. Stewart, M.J., Parikh, S., Xiao, G., Tonge, P.J. & Kisker, C. (1999) Structural basis and mechanism of enoyl reductase inhibi- tion by triclosan. J. Mol. Biol. 290, 859–865. 9. Rozwarski,D.A.,Vilcheze,C.,Sugantino,M.,Bittman,R.& Sacchettini, J.C. (1999) Crystal structure of the Mycobacterium tuberculosis enoyl-ACP reductase, InhA, in complex with NAD + and a C16 fatty acyl substrate. J. Biol. Chem. 274, 15582–15589. 10. Rossmann, M.G., Liljas, A., Bra ¨ nde ´ n, C I. & Banaszak, L.J. (1975) The Enzymes,3rdedn.(Boyer,P.D.,eds),Vol.11,pp.61– 102.AcademicPress,NewYork. 11. Wierenga, R.K., de Maeyer, M.C. & Hol, W.G. (1985) Interaction of pyrophosphate moieties with a-helices in dinucleotide binding proteins. Biochemistry 24, 1346–1357. 12. Wierenga, R.K., Terpstra, P. & Hol, W.G. (1986) Prediction of the occurrence of the ADP-binding beta alpha beta-fold in proteins, using an amino acid sequence fingerprint. J. Mol. Biol. 187, 101– 107. 13. Tanaka, N., Nonaka, T., Nakanishi, M., Deyashiki, Y., Hara, A. & Mitsui, Y. (1996) Crystal structure of the ternary complex of mouse lung carbonyl reductase at 1.8 A ˚ resolution: the structural origin of coenzyme specificity in the short-chain dehydrogenase/ reductase family. Structure 4, 33–45. 14. Breton, R., Housset, D., Mazza, C. & Fontecilla-Camps, J.C. (1996) The structure of a complex of human 17beta-hydroxysteroid dehydrogenase with estradiol and NADP + identifies two principal targets for the design of inhibitors. Structure 4, 905–915. 15. Nokelainen, P., Peltoketo, H., Vihko, R. & Vihko, P. (1998) Expression cloning of a novel estrogenic mouse 17 beta- hydroxysteroid dehydrogenase/17-ketosteroid reductase (m17HSD7), previously described as a prolactin receptor-associated protein (PRAP) in rat. Mol. Endocrinol. 12, 1048–1059. 16. Varughese, K.I., Skinner, M.M., Whiteley, J.M., Matthews, D.A. & Xuong, N.H. (1992) Crystal structure of rat liver dihydropteridine reductase. Proc. Natl Acad. Sci. USA. 89, 6080–6084. 17. Bausch, C., Peekhaus, N., Utz, C., Blais, T., Murray, E., Lowary, T. & Conway, T. (1998) Sequence analysis of the GntII (subsidiary) system for gluconate metabolism reveals a novel pathway for L -idonic acid catabolism in Escherichia coli. J. Bac- teriol. 180, 3704–3710. 18. Deacon, A.M., Ni, Y.S., Coleman, W.G. Jr & Ealick, S.E. (2000) The crystal structure of ADP- L -glycero- D -mannoheptose 6-epimerase: catalysis with a twist. Structure Fold. Des. 8, 453–462. 19. Ni,Y.,McPhie,P.,Deacon,A.,Ealick,S.&Coleman,W.G.Jr (2001) Evidence that NADP + is the physiological cofactor of ADP- L -glycero- D -mannoheptose 6-epimerase. J. Biol. Chem. 276, 27329–27334. 20. Graninger, M., Nidetzky, B., Heinrichs, D.E., Whitfield, C. & Messner, P. (1999) Characterization of dTDP-4-dehydro- rhamnose3,5-epimeraseanddTDP-4-dehydrorhamnosereductase, required for dTDP- L -rhamnose biosynthesis in Salmonella enterica serovar Typhimurium LT2. J. Biol. Chem. 274, 25069–25077. 21. Venter, J.C. et al. (2001) The sequence of the human genome. Science 291, 1304–1351. 22. Wilson, R.K. (1999) How the worm was won. The C. elegans genome sequencing project. Trends Genet. 15, 51–58. 23. Adams, M.D. et al. (2000) The genome sequence of Drosophila melanogaster. Science 287, 2185–2195. 24. Huala, E. et al. (2001) The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res. 29, 102–105. 25. Mewes, H.W. et al. (1997) Overview of the yeast genome. Nature 387, 7–65. 26. Kallberg, Y., Oppermann, U., Jo ¨ rnvall, H. & Persson, B. (2002) Short-chain dehydrogenase/reductase (SDR) relationships: a large family with eight clusters common to human, animal, and plant genomes. Protein Sci. 11, 636–641. 27. Bancroft, I. (2000) Insights into the structural and functional evolution of plant genomes afforded by the nucleotide sequences of chromosomes 2 and 4 of Arabidopsis thaliana. Yeast 17, 1–5. 28. Semple, C. & Wolfe, K.H. (1999) Gene duplication and gene conversion in the Caenorhabditis elegans genome. J. Mol. Evol. 48, 555–564. 29. Devos, D. & Valencia, A. (2001) Intrinsic errors in genome annotation. Trends Genet. 17, 429–431. 30. Filling, C., Berndt, K.D., Benach, J., Knapp, S., Prozorovski, T., Nordling, E., Ladenstein, R., Jo ¨ rnvall, H. &Oppermann, U. (2002) Critical residues for structure and catalysis in short-chain dehydrogenases/reductases (SDR). J. Biol. Chem. 277, 25677–25684. 31. Oppermann, U.C., Filling, C., Berndt, K.D., Persson, B., Benach, J., Ladenstein, R. & Jo ¨ rnvall, H. (1997) Active site directed mutagenesis of 3 beta/17 beta-hydroxysteroid dehydrogenase establishes differential effects on short-chain dehydrogenase/reductase reactions. Biochemistry 36, 34–40. 32. Filling, C., Nordling, E., Benach, J., Berndt, K.D., Ladenstein, R., Jo ¨ rnvall, H. & Oppermann, U. (2001) Structural role of conserved Asn179 in the short-chain dehydrogenase/reductase scaffold. Biochem. Biophys. Res. Commun. 289, 712–717. 33. Ghosh, D. & Vihko, P. (2001) Molecular mechanisms of estrogen recognition and 17-keto reduction by human 17beta- hydroxysteroid dehydrogenase 1. Chem. Biol. Interact. 130–132, 637–650. 34. Huang, Y.W., Pineau, I., Chang, H.J., Azzi, A., Bellemare, V., Laberge, S. & Lin, S.X. (2001) Critical residues for the specificity of cofactors and substrates in human estrogenic 17beta- hydroxysteroid dehydrogenase 1: variants designed from the three-dimensional structure of the enzyme. Mol. Endocrinol. 11, 2010–2020. 35. Peralba, J.M., Cederlund, E., Crosas, B., Moreno, A., Julia ` ,P., Martı ´ nez, S.E., Persson, B., Farre ´ s, J., Pare ´ s, X. & Jo ¨ rnvall, H. (1999) An NADP(H)-dependent stomach alcohol dehydrogenase. Structural and enzymatic properties of a gastric NADP(H)- dependent and retinal-active alcohol dehydrogenase. J. Biol. Chem. 274, 26021–26026. 36. Simon, A., Hellman, U., Wernstedt, C. & Eriksson, U. (1995) The retinal pigment epithelial-specific 11-cis retinol dehydrogenase belongs to the family of short chain alcohol dehydrogenases. J. Biol. Chem. 270, 1107–1112. 37. Chai, X., Boerman, M.H., Zhai, Y. & Napoli, J.L. (1995) Cloning of a cDNA for liver microsomal retinol dehydrogenase. A tissue- specific, short-chain alcohol dehydrogenase. J. Biol. Chem. 270, 3900–3904. 38. Tsigelny, I. & Baker, M.E. (1996) Structures important in NAD(P)(H) specificity for mammalian retinol and 11-cis-retinol dehydrogenases. Biochem. Biophys. Res. Commun. 226, 118–127. Ó FEBS 2002 Coenzyme-based functional assignments of SDRs (Eur. J. Biochem. 269) 4417 . Short-chain dehydrogenases/reductases (SDRs) Coenzyme-based functional assignments in completed genomes Yvonne Kallberg 1,2 ,. than NADP(H)-binding. The NAD(H)-binding enzymes are twice as many as the NADP(H)-binding ones, indicating that there are more dehydrogenases than reductases in the extended

Ngày đăng: 23/03/2014, 21:21

Xem thêm