Báo cáo khoa học: In silico analysis of the adenylation domains of the freestanding enzymes belonging to the eucaryotic nonribosomal peptide synthetase-like family pot

In silico analysis of the adenylation domains of the freestanding enzymes belonging to the eucaryotic nonribosomal peptide synthetase-like family Leonardo Di Vincenzo1, Ingeborg Grgurina1 and Stefano Pascarella1,2 ` Dipartimento di Scienze Biochimiche ‘A Rossi Fanelli’, Universita di Roma ‘La Sapienza’, Roma, Italy ` Centro Interdipartimentale di Ricerca per l¢ Analisi dei Modelli e dell’Informazione nei Sistemi Biomedici (CISB), Universita di Roma ‘La Sapienza’, Roma, Italy Keywords nonribosomal peptide synthetase; homology modelling; docking; specifity conferring code; freestanding NRPSs Correspondence S Pascarella, Dipartimento di Scienze ` Biochimiche ‘A Rossi Fanelli’, Universita di Roma ‘La Sapienza’, 00185 Roma, Italy Fax: +39 06 49917566 Tel: +39 06 49917574 E-mail: Stefano.Pascarella@uniroma1.it Website: http://w3.uniroma1.it/bio_chem/ homein.html (Received August 2004, revised 30 November 2004, accepted December 2004) This work presents a computational analysis of the molecular characteristics shared by the adenylation domains from traditional nonribosomal peptide synthetases (NRPSs) and the group of the freestanding homologous enzymes: a-aminoadipate semialdehyde dehydrogenase, a-aminoadipate reductase and the protein Ebony The results of systematic sequence comparisons allow us to conclude that a specificity-conferring code, similar to that described for the NRPSs, can be recognized in such enzymes The structural and functional roles of the residues involved in the substrate selection and binding are proposed through the analysis of the predicted interactions of the model active sites and their respective substrates The indications deriving from this study can be useful for the programming of experiments aimed at a better characterization and at the engineering of this emerging group of single NRPS modules that are responsible for amino acid selection, activation and modification in the absence of other NRPS assembly line components doi:10.1111/j.1742-4658.2004.04522.x Nonribosomal peptide synthetases (NRPSs) are multidomain, multifunctional enzymes involved in the biosynthesis of many bioactive microbial peptides [1,2] This class of natural products includes a variety of compounds with interesting biological activities (phytotoxins, siderophores, biosurfactants, and antiviral agents), as well as several clinically valuable drugs [3,4] NRPSs are organized in iterative modules, one for each amino acid to be built into the peptide product The minimal module required for a single monomer addition consists of a condensation domain (C), an adenylation domain (A) and a peptidyl carrier protein (PCP) also denoted as thiolation (T) domain The A domain is involved in the selection and activation of the amino acid substrate, which is then covalently attached to the enzyme via a thioester bond with the phosphopantetheine residue of the T domain C domains are localized between every consecutive pair of A domains and PCPs and catalyze the formation of the peptide bond between the upstream amino acyl or peptidyl moiety tethered to the phosphopantetheinyl group and the free amino group of the downstream aminoacyl moiety, thus facilitating the translocation of the growing chain onto the next module The structural diversity of NRPS products is enriched through the occasional presence of epimerization (E), Abbreviations A domain, adenylation domain; AASDH, a-aminoadipate semialdehyde dehydrogenase; ACV, synthetase [L-d-(a-aminoadipoyl)-L-cysteine-Dvaline] synthetase; AS, putative amine-selecting domain; GrsA, gramicidin S synthetase A; HMM, hidden Markov models; NRPS, nonribosomal peptide synthetase; PQQ, pyrroloquinoline quinone; T domain, thiolation domain; RMSD, root mean square deviation FEBS Journal 272 (2005) 929–941 ª 2005 FEBS 929 Adenylation domains of freestanding NRPS enzymes cyclization (Cy), N-Methylation (N-Met) and oxidation (Ox) domains [1] Intense research work carried out in the last decade led to the characterization of a number of new gene clusters and to the discovery of nonclassical NRPS systems [2,5] The crystallographic structure of three members of the adenylate-forming enzyme family, firefly luciferase of Photinus pyralis [6], the A domain of the gramicidin S synthetase A (GrsA) from Bacillus brevis [7] and, recently, DhbE (2,3-dihydroxy-benzoate activating module) [8], have been solved Likewise, the structure of VibH, representative of C domains, is now available [9] The wealth of sequence and structure information pertaining to the A domains has been exploited to understand the molecular bases of their substrate specificity [10,11] Systematic comparative analyses identified 10 sequence positions lining the active site pocket that are responsible for substrate recognition and selection The nature of the residues at such positions was correlated with the known substrates and a specificity-conferring code was proposed also with predictive potential [10] Recently, it was pointed out that modules composed of an adenylation and a thiolation domain, followed by a domain having a redox function and not inserted in the context of a typical NRPS cluster, can be found in eucaryotes [12] Indeed, a-aminoadipate semialdehyde dehydrogenase (AASDH) and a-aminoadipate reductase (Lys2), enzymes involved in lysine metabolism in eucaryotes, display a 3-domain architecture where the two N-terminal domains are homologous to the A and T domains from NRPS systems and the C-terminal part contains a redox cofactor binding site for either pyrroloquinoline quinone (PQQ) or NADPH In particular, AASDH, containing a PQQ binding domain, is supposed to be involved in lysine degradation and to convert the a-aminoadipate semialdehyde to a-aminoadipate [12] Lys2, possessing a NADPH-binding domain, is involved in lysine biosynthesis; it converts the a-aminoadipate to a-aminoadipate semialdehyde [13,14] Furthermore, the protein Ebony, an enzyme from Drosophila melanogaster involved in conjugation of b-alanine to histamine and sharing homology to NRPS domains A and T, was recently characterized [15] The occurrence of gene assets, typically encountered in the microbial world, in evolutionarily higher organisms is intriguing It appears worthwhile to carry out a deeper investigation on the extent of similarity between the A domains of the aminoacyl adenylate-forming enzymes of the freestanding enzymes and those of the traditional NRPS systems In particular, how many sequences of freestanding A domains are known and 930 L Di Vincenzo et al which are the evolutionary relationships to the NRPSs? Can the nonribosomal code of the traditional NRPS systems be applied in the freestanding A domains and, if so, what is the potential role of the residues involved? To address these issues, systematic sequence comparisons, homology modelling and docking simulations were employed to predict the structure of the active site of such enzymes and to propose functional roles for the conserved residues Results and Discussion Databank searches and sequence comparison The available sequences of the freestanding NRPS modules from eucaryotic organisms were collected by means of exhaustive databank searches The psi-blast [16] suite was applied over the NR and UniProt databanks Query sequences were Ebony from Drosophila melanogaster, AASDH from Mus musculus and Lys2 from yeast Each sequence is representative of a domain pattern: A-T-AS (AS stands for putative amine-selecting domain [15]), A-T-PQQ and A-T-NADPH, respectively Only the A and T domains were included in the query sequence Table reports the homologous sequences collected by these databank searches including 10 sequences from genes coding for putative A domains not yet annotated in the protein databanks which were predicted through genome scans Overall, 39 sequences were identified from different eucaryotic species and the domain assignments were confirmed by CDD [17] and Pfam [18] queries The sequence subset formed by the A-T domains was aligned utilizing the hmmer package [19] A set of 62 sequences, corresponding to the domain A of microbial NRPSs, were extracted from the seed alignment of the Pfam AMP-binding family (Pfam code: PF00501) The adjacent T domain was subsequently added to each sequence The extended sequences were aligned with clustalw [20] and the final alignment was manually refined to match functionally important residues such as the Asp235 that binds the a-amino group of the substrate [11] and Ser573 site of the phosphopantetheine attachment The resulting alignment was finally utilized to train the HMM The resulting HMM was used to align a subset of the A-T domains listed in Table The alignment was manually refined and used in turn to train the final HMM, now specific for the eucaryotic A-T domains, to carry out the alignment of all the 39 sequences (Fig 1) On the basis of the structural equivalencies contained in this multiple alignment, the occurrence of a specificity-conferring code similar to that described for FEBS Journal 272 (2005) 929–941 ª 2005 FEBS L Di Vincenzo et al Adenylation domains of freestanding NRPS enzymes Table List of freestanding and NRPS-like enzymes retrieved from databanks All accession numbers refer to UniProt database except where noted Boldface names denote in silico predicted proteins not included in databanks A stands for adenylation domain, T for thiolation, C for condensation, AS for amine-selecting, PQQ for PQQ-binding domain, NADPH for NADPH-binding domain, X for other domains not commonly present in NRPSs Numeric subscript to parentheses indicate the repetition of those modules MNRPS stands for monomodular NRPS Question marks denote unassigned function Every PSI-BLAST search was performed with three iterations using as a probe the sequence of GrsA (P14687) Source Domains Protein length Putative function BLAST Accession number P14687 P07702 EAA62703a Q75BB3 Q6SV64 EAA73900a S_MIKATAEc,f S_CASTELLIIc,g Q8NJ21 Q9P3Y3 P40976 Q12572 O74298 Q9P3Q7 Q9HDP9 Q873Z1 Q8J0L6 Q8NJX1 O_SATIVA1d,h O_SATIVA2d,i O_SATIVA3d,j Q8L5Z8 Q95Q02 Q9XUJ4 Q17301 ENSCBRP00000001007b O76858 Q9VLL0 Q7QKF0 Q7Q0Y5 Q80WC9 ENSRNOP00000002907b Q7Z5Y3 D_RERIOe,k F_RUBRIPESe,l C_INTESTINALISe,m C_SAVIGNYe,n P_TROGLODITESe,o ENSGALP00000022317b Bacillus brevis Saccharomyces cerevisiae Aspergillus nidulans Eremothecium gossypii Cryptococcus neoformans Gibberella zeae Saccharomyces mikatae Saccharomyces castellii Kluyveromyces lactis Pichia farinosa Schizosaccharomyces pombe Candida albicans Penicillium chrysogenum Neurospora crassa Acremonium chrysogenum Leptosphaeria maculans Claviceps purpurea Hypocrea virens Oryza sativa Oryza sativa Oryza sativa Arabidopsis thaliana Caenorhabditis elegans Caenorhabditis elegans Caenorhabditis briggsae Caenorhabditis briggsae Drosophila melanogaster Drosophila melanogaster Anopheles gambiae Anopheles gambiae Mus musculus Rattus norvegicus Homo sapiens Danio rerio Takifugu rubripes Ciona intestinalis Ciona savigny Pan troglodites Gallus gallus A-T-C A-T-NADP A-T-NADP A-T-NADP A-T-NADP A-T-NADP A-T-NADP A-T-NADP A-T-NADP A-T-NADP A-T-NADP A-T-NADP A-T-NADP A-T-NADP A-T-NADP A-T-NADP-NADP A-T-C X-(C-A-T)18-NADP A-T-C A-T-C A-T-PQQ A-T-PQQ C-A-T-C-T-C-A-T A-T-PQQ X-X-X-.-C-A-T A-T-PQQ A-T-AS A-T-PQQ A-T-AS A-T-PQQ A-T-PQQ A-T-PQQ A-T-PQQ A-T-PQQ A-T-PQQ A-T-PQQ A-T-PQQ A-T-PQQ A-T-PQQ 1098 1392 1421 1385 1359 1042 1384 1386 1384 1398 1419 1391 1409 1174 1196 1282 1308 20925 1225 995 1285 1040 2870 707 4767 866 879 1012 881 824 1100 1152 998 1003 1088 1074 1167 1343 1125 NRPS Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 MNRPS MNRPS NRPS NRPS NRPS ? ? NRPS AASDH NRPS AASDH Ebony AASDH Ebony AASDH AASDH AASDH AASDH AASDH AASDH AASDH AASDH AASDH AASDH e-145 e-135 e-126 e-135 e-105 0.0 0.0 e-141 e-143 e-138 e-134 e-133 e-133 e-129 e-118 0.0 0.0 5e-41 3e-33 2e-51 3e-84 3e-83 4e-59 3e-66 4.8e-28 e-114 1e-57 3e-94 8e-55 1e-85 2e-78 9e-73 4.6e-131 3.1e-128 1e-75 1e-74 7.4e-221 1e-76 E-value a EMBLCDS entry name and b EnsEMBL peptide databank Boldface names denote in silico predicted proteins not included in databanks denote that the TBLASTN searches against genomes used as input query sequences, P07702, Q8L5Z8, Q80WC9, respectively The genes were predicted from the nucleotide sequences: f EMBL accession number AABZ01000259 (positions coding for the protein: 3300–7700), g EMBL AACF01000123 (15700–20800), h EMBL AAAA01021459 (854–2040), i EMBL AAAA01023971 (610–2070), j EMBL AAAA01000789 (18780–31200), k EnsEMBL ctg11952 (800001–1000000), l EnsEMBL Chr_scaffold_632 (38782–48782), m EMBL AABS01000029, n EMBL AACT01000010, o EnsEMBL scaffold_37623 (4535897–4735897) c,d,e the NRPS systems [10] was tested Substrate specificities were assigned either on the basis of literature data or by use of the NRPS prediction server [11] The FEBS Journal 272 (2005) 929–941 ª 2005 FEBS sequence positions equivalent, in the multiple sequence alignment, to those involved in the described nonribosomal specificity code [10] are reported in 931 Adenylation domains of freestanding NRPS enzymes L Di Vincenzo et al Fig Multiple sequence alignment of adenylation domains Only conserved portions from the multiple sequence alignment obtained as described in Results are shown Dashes represent insertion and deletion Numbers above the sequences refer to the sequence numbering of the gramicidin synthetase; is used as block separator The sequence positions equivalent to those involved in the nonribosomal specificity-conferring code described for the A domain of the gramicidin synthetase are marked with blue triangles The positions of the core motifs are marked underneath with grey bars labelled according to [1] Secondary structure assignments are shown for GrsA: a-helices and b-strands are rendered as squiggles and arrows, respectively; T stands for turn; blank for coil and irregular conformations; dots represent gaps introduced in the alignment Identically conserved residues are displayed as white characters on red background Conserved regions are denoted by boxed red characters 932 FEBS Journal 272 (2005) 929–941 ª 2005 FEBS L Di Vincenzo et al Adenylation domains of freestanding NRPS enzymes Fig (Continued) FEBS Journal 272 (2005) 929–941 ª 2005 FEBS 933 Adenylation domains of freestanding NRPS enzymes L Di Vincenzo et al Table Nonribosomal specifity-conferring code in the freestanding enzymes All accession numbers refer to UniProt database except those noted Boldface codes denote in silico predicted protein not included in databanks MNRPS is monomodular NRPS; ACV stands for (L-d-(a-aminoadipoyl)-cysteine-D-valine) tripeptide synthetase Question marks denote unassigned function or substrate L-a-Aa stands for L-a-aminoadipate, L-a-Aas stands for L-a-aminoadipate semialdehyde, b-Ala for b-alanine, Hty for hydroxyl tyrosine, the other three letters code stand for standard amino acid abbreviations Residue position according to GrsA A domain numbering Protein ID Source Function Activated substrate P14687 Q873Z1 Q8J0L6 Q8NJX1 O_SATIVA1 O_SATIVA2 O_SATIVA3 Q8L5Z8 Q95Q02 Q95Q02.2 Q17301 O76858 Q7QKF0 P07702 EAA62703a Q75BB3 Q6SV64 EAA73900a S_MIKATAE S_CASTELLII Q8NJ21 Q9P3Y3 P40976 Q12572 O74298 Q9P3Q7 Q9HDP9 Q9XUJ4 ENSCBRP00000001007b Q9VLL0 Q7Q0Y5 Q80WC9 ENSRNOP00000002907b Q7Z5Y3 F_RUBRIPES D_RERIO C_INTESTINALIS C_SAVIGNY ENSGALP00000022317b P_TROGLODITES P26046 M1c Bacillus brevis Leptosphaeria maculans Claviceps purpurea Hypocrea virens Oryza sativa Oryza sativa Oryza sativa Arabidopsis thaliana Caenorhabditis elegans Caenorhabditis elegans Caenorhabditis briggsae Drosophila melanogaster Anopheles gambiae Saccharomyces cerevisiae Aspergillus nidulans Eremothecium gossypii Cryptococcus neoformans Gibberella zeae Saccharomyces mikatae Saccharomyces castellii Kluyveromyces lactis Pichia farinosa Schizosaccharomyces pombe Candida albicans Penicillium chrysogenum Neurospora crassa Acremonium chrysogenum Caenorhabditis elegans Caenorhabditis briggsae Drosophila melanogaster Anopheles gambiae Mus musculus Rattus norvegicus Homo sapiens Takifugu rubripes Danio rerio Ciona intestinalis Ciona savigny Gallus gallus Pan troglodites Penicillium chrysogenum NRPS MNRPS MNRPS NRPS NRPS NRPS ? ? NRPS NRPS NRPS Ebony Ebony Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 Lys2 AASDH AASDH AASDH AASDH AASDH AASDH AASDH AASDH AASDH AASDH AASDH AASDH AASDH ACV Phe Thrd Leud Phed Thrd Leu ⁄ Ile ⁄ Vald ? ? Leud ? Htyd b-Ala b-Ala L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa L-a-Aa 235 236 239 278 299 301 322 330 331 D A V G V L H H I N V D D P P P P P P P P P P P P P P P P P P P P P P P P P P P P T L S M N L N L Y L F V V H H H H H H H H H H H H H H Q Q Q Q Q Q Q Q M Q F Q Q N I W V V I M I V Q V T V V F F F F F F F F F F F F F F L L L V A A A A A A V A L I A N G G G G S S G G G S S V V V V V V V V V V A V L G L A L L E Y I F F M M W M M M M M M M M M M L V V I I I I I V L I I L V E I I V N I I G G V A V G G R R H R R R R R R R R R R R C C C C S S C S C S S S C F C A D D D D D D D D D V V D D D D D D D D D D D D D D D D D D D D D D D D D D D E W W F G W F E E A L S A S R R R R R R R R R R R R R R V V V L V V V V V V V V V R V V V G G G G G G G G G V G G G V Y H F D D Y F W D D A A L A A A A A S S S A A A W W W W W W W W W W W W V a EMBLCDS entry name and b EnsEMBL peptide databank c Module of ACV synthetase is included for comparison with the code of Lys2 and AASDH d Predicted using the NRPS prediction BLAST server [11] Table The residues equivalent to those which in the GrsA were observed to interact with the a-amino and a-carboxyl groups of the amino acid substrate, Asp235 and Lys517, respectively [7], are conserved, the only 934 exceptions being the freestanding NRPS from Leptosphaeria maculans (UniProt accession no Q873Z1) (lacks the Asp235) and AASDH from Acremonium chrysogenum (UniProt accession no Q9HDP9) (lacks FEBS Journal 272 (2005) 929–941 ª 2005 FEBS L Di Vincenzo et al the Lys517), and the sequences of Ebony from Drosophila melanogaster and from Anopheles gambiae (UniProt accession no Q7QKF0) where the Asp235 is missing In this latter case, the absence of the Asp235 can be explained in the light of the model of the interaction of the substrate with the active site (vide infra) It should be noted that the specificity code for the A domains recognizing the substrate b-alanine (Table 2) is similar to that already predicted for the A module of exochelin synthetase from Mycobacterium smegmatis (UniProt accession no O87313) [11], the only differences being in Ebony, at the positions 239 (Ser vs Thr), 278 (Val vs Leu), 299 (Val vs Ile) and 322 (Phe vs Ser) The specificity codes of Lys2 and AASDH share the residues Asp235 and Pro236 The residue Pro236 seems to be specific for the aminoadipate substrates Indeed, the only other system in which it is present in the same position is the module of the chloroeremomycin synthetase (UniProt accession no O52821) from Amycolatopsis orientalis specific for 3,5hydroxy-l-phenylglycine [11] The specificity code of the module of ACV [l-d-(a-aminoadipoyl)-l-cysteine-d-valine] synthetase from Penicillium chrysogenum that activates the l-a-aminoadipate, displays strong similarities to the Lys2 code (Table 2) with the remarkable difference at position 235 where a Glu residue replaces the conserved Asp, and at position 330, where a Phe residue replaces the conserved Arg ⁄ His The marginal resemblance of the AASDH code to that of Lys2 and ACV module provides a structural basis for the current view that the physiological substrate of the dehydrogenase is l-a-aminoadipate semialdehyde rather than l-a-aminoadipate Traditional and freestanding A and T domains share also some conserved core motifs In particular, the core motifs A3 to A10 [1] are conserved in the eucaryotic NRPS-like domains while the motifs A1 and A2 are positioned in a nonconservative section of the alignment (not shown in the figure) However, A1 and A2 are away from the active site and probably only conserved in the NRPSs for structural reasons [1] Phylogenetic analysis To visualize evolutionary relationships among the freestanding NRPS A domains and the corresponding domains of the traditional NRPS in a phylogenetic tree, the A domains of 25 bacterial NRPS and the A domain of the ACV synthetase from Penicillium chrysogenum were added to the multiple sequence alignment shown in Fig The 25 bacterial NRPS sequences were selected taking one representative from each of the different substrate specificity FEBS Journal 272 (2005) 929–941 ª 2005 FEBS Adenylation domains of freestanding NRPS enzymes groups defined by Challis et al [11] to have a view of the substrate range utilized by these enzymes The phylogenetic tree shown in Fig 2A, was built from the portion of the multiple sequence alignment shown in Fig comprised between the positions 190–331, that contain the specificity code residues and the core motifs A3 to A5, using the neighborjoining method as implemented in the module neighbor of the phylip package [21] The tree accuracy was tested with 1000 bootstrap replicates On the basis of the assumption that the nine amino acids lining the binding pocket determine substrate specificity [11], we used maximum parsimony method implemented in the program protpars of the phylip package [21] to establish a relationship between these important residues and substrate specificities in the 65 A-domains considered, i.e 39 freestanding plus 26 NRPS A domains Therefore, the tree in Fig 2B was derived considering only nine sequence positions corresponding to the eight involved in the nonribosomal specificity code [11] and the Asp235 which was included because not always conserved On the contrary, Lys517 was not included because it was conserved in all cases considered The resulting tree obviously has no phylogenetic meaning The phylogenetic tree based on the positions 190–331 of the complete alignment revealed two clusters containing the a-aminoadipate reductase from fungi and the a-aminoadipate semialdehyde dehydrogenase from metazoa, with independent segregation from the other bacterial sequences This pattern parallels that observed in the specificity code tree reported in Fig 2B and confirms that Lys2 and AASDH recognize different substrates Another independent cluster in both trees is made by the two Ebony proteins (UniProt accession nos Q7QKF0 and O76858), domain A of exochelin synthetase from Mycobacterium smegmatis module (UniProt accession no O87313) and the two plant hypothetical NRPS-like proteins (UniProt accession no Q8L5Z8 and in silico predicted protein O_SATIVA3 in Table 1) This segregation could suggest that b-alanine or a very similar compound might be the substrate of the two plant proteins ACV synthetase module from Penicillium chrysogenum (UniProt accession no P26046) displays a substrate specificity identical to fungal Lys2 although its sequence is more similar to that of the metazoa AASDH Finally, it is interesting to observe the unexpected position in the trees of the protein sequences from Caenorhabditis elegans and Caenorhabditis briggsae (UniProt accession nos Q95Q02 and Q17301) These proteins, containing 2870 and 4767 residues, respectively, display a typical NRPS modular structure and, 935 Adenylation domains of freestanding NRPS enzymes A L Di Vincenzo et al B Fig Phylogenetic trees based on the multiple alignment of A domain sequences Metazoa, plants, fungi and bacteria are represented with red, green, brown and black colours, respectively All names and numbers used in the phylogenetic trees are defined in Table except for the following UniProt accession numbers: P35854, D-alanine activating enzyme, Lactobacillus casei; Q50857, saframycin Mx1 synthetase B., Myxococcus xanthus; O87313, FxbB, Mycobacterium smegmatis; O30409, tyrocidine synthetase 3, Brevibacillus brevis; Q9Z4X5, CDA peptide synthetase II, Streptomyces coelicolor; P19828, AngR protein, Listonella anguillarum; Q45295, LchAA protein, Bacillus licheniformis; P39845, putative fengycin synthetase, Bacillus subtilis; P45745, dhbF, Bacillus subtilis; O68008, bacitracin synthetase 3, Bacillus licheniformis; O68006, bacitracin synthetase 1, Bacillus licheniformis; O52819, PCZA363.3, Amycolatopsis orientalis; O68007, bacitracin synthetase 2, Bacillus licheniformis; O87606, peptide synthetase, Bacillus subtilis; Q9ZGA6, FK506 peptide synthetase, Streptomyces sp.; O07944, Pristinamycin I synthetase and 4, Streptomyces pristinaespiralis; P11454, enterobactin, Escherichia coli; O52820, PCZA363.4, Amycolatopsis orientalis; O52821, PCZA363.5, Amycolatopsis orientalis; P71717, phenyloxazoline synthetase MBTB, Mycobacterium tuberculosis; Q9Z4 · 6, CDA peptide synthetase I, Streptomyces coelicolor; Q50858, saframycin Mx1 synthetase A, Myxococcus xanthus; O69246, LchAB protein, Bacillus licheniformis; P26046, N-(5-amino-5-carboxypentanoyl)-L-cysteinyl-D-valine synthetase, Penicillium chrysogenum The ‘M’ followed by a number in bacterial NRPS refers to the module A used for building the trees Enzyme substrates are indicated at the end of the databank code with the standard one-letter code for amino acids or with the following abbreviations Aa: L-a-aminoadipate; Orn: L-ornithine; DHPG: 3,5-hydroxy-L-phenylglycine; PGly: L-phenylglycine; b-A: b-alanine; Aas: L-a-aminoadipate semialdehyde; 3hTyr: 3-hydroxyL-tyrosine; HPG: 4-hydroxy-L-phenylglycine; 3h4mF: 3-hydroxy-4-methyl-phenylalanine (A) Neighbor-joining phylogenetic tree based on the comparison of alignment positions 190–331 The numbers on the branches indicate the number of times the partition of the species into the two sets which are separated by that branch occurred among the 1000 bootstrap trees; (B) maximum parsimony tree calculated with the nine amino acid lining the substrate binding pocket of adenylation domains in the phylogenetic tree, are grouped with bacterial NRPSs Two sequences in the same species homologous to AASDH are observed to cluster, as expected, in the AASDH group (UniProt accession 936 no Q9XUJ4 and EnsEMBL accession no ENSCBRP00000001007) Evolutionary trace analysis [22] (results not shown) was also applied to confirm the presence of functionally FEBS Journal 272 (2005) 929–941 ª 2005 FEBS L Di Vincenzo et al important residues conserved at different levels of partition of the freestanding NRPS family This method exploits the information inherent in a family of homologous proteins by dividing it to maximize functional similarity within the groups and functional variation between the single groups The analysis was conducted using the TraceSuite II server (http://www.cryst bioc.cam.ac.uk/jiye/evoltrace/evoltrace.html) with the same multiple sequence alignment used for building the tree shown in Fig 2A The results showed that the core motifs A3 to A5 are conserved in almost all partitions and are characteristic of the NRPS A domains Furthermore, the variability of the residues of the specificity code confirms that they are group-specific except for the residues Asp235 and Pro236 that are shared by the two groups, AASDH and Lys2, which bind similar substrates (Table 2) Adenylation domains of freestanding NRPS enzymes The reliability of docking experiments using homology models built at a sequence identity to the template of 25–30%, as in the reported case, can be Modelling of active sites and docking studies Molecular modeling, manual and automated docking have been utilized to map the conserved residues onto a hypothetical active site structure, to understand the role of their conserved residues and predict their interaction with the substrates Figures 3, and report the model active sites of Ebony from Drosophila melanogaster, Lys2 from Saccharomyces cerevisiae and the AASDH from Homo sapiens, respectively Fig Model structure of the active site of Lys2 from Saccharomyces cerevisiae AMP molecule is shown as a stick model Carbon, oxygen, nitrogen and phosphorous atoms are displayed with green, red, blue, purple colors, respectively The two possible assets of the substrate L-a-aminoadipate (L-a-Aa) are superimposed and represented as sticks Carbon atoms are colored in two different way: cyan for L-a-Aa in which the d-carboxyl group forms an hydrogen bond with Lys517; green for L-a-Aa in which a-carboxyl forms a hydrogen bond with Lys517 The other atoms are colored as in AMP All the residues in the active site are rendered as CPK and colored in slate blue This figure was rendered using PYMOL [31] Fig Model structure of Ebony from Drosophila melanogaster Ebony model is represented in teal blue cartoons AMP molecule is rendered as a stick model The specifity code residues are shown as stick models with superimposed slate blue CPK models Carbon, oxygen, nitrogen and phosphorous atoms are displayed with green, red, blue, purple colors, respectively b-Alanine is represented as a stick model with grey carbon atoms Dashes indicate hydrogen bonds This figure was rendered using PYMOL [31] Fig Model structure of AASDH from Homo sapiens AASDH main chain is represented in teal blue cartoons; AMP is shown as stick model Carbon, oxygen, nitrogen and phosphorous atoms are displayed with violet, red, blue, purple colours, respectively L-a-Aminoadipate semialdehyde (L-a-Aas) is represented as stick and carbon atoms are green The specifity code residues are shown as sticks and CPK Sticks are colored as in AMP except for carbon atoms which are in grey, and CPK which are colored in blue marine This figure was rendered using PYMOL [31] FEBS Journal 272 (2005) 929–941 ª 2005 FEBS 937 Adenylation domains of freestanding NRPS enzymes questionable Indeed, the superposition of the three structures related to the freestanding A domains, namely GrsA, firefly luciferase and DhbE that share 16% sequence identity on average, shows that the average RMSD over the Ca of the entire structures is ˚ 2.6 A On the contrary, the average RMSD calculated ˚ over the Ca enclosed in a sphere of radius A centered at the GrsA residue Asp235 in the active site, is ˚ 0.95 A Indeed, the active sites of the enzymes tend to be structurally more conserved during evolution [23] Therefore the error affecting the active site is expected to be lower than that regarding the rest of the protein Consequently, the docking studies can still provide useful and testable indications In the active site of Ebony (Fig 3), two residues of the traditional nonribosomal code Asp235 and Pro236, are replaced by Val and Asp, respectively The aspartate in position 236 can form a hydrogen bond to the b-amino group of the b-alanine substrate, which interacts also via hydrogen bonds with Ser301 and Asp331 The other residues line the active site pocket A bulky aromatic residue (Phe322) serves as the floor of the active site pockets Apparently, the rearrangement of the side chains at the active site enabled the enzyme to recognize a substrate with a b-amino instead of a a-amino group Interestingly, substitution of Asp235 is indicative of the substrate structure For example, in the case of DhbE position 235 is occupied by Asn and the relative susbstrate lacks a a-amino group [8] It has been proposed, for Lys2 from S cerevisiae, that the l-a-aminoadipate substrate could be adenylated at the d-carboxylate rather than the a-carboxylate and that the a-amino and a-carboxyl groups of the substrate bind at the bottom of the pocket interacting with the Arg239 and Glu322 [14] Analogous arrangement was proposed also for the binding of l-a-aminoadipate to the adenylation domain of the ACV synthetase from Penicillium chrysogenum [7] The results of the docking experiments indicated (Fig 4) that the possible binding modes cluster into two solutions According to the first possibility, the substrate a-aminoadipate is bound to the active site with a salt bridge between the a-amino group and the a-carboxyl group of Asp235 and a hydrogen bond to the carbonyl group of Arg330 In yeast Lys2, the d-carboxylate group of the substrate forms a salt bridge with Arg239 Finally, the substrate a-carboxylate interacts via hydrogen bonds with the e-amino group of Lys517 The other residues of the putative specificity code line the walls of the active site In particular, the conserved Pro236 shapes the pocket to host the substrate An alternative inter938 L Di Vincenzo et al action way of binding of the substrate to the active site involves the formation of a salt bridge between the d-carboxylate group and the e-amino group of Lys517 and between the a-carboxylate and Arg239 The a-amino group interacts via hydrogen bonds with the carbonyl oxygens of Met322, Gly324 and Arg330 The first binding mode of the substrate (the a-carboxylate interacting with Asp235) is supported by the invariancy of Asp235 that usually stabilizes the a-amino group of the amino acid substrate The importance of Asp235 in Lys2 is evidenced also by mutational analysis which showed a complete loss of catalytic activity for the mutant Asp235fiAsn, while the mutant Asp235fiGlu retained only 4% of catalytic activity [24] Also, this binding mode is in line with the absence of a negatively charged side chain in the position 322 of the putative a-aminoadipate specificity code (Table 2) whose role is to stabilize the a-amino group of the substrate Such a residue (Glu322) is present in ACV synthetase However, most importantly, the same binding mode does not account for the experimental evidence of the existence of the a-aminoadipoyl-C6-AMP [13], which can be explained by the binding mode with the d-carboxylate in proximity of Asp235 The results of the docking studies of a-aminoadipate semialdehyde, assumed to be the substrate of AASDH [12] (Fig 5), show that the substrate can interact with the active site in only one orientation It involves the formation of a salt bridge between the a-amino group of the substrate and the carboxylic group of Asp235 and a hydrogen bond to the carbonyl atom of Ser330 The d-aldehyde group of the substrate interacts with Gln278 Finally, the substrate a-carboxylate, as expected, interacts via hydrogen bonds with the e-amino group of Lys517 in both enzymes Once again, this binding mode can explain the invariancy of Asp235 and this model can account for the lack of a negatively charged side chain at position 322 of the putative specificity code (Table 2) able to stabilize the a-amino group of the substrate which is instead present in ACV synthetase (Glu322) The results reported in this work demonstrate that a specificity-conferring code can be recognized also in the freestanding eucaryotic NRPS-like enzymes A role for some of the specificity residues could be predicted on the basis of in silico studies These indications can be useful for programming experiments aimed at a better characterization and at the engineering of this emerging group of single NRPS modules responsible for amino acid selection, activation and modification in the absence of other NRPS assembly line components FEBS Journal 272 (2005) 929–941 ª 2005 FEBS L Di Vincenzo et al Experimental procedures Databank searches, gene prediction, sequence comparisons and evolutionary analysis UniProt and NR databanks, available, respectively, at EBI (http://www.ebi.ac.uk) and NCBI (http://www.ncbi nlm.nih.gov/entrez) web sites, were searched with the psiblast [16] program Genomes were accessed through the EnsEMBL (http://www.ensembl.org), TIGR (http://www tigr.org) and NCBI portals The CDD [17] and Pfam [18] databanks were utilized as a reference for protein families and domains identification NRPS-like A domains not yet included in the protein databanks were searched through the genomes of Homo sapiens, Mus musculus, Rattus norvegicus, Danio rerio, Fugu rubripes, Ciona intestinalis, Drosophila melanogaster, Caenorhabditis elegans, Caenorhabditis briggsae, Saccharomyces cerevisiae, Saccharomyces castellii, Saccharomyces mikatae, Schizosaccharomyces pombe, Candida albicans, Ciona savigny, Gallus gallus, Pan troglodytes, Oryza sativa and Zea mays for which draft genomic sequences were available tblastn module of blast [16] package at NCBI, EnsEMBL, or TIGR was used to search putative NRPS domain-related genes on such genomes Gene predictions were subsequently validated and checked through the sequential application of several programs Indeed, the genomic sequences spotted by tblastn were further analyzed with the program genomescan [25]: the target DNA sequence plus 10 kilobases upstream and downstream were extracted to assure that the entire gene structure including promoter, encoding region, terminator, etc was taken The putative gene structures were further confirmed by EST (expressed sequence tags) clustering analysis, performed with estcluster software available in the GCG package (Wisconsin package, version 10.2, Genetics Computer Group, Madison, WI, USA) on the EST database of the respective organism Multiple sequence alignments were built with hmmalign program of hmmer 2.0 software package [19] and clustalw software [20] The resulting alignments were visually inspected and, when appropriate, manually adjusted The sequence alignments were displayed with the program espript [26] Phylogenetic trees relied on the modules protdist, neighbor, protpars and drawgam of the phylip package [21] Molecular modelling and docking studies The crystal structure of the adenylation domain of the gramicidin synthetase (GrsA, Protein Data Bank code 1AMU) was used as a template for the construction of homologyderived models of human a-aminoadipate semialdehyde dehydrogenase (AASDH, UniProt accession no Q7Z5Y3), FEBS Journal 272 (2005) 929–941 ª 2005 FEBS Adenylation domains of freestanding NRPS enzymes Ebony b-alanyl biogenic amine synthetase from D melanogaster (UniProt accession no O76858) and yeast a-aminoadipate reductase (Lys2, UniProt accession no P07702) GrsA shares 26% sequence identity with human AASDH, 27% with Ebony and 30% with Lys2 Two other potential templates, firefly luciferase (PDB code 1LCI) and DhbE (PDB code 1MD9) have lower percentages of sequence identity with the target sequence Indeed, the sequence identities between DhbE and AASDH, Ebony and Lys2 are 19, 23 and 21%, respectively, and they are even lower for firefly luciferase (11, 12 and 11%) Moreover, DhbE binds a substrate without an amino group and has a deletion of one of the residues of the specificity code corresponding to the Ile330 of the GrsA For all these reasons, GrsA was judged the best template and selected for modelling Homology modelling was based on the multiple sequence alignment obtained as described in the previous section Models were calculated with the modeller-4 package [27] Twenty different models were derived for each target protein using the highest built-in refinement procedure and the one displaying the lowest objective function, which measures the extent of violation of constraints from the templates, was taken as the representative model An AMP molecule, a buried water molecule and a magnesium ion from the crystal structure of GrsA were maintained in the homology models The stereochemical quality, the packing, and the solvent exposure of the resulting models were validated by procheck [28] and prosaii [29] analyses, respectively The models of the Lys2 substrate, l-a-aminoadipate, the Ebony substrate, b-alanine, and the AASDH putative substrate, l-a-aminoadipate semialdehyde, were built using the builder package in the program insightii (Version 2000, Accelrys, San Diego, CA, USA) and were positioned into the predicted enzyme active sites following the binding mode of the phenylalanine substrate in GrsA During the manual docking, steric clashes were removed and the position of the substrate at the active site was adjusted to establish stabilizing interactions between the atoms of the ligand and those constituting the active site To optimize the conformation of the active site side chains and the position of the substrate, the enzyme-substrate complex was further processed by energy minimization as implemented in discover 2.9 of the insightii package The cff91 forcefield, a distance-dependent dielec˚ tric constant and a cut-off distance of 28 A were used during each simulation An initial minimization was performed to relax the hydrogens added to the model Positions of the heavy atoms of the binary complex were fixed, and 100 steepest descent steps were performed, until the maximum ˚ energy derivative was less than 41.8 kJỈmol)1ỈA)1 Subsequently, while main chain atoms were maintained fixed, ˚ side chains of every residue contained in a sphere of A centered on the phosphate atom of the AMP were subjected to a gradually decreasing tethering force (from 4180 939 Adenylation domains of freestanding NRPS enzymes ˚ to 209 kJỈA)2) using steepest descents, until the maximum ˚ derivative was less than 4.18 kJỈmol)1ỈA)1 Finally, a side chain minimization including charges was performed for 100 steepest descent step, until the maximum energy deriv˚ ative was less than 0.42 kJỈmol)1ỈA)1 To verify and confirm the predicted binding mode of the substrate, the position of the substrate was again calculated with an automatic method which does not rely on the information contained in the template structure The autodock 3.0 suite [30] which exploits the Lamarckian genetic algorithm, was used The docking grid size was prepared with the autogrid utility of autodock setting to 90 · 90 · 90 ˚ points with a grid spacing of 0.200 A The grid center was placed in the active site pocket center The grid boxes included the entire binding site of the enzyme and provided enough space for the ligand translational and rotational walk For each of the three enzymes, AASDH, Lys2 and Ebony, 50 runs were performed and for each, a maximum number of 27 000 genetic algorithm operations were generated on a single population of 50 individuals The maximum number of energy evaluations was set to 250 000 Other parameters for the docking were: a random starting ˚ position and conformation, a maximal mutation of A in translation and 50 degrees in rotations, an elitism of 1, a mutation rate of 0.02, a crossover rate of 0.8 and a local search rate of 0.06 Simulations were ranked according to the docked energy between the protein and the ligand, a summation of internal ligand energy and intermolecular energy terms L Di Vincenzo et al 10 11 Acknowledgements This research was supported in part by a grant from ` the Italian Ministero dell’Istruzione, Universita e Ricerca (MIUR) and by PRIN-2002 grant to IG Part of this work will be submitted by LDV in partial fulfillment of the requirements of the degree of Dottorato ` di Ricerca at Universita di Roma ‘La Sapienza’ Authors are grateful to Daniele Tronelli for his skilful help and to Professor Francesco Bossa and to Professor Donatella Barra for their encouraging support and advice 12 13 14 15 References Marahiel MA, Stachelhaus T & Mootz HD (1997) Modular peptide synthetases involved in nonribosomal peptide synthesis Chem Rev 97, 2651–2673 Mootz HD, Schwarzer D & Marahiel MA (2002) Ways of assembling complex natural products on modular nonribosomal peptide synthetases Chembiochem 3, 490–504 Du L, Sanchez C, Chen M, Edwards DJ & Shen B (2000) The biosynthetic gene cluster for the antitumor drug bleomycin from Streptomyces verticillus 940 16 17 ATCC15003 supporting functional interactions between nonribosomal peptide synthetases and a polyketide synthase Chem Biol 7, 623–642 Walsh CT (2004) Polyketide and nonribosomal peptide antibiotics: modularity and versatility Science 303, 1805–1810 Guenzi E, Galli G, Grgurina I, Gross DC & Grandi G (1998) Characterization of the syringomycin synthetase gene cluster: a link between procaryotic and eucaryotic peptide synthetases J Biol Chem 273, 32857–32863 Conti E, Franks NP & Brick P (1996) Crystal structure of firefly luciferase throws light on a superfamily of adenylate-forming enzymes Structure 4, 287–298 Conti E, Stachelhaus T, Marahiel MA & Brick P (1997) Structural basis for the activation of phenylalanine in the non-ribosomal biosynthesis of gramicidin S EMBO J 14, 4174–4183 May JJ, Kessler N, Marahiel MA & Stubbs MT (2002) Crystal structure of DhbE, an archetype for aryl acid activating domains of modular nonribosomal peptide synthetases Proc Natl Acad Sci USA 99, 12120–12125 Keating TA, Marshall CG, Walsh CT & Keating AE (2002) The structure of VibH represents nonribosomal peptide synthetase condensation, cyclization and epimerization domains Nat Struct Biol 9, 522–526 Stachelhaus T, Mootz HD & Marahiel MA (1999) The specificity-conferring code of adenylation domains in nonribosomal peptide synthetases Chem Biol 6, 493–505 Challis GL, Ravel J & Townsend CA (2000) Predictive, structure-based model of amino acid recognition by nonribosomal peptide synthetase adenylation domains Chem Biol 7, 211–224 Kasahara T & Kato T (2003) A new redox-cofactor vitamin for mammals Nature 422, 832 Sinha AK & Bhattacharjee JK (1971) Lysine biosynthesis in Saccharomyces: conversion of a-aminoadipate into a-aminoadipic d-semialdehyde Biochem J 125, 743–749 Ehmann DE, Gehring AM & Walsh CT (1999) Lysine biosynthesis in Saccharomyces cerevisiae: mechanism of a-aminoadipate reductase (Lys2) involves posttranslational phosphopantetheinylation by Lys5 Biochemistry 38, 6171–6177 Richard A, Kemme T, Wagner S, Schwarzer D, Marahiel MA & Hovemann BT (2003) Ebony, a novel nonribosomal peptide synthetase for b-alanine conjugation with biogenic amines in Drosophila J Biol Chem 278, 41160–41166 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang ă Z, Miller W & Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res 25, 3389–3402 Marchler-Bauer A, Anderson JB, DeWeese-Scott C, Fedorova ND, Geer LY, He S, Hurwitz DI, Jackson FEBS Journal 272 (2005) 929–941 ª 2005 FEBS L Di Vincenzo et al 18 19 20 21 22 23 JD, Jacobs AR, Lanczycki CJ, Liebert CA, Liu C, Madej T, Marchler GH, Mazumder R, Nikolskaya AN, Panchenko AR, Rao BS, Shoemaker BA, Simonyan V, Song JS, Thiessen PA, Vasudevan S, Wang Y, Yamashita RA, Yin JJ & Bryant SH (2003) CDD: a curated Entrez database of conserved domain alignments Nucleic Acids Res 31, 383–387 Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C & Eddy SR (2004) The Pfam Protein Families Database Nucleic Acids Res 32, D138–D141 Eddy SR (1996) Hidden Markov Models Curr Opin Struct Biol 6, 361–365 Thompson JD, Higgins DG & Gibson TJ (1994) clustalw: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res 22, 4673–4680 Felsenstein J (1996) Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods Methods Enzymol 266, 418–427 Lichtarge O, Bourne HR & Cohen FE (1996) An evolutionary trace method defines binding surfaces common to protein families J Mol Biol 257, 342–358 Irving JA, Whisstock JC & Lesk A (2001) Protein structural alignments and functional genomics Proteins Struct Func Genet 42, 378–382 FEBS Journal 272 (2005) 929–941 ª 2005 FEBS Adenylation domains of freestanding NRPS enzymes 24 Guo S & Bhattacharjee JK (2003) Site-directed mutational analysis of the novel catalytic domains of a-aminoadipate reductase (Lys2p) from Candida albicans Mol Genet Genomics 269, 271–279 25 Yeh RF, Lim LP & Burge C (2001) Computational inference of homologous gene structures in the human genome Genome Res 11, 803–816 26 Gouet P, Courcelle E, Stuart DI & Metoz F (1999) ESPript: analysis of multiple sequence alignments in PostScript Bioinformatics 15, 305–308 ˇ 27 Sali A & Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints J Mol Biol 234, 779–815 28 Laskowski RA, MacArthur MW, Moss DS & Thornton JM (1993) procheck: a program to check the stereochemical quality of protein structures J Appl Cryst 26, 283–291 29 Sippl MJ (1993) Recognition of errors in three-dimensional structures of proteins Proteins Struct Func Genet 17, 355–362 30 Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, Belew RK & Olson AJ (1998) Automated docking using a Lamarckian genetic algorithm and empirical binding free energy function J Comp Chem 19, 1639– 1662 31 DeLano WL (2002) The PyMOL Molecular Graphics System DeLano Scientific, San Carlos, CA, USA 941 ... organisms is intriguing It appears worthwhile to carry out a deeper investigation on the extent of similarity between the A domains of the aminoacyl adenylate-forming enzymes of the freestanding enzymes. .. containing a PQQ binding domain, is supposed to be involved in lysine degradation and to convert the a-aminoadipate semialdehyde to a-aminoadipate [12] Lys2, possessing a NADPH-binding domain,... N-terminal domains are homologous to the A and T domains from NRPS systems and the C-terminal part contains a redox cofactor binding site for either pyrroloquinoline quinone (PQQ) or NADPH In particular,

Định dạng
Số trang	13
Dung lượng	1,35 MB