Báo cáo khoa học: The astacin protein family in Caenorhabditis elegans docx

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	12
Dung lượng	466,74 KB

Nội dung

The astacin protein family in Caenorhabditis elegans Frank Mo¨ hrlen 1 , Harald Hutter 2 and Robert Zwilling 1 1 Institute of Zoology, University of Heidelberg; 2 Max-Planck-Institute for Medical Research, Heidelberg, Germany In the nematode Caenorhabditis elegans, 40 genes code for astacin-like proteins (nematode astacins, NAS). The astacins are metalloproteases present in bacteria, invertebrates and vertebrates and serve a variety of physiological functions like digestion, hatching, peptide processing, morphogenesis and pattern formation. With the exception of one distorted pseudogene, all the other C. elegans astacins are expressed and are evidently functional. For 13 genes we found splicing patterns differing from the Genefinder predictions in WormBase, sometimes markedly. The GFP expression pattern for NAS-4 shows a specific localization in anterior pharynx cells and in the whole digestive tract (as the secreted form). In contrast, NAS-7 is found in the head of adult hermaphrodites, but not in pharynx cells or in the lumen of the digestive tract. In embryos, NAS-7 fluorescence becomes detectable just before hatching. In C. elegans astacins, three basic structural and functional moieties can be discerned: a prepro portion, the central catalytic chain and long C-terminal extensions with presumably regulatory functions. Within the regulatory moiety, EFG-like, CUB, SXC, and TSP-1 domains can be distinguished. Based on structural differences of the regulatory unit we established six NAS subgroups, which seemingly represented different functional and evolutionary clusters. This pattern deduced exclusively from the domain arrangement in the regulatory moiety is perfectly reflected in an evolutionary tree constructed solely from amino acid sequence information of the catalytic chain. Related catalytic chains tend to have related regulatory extensions. The notable gene, NAS-39 shows a striking resemblance to human BMP-1 and the tolloids. Keywords: astacin family; Astacus astacus; Caenorhabditis elegans; protein evolution; metalloproteases. The first evidence for the existence of the astacin protein family can be traced back to the year 1967, when one of us (R. Zwilling) observed a proteolytic activity in the digestive fluid of the decapode crayfish Astacus astacus that was different to all other proteases known at that time [1]. Investigations of the cleavage and inhibition specificity confirmed this notion [2–4] and the elucidation of its unique amino acid sequence demonstrated definitely that the crayfish protease represented a new protein family [5]. In subsequent studies, the X-ray crystal structure of the Astacus protease, for which we proposed the denomination ÔastacinÕ, was solved to a resolution of 1.8 A ˚ [6]. Astacin was recognized to be a metalloprotease exhibiting a penta- coordinated zinc ion in its active center [7]. In addition, the site of biosynthesis [8], genome organization [9], and mode of activation [10,11] have been elucidated, which made the crayfish protease a prototype for the astacins. A second member of the astacin protein family was identified when Wang et al. and Wozney et al. (1988) studied the human bone-inducing factor BMP-1, into which a domainwithhighresemblancetocrayfishastacinisinserted [12,13]. After that many more astacin-like proteins or genes were described in rapid succession in vertebrates, invertebrates and even in prokaryotes [14], where they serve as different physiological functions as food digestion, hatching, peptide processing, morphogenesis and pattern formation (for an overview see [15]). In the crayfish Astacus astacus,a second astacin gene can be found in the embryo that is activated only during a narrow time window just before hatching [16]. In the model organism Caenorhabditis elegans metalloproteases are present in a great variety, as we have seen in data bank analysis (also [17]). On the other hand we have shown recently that the bulk of total proteolytic activity found in crude extracts of mixed stage populations consists of acidic aspartyl proteases [18,19]. However, with regard to the number of expressed astacin genes C. elegans surpasses any other organism studied so far. This investigation therefore was stimulated by the question, what for this 959-cell organism would need more than 30 different and active astacin genes. Correspondence to R. Zwilling, Zoologisches Institut, Universita ¨ t Heidelberg, Im Neuenheimer Feld 230, D-69120 Heidelberg, Germany. Fax: + 49 6221 544913, Tel.: + 49 6221 545887, E-mail: RobertZwilling@t-online.de Abbreviations: cDNA, complementary DNA; dsRNA, double-stranded RNA; EST, expressed sequence tag; GFP, green fluorescent protein; L1-4, larval stage 1–4; OST, open reading frame sequence tag; RNAi, RNA interference; RT-PCR, reverse transcription-polymerase chain reaction; NAS, nematode astacin. Note: Supplementary figures are available at http://www.zoo.uni- heidelberg.de/moehrlen Note: The sequences and the alignment reported in this paper have beensubmittedtoGenBank/EMBL/DDBJdatabankwithaccession numbers AJ561200, AJ561201, AJ561202, AJ561203, AJ561204, AJ561205, AJ561206, AJ561207, AJ561208, AJ561209, AJ561210, AJ561211, AJ561212, AJ561213, AJ561214, AJ561215, AJ561216, AJ561217, AJ561218, AJ561219, AJ561220, AJ561221 and ALIGN_000543. (Received 3 September 2003, revised 15 October 2003, accepted 22 October 2003) Eur. J. Biochem. 270, 4909–4920 (2003) Ó FEBS 2003 doi:10.1046/j.1432-1033.2003.03891.x Materials and methods Preparation of C. elegans The C. elegans wild-type strain N2 variant Bristol was grown as a liquid culture in S-medium [20] supplemented with Escherichia coli OP50 as food source. The cultures were incubated at 18 °C for 6–8 days under vigorous shaking. When the E. coli food source appeared to have been nearly exhausted, the nematodes, representing a mixed population of adults, all four larval stages and eggs, were harvested and separated from bacteria as described elsewhere [20]. RNA purification For the isolation of RNA, 100 lg fresh or frozen nematode pellets from a liquid culture were ground by means of a pestle in a mortar containing liquid nitrogen. Total RNA was extracted from the resulting powder following the protocol of Chomcynski and Sacchi [21]. Contamination by genomic DNA was avoided by treating total RNA with DNase I (RNase-free, Boehringer). Poly(A)-rich RNA was isolated by the OligotexÓ mRNA procedure (Qiagen, Germany). DNA purification Genomic DNA was isolated from 1 mL fresh nematodes from a liquid culture using a standard protocol [22]. PCR amplification and cloning Polyadenylated RNA (1 lg) was converted into single- stranded cDNA using a d(T) 17 primer or a random hexa- mer primer as described [23]. For the amplification of the predicted astacin-like cDNA fragments specific oligo- nucleotide primers derived from the genome sequencing data were used. Primer sequences are available at http:// www.zoo.uni-heidelberg.de/moehrlen/docs/WebFig1.htm. PCR amplification was performed on single-stranded cDNA or genomic DNA as a control with 2 U high fidelity Taq DNA polymerase (Invitrogen, Germany) to diminish the mutation rate inherent to the PCR reaction. The cycling conditions were 94 °Cfor3min,94°Cfor40s,55°Cfor 40 s, 68 °C for 1 min per kb for 35 cycles, and 68 °Cfor 8 min. After PCR, samples were analyzed in 2% agarose gels and discrepancies between expected and observed size of any PCR product were readily detected on visual inspection of the gels. The PCR products were then excised from 1.5% agarose gels and purified with a NucleoSpinÓ gel-extraction kit (Macherey and Nagel, Germany). The purified fragments were subjected to the SureCloneÓ Ligation procedure and cloned into a pUC18 vector according to manufacturer’s instructions (Pharmacia, Sweden). Plasmid DNA was prepared and subsequently nucleotide sequences were determined by double-strand sequencing according to the dideoxynucleotide chain-termination method, using T7 DNA polymerase (Amersham, Sweden). Universal M13 primers were used for sequencing. All sequences have been deposited in EMBL/GenBank/DDBJ under accession numbers AJ561200, AJ561201, AJ561202, AJ561203, AJ561204, AJ561205, AJ561206, AJ561207, AJ561208, AJ561209, AJ561210, AJ561211, AJ561212, AJ561213, AJ561214, AJ561215, AJ561216, AJ561217, AJ561218, AJ561219, AJ561220, AJ561221. GFP fusion genes for expression studies The genomic sequence data in WormBase [24] were used to identify a genomic DNA fragment suitable for fusion to a GFP reporter gene. In order to make sure that the gene specific promoter and all proper cis-elements necessary for guiding tissue specific expression are included in the reporter, the whole upstream region between the gene of interest and the neighboring upstream gene was used. For PCR amplification of the genomic DNA fragment the forward primers NAS-4:GFP/SacI/F1 (5¢-CGA GCT CTT GAG TGA AGA TGC CAA GA-3¢), NAS7:GFP/BamHI/ F1 (5¢-CGG GAT CCT TCC GCC AAA GTC ATT TAG- 3¢), NAS-15:GFP/PstI/F1 (5¢-AAC TGC AGC TTT TCG GAA GAC TTT TGC-3¢), NAS33:GFP/KpnI/F1 (5¢-GGG GTA CCC CGG ACC ACA GTA AAG AAT-3¢)and the corresponding reverse primers NAS4:GFP/KpnI/R1 (5¢-GGG GTA CCC TGA CAC GCT GAC CCA TAC-3¢), NAS7:GFP/KpnI/R1 (5¢-GGG GTA CCC GATC CTC GCA TTC TA-3¢), NAS15:GFP/KpnI/R1 (5¢-GGG GTA CCC GCT GGG TAG TGG AGT TG-3¢), NAS33:GFP/SacI/ F1 (5¢-CGA GCT CTG ACA AGA AAG GCA CAA AG- 3¢) were used. A 8–10 kb PCR fragment containing approximately 3–5 kb upstream sequences down to the last 30–50 codons of the astacin genes was fused in frame to the reporter gene GFP. Thus, the intergenic region as well as the protein coding regions of the astacin-like genes NAS-4, NAS-7; NAS-15 and NAS-33 were amplified with 2 U ElongaseÓ DNA polymerase (Invitrogene, Germany), gel purified (NucleoSpinÓ gel-extraction kit, Macherey and Nagel, Germany) and cloned in frame into a pBD95.85 vector (having the S65C mutation and artificial introns to increase the expression of GFP; A. Fire Vector Kit, Baltimore, USA) according to standard protocols [23,25]. The molecular details of all fusion constructs are available on request. The construct, together with the marker plasmid pBx, was introduced into pha-1 hermaphrodites, and the worms having the constructs as extrachromosomal arrays were isolated at 25 °C and observed for GFP fluorescence under a Zeiss Axiovert 200 microscope. Sequence analysis and phylogenetic studies To identify metalloprotease genes in the genome of C. elegans, we used representative vertebrate and insect proteins, or their conserved domains according to the PFAM [26] and PRINTS database [27], as queries for BLAST searches [28,29] of WormBase [24]. For astacin genes the astacin domain, the zinc binding motif or the Met-turn sequences, as listed by PRINTS, were used to repeatedly screen the whole C. elegans genomic sequence, available from WormBase. DNA sequences of all astacin genes were further analyzed using the HUSAR package [30] and the predicted gene structures were compared to the Genefinder predictions as annotated in WormBase, and to the alternative GenieGene open reading frame predictions of Kent and Zahler [31]. The 4910 F. Mo ¨ hrlen et al. (Eur. J. Biochem. 270) Ó FEBS 2003 splicing patterns were subsequently refined using the EST/ OST sequences available in the latest WormBase release (WormBase97, 7 March 2003) and the cDNA sequences resulting from this work. Discrepancies between the WormBase, GenieGene predictions and our own cDNA sequences were communicated to those annotating the sequences (http://www.zoo.uni-heidelberg.de/moehrlen/ docs/WebFig2.htm). The corrected cDNA sequences were translated into amino acid sequences using the HUSAR package and aligned using CLUSTAL [32]. For remaining unconfirmed splicing patterns, those protein predictions were used for further analysis, which are in accordance with the protein family alignment showing no exceptional insertions, deletions or frame shifts (EMBL: ALIGN_000543). For identification and annotation of protein domains and the analysis of domain architectures the tools of the SMART [33], PFAM [26], ProDom [34] and INTERPRO [35] protein domain databases were used. For phylogenetic studies the active protease domains, covering the region from Ala-1 to Leu-200 in the prototype crayfish astacin, from the C. elegans astacins and selected other astacin family members were aligned using CLUSTAL [32] and imported into GENEDOC [36] for further manipulation. The alignment is available at EMBL database with accession number ALIGN_000543. Phylo- genetic analyses were carried out using the neighbor- joining method and the Bayesian phylogenetic method. For neighbor-joining analysis the PHYLIP 3.5 software package [37] was used. Distances between the pairs of protein sequences were calculated and corrected for multiple changes according to the PAM001 distance matrix. The reliability of the tree was tested by bootstrap analysis with 100 replications. Bayesian phylogenetic analysis [38,39] was performed by the MR BAYES 3.0 BETA 4 program [40] with the WAG matrix [41] assuming a gamma distribution of substitution rates. Prior probabilities for all trees and amino acid replacement models were equal; the starting trees were random. Metropolis-coupled Markov chain Monte Carlo sampling was performed with one cold and three heated chains that were run for 50 000 generations. Trees were sampled every 10th generation. Posterior probabilities were estimated on 2000 trees (burnin ¼ 3000). The tree presented here was visualized using TREE VIEW [42]. Results and discussion Astacin homologue proteins in C. elegans During a preliminary data base survey we observed in 1996 that the 959-cell organism C. elegans accommodates a surprising number of gene sequences coding for astacin-like proteins, while for other species with a much larger genome not more than 2–3 astacin genes had been reported (G. Geier and R. Zwilling, unpublished). The complete sequencing of the 97 megabase genome of C. elegans by the C. elegans Sequencing Consortium in 1998 [43] then made a thorough analysis possible. The latest WormBase release (WormBase97, 7 March 2003) contains now 21 437 coding sequences when counting 1891 alternate splice forms. Of these the MEROPS protease database (latest release 6.11: 20 January 2003) lists 382 protease genes (E.C.3.4), of which 158 genes belong to the group of metalloproteases (E.C.3.4.24). The metalloproteases of C. elegans can be arranged into 11 protein clans and subdivided into 27 protein families, according to the nomenclature of Barrett et al.[44].Our own BLAST searches in WormBase, using protein family consensus sequences according to the PFAM or PRINTS databases as queries, revised the number of identified genes temporarily listed by MEROPS (see Table 1). BLAST searches based on the whole astacin domain, the zinc binding motif or the Met-turn sequence revealed some more astacin genes in C. elegans in addition to those listed by MEROPS so far, which finally brought up the total number of astacin genes in C. elegans to 40 (Tables 1 and 2). The nomenclature proposed for these 40 C. elegans astacin genes is in accordance with suggestions of the Table 1. One hundred and fifty-one genes coding for metalloproteases in C. elegans. Identification of genes was based on data available in MEROPS (The protease database, release 6.11: 20 January 2003, http://merops.sanger.ac.uk) and subsequently corrected by BLAST searches using the genome sequencing data of C. elegans. Nomenclature is according to Barrett et al. [44]. Clan Protease family Number of genes Clan Protease family Number of genes MA(E) M1 aminopeptidase 12 MF M17 leucyl aminopeptidase 2 M2 peptidyl-dipeptidase 1 MG M24A methionyl aminopeptidase I 5 M3A oligopeptidase 2 M24B aminopeptidase P 3 M13 neprilysin 23 MH M18 aminopeptidase I 1 M41 E. coli endopeptidase 3 M20A/B glutamate carboxypeptidase 5 MA(M) M8 leishmanolysin 1 M28B aminopeptidase Y 2 M10A MMP 6 M28X 4 M12A Astacin 40 MJ M38 beta-aspartyl dipeptidas 1 M12B/C ADAM 10 MK M22 O-sialoglycoprotein endopeptidase 2 MC M14A carboxypeptidase A 9 MM M50 S2P protease 1 M14B carboxypeptidase E 3 MX M48A Ste24 endopeptidase 1 ME M16A pitrilysin 5 M49 dipeptidylpeptidase 1 M16B mitochondrial processing peptidase 3 M67 proteasome regulatory subunit RPN11 3 M16X 2 Ó FEBS 2003 Astacin protein family in C. elegans (Eur. J. Biochem. 270) 4911 Table 2. Denomination of astacin genes in C. elegans, data base entries, approximate genetic map position and matching EST or OST clones (see WormBase release 94, Jan – 24–2003 [24]). For RT-PCR sequences (fmNAS-x) resulting from this work see http://www.zoo.uni-heidelberg.de/moehrlen. For further explications see text. Gene name Wormpep name EMBL/GenBank Genetic map position EST/OST RT-PCR sequencing Comment NAS-1 F45G2.1 Z93382 III:22.1 OST Aberrant splice, corrected full-length sequence (http://www.zoo.uni-heidelberg.de/moehrlen/docs/WebFig2.htm) NAS-2 F56A4.1 AC006645 AC006722 V:13.99 No PCR product Expression confirmed by microarrays only, translation fits best with GenieGene prediction g-V-409 NAS-3 K06A4.1 Z70755 V:1.98 fmNAS-3 cDNA fits best with Genie Gene prediction g-V-1836 NAS-4 C05D11.6 U00048 III:1.33 OST fmNAS-4 cDNAs fit best with Genie Gene prediction g-II-1042 NAS-5 T23H4.3 Z83240 I:4.03 No PCR product Expression confirmed by microarrays only NAS-6 4R79.1 AL031254 IV: 30.16 fmNAS-6 translation fits best with Genie Gene prediction g-IV-3005 NAS-7 C07D10.4 U13072 II:0.41 fmNAS-7 cDNA fits best with Genie Gene prediction g-II-1703 NAS-8 C34D4.9 U58755 IV:3.29 EST fmNAS-8 NAS-9 C37H5.9b U88315 V:6.52 EST Full-length sequence is confirmed by overlapping cDNAs NAS-10 K09C8.3 Z68006 X:2.51 EST NAS-11 K11G12.1 U23525 X:2.66 EST NAS-12 C24F3.3 Z81055 IV:4.54 fmNAS-12 NAS-13 F39D8.4 Z69791 X:21.46 fmNAS-13 Translation fits best with Genie Gene prediction g-X-2412 NAS-14 F09E8.6 Z73896 IV:8.02 fmNAS-14 Translation fits best with Genie Gene prediction g-IV-2471 NAS-15 T04G9.2 U41274 X:19.12 EST fmNAS-15 cDNAs and translation fit best with Genie Gene prediction g-X-2732 NAS-16 K03B8.1 Z74039 V:3.16 No PCR product Expression confirmed by microarrays only NAS-17 K03B8.2 Z74039 V:3.16 No PCR product Expression confirmed by microarrays only NAS-18 K03B8.3 Z74039 V:3.16 No PCR product Expression confirmed by microarrays only NAS-19 K03B8.5 Z74039 V:3.16 fmNAS-19 NAS-20 T11F9.3 Z74042 V:3.2 fmNAS-20 cDNA and translation fits best with Genie Gene prediction g-V-2325 NAS-21 T11F9.5 Z74042 V:3.2 fmNAS-21 Aberrant splice, corrected sequence (http://www.zoo.uni-heidelberg.de/moehrlen/docs/WebFig2.htm) NAS-22 T11F9.6 Z74042 V:3.2 fmNAS-22 Aberrant splice, corrected sequence (http://www.zoo.uni-heidelberg.de/moehrlen/docs/WebFig2.htm) translation fits best with Genie Gene prediction g-V-2327 NAS-23 D1022 unassigned U23517 II:0.45 fmNAS-23 Not in WORMBASE , predicted using Genescan see (http://www.zoo.uni-heidelberg.de/moehrlen/docs/WebFig2.htm) NAS-24 F20G2.4 Z79753 V:5.42 fmNAS-24 Translation fits best with Genie Gene prediction g-V-2804 NAS-25 F46C5.3 Z54281 II:0.92 EST fmNAS-25 NAS-26 T24A11.3 Z49072 III:4.54 EST cDNA Fits best with Genie Gene prediction g-V-483 toh-1 NAS-27 T23F4.4 AF025466 II:13.27 fmNAS-27 4912 F. Mo ¨ hrlen et al. (Eur. J. Biochem. 270) Ó FEBS 2003 C. elegans Sequencing Consortium. In Table 2 we have numbered these C. elegans astacins (nematode astacins, NAS) from 1 to 40. The two proteins NAS-23 and NAS-40 (located on cosmids F54B8 and D1022) are not recorded in the WormPep database (predicted proteins from Worm- Base) but could be detected by a genomic TBLASTN search and the use of the program GENSCAN . However, for NAS-40 GENSCAN did not predict a complete protein but rather an 88 amino acid fragment which is interrupted by two stop codons. Hishida et al. [45] reported that HCH-1 (¼ F40E10.1, NAS-34) is required for normal hatching and neuroblast migration in C. elegans. For all other astacin genes, beyond the Genefinder protein prediction in WormBase and the partial transcription analysis by the EST or open reading frame sequence tags (OST) projects no further details were known. It therefore was indispensable to confirm as a first step for each gene the existence of expression products. Transcriptome analysis Comparing all genomic DNA sequences of astacin genes identified by our BLAST search to the cDNA data of WormBase it became evident that for 12 of the total of 40 genes EST or OST clones [46,47] were already known (WormBase release 57, 17 December 2001). This confirmed that the 12 genes in question were expressed on the mRNA level. The remaining 28 genes were analyzed by RT-PCR followed by sequencing of the DNA fragments in order to demonstrate their transcription activity. For each gene specific primer pairs were synthesized, the gene fragments amplified by PCR and the products analyzed on agarose gels (http://www.zoo.uni-heidelberg.de/moehrlen/ docs/WebFig1.htm). In each case the PCR reaction with reverse-transcribed RNA was accompanied by a control reaction with genomic DNA. Introns within the amplified DNA regions gave rise to correspondingly larger DNA fragments when compared to their cDNA fragments. For unambiguous identification and for the correction of erroneous splicing pattern predictions for all DNA fragments the PCR products were eluted from a agarose gel, blunt end cloned into the vector pUC18 and subsequently sequenced (http://www.zoo.uni-heidelberg.de/moehrlen/ docs/WebFig2.htm). In combination with the recently available EST and OST sequences (WormBase release 97, 7 March 2003) we found for 13 genes (Table 2) splicing patterns differing from the Genefinder predictions in WormBase, sometimes markedly. In these cases, the experimental cDNA transcripts were in good accordance with the alternative GenieGene open reading frame predictions of Kent and Zahler [31] (Table 2 and http://www.zoo.uni-heidelberg.de/moehrlen/docs/ WebFig2.htm). For NAS-1, NAS-21, NAS-22 and NAS- 28 we observed aberrant splice sites from both, the Genefinder and the GenieGene prediction. The manually corrected cDNA sequences can be found at http://www. zoo.uni-heidelberg.de/moehrlen/docs/WebFig2.htm. All new sequence data including corrected gene structures have been submitted to WormBase and EMBL/GenBank/DDBJ databases (for accession number, see footnote). The genes Table 2. (Continued). Gene name Wormpep name EMBL/GenBank Genetic map position EST/OST RT-PCR sequencing Comment NAS-28 F42A10.8 U10414 III:1.38 OST fmNAS-28 Aberrant splice, corrected full-length sequence is confirmed by overlapping cDNAs, (http://www.zoo.uni-heidelberg.de/moehrlen/docs/WebFig2.htm) NAS-29 F58A6.4 U53339 II:1.98 fmNAS-29 Translation fits best with Genie Gene prediction g-II-1160 NAS-30 Y95B8 A1 AC024877 I:20.88 No PCR product Expression confirmed by microarrays only NAS-31 F58B4.1 Z74038 V:2.87 EST fmNAS-31 cDNAs fits best with Genie Gene prediction g-V-2200, possible alternative splice site NAS-32 T02B11.7 AF022979 V:19.07 EST fmNAS-32 NAS-33 K04E7.3 U39666 X:2.93 fmNAS-33 NAS-34 F40E10.1 D85744Z69792 X:19.9925 EST Full-length cDNA confirmed by Hishida et al. hch-1 NAS-35 R151.5 U00036 III:0.76 EST Full-length sequence is confirmed by overlapping cDNAs toh-2 NAS-36 C26C6.3 Z72503 I:2.05 EST NAS-37 C17G1.6 Z78415 X:1.48 EST NAS-38 F57C12.1 U41554 X:19.47 EST NAS-39 F38E9.2 U46668 X:23.83 EST NAS-40 F54B8 unassigned Z93383 V:9.77 No PCR product; Not in WORMBASE , Pseudogene Ó FEBS 2003 Astacin protein family in C. elegans (Eur. J. Biochem. 270) 4913 NAS-2, NAS-5, NAS-16, NAS-17, NAS-18 and NAS-30 showed no apparent PCR product in our RT-PCR analysis (Table 2, http://www.zoo.uni-heidelberg.de/moehrlen/docs/ WebFig2.htm). However, the microarray projects of Hill et al. [48,49], Kim et al. [50], or Jiang et al.[51](foran overview see WormBase) support the expression of these genes. We would like to point out that this technique has no way to unerringly verify either the identity or the splicing pattern of a gene because no sequence data are produced. Nevertheless, in summary it may be stated that with the exception of pseudogene NAS-40 for all other 39 astacin genes a transcription activity could be confirmed. Functional analysis We made an attempt to analyze the function of selected astacin genes in C. elegans investigating the expression pattern of four representative astacin genes of different subgroups (see section on Structural and phylogenetic analysis, Fig. 2.) using GFP-fusion constructs. All astacin- GFP fusions were assayed for expression in animals from embryonic stages onwards. At least three independent transgenic lines were generated from at least two independent clones of each of the astacin-GFP fusion constructs to control for PCR-induced sequence errors. The reporter gene fusion NAS-15::GFP and NAS- 33::GFP failed to give detectable expression in any life stage. The fusion protein NAS-4::GFP showed extensive GFP fluorescence throughout the digestive tract in larval stages and in adult worms (Fig. 1A). At higher magnification, we saw GFP staining within pharynx cells of the procorpus, metacorpus, isthmus and terminal bulb, and extracellular staining in the lumen of the terminal bulb (Fig. 1B, arrows). Therefore, NAS-4 most likely is secreted by the pharynx cells into the lumen and then is found in secreted form all the way down in the lumen of the gut. We conclude from this expression pattern that NAS-4 is associated with digestive functions. Of special interest is the notion that NAS-4 and the digestive enzyme astacin from crayfish [8] have a similar domain arrangement, both lacking a C-terminal extension (see section on Structural and phylogenetic analysis). They also cluster in the phylogenetic tree (Fig. 3), suggesting that they have similar functions. These considerations might be extended to the whole subgroup I (Fig. 2, NAS-2–6) which shares these features. By contrast, NAS-7::GFP staining was observed only in the head of adult hermaphrodites, but not within pharynx cells (Fig. 1C). The expressing cells are located outside of the pharynx, around the metacarpus and the terminal bulb, and could include neurons, cells of the excretory system or gland cells of still unknown functions [20]. Reporter gene expression also became detectable in the embryo before hatching (Fig. 1D). While at this moment the function of the gene expressed in the adult remains open, in the embryo it possibly could serve as a hatching enzyme. To further characterize the function of astacin genes in C. elegans we analyzed the genome wide RNAi analysis of Gonczy et al.[52],Fraseret al.[53],Maedaet al.[54], Kamath et al. [55,56], Ashrafi et al.[57],Leeet al. [58] and Pothof et al. [59]. Although nearly all astacin genes have been investigated for gene silencing by RNAi, most of them lack of an obvious phenotype and no function could be deduced from the attempted inactivation. Whether this phenomenon reflects the dsRNA interference being incom- plete or a redundancy in functions for the high number of expressed astacin genes remains to be established. Strong RNAi phenotypes were observed for NAS-9, -11 and -37 only, revealing these three astacin genes to be essential. Inactivated NAS-9 showed 6% embryonic lethality [54], Fig. 1. GFP expression pattern images for NAS-4 (A, B) and NAS-7 (C, D). (A) Extensive GFP fluorescence throughout the digestive tract in an adult hermaphrodite and a L2 larvae for a NAS-4::GFP fusion gene; 100 · magnification. (B) Higher magnification of the head of an adult hermaphrodite showing GFP expression for the same construct in pharynx cells and in the lumen of the terminal bulb; 400 · magnification. (C) GFP expression of a NAS-7::GFP fusion gene is found in the head of adult hermaphrodites, but not in pharynx cells or in the lumen of the digestive tract; 300 · magnification. (D) In embryos NAS-7::GFP reporter gene fluorescence became detectable just before hatching; 400 · magnification. 4914 F. Mo ¨ hrlen et al. (Eur. J. Biochem. 270) Ó FEBS 2003 NAS-11 showed retarded growth [56] and NAS-37 showed long body deviancy and a molt defect [54,56]. As a rule it can be stated that all known astacin gene inactivations had only little, if any, effect. One explanation for this could be that C. elegans astacins have overlapping functions, which is also suggested by structural homologies. Structural and phylogenetic analysis All known sequence data of astacin-like proteins are derived from cDNA and genomic sequences, with the exception of crayfish astacin, which in addition had been completely sequenced by Edman degradation [5]. The present analysis is based on protein sequences available from SwissProt, TrEMBL, EMBL, and GenBank databases. If necessary, open reading frames of DNA sequences were translated by the HUSAR Package into amino acid sequences. For C. elegans we used the Gene- finder or GenieGene predictions corrected by our cDNA data (http://www.zoo.uni-heidelberg.de/moehrlen/docs/ WebFig1.htm). Altogether, we found over a hundred complete sequences of astacin-like proteins, which Fig. 2. Schematic representation of homologues and domain structures in astacin genes in C. elegans. Pre-pro sequences, catalytic domain and presumably regulatory appendices. Diagram scale is related to amino acid length. Presequences, purple shaded boxes; prosequences, grey oval; astacin domain, red box; six cysteins, SXC; EGF-like, yellow oval; CUB domains, CUB; thrombospondin-1 like, TSP1; low complexity sequences, striped boxes; not specified, open boxes. Ó FEBS 2003 Astacin protein family in C. elegans (Eur. J. Biochem. 270) 4915 are known at present (http://www.zoo.uni-heidelberg.de/ moehrlen/docs/WebFig2.htm). Considering only the euca- ryote genomes sequenced completely, in human and mouse six, and in Drosophila melanogaster 12 astacin genes are found. However, the tiny 959-cell organism C. elegans exhibits the striking number of 40 astacin genes, a number by far not reached in any other organism studied up to now. With the only exception of the pseudogene NAS-40 all these genes are expressed and seem to have specific functions. Therefore, these findings not only allow the study of an extraordinary divergence of a protein family within one single organism, but also shed light on a multiple functional fine modulation evolving from a common structural source. In the astacins typically three basic structural and functional moieties can be discerned: a pre-pro portion, the catalytic astacin chain, and long C-terminal extensions, which presumably contain messages for proper function (Fig. 2). Pro-sequences are found in all functional C. elegans astacins, while presequences (signal peptides) are lacking in nine genes (Fig. 2). The missing of signal peptides in these genes may reflect specific intracellular functions of non- secreted proteins. On the other hand the lack of these signal peptides could also reflect problems with the still unconfirmed 5¢-gene predictions of Genfinder or GenieGene as the sequencing data produced here have been limited to PCR- derived fragments, and to the reanalysis of EST and OST fragments. In some rare cases in other organisms prepro structures may be lacking completely, often combined with a N-terminally truncated catalytic domain [Cortunix cortunix (quail) CAM-1, Swissprot P42326; Drosophila melanogaster CG6974, TrEMBL Q9VFD6; Hydra vulgaris FARM-1, TrEMBL Q9U4 · 9], but in C. elegans (with the exception of the not expressed pseudogene NAS-40) this feature never could be seen. In the central domain of all C. elegans astacin genes, the amino acid residues that have been identified in crayfish astacin as essential for catalytic activity [6,7,60,61] are preserved without exception. From this fact it may be concluded that all C. elegans astacins potentially have catalytic activity, too. C. elegans astacins typically are characterized by long, complex C-terminal extensions adjacent to the catalytic domain, which presumably define time and place of their activity (Fig. 2). Based on homology criteria within these appendices CUB-, EGF-, SXC-, and TSP-1 domains can be discerned, while other sequences must be classified as Ônon specificÕ or having Ôlow compositional complexityÕ (LC). LC regions are often Ser/Thr-rich, are found in many astacins and could serve as sites for O-glycosylation. EGF domains are epidermal growth factor like modules (PFAM accession number: PF00008). CUB domains (SMART accession number: SM0042) are named after their occurrence in complement components C1r/C1s, embryonic sea urchin protein Uegf, and BMP-1 [62]. These domains may be involved in calcium-binding and protein-protein or enzyme–substrate interactions [63]. The SXC (six-cysteine) motif was observed in several hypo- thetical C. elegans proteins [64,65] but was originally described in metridin, a toxin from sea anemone and is also called ShK toxin domain (SMART accession number: SM0254). TSP-1-like domains are thrombospondin type 1 repeats (SMART accession number: SM0209) which are present in several families of metalloproteases namely in the ADAM-TS proteases (ADAM-TS, a disintegrin-like and metalloproteinase with thrombospondin type I motifs; family M12B/C, see Table 1). TSP-1 domains are reported here for the first time for astacins. According to the structural differences in their C-terminal extensions we arranged all 40 C. elegans genes into the subgroups I–VI (Fig. 2). Subgroup I comprises five genes with no C-terminal extension (NAS-1), or with short, unspecific extensions, where probably no specific signals can be accommodated. Subgroup II exhibits in its 10 genes exclusively the SXC domain, while other domain types are completely lacking. The SXC domain appears in a single, double or triple arrangement and the domains may be attached directly to the catalytic chain or separated from it and from each other by short, unspecific sequences. A tandem-like arrangement can only be seen with these SXC domains, while other domain types are represented only once in a regulatory chain (for an exception see subgroup VI). Subgroup III combines 15 genes that typically have an EGF-like domain directly attached to the catalytic chain, followed by a CUB domain. In gene NAS-18 the CUB domain and in gene NAS-21 the EGF-like domain is missing. In subgroup IV (two genes) a SXC domain and in subgroup V (six genes) a TSP-1 domain is added to EGF and CUB domains, Fig. 3. Phylogenetic relationship of the astacins, including all C. elegans astacin proteins (shaded yellow) and selected examples from other organisms. The tree was deduced by Bayesian and neighbor-joining analysis based on the alignment of the amino acid sequences of the catalytic chain. At branching points, Bayesian posterior probabilities and bootstrap values greater than 50 of 100 replications (values in parentheses) and are given as an indication for the confidence of the tree presented. The scale bar represents a distance of 0.1 accepted point mutations per site (PAM). Evolutionary subgroups of the astacin protein family are indicated on the right side. The schematic representation of the protein domains (colored bars) corresponds to that in Fig. 2. Meprin domains: MAM domain, MAM; MATH domain, MATH; I-domain, I; intervening sequence, inter; transmembrane domain, TM; cytoplasmic domain, c. For an overview, see [66]. Abbreviations and Swissprot/TREMBL/PIR accession number of the astacins: AA Astacin, Astacus astacus (crayfish) astacin (P07584); AC TBL-1, Aplysia californica TBL-1 (P91972); AJ EHE-4, Anguilla japonica (fish) EHE-4 (Q90Y89); CC Nephrosin, Cyprinus carpio (fish) Nephrosin (O42326); DM Tolloid and Tolkin, Drosophila melanogaster Tolloid (P25723) and Tolkin (Q23995); FM Flavast, Flavobacterium meningosepticum Flavastacin (Q47899); HS BMP-1, Homo sapiens bone morphogenetic protein 1 (Q14874); HS Meprin A and B, Homo sapiens Meprin a (Q16819) and b (Q16820); HS TLL and TLL-2, Homo sapiens Tolloid like 1 (Q9NQS4) and 2 (Q9UQ00); HV HMP-2, Hydra vulgaris (Cnidaria) Metalloprotease 2 (Q9XZG0); MM BMP-1, Mus musculus BMP-1 (I49540); MM Meprin A and B, Mus musculus Meprin a (P28825) and b (Q61847); OL LCE and HCE- 1, Oryzias latipes (fish) low choreolytic enzyme (P31581) and high choreolytic enzyme 1 (EMBL:M96170); PC PMP-1, Podocoryne car- nea (Cnidaria) Metalloprotease 1 (O62558PL); PL BP-10, Paracen- trotus lividus (sea urchin) blastula protease 10 (P42674); SP BMP-H, Strongylocentrotus purpuratus (sea urchin) BMP-1 homolog (P98069); SP SPAN, Strongylocentrotus purpuratus (sea urchin) SPAN (P98069); TR MP, Takifugu rubripes (fish) HCE-1 (AAL40376); XL BMP-1, Xenopus laevis BMP-1 (P98070). 4916 F. Mo ¨ hrlen et al. (Eur. J. Biochem. 270) Ó FEBS 2003 which show an identical arrangement as in subgroup III. Subgroup VI is a special case: the only entry NAS-39 shows a striking similarity to human bone inducing factor BMP-1. A comparison between both proteins reveals a sequence identity of the catalytic chains of 74%, while for other nematode astacins this value reaches on average only 40%. But also xolloid (Xenopus), tolloid and tolkin (Drosophila) and TBL-1 (Aplysia) have corresponding structures. The Number and arrangement of CUB- and EGF-domains are identical in these genes. NAS-39 exceeds in its length by far all other C. elegans genes. It will be interesting to see what physiological role a factor almost identical to human BMP-1 might perform in C. elegans and this could give us also some insight into the Ó FEBS 2003 Astacin protein family in C. elegans (Eur. J. Biochem. 270) 4917 primordial functions from which human BMP-1 has evolved. The distinctive and complex pattern, which appears in the subgroups I–VI seems to provide a specific function for each C. elegans astacin gene. Members of the same subgroup might have similar or identical functions. We constructed a phylogenetic tree comprising all 39 expressed C. elegans astacins and in addition selected astacin proteins from a variety of other organisms (Fig. 3). The tree is based on a multiple alignment of the amino acid sequence of the active protease domain, covering the region from Ala1 to Leu200 in the prototype, crayfish astacin. Results were corrected with help of the known secondary structures and conserved regions of crayfish astacin. The alignment has been submitted to EMBL databank with accession number ALIGN_000543. Phylogenetic relationships were initially established on the basis of the neighbor-joining method using the PHYLIP program package. As outgroup we used the phylogenetically most remote flavastacin from bacteria. However, an isolated occurrence of an astacin sequence in a single bacteria species could be due to a lateral gene transfer, which would render this sequence unsuitable as an outgroup. Because recently at least one more astacin-like protein has been detected in bacteria (http://www.zoo.uni- heidelberg.de/moehrlen), lateral gene transfer is most unlikely. Moreover, we also tried the phylogenetically remote Cnidaria astacins (HMP-2 and PMP-1) as an outgroup, which gave exactly the same phylogenetic tree. For statistical verification a consensus tree including 100 sequences was calculated and bootstrap values were established for each point of divergence. However, the phylogenetic tree based on the neighbor-joining method showed rather low bootstrap values (< 50) for the most ancestral nodes (Fig. 3). Pro sequences could not be used additionally to strengthen these branching points because they are differing extremely in length, are changing rapidly or are lacking completely. A similar consideration can be made for the C-terminal extensions. The robustness of the tree was therefore verified additionally by the Bayesian phylogenetic method. With this study the confidence of the tree significantly increased and resulted in high posterior probabilities. The evolutionary tree now presented in Fig. 3 summarizes all above-mentioned approaches and exhibits therefore the best reliability. From this analysis it becomes evident that similar sequences of the catalytic chain tend to have similar C-terminal extensions (Fig. 3). All 39 complete NAS proteins can be subdivided into two different types: one having CUB domains in their regulatory domains, and another one where these are lacking completely (see also Fig. 2). This pattern is clearly reflected in the amino acid sequence based phylogenetic tree, where all NAS proteins exhibiting a CUB domain come closely together in one cluster (Fig. 3). The CUB domain is almost always preceded by an EGF domain (exception NAS-21). To these either no further segments are attached (subgroup III), or a SXC domain (subgroup IV) or a TSP-1 domain (subgroup V) might follow. The second cluster comprises the NAS-1 to NAS-15 proteins, characterized by having no distinct extensions (subgroup I) or showing one, two or three SXC domains (subgroup II). NAS-39 (subgroup VI) is strikingly different from all other C. elegans astacins, but can perfectly be inserted into the BMP-1/Tolloid-group, likewise on the basis of the sequence homologies or the complex, but identical arrangement of the 5 CUB- and the 2 EGF-segments (Figs 2 and 3). One might wonder about the expression of such large a number of related, but different astacin genes in a 959-cell organism. Potentially all these genes could have different functions, showing in each case at least clear, in some cases marked structural differences. However, much of this divergence seems to be due to relatively recent gene duplications. In the closely related species Caenorhabditis briggsae the genes NAS-16, -18, -19, -22, -24 and the pseudo-gene NAS-40 are missing. C. elegans and C. briggsae share, however, the neighboring genes NAS-17, -20, and -21. In addition, these genes show a tandem-like arrangement in clusters and are all located on chromosome V, where NAS- 16, -17, -18, -19 form one cluster, and separated by different other genes a second cluster comprising NAS-20, -21, -22 can be found. These notions are also supported by the position of these genes in the evolutionary tree (see Table 2, and Figs 2 and 3). It therefore seems reasonable to assume that these genes comprising one half of subgroup III resulted from recent gene duplications, which implies that they might have more or less similar functions. If one extends this kind of reasoning with some caution to the whole of the analyzed C. elegans astacins one could conclude that only the six established subgroups actually represent major functional differences, as these are based on marked differences in their regulatory units. This would reduce the number of func- tionally different gene types to six, a number that comes close to that found for astacins also in other organisms. Nevertheless, the fact remains that each NAS gene is expressed and structurally distinct from the others. This constitutes a favorable starting point for the rapid acquisi- tion of new functions, a capacity, which might be a prerequisite for the ubiquous occurrence of C. elegans in nearly all soil types. However, most NAS genes are dispersed over all six chromosomes of C. elegans, which indicates a long evolutionary history of the astacin protein family in the nematodes. The identical and complex arrangement of the seven regulatory domains in NAS-39 and BMP-1 suggests furthermore that this distinct structure has been retained unchanged for long periods and was already present in the common ancestor of nematodes and vertebrates. Acknowledgements This study was supported by a grant from the Deutsche Forschungsg- emeinschaft, Bonn, to RZ (Zw 17/14–2). We also wish to thank Thorsten Burmester, University of Mainz, Germany for supporting the Bayesian phylogenetic analysis. References 1. Pfleiderer, G., Zwilling, R. & Sonneborn, H.H. (1967) On the evolution of endopeptidases, 3: a protease of molecular weight 11,000 and a trypsin-like fraction from Astacus fluviatilis fabr. Hoppe Seylers. Z. Physiol. Chem. 348, 1319–1331. 2. Sonneborn, H.H., Zwilling, R. & Pfleiderer, G. (1969) Evolution of endopeptidases. X. Cleavage specificity of low molecular weight protease from Astacus leptodactylus Esch. Hoppe Seylers. Z. Physiol. Chem. 350, 1097–1102. 4918 F. Mo ¨ hrlen et al. (Eur. J. Biochem. 270) Ó FEBS 2003 [...]... of other distinct bone-inducing factors Proc Natl Acad Sci USA 85, 9484–9488 14 Tarentino, A.L., Quinones, G., Grimwood, B.G., Hauer, C.R & Plummer, T.H J (1995) Molecular cloning and sequence analysis of flavastacin: an O-glycosylated prokaryotic zinc metalloendopeptidase Arch Biochem Biophys 319, 281–285 15 Zwilling, R & Stocker, W (1997) The Astacins: Structure and ¨ Function of a New Protein Family. .. the C elegans genome against mutations by genome-wide RNAi Genes Dev 17, 443–448 Stocker, W., Gomis-Ruth, F., Bode, W & Zwilling, R (1993) ¨ ¨ Implications of the three-dimensional structure of astacin for the structure and function of the astacin family of zincendopeptidases Eur J Biochem 214, 215–231 Yiallouros, I., Grosse-Berkhoff, E & Stocker, W (2000) The roles ¨ of Glu93 and Tyr149 in astacin- like...Ó FEBS 2003 Astacin protein family in C elegans (Eur J Biochem 270) 4919 3 Krauhs, E., Dorsam, H., Little, M., Zwilling, R & Ponstingl, H ¨ (1982) A protease from Astacus fluviatilis as an aid in protein sequencing Anal Biochem 119, 153–157 4 Zwilling, R., Dorsam, H., Torff H.-J & Rodl, J (1981) Low ¨ ¨ molecular mass protease: evidence for a new family of proteolytic enzymes FEBS... of the zinc-endopeptidase astacin Arch Biochem Biophys 337, 300–307 10 Mohrlen, F., Baus, S., Gruber, A., Rackwitz, H.R., Schnolzer, M., ¨ ¨ Vogt, G & Zwilling, R (2001) Activation of pro -astacin: immunological and model peptide studies on the processing of immature astacin, a zinc-endopeptidase from the crayfish Astacus astacus Eur J Biochem 268, 2540–2546 11 Yiallouros, I., Kappelhoff, R., Schilling,... Bayesian inference of phylogenetic trees Bioinformatics 17, 754–755 Whelan, S & Goldman, N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach Mol Biol Evol 18, 691–699 TreeView 1.6.6; http://taxonomy.zoology.gla.ac.uk/rod/rod.html C elegans Sequencing Consortium (1998) The C elegans Sequencing Consortium genome sequence of the. .. dichroism, binding to procollagen type I, and computer modeling Biochemistry 39, 3231–3239 Gems, D., Ferguson, C.J., Robertson, B.D., Nieves, R., Page, A.P., Blaxter, M.L & Maizels, R.M (1995) An abundant, trans-spliced mRNA from Toxocara canis infective larvae encodes a 26-kDa protein with homology to phosphatidylethanolaminebinding proteins J Biol Chem 270, 18517–18522 Blaxter, M (1998) Caenorhabditis elegans. .. Zahler, A.M (2000) The intronerator: exploring introns and alternative splicing in Caenorhabditis elegans Nucleic Acids Res 28, 91–93 Higgins, D.G., Thompson, J.D & Gibson, T.J (1996) Using CLUSTAL for multiple sequence alignments Methods Enzymol 266, 383–402 SMART, Version 3.5, 21 March 2001 http://smart.embl-heidel berg.de ProDom, release.1/CG67; http://www.toulouse.inra.fr/prodom html InterPro database,... Geier, G & Zwilling, R (1998) Cloning and characterization of a cDNA coding for Astacus embryonic astacin, a member of the astacin family of metalloproteases from the crayfish Astacus astacus Eur J Biochem 253, 796–803 17 Hutter, H., Vogel, B.E., Plenefisch, J.D., Norris, C.R., Proenca, R.B., Spieth, J., Guo, C., Mastwal, S., Zhu, X., Scheel, J & Hedgecock, E.M (2000) Conservation and novelty in the evolution... systematic RNA interference Nature 408, 325–330 Maeda, I., Kohara, Y., Yamamoto, M & Sugimoto, A (2001) Large-scale analysis of gene function in Caenorhabditis elegans by high-throughput RNAi Curr Biol 11, 171–176 Kamath, R.S., Martinez-Campos, M., Zipperlen, P., Fraser, A.G & Ahringer, J (2001) Effectiveness of specific RNA-mediated interference through ingested double-stranded RNA in Caenorhabditis elegans. .. Kurpiewski, M.R., Ashcom, J.D., Jen-Jacobson, L & Jacobson, L.A (1988) Proteases of the nematode Caenorhabditis elegans Arch Biochem Biophys 261, 80–90 19 Geier, G., Banaj, H.J., Heid, H., Bini, L., Pallini, V & Zwilling, R (1999) Aspartyl proteases in Caenorhabditis elegans Isolation, identification and characterization by a combined use of affinity chromatography, two-dimensional gel electrophoresis, 20 21 . protease domains, covering the region from Ala-1 to Leu-200 in the prototype crayfish astacin, from the C. elegans astacins and selected other astacin family. For astacin genes the astacin domain, the zinc binding motif or the Met-turn sequences, as listed by PRINTS, were used to repeatedly screen the whole C. elegans

Ngày đăng: 23/03/2014, 15:21

Xem thêm