Abstract:Expressed sequence tag data were generated from complementary DNA libraries created from cephalothorax, eyestalk, and pleopod tissue of the black tiger shrimp (Penaeus monodon). Significant database matches were found for 48 of 83 nuclear genes sequenced from the cephalothorax library, 22 of 55 nuclear genes from the eyestalk library, and 6 of 13 nuclear genes from the pleopod library. The putative identities of these genes reflected the expected tissue specificity. For example, genes for digestive enzymes were identified from the cephalothorax library and genes involved in the visual and neuroendocrine system from the eyestalk library. A few sequences matched anonymous EST or genomic sequences, and others contained minisatellite or microsatellite repeat sequences. The remainder, 31 from the cephalothorax library, 25 from the eyestalk library, and 5 from the pleopod library, were sequences of high nucleotide complexity with no matches in any database searched and thus may represent novel genes.
Mar Biotechnol 1, 465–476, 1999 © 1999 Springer-Verlag New York Inc Tissue-Specific Expressed Sequence Tags from the Black Tiger Shrimp Penaeus monodon Sigrid A Lehnert,1,* Kate J Wilson,2 Keren Byrne,1 and Stephen S Moore1 CSIRO Tropical Agriculture, Molecular Animal Genetics Centre, Level 3, Gehrmann Laboratories, University of Queensland, St Lucia, Qld 4072, Australia Australian Institute of Marine Science, PMB No 3, Townsville Mail Centre, Townsville, Qld 4810, Australia Abstract: Expressed sequence tag data were generated from complementary DNA libraries created from cephalothorax, eyestalk, and pleopod tissue of the black tiger shrimp (Penaeus monodon) Significant database matches were found for 48 of 83 nuclear genes sequenced from the cephalothorax library, 22 of 55 nuclear genes from the eyestalk library, and of 13 nuclear genes from the pleopod library The putative identities of these genes reflected the expected tissue specificity For example, genes for digestive enzymes were identified from the cephalothorax library and genes involved in the visual and neuroendocrine system from the eyestalk library A few sequences matched anonymous EST or genomic sequences, and others contained mini-satellite or microsatellite repeat sequences The remainder, 31 from the cephalothorax library, 25 from the eyestalk library, and from the pleopod library, were sequences of high nucleotide complexity with no matches in any database searched and thus may represent novel genes Key words: Penaeus monodon, shrimp, cDNA libraries, ESTs, gene expression, genome I NTRODUCTION Penaeus monodon, the black tiger shrimp, is the most important shrimp aquaculture species in the Southern IndoPacific region In 1997, approximately 400,000 tonnes were produced worldwide with a value of more than U.S $3 billion (Rosenberry, 1997) This industry is now threatened by problems of disease and depletion of the wild broodstock that are used to stock commercial hatcheries (Browdy, 1998) Research on this species is therefore focusing on domestication to overcome broodstock shortages and to allow genetic selection for improved strains, and on idenReceived February 12, 1999; accepted April 13, 1999 *Corresponding author; telephone +61-7-3214-2445; fax +61-7-3214-2480; e-mail Sigrid.Lehnert@tag.csiro.au tification of the causative agents of major prawn diseases Molecular tools are being applied to both problems (Benzie, 1998) At present, there is little baseline information on the molecular biology of any crustacean species The GenBank release of July 1, 1998, contained 846 crustacean sequences Excluding sequences for ribosomal RNA genes, highly repetitive DNA, and mitochondrial genes, there were only 254 entries, including many redundancies Of these, only 133 were from decapod crustaceans, the group to which most commercially important species belong To address this constraint for the commercially important penaeid species, we have initiated a project to characterize expressed sequence tags (ESTs) in P monodon ESTs are generated by single-pass DNA sequencing of clones obtained from complementary DNA cDNA libraries and are a 466 Sigrid A Lehnert et al powerful tool in the genetic characterization of organisms, owing in large part to the speed and affordability of generating these sequences Comparison of sequences obtained with those available in public sequence databases allows putative identification of many genes (Marra et al., 1998) ESTs can be used to characterize patterns of gene expression as they represent genes that are actively expressed in the tissues from which cDNA librarires are prepared (Fields, 1994) Expressed sequence data are also an important component of gene mapping studies, as anchor loci to allow the transfer of gene map information between species (Khan et al., 1992; Ruyter-Spira et al., 1996) As the first step in this project we have sequenced a limited number of clones from three different shrimp cDNA libraries to obtain information about representation, sequence homologies to other species, and gene expression that will inform further work on shrimp ESTs Shrimp ESTs will act as a valuable resource for the study of crustacean physiology, genetics, and molecular evolution M ATERIALS AND M ETHODS Whole Cephalothorax Library For the “whole cephalothorax” cDNA library, mRNA was prepared from five adult P monodon obtained from a North Queensland prawn farm The cephalothorax was severed from the abdomen, the carapace was carefully removed, and the legs and other appendages were cut off Eyes and eyestalks were included in the preparation Total RNA was extracted from tissues snap frozen in liquid nitrogen, disrupted with a hammer, and homogenized in Trizol LS (Gibco) using a glass homogenizer Messenger RNA was prepared using oligo(dT)-cellulose spin columns (Pharmacia mRNA purification kit), and cDNA was prepared and directionally cloned into the UniZAP vector (Stratagene) using the ZAP-cDNA synthesis kit (Stratagene) The resulting library contained 1.6 × 106 independent phages and was amplified to × 1010 pfu/ml For dot blot analysis, 500 ng of plasmid DNA was spotted onto Hybond N+ membrane (Amersham) and hybridized to a 32P-labeled probe using standard protocols (Sambrook et al., 1989) Eyestalk and Pleopod cDNA Libraries RNA was prepared from four eyestalks, one each from four separate broodstock females provided by a Queensland hatchery, and from 10 pleopods, each from two tank- reared males, using the “one-step” total RNA preparation method (Chomczynski and Sacchi, 1987) Messenger RNA was prepared using oligo(dT)-cellulose spin columns (Promega mRNA purification kit), and cDNA was prepared and directionally cloned into the ZAP Express vector (Stratagene) using the ZAP Express cDNA synthesis kit (Stratagene) The resulting libraries contained × 105 independent phage and were amplified to 108 pfu/ml EST Sequencing and Analysis In vivo excision using ExAssist helper phage was performed on a small aliquot of each library Plasmid DNA from the resulting pBluescript SK phagemid clones (cephalothorax library) or pBK-CMV phagemid clones (eyestalk and pleopod libraries) was prepared using spin column miniprep kits Automated cycle sequencing was performed on 300 ng of plasmid DNA using the T3 sequencing primer and ABI reagents and equipment Sequence analysis and BLASTN/ BLASTX homology searches were carried out using MacVector 6.0 (Oxford Molecular Group) software and the Australian National Genome Information Service (ANGIS) BLASTX searches were performed for the top strand only on the ANGIS nonredundant protein database using the PAM120 matrix and with prefiltering of sequences using the programs SEG followed by XNU to eliminate regions of low complexity R ESULTS AND D ISCUSSION Complementary DNA Libraries A key prerequisite for an EST project is the availability of high-quality cDNA libraries Ideally, a library should be unidirectionally cloned and comprise a high proportion of long or full-length clones, representative of the entire mRNA population from the tissue used to generate the library and free of contaminating mitochondrial, ribosomal RNA, and genomic sequences (Adams et al., 1995) These parameters can be evaluated by initial sequencing of a small number of randomly selected clones from each library In the present study, unidirectional cDNA libraries were constructed from mRNA isolated from the cephalothorax, pleopods (swimming legs), or eyestalks of the black tiger shrimp P monodon The average insert size of the whole cephalothorax library was kb, of the pleopod library 1.6 kb, and of the eyestalk library 1.2 kb, which exceeds the Expressed Sequence Tags from the Black Tiger Shrimp 467 minimum size of kb recommended by Adams et al (1995) To assess the sequence content, phagemid inserts were excised in vivo from small aliquots of the libraries, and a number of randomly selected clones were sequenced from the 5Ј region of the clones GenBank and EMBL databases (without EST sequences) were searched for homologies with shrimp ESTs using the BLASTN program (Altschul et al., 1990) Protein databases were also searched with conceptual translations of the EST sequences using the program BLASTX (Gish and States, 1993) Probabilities 70% A+T; K Wilson, unpublished data), which may lead to mitochondrial RNA molecules binding to the oligo(dT) columns used to isolate poly(A)-containing mRNA The differing prevalence of contaminating mitochondrial genes in the three libraries probably reflects the abundance of mitochondria in the different tissues, as the eyestalk and pleopod libraries were made in parallel using exactly the same techniques Moreover, the cephalothorax library, although made separately, used very similar protocols Considerable variation in the prevalence of mitochondrial genes in different cDNA libraries was also observed by Adams et al (1995) Another useful criterion for evaluating cDNA libraries is “distinctness,” or the proportion of sequences that are clearly different from each other (Adams et al., 1995) Of the nuclear genes, 59.8% were distinct in the cephalothorax library, and 81.8% in the eyestalk library (Table 1) The figure for the cephalothorax library is heavily affected by the extreme abundance of hemocyanin clones In general, such highly abundant clones can be avoided by prescreening the library prior to further sequencing (Adams et al., 1995) In the present study, after the first five hemocyanin and five mitochondrial clones had been identified by sequencing, the cephalothorax library was prescreened with a hemocyanin probe and a P monodon 10-kb mitochondrial probe containing both the large and small subunit mitochondrial rRNAs as well as a number of other mitochondrial genes (K Wilson, unpublished data) This identified a further 19 hemocyanin clones and mitochondrial clones Following prescreening, there were 77.2% distinct nuclear clones in the cephalothorax library (Table 1) Identification of Shrimp Homologues for Well-Characterized Sequences On the basis of homology searches, 48 of 83 nuclear sequences analyzed from the cephalothorax cDNA library, 22 of 55 from the eyestalk library, and of 13 from the pleopod library could be assigned a possible identity (Tables and 2) Most of the identified sequences were represented only once The most prominent exception was the hemocyanin gene, as discussed above In addition, homologues of arrestin, GTP binding proteins, members of the thrombospondin family, as well as a number of muscle-related RNAs, arginine kinase, myosin light and heavy chains, tropomyosin and actin, were all found up to four times (Table 2) Unidentified Sequences The sequences that did not show significant homologies in the initial BLAST searches were compared with the GenBank EST databases This search found another four significant homologies, three from the cephalothorax library and one from the eyestalk library (Table 2) In addition, one of the eyestalk sequences showed homology to a hypothetical protein of unknown function identified from the Caenorhabditis elegans (nematode) genomic sequencing project The remaining “unidentified” EST sequences were aligned with each other at the nucleic acid level using MacVector 6.0 to look for families of related sequences Only scores higher than 1000 were considered significant The shrimp cephalothorax cDNA library contained a family of 11 highly related sequences that did not match any se- 468 Sigrid A Lehnert et al Table Complementary cDNA Library Statistics Results of database searches† Composition of cDNA library* Cephalothorax Eyestalk Pleopod Nuclear genes Nuclear rRNA mt rRNA 102/112 (91.0%) 55/56 (98.2%) 13/32 (40.6%) 9/112 (8.0%) 1/56 (1.8%) 14/32 (43.8%) 0 Other mt genes Nuclear genes with matches to database sequences 1/112 (0.9%) 48/83 (57.8%) 22/55 (40.0%) 6/13 (46.2%) 5/32 (15.6%) Nuclear genes with matches to anonymous database sequences Nuclear genes of high sequence complexity with no database matches cDNAs with microsatellite sequences 3/83 (3.6%) 2/55 (3.6%) 0.0% 31/83 (37.3%) 25/55 (45.5%) 5/13 (38.5%) 1/83 (1.2%) 6/55 (10.9%) 2/13 (15.4%) Distinct nuclear genes‡ 61/102 or 61/79 (59.8%) or (77.2%) 48/55 (81.8%) 12/13 (91.7%) *Data for the cephalothorax library are based on a combination of sequence and hybridization data: 88 clones were sequenced, and a further 19 hemocyanin clones and mitochondrial clones were identified by dot blot hybridization † Percentages for the cephalothorax library are based on the total number of sequenced nuclear clones, not the total number of nuclear clones ‡ Two figures are given for the cephalothorax library The first is based on the total number of identified nuclear clones, and the second is the figure that would result if all hemocyanin clones were eliminated by prescreening Table Homologies to Shrimp ESTs Identified in Database Searches Clone ID Sequence length (bp) Library Redundancy Data base Database accession Species with closest homology Probability Acidic ribosomal phosphoprotein Elongation factor 1␣ Elongation factor Heat shock protein hsp 60 Ribosomal protein L18 Ribosomal protein L27a Ribosomal protein L7a Ribosomal protein S1a Ribosomal protein S2 Ribosomal protein S7 M17885 Human 1.00E-39 394 X77689 M86959 X99341 Zebra fish Nematode Fruit fly 3.60E-78 5.10E-84 1.40E-13 1036 449 283 L04128 U66358 X62640 X57322 U01334 L20096 1.80E-24 9.10E-58 1.30E-71 1.10E-65 2.50E-120 9.80E-33 412 782 597 913 1386 442 S57432 Mouse Fruit fly Chicken Frog Fruit fly Tobacco hornworm Frog 1.10E-47 474 U18973 Fruit fly 5.60E-41 604 X69422 Wild oat 2.00E-152 1957 Lobster: H vulgaris Lobster: H.vulgaris Fish Fruit fly Fruit fly Fruit fly 5.70E-109 1404 1.10E-36 513 4.90E-96 1.90E-41 4.00E-16 4.20E-54 748 624 316 756 Gene C GB AIMS-P.mon66 SAL078 SAL010 290 767 295 P C C GB GB GB AIMS-P.mon19 SAL012 SAL111 SAL039 SAL084 SAL063 302 538 857 505 857 767 E C C C C C GB GB GB GB GB GB AIMS-P.mon30 379 E GB AIMS-P.mon52 432 E GB AIMS-P.mon26 569 E GB Ribosomal small subunit protein (40S) Protein disulfide isomerase Ubiquitin Muscle-related proteins AIMS-P.mon44 389 E GB Arginine kinase X68703 SAL048 677 C GB Arginine kinase X68703 SAL104 SAL053 AIMS-P.mon21 AIMS-P.mon4 836 817 296 413 C C E E GB GB GB GB Actin (muscle) Muscle LIM protein Muscle LIM protein 84B Myosin alkali light chain D87740 X81192 X91245 L08052 2 Expressed Sequence Tags from the Black Tiger Shrimp 469 Protein synthesis and processing SAL109 812 Score Clone ID Sequence length (bp) Library AIMS-P.mon58 203 AIMS-P.mon1 SAL033 SAL096 SAL088 Gene Database accession Species with closest homology Probability SP Myosin alkali light chain Q24755 Fruit fly 1.60E-10 GB GB GB GB Myosin (fast) heavy chain Myosin heavy chain Myosin light chain Sarco/endoplasmic reticulum Ca2+ ATPase Sarcoplasmic calcium binding protein Sarcoplasmic calcium binding protein Tropomyosin U03091 M61229 L08051 AF025848 4.00E-43 3.00E-108 8.30E-51 2.20E-107 637 1427 737 1417 9.10E-56 184 AF014951 Lobster: H americanus Fruit fly Fruit fly Crayfish: P clarkii Shrimp: Penaeus sp Fruit fly 5.60E-09 234 AF034954 Lobster: H americanus 1.40E-117 1540 Y07894 Q06185 M11259 Fruit fly Mouse Fruit fly 3.20E-34 8.10E-07 4.7 E-67 535 100 903 NUFM_BOVIN Cow 1.10E-42 235 S67973 Human 3.80E-84 1102 Q62425 Mouse 1.90E-16 145 X85127 X15800 X86369 Shrimp: P.vannamei Rat Shrimp: P.vannamei 5.50E-163 1.10E-11 2.80E-122 1274 266 875 Data base P 271 684 867 832 E C C C SAL054 977 C SP SAL101 854 C GB SAL097 881 C Energy generation SAL042 AIMS-P.mon68 AIMS-P.mon57 and metabolism 632 C 365 P 378 P Redundancy GB GB SP GB SAL106 737 C SP AIMS-P.mon8 395 E GB AIMS-P.mon59 210 P SP Digestive enzymes SAL099 SAL028 SAL107 741 707 809 C C C GB GB GB ATP synthetase alpha subunit ATP synthase E chain Glyceraldehyde-3-phosphate dehydrogenase NADH ubiquinone oxidoreductase 13 kDa subunit NADH:ubiquinone oxidoreductase flavoprotein subunit NADH-ubiquinone oxidoreductase MLRQ subunit Cathepsin L-like cysteine proteinase Pyruvate kinase Trypsin SCPB_PENSP Score 83 470 Sigrid A Lehnert et al Table Continued Table Continued Clone ID Sequence length (bp) Library Neurosensory/endocrine system AIMS-P.mon10 486 E AIMS-P.mon31 599 E SAL040 509 C SAL068 674 C SAL117 772 C AIMS-P.mon3 409 E Redundancy 2 Data base Gene GB GB GB GB GB SP Beta-arrestin Arrestin Assembly protein 180 Cysteine string protein GTP binding protein GTP binding protein C E GB GB AIMS-P.mon7 SAL087 448 857 E C GB GB Opsin Phenylalanine/tryptophan hydroxylase Phospholipase C Ubiquitin carboxyl-terminal hydrolase Other SAL006 SAL114 SAL027 AIMS-P.mon39 AIMS-P.mon13 300 734 519 402 403 C C C E E GB GB GB GB SP Actin (cytoplasmic) Aldose reductase Cartilage oligomeric matrix protein Clathrin heavy chain Dynactin AIMS-P.mon49 SAL046 AIMS-P.mon20 AIMS-P.mon25 SAL083 SAL060 357 628 411 566 810 467 E C E E C C PIR GB SP GB GB SP Exoskeletal protein Hemocyanin Hemocyte transglutaminase Histone H1 Low density lipoprotein receptor Metallothionin-1 SAL043 671 C GB SAL005 484 C GB Putative transcriptional regulator (CON7) Receptor for activated kinase (RACK1) 24 Species with closest homology Probability Score M33601 M30140 X68878 S81917 M33141 GBGB HUMAN X71665 M32802 Cow Fruit fly Rat Rat Cow Human 1.20E-40 9.30E-46 9.30E-16 7.80E-42 8.60E-86 2.00E-08 451 675 313 296 1106 113 Mantis Fruit fly 1.90E-54 2.90E-46 780 672 J03138 M30496 Fruit fly Human 2.50E-84 4.10E-15 1111 132 U09635 M59754 X74326 U60803 DYNC_HUM AN S77934 X82502 Q05187 D87065 M11501 MT1_HOMA M AF015771 Sea urchin Cow Cow Human Human 3.00E-70 5.00E-55 1.90E-05 1.20E-89 6.40E-22 775 787 144 1175 154 Lobster: H americanus Shrimp: P vannamei Horseshoe crab: T tridentatus Wheat Rabbit Lobster: H americanus 3.4E-18 2.40E-117 5.6E-32 1.70E-17 1.10E-03 3.00E-36 169 1148 242 335 170 299 Fungus 3.10E-08 224 Fish 9.70E-25 194 AF025331 Expressed Sequence Tags from the Black Tiger Shrimp 471 SAL026 679 AIMS-P.mon17 400 Database accession 472 Sigrid A Lehnert et al Table Continued Clone ID Sequence length (bp) Library SAL072 SAL082 SAL008 AIMS-P.mon63 791 754 311 249 C C C P Matches to anonymous database sequences SAL065 665 C SAL075 807 C SAL100 738 C AIMS-P.mon2 427 E AIMS-P.mon16 310 E Redundancy Gene Database accession Species with closest homology Probability GB GB GB GB Thrombospondin-1 Thrombospondin-4 U2 snRNP-specific A’protein Voltage-dependent anion-selective channel U76994 Z19585 X13482 U70314 Chicken Human Human Fruit fly 1.40E-13 1.50E-37 8.80E-36 7.9E-27 289 369 539 434 GB (EST) GB (EST) GB (EST) GB (EST) TREMBL EST-mouse skin EST-fruit fly head EST-fruit fly head EST-human fetal heart Genomic sequence-nematode AA512039 AA699176 AI062868 AA009415 E276071 Mouse Fruit fly Fruit fly Human Nematode 1.30E-10 5.00E-05 1.70E-06 5.70E-14 86 252 185 203 289 86 Data base Score *When multiple significant similarities were found for a single cDNA, only the highest scoring hit is included in the table When more than one clone from a library matched the same gene, only the highest scoring clone from each library is listed and the total number of “hits” (i.e., including the listed one) is indicated in the “redundancy” column †Abbreviations for cDNA libraries: C indicates cephalothorax; E, eyestalk; P, pleopod ‡All shrimp EST sequences described in this article, including those without database matches, have been submitted to GenBank (ESTs) The accession numbers for AIMS-P.mon1–AIMS-P.mon55 (eyestalk library) are AI253798–AI253852, for AIMS-P.mon56–AIMS-P.mon68 (pleopod library) are AI253853–AI253865, and for SAL001–SAL118 (cephalothorax library) are AI254886–253953 Databases: GB indicates GenBank (no ESTs); GB(EST), GenBank dbEST; PIR, Protein Information Resource; SP, SwissPROT; TREMBL, translated from EMBL Expressed Sequence Tags from the Black Tiger Shrimp 473 Table Microsatellite sequences from Shrimp EST Sequences Clone Type of microsatellite Nucleotide repeat unit SAL001* SAL015 Perfect Compound imperfect Dinucleotide Di- and teteranucleotide AIMS-P.mon12 Compound perfect and compound imperfect Imperfect Compound perfect Imperfect Perfect Imperfect Imperfect/perfect AIMS-P.mon27 AIMS-P.mon35 AIMS-P.mon45 AIMS-P.mon47 AIMS-P.mon56 AIMS-P.mon60 Microsatellite sequence (GA)56 (GA)25GT(GA)2GTGAAA(GA)2CA(GA)2GTGA(GAAA)4 AAACA(GA)4GAAA(GA)3 Di- and tetranucleotide (ATAG)5(AG)17 and (ATGT)5ACGT(ATGT)2, (AGAT)4(TGAT)(AGAT)(AT)4(AG)4 Trinucleotide (AAT)9AGT(AAT)5 Pentanucleotide/mononucleotide (TTTTC)5(T)11 Trinucleotide (AAT)7AGT(AAT)13 Dinucleotide (AG)8 Trinucleotide (ATG)4ACG(ATG)6 and (ATG)6(ACG)2(ATG)4 Trinucleotide (CCA)3TCA(CCA)3 and (GCC)6 *This clone appears to be a chimeric clone, containing mitochondrial 16S rRNA sequences associated with microsatellite sequence at the extreme 5Ј end of the clone Sequencing the complete mitochondrial genome of P monodon failed to detect this microsatellite (K Wilson, unpublished data) The 16S rRNA–like sequence in SAL001 is clearly that of a decapod crustacean, based on BLASTN analysis quences in the published databases or the ESTs from the eyestalk or pleopod cDNA libraries One further pair of unidentified related sequences was also discovered in the cephalothorax library The eyestalk library contained one family of four highly related unknown sequences that did not overlap with sequences from the other libraries One of the “unknown” sequences from the pleopod library was related to one from the eyestalk library Complementary DNAs Containing Sequences of Low Complexity A number of sequenced cDNA clones contained microsatellite sequences (Table 3) Microsatellites are commonly believed to occur primarily in noncoding DNA However, surveys of other cDNA libraries have indicated that up to 8% of clones may contain microsatellites (Khan et al., 1992; Depeiges et al., 1995; Ruyter-Spira et al., 1996) A high proportion of the P monodon cDNA microsatellites identified in this study were imperfect and compound repeats, an observation consistent with the results of other groups studying penaeid microsatellites (Tassanakajon et al., 1998) It is also noteworthy that three of the nine clones with microsatellites (AIMS-P.mon12, AIMS-P.mon56, and AIMS-P.mon60) contained two independent microsatellites, further emphasizing the complexity of microsatellite structure in decapod crustaceans When subjected to BLASTX searches without prefiltering to remove sequences of low complexity, almost all these sequences produced statistically significant database matches For example, AIMSP.mon45 showed homology to phosphatidylinositol 3-kinase from the slime mold Dictyostelium discoideum with a score of 101 and p value of 1.50E-14 However, in each case these matches are almost entirely due to long runs of a single amino acid (asparagine in the example of AIMSP.mon45), and the significance is therefore difficult to evaluate Two further eyestalk sequences (AIMS-P.mon32 and AIMS-P.mon43) and one cephalothorax sequence (SAL079) showed long stretches consisting only of A and G residues, but without a regular short repeat unit, which would allow them to be classed as microsatellites In AIMS-P.mon32, three out of the four blocks of A and G residues contain a repeat unit of 31 residues with only a single nucleotide difference between them, which perhaps could be classed as a minisatellite Five of the cephalothorax sequences (SAL024, 071, 091, 095, and 106) showed stretches of around 15 A or T residues, which had to be filtered out in order to perform database searches Only one of the ESTs containing poly(T) or poly(A) (SAL106) matched a database entry A further group of sequences did not contain obvious nucleotide repeat units, but conceptual translation of the sequence revealed significant repetition of amino acid residues, leading to apparent database matches when the 474 Sigrid A Lehnert et al BLASTX searches were run without prefiltering There are three striking examples: AIMS-P.mon9 gives rise to a proline-rich sequence with homology to proteins such as a putative proline-rich protein from the nematode Caenorhabditis elegans (TREMBL Q20001); AIMS-P.mon11 potentially encodes an arginine-rich protein with homology to a human brain protein (TREMBL D1026392) and a mouse protein kinase homologue (GenPept AF033663); and AIMS-P.mon36 may encode a glycine-rich protein with homology to the glycine-rich cell wall structural proteins of plants (e.g., SwissPROT GRP1 PETHY) Similarly, SAL007 may encode a proline- or glycine-rich protein, and SAL041 could give rise to a very arginine-rich sequence with homologies to sperm protamines (e.g., SwissPROT HSP1 TACAC) Tissue-Specific Expression of Shrimp Sequences Shrimp ESTs that were assigned putative identities based on homology searches are listed in Table The ESTs are broadly grouped according to functional category, to highlight the distribution of these initial ESTs between the different cDNA libraries and, hence, tissue types The shrimp cephalothorax contains all the major organ systems (digestive system, nervous system, hepatopancreas, gills, gonads, endocrine system, and heart); therefore, the cephalothorax library would be expected to contain a large diversity of transcripts The most abundant transcript identified from the cephalothorax was hemocyanin (24 of 112 clones when assayed by dot blot hybridization) As the hepatopancreas is the main site of synthesis of hemocyanin (Rainer and Brouwer, 1993), and as hemocyanin is a highly abundant protein, the prevalence of transcripts of this gene would be predicted The protein encoded by the metallothionein homologue may be linked with hemocyanin activity as it is probably involved in reactivation of copperdepleted hemocyanin (Brouwer et al., 1989) A number of sequences showed homology to genes involved in general cellular maintenance, such as ribosomal proteins and polypeptide elongation factors A preponderance of transcripts from muscle-specific genes, such as actin, myosin, and the sarcoplasmic calcium-binding protein, and of digestive enzymes such as trypsin, is to be expected in the cephalothorax library Penaeus monodon homologues of the thrombospondin family of genes were identified three times during the BLAST analysis (SAL027, SAL072, and SAL082) Thrombospondins and the related cartilage oligomeric matrix protein constitute a family of glycoproteins that appear to be involved in cell-to-cell and cell-to-matrix adhesion Alignment of SAL027, SAL072, and SAL082 with thrombospondin peptide sequences showed that the ESTs aligned with the COOH-terminal region and the type calcium binding repeats of the thrombospondin molecule The three ESTs therefore appear to code for truncated crustacean homologues of the thrombospondin family of molecules Other transcripts potentially encode proteins related to activity of the shrimp nervous system; e.g., homologues of the cysteine string protein have been associated with neural activity in Drosophila, the marine ray, and rats (Braun and Scheller, 1995) Expression of the clathrin assembly protein 180 is restricted to neuronal tissue in mammals (Morris et al., 1993) A transcript encoding a homologue of the retinal pigment opsin was detected most likely because eyestalks were included in the cephalothorax tissue used to make this library The eyestalk of P monodon contains both optical and endocrine organs as well as associated tissues such as muscle and the vascular system At the terminus of the eyestalk lies the compound eye, consisting of photoreceptor and pigment cells, from which the optic ganglion runs through the eyestalk to the brain At the base of the eyestalk lies the medulla terminalis X-organ, the site of synthesis of a number of hormones, many of which are stored in the sinus gland, a neurohemal organ located around the midpoint of the eyestalk The eyestalk also contains the organ of Bellonci, which is believed to be sensory or secretory in nature (Fingerman, 1992) In the eyestalk, homologues of a number of genes involved in general cellular metabolism were identified (e.g., ubiquitin, histone, and ribosomal proteins), as well as a group involved in muscle action (e.g., arginine kinase and the myosin molecules) The remaining genes appeared to be related to specific cell types within the eyestalk The most common sequence homology identified among the eyestalk ESTs was to arrestin (AIMS-P.mon10, AIMS-P.mon18, AIMS-P.mon31, AIMS-P.mon37) Arrestins are a family of proteins involved in desensitization of various G-protein-coupled receptors In vertebrates, there are two broad classes: the visual arrestins, which inactivate rhodopsin, and the -arrestins, which inactivate the -adrenergic receptor Two different arrestin molecules have been identified in the Drosophila visual system (Matsumoto and Yamada, 1991), and two different classes have also been identified in insect antennae (Raming et al., 1993) Three of the four shrimp clones appear to contain the 5Ј end of the gene and hence can be aligned with each other This indi- Expressed Sequence Tags from the Black Tiger Shrimp 475 cates clear differences between the sequence encoded by AIMS-P.mon10 and that encoded by AIMS-P.mon18 and AIMS-P.mon38, suggesting that two different types of arrestins may also have been identified in the shrimp However, the functional significance of these different arrestin sequences in invertebrates remains unclear Another gene that may be involved in vision is the phospholipase C (PLC) homologue In Drosophila, the PLC gene is specifically expressed in the retina, and mutations in this gene render the flies blind (Bloomquist et al., 1988) Three of the other eyestalk ESTs with database identifications encode homologues of genes that could play a role in the neuroendocrine system Clathrin is required for receptor-mediated endocytosis via coated pits, and molecules taken up by this pathway in mammals include growth factors such as epidermal growth factor Protein disulfide isomerase is generally required to assist in formation of disulfide bonds during passage of secreted proteins through the endoplasmic reticulum, and hence might be expected to be relatively abundant in neurosecretory tissue Phenylalanine/tryptophan hydroxylase activities are encoded by the same locus in Drosophila, with the different activities thought to be regulated by different posttranslational modifications Expression of this gene is seen in Drosophila neural tissue, most likely because tryptophan and phenylalanine hydroxylase activity are required for biosynthesis of the biogenic amines serotonin and dopamine, respectively (Neckameyer and White, 1992) Only very limited EST analysis was undertaken of the pleopod library, owing to the extremely high number of contaminating mitochondrial sequences However, of the six sequences that were identified, all are homologous to sequences that would be expected to be abundant in muscle tissue: four are involved in energy generation and mitochondrial function, one is a muscle structural protein, and one is involved in protein synthesis It is likely that the sequences for which no matches were found in the databases also included some genes with tissue specificity Several of these unidentified ESTs were highly abundant in the libraries from which they were isolated (i.e., appeared more than once in a small sample of random sequences) and yet were not found in the other libraries The only genes that were found in common between different libraries were the muscle-specific transcripts arginine kinase, myosin light and heavy chains, and one of the muscle LIM genes, and a GTP-binding protein (Table 2) In addition, separate nuclear-encoded subunits of the multi- meric mitochondrial enzymes ATP synthease and NADH dehydrogenase were found in different libraries, and genes involved in protein synthesis, namely elongation factors and ribosomal proteins, were found in all libraries C ONCLUSIONS The sequences presented in this article represent tagging of at least 60 new genes with putative database matches from P monodon, 49 of which have not previously been identified in crustaceans In addition, at least 42 distinct sequences with no database matches were detected, representing either completely novel functions, or genes with sequences that are too diverged from genes of known sequence and similar function in other organisms to enable database matching Hence, the cephalothorax and eyestalk cDNA libraries proved suitable for EST analysis The domestication of P monodon has progressed to a point where it is now possible to construct gene maps with a view to mapping traits of economic importance Shrimp ESTs will contribute type I anchor loci to the current genemapping projects and will assist in the construction of cross-species genetic maps Options for mapping ESTs include intron-length polymorphisms (Palumbi and Baker, 1998), single-strand conformation polymorphisms (SSCPs) (Brady et al., 1997), and those microsatellite repeats within cDNAs that are suitable for development into polymorphic markers (Khan et al., 1992) Mapping of ESTs will enable cross-species comparison of shrimp genome maps, as different species of penaeid prawns (P vannamei, P japonicus, and P monodon) have proved too divergent to allow transfer of type II markers such as microsatellite loci (Moore et al., 1999) To further study the physiological and developmental significance of the sequences that have no known function in shrimp, we are also in the process of characterizing the spatial and temporal expression patterns of these sequences A CKNOWLEDGMENTS We thank Zahra Fayazi for assistance with producing some of the sequences presented in this article This is contribution number 957 from the Australian Institute of Marine Science R EFERENCES Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., and Venter, J.C 476 Sigrid A Lehnert et al (1991) Complementary DNA sequencing: expressed sequence tags and the human genome project Science 252:1651–1656 Adams, M.D., Kerlavage A.R., Fleischmann R.D., Fuldner R.A., et al (1995) Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence Nature 377(Suppl):3–174 Altschul, S.F., Gish, W., Miller, W., Myers, E.G., and Lipman, D.J (1990) Basic local alignment search tool J Mol Biol 215:403–410 Anderson, I., and Brass, A (1998) Searching DNA databases for similarities to DNA sequences: when is a match significant? Bioinformatics 14:349–356 Benzie, J.A.H (1998) Penaeid genetics and biotechnology Aquaculture 164:23–47 Bloomquist, B.T., Shortridge, R.D., Schneuwly, S., Perdew, M., Montell, C., Steller, H., Rubin, G., and Pak, W.L (1988) Isolation of a putative phospholipase C gene of Drosophila, norpA, and its role in phototransduction Cell 54:723–733 Brady, K.P., Rowe, L.B., Her, H., Stevens, T.J., Eppig, J Sussman, D.J., Sikela, J., and Beier, D.R (1997) Genetic mapping of 262 loci derived from expressed sequences in a murine interspecific cross using single-strand conformational polymorphism analysis Genome Res 7:1085–1093 Braun, J.E., and Scheller, R.H (1995) Cysteine string protein, a DnaJ family member, is present on diverse secretory vesicles Neuropharmacology 34:1361–1369 Brouwer, M., Winge, D.R., and Gray, W.R (1989) Structural and functional diversity of copper-metallothionins from the American lobster Homarus vulgaris J Inorg Biochem 35:289–303 Browdy, C (1998) Recent developments in penaeid broodstock and seed production technologies: improving the outlook for superior captive stocks Aquaculture 164:3–21 Chomczynski, P., and Sacchi, N (1987) Single-step method of RNA isolation by acid guanidium thiocyanate-phenol-chloroform extraction Anal Biochem 162:156–159 Depeiges, A., Goubely, C., Lenoir, A., Cocherel, S., Picard, G., Raynal, M., Grellet, F., and Delseny, M (1995) Identification of the most represented repeated motifs in Arabidopsis thaliana microsatellite loci Theor Appl Genet 91:160–168 Fields, C (1994) Analysis of gene expression by tissue and developmental stage Curr Opin Biotechnol 5:595–598 Fingerman, M (1992) Glands and secretion In: Harrison, F.W., and Humes, A.G (eds.) Microscopic Anatomy of Invertebrates: Volume 10, Decapod Crustacea, New York: Wiley-Liss, 345–394 Gish, W., and States, D.J (1993) Identification of protein coding regions by database similarity search Nature Genet 3:266–272 Khan, A.S., Wilcox, A.S., Polymeropoulos, M.H Hopkins, J.A., Stevens, T.J., Robinson, M., Orpana, A.K., and Sikela, J.M (1992) Single pass sequencing and physical and genetic mapping of human brain cDNAs Nature Genet 2:180–185 Marra, M.A., Hillier, L., and Waterston, R.H (1998) Expressed sequence tags—ESTablishing bridges between genomes Trends Genet 14:4–7 Matsumoto, H., and Yamada, T (1991) Phosrestins I and II: arrestin homologs which undergo differential light-induced phosphorylation in the Drosophila photoreceptor in vivo Biochem Biophys Res Commun 177:1306–1312 Moore, S.S., Whan, V., Davis, G.P., Byrne, K., Hetzel, D.J.S., and Preston, N.P (1999) The development and application of genetic markers for the Kuruma prawn Penaeus japonicus Aquaculture 173:19–32 Morris, S.A., Schroder, S., Plessman, U., Weber, K., and Ungewickell, E (1993) Clathrin assembly protein AP180: primary structure, domain organization and identification of a clathrin binding site EMBO J 12:667–675 Neckameyer, W.S., and White, K (1992) A single locus encodes both phenylalanine hydroxlase and tryptophan hydroxylase activities in Drosophila J Biol Chem 267:4199–4206 Palumbi, S.R., and Baker, C.S (1994) Contrasting population structure from nuclear intron sequences and mtDNA of humpback whales Mol Biol Evol 11:426–435 Rainer, J., and Brouwer, M (1993) Hemocyanin synthesis in the blue-crab Callinectes sapidus Comp Biochem Physiol B 104:69–73 Raming, K., Freitag, J., Krieger, J., and Breer, H (1993) Arrestinsubtypes in insect antennae Cell Signal 5:69–80 Rosenberry B (1997) World Shrimp Farming 1997 San Diego, Calif.: Shrimp News International Ruyter-Spira, C.P., Crooijmans, R.A., Dijkhof, R.M., Van Oers, P.M., Strijk, J.A., Van der Poel, J., and Groenen, M.M (1996) Development and mapping of polymorphic microsatellite markers derived from a chicken brain cDNA library Anim Genet 27:229– 234 Sambrook, J., Fritsch, E.F., and Maniatis, T (1989) Molecular Cloning: A Laboratory Manual, 2nd ed New York: Cold Spring Harbor Laboratory Press Tassanakajon, A., Tiptawonnukul, A., Supungul, P., Rimphanitchayakit, V., Cook, D., Jarayabhand, P., Klinbunga, S., and Boonsaeng, V (1998) Isolation and characterization of microsatellite markers in the black tiger prawn Penaeus monodon Mol Mar Biol Biotechnol 7:55–61