Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 102 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
102
Dung lượng
862 KB
Nội dung
The transposable elements of the Drosophila melanogaster euchromatin – a genomics perspective Joshua S Kaminker1,8, Casey M Bergman2,8, Brent Kronmiller2,7, Joseph Carlson2, Robert Svirskas3, Sandeep Patel2, Erwin Frise2, David A Wheeler5, Suzanna Lewis1, Gerald M Rubin1,2,4, Michael Ashburner6,9 and Susan E Celniker2 Department of Molecular and Cellular Biology, University of California, Berkeley, CA 94720, 2Drosophila Genome Project, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, 3Amersham Biosciences, 2100 East Elliot Rd., Tempe, AZ 85284, 4Howard Hughes Medical Institute, 5Human Genome Sequencing Center and Department of Molecular and Cell Biology, Baylor College of Medicine, Houston, TX 77030, 6Department of Genetics, University of Cambridge, England, CB2 3EH Current Address: Department of Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011 These authors contributed equally to this work Corresponding author: Michael Ashburner, Department of Genetics, Downing Street, Cambridge, England CB2 3EH phone: +44 1223-333969 fax: +44 1223-333992 email: ma11@gen.cam.ac.uk Running Title: Transposable elements in the Drosophila euchromatin Background Transposable elements are found in the genomes of nearly all eukaryotes The recent completion of the Release genomic sequence of Drosophila melanogaster by the Berkeley Drosophila Genome Project has provided precise sequence for the repetitive elements in the Drosophila euchromatin We have used this genomic sequence to describe the euchromatic transposable elements in the sequenced strain of this species Results We identified 85 known and novel families of transposable element in the Release euchromatic sequences; these vary in copy number between one and 147 Three more families are known in the heterochromatin (unpublished) A total of 1,572 transposable elements were identified, comprising 3.86% of the Release sequences More than twothirds of the transposable elements identified are partial The density of transposable elements is higher on chromosome than on the major chromosome arms while the density on the X chromosome is similar to that on the major autosome arms The density of the three major classes of transposable elements (LTR, LINE-like, and TIR) is markedly higher in the proximal Mb of each chromosome arm, reflecting the transition from euchromatin to heterochromatin; the high density on chromosome is due only to LINE-like and TIR elements Transposable elements are preferentially found outside genes; only 436 of 1,572 transposable elements are contained within the 61.4 Mb of sequence that is annotated as being transcribed A large proportion of transposable elements are found nested within other elements of the same or different classes Analysis of structural variation of elements from different families reveals distinct patterns of deletion for each class Along with being the most abundant class and the class with the highest proportion of complete elements, the low level of sequence diversity in LTR families suggests that on average members of LTR families share a more recent common ancestors than LINE-like or TIR families Conclusions This analysis represents the first complete characterization of the transposable elements in the Release euchromatic genomic sequence of Drosophila Melanogaster and provides a data set freely available from the BDGP for which future analyses can be performed Introduction Transposable element sequences are abundant yet poorly understood components of almost all eukaryotic genomes (Craig et al 2002) As a result, many biologists have an interest in the description of transposable elements in completely sequenced eukaryotic genomes The evolutionary biologist wants to understand the origin of transposable elements, how they are lost and gained by a species and the role they play in the processes of genome evolution; the population geneticist wants to know the factors that determine the frequency and distribution of elements within and between populations; the developmental geneticist wants to know what roles these elements may play in either normal developmental processes or in the response of the organism to external conditions; finally, the molecular geneticist wants to know the mechanisms that regulate the life cycle of these elements and how they interact with the cellular machinery of the host It is for all of these reasons and more that a description of the transposable elements in the recently completed Release genomic sequence of D melanogaster is desirable Our understanding of transposable elements owes much to research in Drosophila Over 75 years ago, Milislav Demerec discovered highly mutable alleles of two genes in D virilis, miniature and magenta (Demerec 1926; 1927; reviewed in Demerec 1935; Green 1976) Both genes were mutable in soma and germ-line and, for the miniature-3alpha alleles, dominant enhancers of mutability were also isolated by Demerec In retrospect, it seems clear that the mutability of these alleles was the result of transposition of mobile elements The dominant enhancers may have been particularly active elements or mutations in host genes that affect transposability (see below) There matters stood until McClintock's analysis of the Ac and Ds factors in maize which led to the discovery of transposition (McClintock 1950), and the discovery of insertion elements in the gal operon of Escherichia coli (see Starlinger 1977) Green (1977) synthesized the available evidence to make a strong case for insertion as a mechanism of mutagenesis in Drosophila Concurrently, Hogness' group had begun a molecular characterization of two elements in D melanogaster, 412 and copia (Rubin et al 1976; Finnegan et al 1978) and provided evidence that they were transposable (Ilyin et al 1978; Strobel et al 1979; Young 1979) Glover (1977) unknowingly characterized the first eukaryotic transposable element at the molecular level, the insertion sequences of 28S rRNA encoding genes The discovery of male recombination (Hiraizumi 1971), and two systems of hybrid dysgenesis in D melanogaster (see Kidwell 1979) bridged the gap between genetic and molecular analyses The discovery of the transposable elements that cause hybrid dysgenesis, the P element (Bingham, Kidwell and Rubin 1981) and the I element (Bucheton et al 1984), led to the first genomic analyses of transposable elements in a eukaryote The publication of the Release genomic sequence in March 2000 (Adams et al 2000) and the Release genomic sequence in October 2000 encouraged several studies on the genomic distribution and abundance of transposable elements in D melanogaster (Berezikov et al 2000; Jurka 2000; Bowen and McDonald 2001; Rizzon et al 2002; Bartolome et al 2002) Unfortunately, neither release was suitable for rigorous analysis of its transposable elements In the whole genome shotgun assembly process, repetitive sequences (including transposable elements) were masked by the SCREENER algorithm and remained as gaps between unitigs (Myers et al 2000) During the repeat resolution phase of the whole genome assembly, an attempt was made to fill these gaps However, comparisons of small regions sequenced by the clone-by-clone approach versus the whole genome shotgun method show that this process did not produce accurate sequences for transposable elements (Myers et al 2000; Benos et al 2001) These results demonstrate that rigorous analyses of the transposable elements, or any other repetitive sequence, requires a sequence of higher quality, now publicly available as Release (Celniker et al 2002) For the first time, the nature, number and location of the transposable elements can reliably be analyzed in the euchromatin of D melanogaster Results and Discussion Identification of known and novel transposable elements Eukaryotic transposable elements are divided between those that transpose via an RNA intermediate, the retrotransposons (class I elements), and those that transpose by DNA excision and repair, the non-retrotransposons (class II elements, Craig et al 2002) Within the retrotransposons, the major division is between those that possess long terminal repeats (LTR elements) and those that not [LINE-like elements and SINE elements (Deininger 1989)] Among the non-retrotransposons, the majority transpose via a DNA intermediate, encode their own transposase and are flanked by relatively short terminally inverted repeat structures (TIR elements) Foldback elements, which are characterized by their property of reannealing after denaturation with zero-order kinetics, are quite distinct from prototypical class I or II elements, and have been included in our analyses (Truet et al 1981) Other classes of repetitive elements, such as DINE-1 (Locke et al 1999a; Locke et al 1999b; Wilder and Hollocher 2001), which are structurally distinct from all other classes, have not been included in this study While the classification of transposable elements by structural class is relatively straightforward, the taxonomy of transposable element families is somewhat arbitrary (Table 1) We used a criterion of greater than 90% identity over more than 100 bp of sequence to assign individual elements to families (see Methods) Subsequently, in order to insure proper inclusion of elements in appropriate families we generated multiple alignments for all families of transposable elements represented by multiple copies This allowed us to identify and remove spurious hits to highly repetitive regions of the genome, and it also enabled us to distinguish sequences of closely related families that share extensive regions of similarity A summary by class of the total number and number of complete transposable elements in the Release Drosophila euchromatic sequence is presented in Table 2, and detailed results for individual families of transposable elements are listed in Table Including those described here, there are 96 known families of transposable elements in D melanogaster: 49 LTR families, 27 LINE-like families, 19 TIR families and the FB family We have identified 1,572 full or partial elements from 93 of these 96 families (Table 2) In total, 3.86% (4.5 Mb) of the Release sequence is composed of transposable elements Previous analysis of both the euchromatic and heterochromatic sequences have suggested that 9% of the Drosophila genome is composed of repetitive elements (Spradling and Rubin 1981) One reason for this difference may be the proportion of transposable element sequences in heterochromatic regions is higher than the genomic average (Bartolome et al 2002; Dimitri et al 2002) As shown in Table 2, the different classes vary in their contribution to the Drosophila euchromatin both in amount of sequence and number of elements LTR elements make up the largest proportion of the euchromatin (2.65%), more sequence than the sum of all other classes of element (LINE-like elements 0.87%, TIR elements 0.31%, and FB elements 0.04%) LTR elements are also the most numerous class of transposable elements in the euchromatic sequences (683) followed by LINE-like (485), TIR (372), and FB (32) elements Thus, LTR elements are the most abundant class in the Drosophila euchromatin, both in terms of number (Rizzon et al 2002; Bartolome et al 2002) and amount of sequence The average size of all transposable elements in our study is 2.9 Kb, smaller than the 5.6 Kb average length of middle repetitive DNA, estimated from reassociation kinetics (Manning, Schmid and Davidson 1975) This difference is in part a consequence of the fact that LTR element sequences, which are the most abundant class, are on average significantly longer than either LINE-like or TIR elements (Figure 1) The average lengths of genomic LTR, LINE-like, and TIR element sequences are 4.5, 2.1, and 0.9 Kb, respectively The greater average length of LTR elements in the genome is in part because the average length of canonical LTR sequences (6.5 Kb) is longer than LINE-like (4.7 Kb) or TIR (2.1 Kb) element sequences (Table 3) The numbers of transposable elements in the largest families are more than an order of magnitude greater than those in the smallest families (Table 3) The largest family of LTR elements is roo (147 copies), the largest family of LINE-like elements is jockey (69 copies), and the largest family of TIR elements is 1360 (105 copies) The mean (median) number of elements per family for the LTR, LINE-like, and TIR classes of elements is: 13.9 (9), 18.0 (10), and 17.7 (7), respectively Over all classes of element, the mean number of elements per family is 16.0, and the median is Based on a definition of fewer than eight elements per family, 39 of the 96 (40.1%) transposable element families are low copy number (Table 3) Three of these 42 families were not found in the Release sequence though they have been reported in D melanogaster It is not surprising that we did not find any P elements, since the sequenced strain was selected to be free of them We did not find R2 and ZAM elements in the euchromatin, but we identified them in unmapped scaffolds, that presumably derive from the heterochromatin The R2 element has previously been shown to be found only within the 28S rDNA locus and in heterochromatin (Jakubczak et al 1991) Strains of D melanogaster are known to exist in which ZAM elements occur in low copy number in heterochromatic sequences (Baldrich et al 1997) The absence of the telomere-associated HeT-A and TART from all chromosomes except chromosome is not unexpected, as the 10 2R 870,914 4.29% 313 84 26.84% 15.42 89 3L 938,947 4.02% 288 100 34.72% 12.33 66.5 3R 866,971 3.11% 288 103 35.76% 10.33 24.5 127,874 10.33% 102 8.82% 82.40 - Total 4,513,296 1,572 480 30.53% 13.46 57.70 Average LTR 3.86% X 629,601 2.89% 135 55 40.74% 6.2 19.00 2L 603,536 2.72% 127 67 52.76% 5.72 20.00 2R 573,034 2.82% 140 54 38.57% 6.9 30.50 3L 618,441 2.65% 117 58 49.57% 5.01 24.00 3R 621,272 2.23% 154 68 44.16% 5.52 7.50 44,121 3.56% 10 40.00% 8.08 - 88 Total 3,090,005 Average LINE 683 306 2.65% 44.80% 5.85 20.20 X 137,420 0.63% 70 18 25.71% 3.21 14.50 2L 185,499 0.83% 98 20 20.41% 4.41 18.50 2R 225,984 1.11% 109 18 16.51% 5.37 37.50 3L 251,077 1.08% 106 27 25.47% 4.54 24.00 3R 176,355 0.63% 70 19 27.14% 2.51 4.50 37,399 3.02% 32 3.12% 25.85 - Total 1,013,734 485 103 21.24% 4.15 19.80 Average 0.87% 89 TIR X 45,324 0.21% 59 11.86% 2.71 14.50 2L 82,761 0.37% 76 11 14.47% 3.42 18.00 2R 69,291 0.34% 62 11 17.74% 3.05 20.50 3L 52,743 0.23% 57 12 21.05% 2.44 16.50 3R 63,359 0.23% 60 14 23.33% 2.15 12.50 44,195 3.57% 58 5.17% 46.85 - Total 357,673 372 61 15.59% 3.19 16.40 33.33% 50.00% 50.00% 37.50% 50.00% 50.00% 0.55 0.18 0.1 0.34 0.14 1.62 2.00 2.00 0.50 2.00 0.00 - 40.62% 0.27 1.30 Average FB 0.31% X 2L 2R 3L 3R 17,774 6,675 2,605 16,686 5,985 2,159 Total Average 51,884 0.08% 0.03% 0.01% 0.07% 0.02% 0.17% 12 4 32 13 0.04% 90 91 Table The transposable elements of D melanogaster The canonical length of each element (in bp) is shown in column 3, the total numbers of each family on each chromosome arm in columns 4-9, the grand totals for each family in column 10, and the numbers that are full length, partial and in the most proximal Mb of the major chromosome arms, in columns 11-13 Partial elements are defined as those whose length is less than 97% of the canonical element The average pairwise distance within each family is shown in the column 14 and the existence of EST sequences indicated in column 15 (see text for details) The Cr1a family could not be reliably aligned and therefore average pairwise distance was not computed (N.C.) Class Family LTR 17.6 1731 297 3S18 412 accord aurora blastopia blood Burdock Circe Canonical Length X 2L 2R 3L 3R # Total # Full Length # Partial 7439 4648 6995 6126 6897 7404 4263 5034 7410 6411 6356 22 0 0 12 0 0 11 7 11 1 0 10 0 0 0 0 0 0 0 12 57 31 17 22 13 18 26 13 22 39 92 # Avg Pairwise Prox Mb Distance EST 12 6 0.006 0.000 0.032 0.075 0.024 0.074 0.016 0.001 0.002 0.057 + + + + + + + + + + + copia diver diver2 Dm88 frogger GATE gtwin gypsy gypsy2 gypsy3 gypsy4 gypsy5 gypsy6 HMS-Beagle Idefix invader1 invader2 invader3 invader4 invader5 McClintock mdg1 mdg3 micropia opus qbert Quasimodo roo rooA rover springer 5143 6112 4917 4558 2483 8507 7411 7469 6841 6973 6852 7369 7826 7062 7411 4032 5124 5484 3105 4038 6450 7480 5519 5457 7521 7650 7387 9092 7621 7318 7546 0 0 1 3 36 13 0 1 5 4 22 0 3 16 0 2 0 31 0 2 0 0 3 2 1 6 31 1 30 0 0 0 0 18 0 4 27 0 0 0 0 0 0 0 1 0 0 0 0 93 30 9 32 20 2 13 26 10 16 25 16 24 14 147 11 26 0 1 1 3 2 13 16 58 32 20 1 1 25 13 12 8 89 17 2 10 5 23 3 0.002 0.002 0.032 0.015 0.077 0.038 0.000 0.067 0.038 0.041 0.005 0.054 0.022 0.023 0.053 0.044 0.068 0.068 0.002 0.012 0.009 0.010 0.003 0.016 0.012 0.045 0.035 0.061 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + LINE Stalker Stalker2 Stalker4 Tabor Tirant Transpac ZAM 7256 8119 7379 7345 8526 5249 8435 1 0 0 0 0 0 0 1 0 0 0 12 13 20 2 15 9 0 0 0 0.014 0.015 0.001 0.001 0.001 0.000 - + + + + + + + baggins BS Cr1a Doc Doc2 Doc3 F G G2 G3 G4 G5 G6 Helena HeT-A I Ivk jockey jockey2 Juan R1 R2 5453 5142 4470 4725 4789 4740 4708 4346 3102 4605 3856 4856 2042 1317 6083 5371 5402 5020 3428 4236 5356 3607 0 1 0 16 2 16 5 1 1 10 17 10 0 15 0 21 19 10 2 1 13 0 10 0 11 2 14 0 10 0 0 1 0 2 3 14 29 56 55 42 14 11 28 69 9 10 0 30 0 16 0 0 12 14 23 55 25 26 12 11 20 57 12 42 11 6 4 0.076 0.028 N.C 0.006 0.065 0.019 0.059 0.036 0.029 0.100 0.063 0.006 0.097 0.018 0.037 0.070 0.003 0.114 0.001 0.049 - + + + + + + + + + + + + + + + + + + + 94 Rt1a Rt1b Rt1c TART X 5108 5183 5443 10654 4740 11 14 0 1 13 37 17 25 5 32 16 19 22 14 12 0.053 0.095 0.177 0.049 + + + + + TIR 1360 Bari1 Bari2 HB H hopper hopper2 looper1 mariner2 NOF P pogo S S2 Tc1 transib1 transib2 transib3 transib4 1177 1728 1064 1653 2959 1435 1680 1881 912 4347 2907 2121 1736 1735 1666 2167 2844 2883 2656 10 0 5 5 9 18 11 0 12 1 16 0 2 11 2 20 0 2 2 11 11 11 12 2 0 30 1 0 0 0 11 0 105 32 24 15 17 44 51 13 21 12 10 5 11 0 14 0 95 27 23 4 13 39 37 13 20 12 48 21 13 27 4 0.087 0.002 0.043 0.114 0.069 0.048 0.042 0.048 0.144 0.013 0.020 0.079 0.268 0.085 0.000 0.082 0.088 0.152 + + + + + + + + + + + + - FB FB 1492 12 32 13 19 13 0.066 + 95 Table A comparison of the numbers of euchromatic transposable elements in the Release sequence with those estimated from natural populations and laboratory stocks by in situ hybridization These data are illustrative of the published literature, not an exhaustive survey The estimated range of copy number per family, range midpoint for each source, and midpoint averages across all sources are shown in columns 4, 5, and 6, respectively 96 Class Family # in Release Range LTR 17.6 1731 297 12 57 3S18 412 31 blood copia 22 30 gypsy HMS-Beagle Idefix mdg1 13 25 mdg3 opus 16 24 Midpoint Midpoint avg 8–17 8–15 18–35 20–30 23 32 25 12 18–38 26–32 21 31 20 14 20–43 12–25 23 12.5 11.5 26.5 25 23 32 25 12 28 29 21 31 20 14 31.5 18.5 23 12.5 11.5 26.6 11 4–20 11–23 15–27 14–22 25 5–18 12–23 11 12 17 21 18 25 11.5 17.5 11.0 12.0 20.3 18.5 25.8 14.0 24.3 3.0 11.5 15.0 97 Source 7-10 isogenic lines, Beltsville, MD 7-10 isogenic lines, Beltsville, MD 7-10 isogenic lines, Beltsville, MD lab stocks 20 isogenic lines, Raleigh, NC 182 inbred lines lab stocks lab stocks 7-10 isogenic lines, Beltsville, MD lab stocks 20 isogenic lines, Raleigh, NC nat populations 182 inbred lines 182 inbred lines lab stocks 18 inbred strains, Azerbaidjan nat populations nat populations 182 inbred lines lab stocks 7-10 isogenic lines, Beltsville, MD 20 lab stocks 17 inbred stocks, Azerbaidjan 18 inbred strains, Azerbaidjan 20 lab stocks 7-10 isogenic lines, Beltsville, MD Reference Charlesworth et al (1994) Charlesworth et al (1994) Charlesworth et al (1994) Strobel et al (1979) Montgomery et al (1987) Dominguez and Albornoz (1996) Georgiev et al (1990) Bucheton et al (1984) Charlesworth et al (1994) Strobel et al (1979) Montgomery et al (1987) Viera and Biemont (1996) Dominguez and Albornoz (1996) Dominguez and Albornoz (1996) Strobel et al (1979) Biemont and Gautier (1988) Viera and Biemont (1996) Viera and Biemont (1996) Dominguez and Albornoz (1996) Hey and Eanes (unpublished), cited in Biemont and Cizeron (1999) Desset et al (1999) Charlesworth et al (1994) Belyaeva et al (1984) Biemont et al (1988) Biemont and Gautier (1988) Belyaeva et al (1984) Charlesworth et al (1994) LINE TIR FB roo 147 Stalker Tirant 12 20 ZAM Doc F 55 42 I 28 jockey 69 1360 Bari1 105 H 24 NOF S 51 FB 32 10–15 55–75 61 2–6 3–13 6–16 0–15 1–15 12.5 65 61 11 7.5 20–30 25–30 25 27.5 41 13–21 0–15 32–40 22 41 17 7.5 36 22 19–39 2–29 8–60 25–27 21 0–2 24–91 29 15.5 34 26 21 57.5 20–30 20 25 20 63.0 4.0 9.5 7.8 25.0 34.3 12.3 29.0 29 9.8 27.0 1.0 57.5 22.5 98 lab stocks 7-10 isogenic lines, Beltsville, MD 20 isogenic lines, Raleigh, NC lab stocks 10 wild-type stocks lab stocks lab stocks lab stocks Whalen et al (1998) Charlesworth et al (1994) Montgomery et al (1987) Georgiev et al (1990) Molto et al (1996) Viggiano et al (1997) LeBlanc et al (1997) Baldrich et al (1997) lab stocks lab stock Vaurey et al (1994) di Nocera et al (1983) Hey and Eanes (unpublished), cited 18 inbred strains, Azerbaidjan lab stocks 7-10 isogenic lines, Beltsville, MD 182 inbred lines in Biemont and Cizeron (1999) Biemont and Gautier (1988) Bucheton et al (1984) Charlesworth et al (1994) Dominguez and Albornoz (1996) lab strains lab stocks 46 lab stocks and nat populations 17 inbred stocks, Azerbaidjan nat populations, Greece 182 inbred lines lab strains 10 lab stocks & nat populations Kholodilov et al (1988) Berghell and Dimitri (1996) Caggesse et al (1995) Biemont et al (1988) Zabalou et al (1994) Dominguez and Albornoz (1996) Harden (1989) Merriman et al (1995) lab strains 182 inbred lines Harden (1989) Dominguez and Albornoz (1996) ... and Hartl 1998) Transposable elements are far more abundant in the genome of A thaliana than in the euchromatic genomes of C elegans or D melanogaster In Arabidopsis, over 5,500 transposable elements. .. coordinate of any HSP in the same span A master list was then generated that contained all spans for all elements on a particular arm Any spans (for the same or different elements) that had overlapping... characterization of the transposable elements in the Release euchromatic genomic sequence of Drosophila Melanogaster and provides a data set freely available from the BDGP for which future analyses can