1. Trang chủ
  2. » Ngoại Ngữ

The transposable elements of Drosophila melanogaster – a genomics perspective.

52 1 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

version 16.0: 2002-08-26: ma The transposable elements of Drosophila melanogaster – a genomics perspective Joshua S Kaminker1,8, Casey M Bergman 2,8, Brent Kronmiller2, Joseph Carlson2, Robert Svirskas3, Sandeep Patel2, Erwin Frise2, David A Wheeler5, Suzannna Lewis1, Gerald M Rubin1,2,4, Michael Ashburner6,7 and Susan E Celniker2 Department of Molecular and Cellular Biology, University of California, Berkeley, CA 94720, 2Drosophila Genome Project, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, 3Amersham Biosciences, 2100 East Elliot Rd., Tempe, AZ 85284, 4Howard Hughes Medical Institute, 5Human Genome Sequencing Center and Department of Molecular and Cell Biology, Baylor College of Medicine, Houston, TX 77030, 6Department of Genetics, University of Cambridge, England, CB2 3EH Corresponding author These authors contributed equally to this work For correspondence: Michael Ashburner, Department of Genetics, Downing Street, Cambridge, England CB2 3EH phone: +44 1223-333969 fax: + 44 1223-333992 email: ma11@gen.cam.ac.uk Running Title: Transposable elements in the Drosophila euchromatin Abstract Transposable elements are found in the genomes of nearly all eukaryotes We have analyzed the Release genomic sequence of Drosophila melanogaster to describe the euchromatic transposable elements in the sequenced strain of this species We identified 85 known and novel families of transposable element in the Release sequences; these vary in copy number between and 146 A total of 1,573 transposable elements were identified which comprise 3.93 % of the Release sequences The density of transposable elements is higher on chromosome relative to the major chromosome arms, and transposable element abundance on the X chromosome is similar to the major autosome arms The abundance of of the three major classes of transposable elements (LTR, LINE-like, and TIR) are markedly higher in the proximal Mb of each chromosome arm, reflecting the transition from euchromatin to heterochromatin, whereas the high abundace on chromosome is due only to LINE-like and TIR elements More than two-thirds of the transposable elements identified in Release are partial Analysis of structural variation of elements from different families reveals distinct patterns of deletion for each classes A large proportion of transposable elements are found "nested" within other elements of the same or different classes Transposable elements are preferentially found outside genes; only 436 of 1,573 transposable elements are contained within the 61.4 Mb of sequences which are annotated as being transcribed The high abundance, high proportion of complete elements and low levels of sequence diversity in LTR families suggest that indvidual LTR elements are more likely to be recent insertions into the D melanogaster genome, relative to LINE-like or TIR elements This work provides a starting point for future genomic analysis of transposable elements in Drosophila Introduction Transposable element sequences are abundant yet poorly understood components of almost all eukaryotic genomes (but see Arkhipova and Meselson 2000) As a result, many biologists have an interest in the description of transposable elements in completely sequenced eukaryotic genomes The evolutionary biologist wants to understand the origin of transposable elements, how they are lost and gained by a species and the role they play in the processes of genome evolution; the population geneticist wants to know the factors that determine the frequency and distribution of elements within and between populations; the developmental geneticist wants to know what roles these elements may play in either normal developmental processes or in the response of the organism to external conditions; finally, the molecular geneticist wants to know the mechanisms that regulate the life cycle of these elements and how they interact with the cellular machinery of the host It is for all of these reasons and more that a description of the transposable elements in the recently completed Release genomic sequence of D melanogaster is desirable The contribution of Drosophila to our understanding of transposable elements is long and glorious Over 75 years ago, Milislav Demerec discovered highly mutable alleles of two genes in D virilis, miniature and magenta (Demerec 1926; 1927; reviewed in Demerec 1935; Green 1976) Both genes were mutable in soma and germ-line and, for the miniature3alpha alleles, dominant enhancers of mutability were isolated by Demerec In retrospect, it seems clear that the mutability of these alleles was the result of transposition of mobile elements; the dominant enhancers may have been particularly active elements or mutations in host genes affecting transposability (see below) There matters essentially stood until McClintock's remarkable discovery of mutable alleles in maize and their basis – transposition of the Ac and Ds factors (McClintock 1950), and the discovery, some 20 years later, of insertion elements in the gal operon of Escherichia coli (see Starlinger 1977) Green (1977) synthesized the evidence then at hand to make a strong case for insertion as a mechanism of mutagenesis in Drosophila Within a year or so Hogness' group had begun a molecular characterization of two elements in D melanogaster, 412 and copia (Rubin et al 1976; Finnegan et al 1978) and evidence that they were transposable was soon available (Ilyin et al 1978; Strobel et al 1979; Young 1979) In fact, the Hogness group had already, but unknowingly, molecularly characterized the first eukaryotic transposable element, the insertion sequences of 28S rRNA encoding genes (see Glover 1977) The discovery of male recombination (Hiraizumi 1971), and two systems of hybrid dysgenesis in D melanogaster (see Kidwell 1979), allowed the gap, then wide, between genetic and molecular analyses to be bridged The discovery of the causal transposable elements, the P-element (Bingham, Kidwell and Rubin 1981) and the I-element (Bucheton et al 1984), lead to the first genomic analyses of transposable elements in a eukaryote The publication of the Release genomic sequence in March 2000 (Adams et al 2000) and the Release genomic sequence in October 2000 encouraged several studies on the genomic distribution and abundance of transposable elements in D melanogaster (Berezikov, Bucheton and Busseau 2000; Jurka 2000; Bowen and McDonald 2001; Rizzon et al 2002; Bartolome, Maside and Charlesworth 2002) Unfortunately, neither release was suitable for rigorous analysis of its transposable elements since sequences corresponding to known transposable elements, along with other sequences known to be repetitive in the genome, were masked by the SCREENER algorithm and remained as gaps between unitigs (Myers et al 2000) During the repeat resolution phase of the whole genome assembly, an attempt was made to fill these gaps However, comparisons of small regions sequenced by the clone-by-clone approach versus the whole genome shotgun method show that this was not a very accurate process (Myers et al 2000; Benos et al 2001) It was clear that any rigorous analysis of the transposable elements, or any other repeat, required a sequence of higher quality This has now been achieved by the finishing efforts of the Berkeley Drosophila Genome Project This sequence, Release 3, is now publicly available (Celniker et al 2002) For the first time, a reliable analysis can be performed of the nature, number and location of the transposable elements in the euchromatin of D melanogaster Results and Discussion Identification of known and novel transposable elements Eukaryotic transposable elements are divided between those that transpose via an RNA intermediate (class I), retrotransposons, and those that transpose by DNA excision and repair (class II), non-retrotransposons (Craig et al 2002) Within the retrotransposons, the major division is between those that possess long terminal repeats (LTR elements) (and those that not (LINE-like elements and SINE elements (Deininger 1989)) Among the nonretrotransposons, the majority transpose via a DNA intermediate, encode their own transposase and are flanked by relatively short terminally inverted repeat structures (TIR elements) Foldback elements, which are characterized by their property of reannealing after denaturation with zero-order kinetics, are quite distinct from prototypical class I or II elements, and have been included in our analyses (Truet et al 1981) In addition, there are other classes of repetitive elements, such as INE-1 (Locke et al 1999a; Locke et al 1999b; Wilder and Hollocher 2001), which are structurally distinct from all other classes of elements, have not been included in this study While the classification of transposable elements by structural class is relatively easy, the taxonomy of transposable element families is somewhat arbitrary (Table 1) We used a criterion of greater than 90% identity over greater than 100 bp of sequence to assign individual elements to families (see Methods) Subsequently, to insure proper inclusion of elements in appropriate families we generated multiple alignments for all families of transposable elements represented by multiple copies This allowed the identification and removal of spurious hits to highly repetitive regions of the genome, and it also enabled us to distinguish sequences of closely related families that share extensive regions of similarity A summary by class of the total number and number of complete transposable elements in the Release Drosophila euchromatic sequence is presented in Table 2, and detailed results for individual families of transposable elements are listed in Table Including those described here, there are 96 known families of transposable elements in D melanogaster: 49 LTR families, 27 LINE-like families, 19 TIR families and the FB family We have identified 1,573 full or partial elements from 93 of these 96 families (Table 2) In total, 3.88% (4.5 Mb) of the Release sequence is composed of transposable elements Previous analysis of both the euchromatic and heterochromatic sequences using have suggested that 9% of the Drosophila genome is composed of repetitive elements (Spradling and Rubin 1981) One likely reason for this difference is that the proportion of transposable element sequences in heterochromatic regions is higher than the genomic average (Bartolome et al 2002; Dimitri et al 2002) As shown in Table 2, the different classes vary in their contribution to the Drosophila euchromatin both in amount of sequence and number of elements LTR elements make up largest proportion of the euchromatin (2.65%), more sequence than the sum of all other classes of elements (LINE-like elements 0.87%, TIR elements 0.31%, and FB elements 0.04%) LTR elements are also the most numerous class of transposable elements in the euchromatic sequences (686) followed by LINE-like (482), TIR (373), and FB (32) elements Thus, LTR elements are the most abundant class in the Drosophila euchromatin, both in terms of number (Rizzon et al 2002; Bartolome et al 2002) and amount of sequence The average size of all transposable elements in our study is 2.9 Kb, smaller than the 5.6 kb average length of middle repetitive DNA, estimated from reassociation kinetics (Manning, Schmid and Davidson 1975) LTR element sequences are on average significantly longer than either LINE-like or TIR elements (Figure 1) The average lengths of genomic LTR, LINE-like, and TIR element sequences are 4.5, 2.1, and 0.9 Kb, respectively This is in part because the average length of canonical LTR sequences (6.5 Kb) is longer than LINE-like (4.7 Kb) or TIR (2.1 Kb) element sequences (Table 3) While families with only one or two copies can be found within each class, the numbers of transposable elements in the largest families are more than an order of magnitude greater (Table 3) The largest family of LTR elements is roo (145 copies), the largest family of LINE-like elements is jockey (69 copies), and the largest family of TIR elements is 1360 (107 copies) The mean (median) number of elements per family for the LTR, LINE-like, and TIR classes of elements is: 14.0 (9), 17.9 (10), and 19.6 (12), respectively Over all classes of element, the mean number of elements per family is 16.4, and the median is Based on a definition of fewer than eight elements per family, 42 of the 96 (43.8%) of transposable element families are low copy number (Table 3) Three of these 42 families were not found in the Release sequence though they have been reported in D melanogaster It is not surprising that we did not find any P-elements, since the sequenced strain was selected to be free of them We did not find R2 and ZAM elements in the euchromatin, but we identified these elements in unmapped scaffolds Consistent with this, the R2 element has previously been shown to be found only within the 28S rDNA locus and heterochromatin (Jakubczak et al 1991) Strains of D melanogaster are known to exist in which ZAM elements occur in low copy number in heterochromatic sequences (Baldrich et al 1997) The absence of the telomere-associated HeT-A and TART from all chromosomes except chromosome is not unexpected, as the tandemly repetitive nature of the telomeric sequences (Pardue and DeBaryshe 1999; Biessmann et al 2002) are inherently difficult to assemble and are under-represented in Release We discovered eight new families of transposable elements within the Release sequences Six are members of the LTR class: frogger (EMBL:AF492763), rover (EMBL:AF492764), cruiser (a.k.a Quasimodo) (EMBL:AF364550), McClintock (EMBL:??), qbert (EMBL:??), and Stalker4 (EMBL:??) Two are members of the TIR class: Bari2 (EMBL:??) and hopper2 (EMBL:??) The frogger element (one partial copy) was identified on the basis of its LTRs and a protein-coding open reading frame (ORF) which is 73% similar at the amino acid level to that of the Dm88 family The rover family (six copies) was identified in a BLAST search for repetitive elements in the genome; it is most closely related to the 17.6 element (71% amino acid identity) The cruiser family (14 copies) was identified during the finishing project by virtue of its LTRs, and is most closely related in sequence to the Idefix family, sharing 60% amino acid identity Bari2 (four copies) was identified by querying the D melanogaster genome using a Bari1-like element isolated from D erecta (EMBL:Y13853) The hopper2 (three copies) and Stalker4 (two copies) families were identified by an analysis of the multiple alignment of the hopper and Stalker families, respectively These alignments exhibited evidence of distinct subfamilies on the basis of both nucleotide divergence and structural rearrangements over large regions of their alignment The qbert family was identified by searching for regions of the genome that share similarity with protein-coding ORFs represented in our transposable element data set The qbert family (one copy) is most highly related to the accord family and shares regions of similarity that are 66% identical at the amino acid level The McClintock family (two copies), identified by its presence in a repeat region near the centromere of chromosome 4, is most closely related to the 17.6 family Over its terminal 1,400 bp, McClintock shares 86% amino acid identity with 17.6 However, elsewhere these elements are quite divergent, sharing less than 50% amino acid identity over the proximal 5,000 bp of the McClintock element We have also discovered several other sequences with high sequence similarity to the protein-coding regions of transposable elements, but they are not associated with repeats (see also Berezikov et al 2000) These elements cannot easily be classified to particular families Although we have not included them in this analysis, they have been included in the Release annotation of the genome (http://www.fruitfly.org/annot/) Chromosomal distribution of elements The fraction of each chromosome arm composed of transposable elements varies between 3.13% and 4.27%, with the exception of chromosome 4, for which it is over 11% This indicates an average transposable element density of ~10-15/Mb for the major chromosome arms, and over 82/Mb for chromosome These densities of transposable elements are greater than the estimate of 5/Mb derived from lower resolution cytological methods (Charlesworth and Langley 1988), presumably because of clustered elements and partial elements that may give very weak in situ hybridization signals In contrast with the theoretical prediction that the X chromosome should have the lowest density of transposable elements, if insertions are partially recessive and have deleterious effects in hemizygous males (Montgomery et al 1987), we found no evidence for a reduction in density of transposable elements on the X chromosome relative to the major autosome arms (Table 2) In contrast to previous findings (Bartolome et al 2002), this result suggests that the deleterious effects of transposable element insertions may not be the primary force controlling their distribution and abundance in the sequenced strain (Rizzon et al 2002) The densities of LINE-like elements and TIR elements on are 25.04/Mb and 47.66/Mb, respectively, much higher than their densities on the major chromosome arms (2.5-5.3/Mb and 2.1-3.4/Mb, respectively) (Table 2) By contrast, the density of LTR elements on chromosome is only slightly higher (8.08/Mb) than on the five major chromosome arms (5.1-6.9/Mb) Moreover, the fraction of chromosome that is composed of LTR elements is only slightly higher than that for the major chromosome arms (3.56% versus 2.25-2.93%) Thus, the difference in density of transposable elements on chromosome is predominantly due to the order-of-magnitude increase in the number of LINE-like and TIR elements Transposable element density is also known to vary along the major chromosome arms (Adams et al 2000; Rizzon et al 2002; Bartolome et al 2002) As shown in Figure 2, the density of transposable elements increases in the proximal euchromatin, here defined as the proximal Mb of the assembly of each of the five major chromosome arms (i.e, about 10% of the euchromatic sequence here analysed) On the major chromosome arms, 36.6% (576/1573) of the elements are located in centromere proximal euchromatin, which represents only 10% of the euchromatic sequence This result is also consistent with previous observations that the density of transposable elements is higher in heterochromatin than within euchromatin (Charlesworth et al 1994; Pimpinelli et al 1995; Carmena and Gonzalez 1995; Dimitri 1997; Junakovic et al 1998), these centromere proximal sequences represent the transition between euchromatin and heterochromatin There are 14 families located exclusively within the centromere proximal Mb of the major chromosome arms; 12 of these are low copy number families (defined as less than copies per family) Although families which are exclusively found in the proximal euchromatin are also low copy number, elements belonging to low copy number families show no general tendency to be located in these regions In fact only a minority (27/142) of elements that belong to low copy number families are localized in the proximal Mb regions of the chromosome arms Finally, although the densities of transposable elements in the proximal euchromatin and chromosome are both elevated with respect to the euchromatic average (58 and 82 elements/Mb, respectively), the composition of the elements in these regions is quite different The increase in transposable element density in the proximal regions of the chromosome arms is due to a larger number of elements belonging to all classes (Figure 2), while the increase elements on chromosome is due almost exclusively to LINE-like and TIR elements Analysis of structural variation It has been recognized since McClintock's definition of the Ac and Ds elements of maize that many elements can be autonomous or defective with respect to transposition Defective elements often exhibit deletions in ORFs or terminal repeats which are necessary for transposition Assuming that canonical elements represent full-length active copies, we defined any element less than 97% of the length of the canonical member of their family as partial Based on this criterion, more than two-thirds (1087/1573) of the elements in the Release sequence are partial (Table 2) The proportion of partial elements is reasonably uniform among major chromosome arms (64-73%), with the notable exception of chromosome (91%) While more than half (576/1087) of the partial elements on the major chromosome arms lie within the proximal Mb, nearly 90% of the transposable elements on chromosome are partial Since LINE-like and TIR elements make up 88% of the elements on chromosome 4, these data indicate differences in proportions of partial elements between classes In fact, 78% of LINE-like elements and 84% of TIR elements are partial, whereas only 55% of LTR elements are partial (Table 2) Analysis of the distribution of transposable elements lengths scaled relative to the length of their canonical sequence shows that while all three classes have bimodal distributions of scaled elements lengths, they differ significantly from one another (Figure 3) The bimodal shape of these distributions presumably reflects the boundary states of the dynamic process of deletion, excision and transposition Only a very small number of LINE-like (9) and TIR (22) elements exceed their canonical length, indicating that rates of insertion into transposable elements are low relative to rates of deletion in D melanogaster transposable elements (Petrov and Hartl 1998) A higher number of LTR elements (162) exceed their canonical length, but on average these elements are less than 2% longer than the length of their canonical; only 27 of the 162 LTR elements are greater than 5% longer Of these, 25/27 are due to the 412 family; we have subsequently determined that the canonical sequence used in this analysis was not full length We characterized the distribution of structural variation for a representative element from each of the three major classes, by determining the proportion of sequences represented in multiple alignments for a given nucleotide site (Figure 4) The resulting plot for the LINElike jockey family approximates a negative exponential distribution starting from the 3’ end (Figure 4a) LINE-like elements are known to become deleted preferentially at their 5' ends, as a consequence of the mechanism of their transposition (Finnegan 1997) The TIR element pogo shows a very different pattern; deletions predominantly occur internally, leaving the inverted repeat termini intact (see Tudor et al 1992) (Figure 4b) By analogy with patterns of deletion in P elements (see Engels 1989), these deleted elements will be non-autonomous with respect to transposition; they presumably arise when double-stranded gap repair is interrupted (see Engels et al 1990; Hsia and Schnable 1996) By contrast, for the representative LTR element, roo, there is relative uniform pattern of structural variation across the element, with the exception of two apparent deletion hotspots, both of which occur in regions that are expected to be coding (Figure 4c) Twenty-four of the 93 families (25.8%) represented in Release are composed entirely of partial elements An additional 19 families have only one full length element Fifteen of the 24 partial-only families are low copy number (less than eight copies per family) and nine are high copy number The majority of elements (125/181) in these 24 families are found in the proximal Mb or on chromosome 4; all elements for 16 of these 24 families are found exclusively in these regions of the genome One class of defective LTR elements, solo LTR sequences, has been known for some time in Drosophila (Cabonara and Gehring 1985) and other species (S cerevisiae, Boeke 1989; Schizosachharomyces pombe, Wood et al 2002; C elegans, Ganko et al 2001) These presumably arise by exchange between the two LTRs flanking an element, with the loss of the reciprocal product, a small circular molecule In S cerevisiae, 85% of all LTR element insertions are solo LTRs (Kim et al 1998) We screened for solo LTRs of each family of element, using a criterion of 80% identity to the canonical LTR sequence of each family Only 58 solo LTRs were identified, of which 14 are roo LTR elements Analysis of expressed transposable element sequences Transcription is an essential process in the life cycle of transposable elements Moreover, many transposable elements are known to be transcribed in developmentally regulated patterns (Ding and Lipshitz 1994; Danilevskaya et al 1994; Kerber et al 1996; Filatov, Morozova and Pasyukova 1998; Filatov, Nuzhdin and Pasyukova 1998) For these reasons it is important to identify transcripts produced by transposable elements We identified which transposable elements are represented in the BDGP/LBNL's EST projects (Table 3) (Rubin et al 2000; Stapelton et al 2002) All LTR families, and 88.9% (24/27) of LINE-like families have BLAST hits in the EST database, in contrast to only 63% of TIR families, which transpose through a DNA intermediate (The single P-element EST was from the CK cDNA library, which was from a different strain of flies.) Families which are composed of only partial elements may represent inactive families or families for which no active copy exists in the Release sequences Of the 24 families which contain only partial elements, the majority (19/24) have hits in the EST database, suggesting that these families are at least transcribed and may be active This inference is supported by the fact that for the 10 families which have no hits in the EST database, families have either no or only one full-length copy in the Release sequence It is possible that families which have only one full-length copy may also be inactive, if the canonical sequence is itself not a full-length functional copy Taken at face value, these data suggest that the majority of transposable element families in Drosophila have actively expressing members Analysis of sequence variation within families There is evidence that individual point mutations can affect transposable elements The high quality sequence of transposable elements in Release allows such data to be reliably analyzed for the first time Point mutations in coding regions of the gypsy family of retrotransposons correlate with both transposition frequency and copy number (Lyubomirskaya et al 2001) We identified only one full-length gypsy class element, (FBti:0011990) Sequence comparison of this gypsy's ORF2 with that of the mutator strain ORF2 shows these two ORFs to be identical and suggests that the single full-length gypsy element is “active” in the sequenced strain Other families of elements have also been found to be polymorphic with respect to their coding potential in Drosophila Kalymkova, Maisonhaute and Gvozdev (1999) found that while most 1731 elements have the +1 frameshift between their gag and pol gene regions typical of LTR elements, some not and instead express a gag-pol fusion protein (FBti:0020323) is of the latter type The single full-length 1731 element Sequence variation within families of elements was estimated by analysing the average pairwise distance within each family after multiple alignment (Table 3) These data show that intra-family variation ranges from complete identity to 26.8% average pairwise distance (S2), but with only families having greater than 10% average pairwise distance The average sequence divergence for the LTR class of elements is only 2.7%; for the LINE-like and TIR classes it is 5.4% and 7.6%, respectively These estimates of within family variation are remarkably similar to those of Wensink (1978) who, by studying the kinetics of DNA reassociation of cloned middle repetitive sequences, showed that families of middle repetitive sequence exhibited on the order of 3-7% sequence divergence Analysis of the distribution of intra-family average pairwise distances by functional class shows that LTR families have lower levels of average pairwise distance relative to LINE-like or TIR families (Figure 4) These suggest that elements in the LTR class are on average younger than either the TIR or LINE-like classes of element Nesting and clustering of transposable elements The nesting of transposable elements is common in plant genomes (SanMiguel et al 1996; SanMiguel et al 1998; Tikhonov et al 1999; Fu et al 2001) For our analysis, transposable elements that have inserted within another element are termed nests while groups of transposable elements located within 10Kb of each other are defined as clusters We found 64 nests or clusters of transposable elements composed of 328 full or partial elements This indicates that about 21% of all transposable elements in D melanogaster are either inserted in another element or are positioned adjacent to another element The number of nested or clustered elements per arm ranges from 1.4 - 3.6/Mb The density of such elements is much higher in the proximal regions of the euchromatic arms; of the 64 nests or clusters, 25 are within the proximal Mb regions of the major chromosome arms (see also O'Hare et al 2002) A large proportion (89%) of the elements belonging to nests or clusters are partial, in contrast to that seen for all elements (69%) Nesting and clustering appears to be somewhat more frequent for LTR elements (29.3% nested or clustered) than for either LINE-like elements (12.0% nested or clustered) or TIR elements (15.8% nested or clustered) This is presumably due to the larger proportion of the LTR elements present in the Drosophila euchromatin Foldback elements often contain non-FB DNA (Truet et al 1981; see Hoffman-Liebermann, Liebermann and Cohen 1989 and Caceres et al 2001) Both NOF and HB elements have been found flanked by FB arms (Truet et al 1981 and Harden and Ashburner 1990) We identified two HB elements immediately adjacent to FB elements and four examples of NOF elements inserted into FB elements Patterns of element nesting can be very complex, as has been observed in other species (Tikhonov et al 1999) As pointed out by Walbot and Petrov (2001), the insertion of a transposable element may trigger a runaway process, since it will provide a target into which other elements may insert without deleterious consequences The largest euchromatic 10 average denaturation temperature across all sequences was determined using a 3-bp window size In each panel, the light gray line represents the average denaturation temperature of random genomic sequence and the black line represents the average denaturation temperature of the experimental set of sequences The x-axis represent the distance (bp) from the insertion site and the y-axis represents the temperature ( oC) The sequences flanking the roo (a) and pogo (b) elements have opposite characteristics; the roo sequences have a higher than average denturation temperature while the pogo sequences have a lower than average denaturation temperature The average denatruration temperature of the sequence flanking the jockey elements does not differ from that of the random sequence 38 Table A classification of transposable elements in Drosophila, after the following authors: Berizikov, Bucheton and Busseau (2000) (non-LTR retrotransposons), Bowen and McDonald (2001) and Terzian, Pelisson and Bucheton (2000) (LTR retrotransposons) (see also Eickbush and Malik 2002), Kanamori et al (1998) (aurora-element), Pinsker et al (2001) (P elements), Volf et al (2001) and Lyizin et al (2001) (Uri-like endonuclease elements), Gentile et al (2001) (R1 elements), Frame et al (2001) (roo clade), Moschetti et al (1998) (Bari elements; see also Plasterk, Izsvak and Ivics 1999) and Shao and Tu (2001) (DDE/DDD transposons) and Kapitonov and Jurka (2002a.) See Robertson (2002) for a detailed classification of TIR elements Clades whose names are within square brackets are not, as yet, known in Drosophilidae Some commonly used synonyms are enclosed by curly braces (see FlyBase for a complete list of synonyms) This table is not to be taken as a rigorous taxonomy of transposable elements – its purpose is to be illustrative Not included in this table are: NOF, hopper {= M4} and hopper2 Class I – retrotransposons Subclass – LTR retrotransposons Superfamily – ORF order: PR-RT-RH-IN Group – tRNA priming Clade – gypsy Family – gypsy {mdg4} Family – gypsy2 Family – gypsy3 Family – gypsy4 Family – gypsy5 Family – gypsy6 Family – springer Family – gtwin {hamilton} Family – Burdock Clade – roo Family – 3S18 {BEL} Family – roo {B104} Family – rooA Family – GATE Family – diver {Tinker; mazi; Nuria} Family – diver2 Family – aurora Subfamily – aurora-element Subfamily – Dsim\ninja Clade – 412 – short ORFs in 5' leader Family – 412 Family – blood Family – mdg1 Family – Tabor {Pilgrim; wolfman} Family – Stalker 39 Family – Stalker2 Family – Stalker4 Clade – micropia Family – blastopia Family – mdg3 Family – micropia Family – invader1 Family – invader2 Family – invader3 Family – invader4 Family – invader5 Clade – osvaldo Family – Dbuz\osvaldo Family – Dvir\Ulysses Family – Circe Clade – nomad Family – HMS Beagle {midline; midline-jumper} Subfamily – HMS Beagle group Subfamily – HMS Beagle group Family – opus {yoyo; nomad } Clade – ZAM Family – Tirant Family – accord Family – qbert Family – ZAM Clade – Quasimodo Family – Quasimodo {cruiser; antonia} Family – Idefix Family – Dvir\Tv1 Clade – 17.6 Family – 17.6 Family – 297 Family – Transpac Family – McClintock Family – rover Family – Dana\Tom Group – no tRNA priming Family – [Tf1] Superfamily – ORF order: PR-IN-RT-RH Clade – Ty1/copia Family – 1731 Family – copia Family – Dm88 {copia2} Family – frogger Subclass – non-LTR retrotransposons Superfamily – RT {LINE-like-element} 40 Clade – jockey Family – jockey {sancho; wallaby} Family – jockey2 Family – BS Subfamily – BS Subfamily – Dvir\Helena Subfamily – X-element {BS2} Family – Doc Family – Doc2 Family – Doc3 Family – F-element Family – G-element Family – G2 Family – G3 Family – G4 Family – G5 Family – G6 Family – TART Family – Het-A Family – Cr1a Family – Juan {Strider} Clade – R1 Family – Rt1a {Waldo-B; pilger } Family – R1 Subfamily – R1A Subfamily – R1B Family – Rt1b { Waldo-A } Family – Rt1c Clade – R2 Family – R2 Clade – Loa Family – Dsil\Loa Family – Dmir\TRIM Family – Dsub\Bilbo Clade – I Family – I-element Family – Ivk { You } Clade – [CRE] Clade – [R4] Clade – [L1] Clade – [RTE] Clade – [Tad1] Clade – CR1 Family – Dkop\Q-like Family – Q Superfamily – Uri-like domain endonuclease 41 Clade – Penelope Family – Dvir\Penelope Superfamily – lack RT {SINE-element} Clade – [Alu] Superfamily – SINE-LIKE Clade – mini-me Family – INE-1 {Dr.D; narep1; mini-me} Class II – DNA/DNA transposons Subclass – DDE/DDD transposase Superfamily – [DD39D] – DD39D motif Superfamily – [ITmD37E] – DD37E motif Superfamily – [DD37D] – DD37D motif Superfamily – mariner – DD34D motif Clade – mariner Family – Dmau\mariner Superfamily – Tc1 – DD34E motif Clade – Bari1 – short IR termini Family – transib1 Family – transib2 Family – transib3 Family – transib4 Family – Bari1 Family – Tc1 Family – Tc3-like Family – Dhet\Uhu Family – HB Clade – Tc3 – long IR termini, with DR Family – Bari2 Family – Ddip\Bari Family – S-element Family – S2 Family – Dvir\Paris Family – Dhyd\Minos Superfamily – pogo – DDxD motif Clade – pogo Family – pogo Superfamily – [DDxE] – DDxE motif Subclass – no DDE/DDD transposase Superfamily Clade – [IS1] Superfamily Clade – hAT Family – H-element {hobo} Family – Dkop\Gandalf Clade – P Family – 1360 {protop; hoppel} 42 Family – P element Subfamily – Canonical P element Subfamily – M-type P element Subfamily – O-type P element Subfamily – T-type P element Clade – [MuDR] Clade – [CACTA] Clade – [Tx] Subclass – TTAA insertion site specific Family – [piggyBac] Family – looper1 Subclass – MITES Family – Dsub\SGM Class III – unknown mechanism Superfamily – long terminal repeats Clade – FB Family – FB Family – Dbuz\Galileo Family – Dbuz\Newton Family – Dbuz\Kepler Abbreviations: IR, inverted repeat; DR, direct repeat; RT, reverse transcriptase; IN, integrase; PR, protease; RH, RNAse H 43 Table An overview of the numbers of transposable elements in the euchromatic genome of D melanogaster For each class of element the total numbers of each family of element, together with the numbers (and percentage of elements) that are full length is given for each chromosome arm Column gives the total base pairs contained within transposable elements, column the percentage of each chromosome arm composed of transposable element sequences, column gives the number of elements per megabase, and column the numbers of elements within the most proximal Mb of each of the five major chromosome arms Class Arm All Families X 2L 2R 3L 3R 839,569 882,141 866,578 936,742 873,564 136,835 Total Average 4,535,429 X 2L 2R 3L 3R 639,051 604,030 566,113 616,236 627,550 44,121 Total 3,097,101 LTR # TE / MB Genome # TE / MB Prox Mb 30.69% 33.44% 27.33% 34.26% 35.99% 10.78% 12.72 13.73 15.32 12.38 10.36 82.4 50.50 58.50 88.00 67.00 24.00 - 30.90% 13.47 57.60 41.18% 53.54% 38.85% 48.31% 44.23% 40.00% 6.24 5.72 6.85 5.05 5.59 8.08 19.50 20.00 30.00 24.50 7.50 - Total bp of TE % of Arm Total # TEs # Full Length % Full Length 3.85% 3.97% 4.27% 4.01% 3.13% 11.05% 277 305 311 289 289 102 85 102 85 99 104 11 1,573 486 3.88% 2.93% 2.72% 2.79% 2.64% 2.25% 3.56% 136 127 139 118 156 10 56 68 54 57 69 686 308 44 Average LINE-like TIR FB 2.65% X 2L 2R 3L 3R 137,420 185,499 226,974 251,077 176,355 43,534 Total Average 1,020,859 X 2L 2R 3L 3R 45,324 85,937 70,886 52,743 63,674 47,021 Total Average 365,585 X 2L 2R 3L 3R 17,774 6,675 2,605 16,686 5,985 2,159 0.63% 0.83% 1.12% 1.08% 0.63% 3.52% 70 98 107 106 70 31 18 20 18 27 19 482 104 0.87% 0.21% 0.39% 0.35% 0.23% 0.23% 3.80% 59 76 63 57 59 59 12 12 12 14 373 61 0.31% 0.08% 0.03% 0.01% 0.07% 0.02% 0.17% 12 4 45 44.90% 5.87 20.30 25.71% 20.41% 16.82% 25.47% 27.14% 6.45% 3.21 4.41 5.27 4.54 2.51 25.04 14.50 18.50 36.50 24.00 4.50 - 21.58% 4.13 19.60 11.86% 15.79% 19.05% 21.05% 23.73% 6.78% 2.71 3.42 3.1 2.44 2.12 47.66 14.50 18.00 21.00 16.50 12.00 - 16.35% 3.19 16.40 33.33% 50.00% 50.00% 37.50% 50.00% 50.00% 0.55 0.18 0.1 0.34 0.14 1.62 2.00 2.00 0.50 2.00 0.00 n.d Total Average 51,884 32 13 0.04% 40.62% 46 0.27 1.30 Table The transposable elements of D melanogaster The canonical length of each element (in bp) is shown in column 3, the total numbers of each family on each chromosome arm in columns 4-9, the grand totals for each family in column 10, and the numbers that are full length and partial, in columns 11-12 Partial elements are defined as those whose length is less than 97% of the canonical element The average pairwise distance within each family is shown in the column 13 (the Cr1a elements were too diverse for this to be caluclated) and the existence of EST sequences indicated in column 14 (see text for details) Class Family LTR 17.6 1731 297 3S18 412 accord aurora-element blastopia blood Burdock Circe copia diver diver2 Dm88 frogger GATE gtwin gypsy gypsy2 gypsy3 gypsy4 gypsy5 gypsy6 HMS-Beagle Idefix invader1 invader2 Canonical Length X 2L 2R 3L 3R 7439 4648 6995 6126 6897 7404 4263 5034 7410 6411 6356 5143 6112 4917 4558 2483 8507 7411 7469 6841 6973 6852 7369 7826 7062 7411 4032 5124 22 10 0 0 0 1 0 12 0 0 11 13 0 1 7 4 3 16 0 6 11 1 0 0 2 1 0 3 10 0 30 0 0 0 0 18 # Total # Full Length 0 0 0 0 0 0 0 0 0 0 0 0 14 56 33 17 22 13 30 9 32 20 3 2 13 26 10 17 27 13 22 26 0 1 1 # # Avg Pairwise Partial Prox Mb Distance EST 39 6 32 20 2 1 25 12 17 3 10 0.038 0.000 0.033 0.075 0.022 0.074 0.016 0.001 0.002 0.057 0.002 0.002 0.032 0.015 0.077 0.041 0.000 0.067 0.038 0.041 0.005 0.054 0.022 0.023 0.053 + + + + + + + + + + + + + + + + + + + + + + + + + + + + invader3 invader4 invader5 McClintock mdg1 mdg3 micropia opus qbert Quasimodo roo rooA rover springer Stalker Stalker2 Stalker4 Tabor Tirant Transpac ZAM LINE-like baggins BS Cr1a Doc Doc2 Doc3 F-element G-element G2 G3 G4 G5 G6 Helena 5484 3105 4038 6450 7480 5519 5457 7521 7650 7387 9092 7621 7318 7546 7256 8119 7379 7345 8526 5249 8435 3 34 3 1 4 22 0 1 0 2 0 31 4 0 0 1 6 31 1 0 0 0 4 27 3 1 0 1 0 0 0 0 0 0 17 25 16 24 14 145 11 12 13 20 2 13 16 58 2 15 13 12 8 87 9 0 5 21 3 0 0 0.042 0.068 0.068 0.002 0.012 0.009 0.010 0.003 0.016 0.012 0.045 0.035 0.061 0.014 0.015 0.001 0.001 0.001 0.000 - + + + + + + + + + + + + + + + + + + + + + 5453 5142 4470 4725 4789 4740 4708 4346 3102 4605 3856 4856 2042 1317 0 1 0 2 16 5 1 10 15 10 0 21 19 10 2 1 10 0 11 2 10 0 0 1 0 14 29 54 55 42 14 11 30 0 16 0 14 23 53 25 26 12 11 12 40 11 6 0.076 0.028 0.006 0.065 0.019 0.059 0.036 0.029 0.100 0.063 0.006 0.097 + + + + + + + + + + + + - HeT-A I-element Ivk jockey jockey2 Juan R1-element R2-element Rt1a Rt1b Rt1c TART-element X-element 6083 5371 5402 5020 3428 4236 5356 3607 5108 5183 5443 10654 4740 16 0 11 1 0 15 14 13 14 0 0 2 0 1 28 69 10 13 37 17 25 12 5 20 57 8 32 16 19 4 22 14 12 0.018 0.037 0.070 0.003 0.113 0.001 0.049 0.053 0.095 0.177 0.049 + + + + + + + + + + + + TIR 1360 Bari1 Bari2 HB H-element hopper hopper2 looper1 mariner2 NOF P-element pogo S-element S2 Tc1 transib1 transib2 transib3 transib4 1177 1728 1064 1653 2959 1435 1680 1881 912 4347 2907 2121 1736 1735 1666 2167 2844 2883 2656 10 0 5 5 9 18 11 0 12 1 17 0 2 11 2 20 0 2 2 11 11 1 11 12 2 0 31 1 0 0 0 11 0 107 32 24 15 17 44 51 13 21 12 12 5 11 0 14 0 95 27 23 4 13 39 37 13 20 12 49 21 13 27 4 0.088 0.002 0.043 0.114 0.069 0.048 0.042 0.041 0.144 0.013 0.020 0.079 0.268 0.085 0.000 0.082 0.088 0.152 + + + + + + + + + + + + - FB FB 1492 12 32 13 19 13 0.066 + Table A comparison of the numbers of euchromatic transposable elements in the Release sequence with those estimated from natural populations by in situ hybridization These data are illustrative of the published literature, not an exhaustive survey The range of element number (and, where appropriate, rounded means) in the sources indicated are shown Class Family # in Release Range LTR 17.6 1731 297 14 56 3S18 412 33 blood copia 22 30 gypsy HMS-Beagle Idefix mdg1 13 25 Midpoint Midpoint avg 8–17 8–15 18–35 20–30 23 32 25 12 18–38 26–32 21 31 20 14 20–43 12–25 23 12.5 11.5 26.5 25 23 32 25 12 28 29 21 31 20 14 31.5 18.5 23 12.5 11.5 26.6 11 4–20 11–23 15–27 11 12 17 21 11.0 12.0 20.3 18.5 25.8 14.0 24.3 3.0 Source 7-10 isogenic lines; Beltsville, MD 7-10 isogenic lines; Beltsville, MD 7-10 isogenic lines; Beltsville, MD lab stocks 20 isogenic lines; Raleigh, NC 182 inbred lines lab stocks lab stocks 7-10 isogenic lines; Beltsville, MD lab stocks 20 isogenic lines; Raleigh, NC nat populations 182 inbred lines 182 inbred lines lab stocks 18 inbred strains; Azerbaidjan nat populations nat populations 182 inbred lines lab stocks 7-10 isogenic lines; Beltsville, MD 20 lab stocks Reference Charlesworth et al (1994) Charlesworth et al (1994) Charlesworth et al (1994) Strobel et al (1979) Montgomery et al (1987) Dominguez and Albornoz (1996) Georgiev et al (1990) Bucheton et al (1984) Charlesworth et al (1994) Strobel et al (1979) Montgomery et al (1987) Viera and Biemont (1996) Dominguez and Albornoz (1996) Dominguez and Albornoz (1996) Strobel et al (1979) Biemont and Gautier (1988) Viera and Biemont (1996) Viera and Biemont (1996) Dominguez and Albornoz (1996) Hey and Eanes (unpublished), cited in Biemont and Cizeron (1999) Desset et al (1999) Charlesworth et al (1994) Belyaeva et al (1984) LINE-like TIR FB mdg3 opus 16 24 roo 145 Stalker Tirant 12 20 ZAM Doc F-element 55 42 14–22 25 5–18 12–23 10–15 55–75 61 2–6 3–13 6–16 0–15 1–15 18 25 11.5 17.5 12.5 65 61 11 7.5 20–30 25–30 25 27.5 41 13–21 0–15 32–40 22 41 17 7.5 36 22 29 15.5 34 26 21 57.5 25 I-element 28 jockey 69 1360 Bari1 107 H-element 24 NOF S-element 51 19–39 2–29 8–60 25–27 21 0–2 24–91 FB 32 20–30 17 inbred stocks; Azerbaidjan 18 inbred strains; Azerbaidjan 20 lab stocks 7-10 isogenic lines; Beltsville, MD lab stocks 7-10 isogenic lines; Beltsville, MD 20 isogenic lines; Raleigh, NC lab stocks 10 wild-type stocks lab stocks lab stocks lab stocks Biemont et al (1988) Biemont and Gautier (1988) Belyaeva et al (1984) Charlesworth et al (1994) Whalen et al (1998) Charlesworth et al (1994) Montgomery et al (1987) Georgiev et al (1990) Molto et al (1996) Viggiano et al (1997) LeBlanc et al (1997) Baldrich et al (1997) 25.0 34.3 lab stocks lab stock 12.3 18 inbred strains; Azerbaidjan lab stocks 7-10 isogenic lines; Beltsville, MD 182 inbred lines Vaurey et al (1994) di Nocera et al (1983) Hey and Eanes (unpublished), cited in Biemont and Cizeron (1999) Biemont and Gautier (1988) Bucheton et al (1984) Charlesworth et al (1994) Dominguez and Albornoz (1996) 1.0 57.5 lab strains lab stocks 46 lab stocks and nat populations 17 inbred stocks; Azerbaidjan nat populations; Greece 182 inbred lines lab strains 10 lab stocks & nat populations Kholodilov et al (1988) Berghell and Dimitri (1996) Caggesse et al (1995) Biemont et al (1988) Zabalou et al (1994) Dominguez and Albornoz (1996) Harden (1989) Merriman et al (1995) 22.5 lab strains Harden (1989) 11.5 15.0 63.0 4.0 9.5 7.8 29.0 29 9.8 27.0 20 20 182 inbred lines Dominguez and Albornoz (1996) ... 39 Family – Stalker2 Family – Stalker4 Clade – micropia Family – blastopia Family – mdg3 Family – micropia Family – invader1 Family – invader2 Family – invader3 Family – invader4 Family – invader5... Family – G3 Family – G4 Family – G5 Family – G6 Family – TART Family – Het -A Family – Cr 1a Family – Juan {Strider} Clade – R1 Family – Rt 1a {Waldo-B; pilger } Family – R1 Subfamily – R 1A Subfamily... al 1996) and D melanogaster (Petrov and Hartl 1998) Transposable elements are far more abundant in the genome of A thaliana than in the euchromatic genomes of C elegans or D melanogaster In Arabidopsis,

Ngày đăng: 18/10/2022, 02:42

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w