1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo y học: "High resolution transcriptome maps for wild-type and nonsense-mediated decay-defective Caenorhabditis elegans." potx

18 313 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 18
Dung lượng 2,85 MB

Nội dung

Open Access Volume et al Ramani 2009 10, Issue 9, Article R101 Research High resolution transcriptome maps for wild-type and nonsense-mediated decay-defective Caenorhabditis elegans Arun K RamaniÔ*, Andrew C NelsonÔ, Philipp KapranovĐả, Ian BellĐ, Thomas R GingerasĐƠ and Andrew G Fraser*† Addresses: *Donnelly CCBR, College Street, University of Toronto, Toronto, M5S 3E1, Canada †The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK ‡Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, CB2 3DY, UK §Affymetrix, Inc., Central Expressway, Santa Clara, CA 95051, USA ¶Helicos Biosciences Corporation, Cambridge, MA 02139, USA ¥Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, NY 11724, USA Ô These authors contributed equally to this work Correspondence: Thomas R Gingeras Email: gingeras@cshl.edu Andrew G Fraser Email: andyfraser.utoronto@gmail.com Published: 24 September 2009 Genome Biology 2009, 10:R101 (doi:10.1186/gb-2009-10-9-r101) Received: June 2009 Revised: 11 August 2009 Accepted: 24 September 2009 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2009/10/9/R101 © 2009 Ramani et al.; licensee BioMed Central Ltd This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

The into the NMD pathway and The C elegans NMD transcriptome it’sof wild-type and nonsense-mediated decay (NMD) defective C elegans during development reveals insights high-resolution transcriptome role in development.

Abstract Background: While many genome sequences are complete, transcriptomes are less well characterized We used both genome-scale tiling arrays and massively parallel sequencing to map the Caenorhabditis elegans transcriptome across development We utilized this framework to identify transcriptome changes in animals lacking the nonsense-mediated decay (NMD) pathway Results: We find that while the majority of detectable transcripts map to known gene structures, >5% of transcribed regions fall outside current gene annotations We show that >40% of these are novel exons Using both technologies to assess isoform complexity, we estimate that >17% of genes change isoform across development Next we examined how the transcriptome is perturbed in animals lacking NMD NMD prevents expression of truncated proteins by degrading transcripts containing premature termination codons We find that approximately 20% of genes produce transcripts that appear to be NMD targets While most of these arise from splicing errors, NMD targets are enriched for transcripts containing open reading frames upstream of the predicted translational start (uORFs) We identify a relationship between the Kozak consensus surrounding the true start codon and the degree to which uORF-containing transcripts are targeted by NMD and speculate that translational efficiency may be coupled to transcript turnover via the NMD pathway for some transcripts Conclusions: We generated a high-resolution transcriptome map for C elegans and used it to identify endogenous targets of NMD We find that these transcripts arise principally through splicing errors, strengthening the prevailing view that splicing and NMD are highly interlinked processes Genome Biology 2009, 10:R101 http://genomebiology.com/2009/10/9/R101 Genome Biology 2009, Background Identifying genes whose mRNA expression is perturbed in a mutant can yield great insight into a wide range of biological problems For example, comparing gene expression in wildtype organisms with that seen in mutants can be used to identify the targets of transcription factors or signaling pathways [1], to organize genes into modules [2-7], and to order genes in pathways [8,9] Recently, genome-scale tiling arrays and massively parallel sequence analysis of transcriptomes have emerged as powerful new tools for transcriptome analysis [10-14] Both rely on the availability of high quality genome sequence, and both offer the promise of transcriptome analysis at unprecedented depth and efficiency Each technology has different strengths In the case of tiling arrays, the entire transcriptome can be queried at the same depth in a single hybridization, making it a very cost-effective way to achieve excellent coverage However, the resolution with which any transcript can be mapped is limited by the resolution of the array (which for most complex genomes is not at single base-pair resolution) and, furthermore, while one can rapidly identify the regions of the genome that correspond to mature transcripts, the arrays contain no implicit information about how these are connected Deep sequencing of the transcriptome on the other hand generates data at single-base resolution While the sequence reads from all current technologies are short (typically 35 to 70 bp), it is possible to assemble these into longer contiguous reads and to link these contigs together However, since the range of gene expression extends over many orders of magnitude, achieving good coverage for a complex transcriptome is still costly, and assembly of the data is still computationally intensive Since tiling arrays and sequencing have complementary benefits for transcriptome analysis, we decided to use both technologies to examine the Caenorhabditis elegans transcriptome across a series of developmental stages The C elegans genome is completely sequenced and while it contains a similar number of genes as the human genome, it is much more compact - around 27% [15] of the worm genome is coding compared with 1.5% [16] in humans Genome annotation is generally of high quality in the worm; the genome is relatively small (approximately 100 Mb compared with approximately Gb in human) and unrepetitive, making both tiling- and sequence-based approaches comparatively straightforward in the worm Both technologies allow examination not only of levels of gene expression but also of splice changes across development; they also allow identification of novel transcripts that not lie in annotated gene structures They thus provide an unbiased and rich view of the changing transcriptome across development and our immediate goal was to map the wild-type transcriptome at good coverage and resolution and, thus, to provide a framework to analyze perturbations of the transcriptome in mutants Volume 10, Issue 9, Article R101 Ramani et al R101.2 In addition to mapping the wild-type transcriptome across several developmental stages, we wished to assess the usefulness of these data for examining how the transcriptome is perturbed in mutant animals To this end, we used both tiling arrays and sequencing to examine the transcriptome of worms defective for nonsense-mediated decay (NMD), identified in animals by Hodgkin et al [17] and reviewed in [18,19] The central cellular role of the NMD pathway is to prevent the expression of prematurely truncated proteins, which are likely to have deleterious consequences The NMD pathway recognizes transcripts containing premature termination codons (PTCs) and targets them for degradation, thus eliminating them from the cell [20] The role of the NMD pathway in eliminating PTC-containing transcripts is highly conserved and indeed many of the components are shared from yeast to human (see [21,22] for reviews), including the core components SMG-2, SMG-3 and SMG-4 (Upf1-3 in Saccharomyces cerevisiae) The PTC-containing transcripts that are targets for NMD recognition and degradation arise from three principal sources [21,23-27] The first occurs from transcripts deriving from genes containing nonsense mutations, whether inherited or somatic However, nonsense mutations play a clear role in many human genetic diseases, and in several of these, NMD has been shown to affect the severity of the disease phenotype and, thus, this class of target, though rare, has key medical importance [28] The second class comprises transcripts that contain PTCs that arise during alternative splicing - either retention of introns or errors in splice site selection [29-32] Finally, transcripts can be targeted by NMD despite having no PTCs in the principal open reading frame (ORF); instead, these transcripts contain a short ORF upstream of the true start ATG, known as an upstream ORF (uORF) The stop codon of this uORF is recognized as a premature stop codon and the transcript is thus recognized as an NMD target Recently, genome-scale studies using standard expression microarrays have identified endogenous transcripts that are targets of NMD in yeast [33,34], Drosophila [35,36], and humans [37-39] In all three organisms examined, approximately 10% of genes give rise to a transcript that is targeted for degradation via the NMD pathway, a surprisingly large number [40] Much is thus known already about NMD: the molecular components of the NMD pathway are well-characterized, many of the molecular features that cause a specific transcript to be degraded via the NMD pathway are known, and many endogenous transcripts are found to be affected by NMD However, other than in yeast [32] no genome-scale studies have examined the effect of NMD on wild-type transcriptomes with the resolution that either tiling arrays or transcriptome sequencing can provide We thus set out to compare the transcriptomes of wild-type animals with that of worms that are defective for NMD using both tiling arrays and deep sequencing to determine whether the increased resolution of such Genome Biology 2009, 10:R101 http://genomebiology.com/2009/10/9/R101 Genome Biology 2009, analyses can provide new insight into the effect of NMD on the transcriptome of normal developing worms Results and discussion Outline of approach and overview of data Both genome-scale tiling arrays and deep sequencing approaches were used generate a high resolution, high coverage 'reference transcriptome' for C elegans to use as a tool to guide analysis of perturbed transcriptomes such as those of mutant animals At the time of initiating these studies, there was a great difference in the cost to analyze any specific RNA sample by tiling arrays or by deep sequencing, and we thus chose to use tiling arrays as our primary method to map the transcriptome across multiple developmental stages, and deep sequencing to validate the tiling data and to refine the resolution of the transcript mapping at a more limited number of developmental stages Combining these data in this way combines the cost-effectiveness of tiling with the higher resolution of sequencing to generate a high quality transcriptome map For our tiling analysis, we purified total RNA from wild-type N2 animals at four different stages of the C elegans life-cycle (larval stages L3 and L4, young adults, and gravid adults) For each developmental stage, RNA samples were prepared in triplicates and hybridized individually to genome-scale tiling arrays - these have a 35 bp resolution and allow an unbiased view of the majority (70%) of the genome We initially examined these data to assess coverage and to compare data quality between tiling and sequencing At any single developmental stage, we detect expression of around a third of all predicted genes on tiling arrays (see Materials and methods; Table S1 in Additional data file 1); across all examined stages, we detect approximately 50% (9,515 out of 19,169 annotated genes in WS150 release of Wormbase [41]) of genes This is comparable to the detection sensitivity of conventional microarrays We find that approximately 95% of transcribed features (so-called 'transfrags', the individual contiguous regions of the genome that are transcribed; see Materials and methods and [11,13] for definition) map to currently predicted transcripts (Table S2 in Additional data file 1), a far higher proportion than that observed in either Drosophila [42] or human [10] We note that while the proportion of novel transfrags is far lower in the worm, this is in keeping with previous results [12] and is broadly as expected for the worm genome given the far higher proportion of predicted coding sequence relative to that found in many other animal genomes In addition, this low proportion of novel transcribed regions identified as transcribed indicates that genome annotation and gene prediction in the worm is of generally excellent quality To validate our tiling data, we used Illumina sequencing to directly sequence the transcriptome of two developmental stages (L4 and young adults) along with mixed stage worms Volume 10, Issue 9, Article R101 Ramani et al R101.3 and compared these sequence data to the tiling data We generated approximately 225 million individual reads, of which approximately 217 million (85%) could be uniquely assigned to the genome using Mapping and Assembly with Qualities (MAQ; Table S3 in Additional data file 1) [43] Of these uniquely mapped reads, approximately 94% map entirely within known transcripts (approximately 85% of all reads map entirely within known exons and approximately 9% span exon-exon junctions), a number that corresponds closely to that seen by tiling (95% of transcribed regions map to known transcripts by tiling) We compared gene intensities deriving from both tiling and sequence data and find very tight correspondence between these measurements (Figure 1a), suggesting that both methods give accurate estimates of levels of mRNA expression Finally, we compared the sets of genes whose expression is detected by tiling and by sequence analysis and find that approximately 90% of genes that have detectable expression by tiling can also be detected by sequence (Figure 1b) We thus show that the two technologies provide accurate and complementary surveys of the transcriptome, allowing direct comparison with the transcriptomes of mutant animals Novel transcribed regions of the C elegans genome As described above, we find that approximately 95% of the transcribed regions detected by either tiling array or sequence map to predicted transcripts (Additional data file 2) - a representative region of the genome is shown in Figure 1c We next examined whether both technologies detected the same novel transcribed regions and whether these novel regions are likely to represent entirely new stand-alone transcripts (that is, from potentially new genes) or rather are novel exons of previously annotated genes We first identified all novel transfrags identified on tiling arrays at any developmental stage by comparing the tiling data to WS150 gene models and asked what proportion of these can be confirmed by sequence reads (Figure 2a; Table S4 in Additional data file 1) Of the novel transfrags found by tiling, approximately 60% can be detected by sequence - since both technologies are very different, these confirmed novel transfrags are likely to be real Note that while tiling arrays were used to analyze total RNA, only poly-adenylated transcripts were sequenced to avoid redundant reads of rRNA, and this is thus a lower estimate of true novel transcripts identified by tiling We note that two other studies have appeared that also used sequencing to analyze the C elegans transcriptome and we thus compared our data with that produced by Hillier et al [44], who also used deep sequencing to examine the transcriptome at several developmental stages (rather than Shin et al [45] who examined only L1, a stage we did not look at) We find that the overlap is very significant at the gene level (Table S5 in Additional data file 1) and at the level of transfrags (Table S6 in Additional data file 1), confirming the accuracy of all datasets Novel transfrags can either arise from entirely new transcripts that have not been predicted or they could alterna- Genome Biology 2009, 10:R101 http://genomebiology.com/2009/10/9/R101 10 11 12 13 14 15 tiling [log2(intensity)] 10 11 12 13 14 15 tiling [log2(intensity)] sequencing [log2(counts)] Volume 10, Issue 9, Article R101 Ramani et al R101.4 Exon correlation (sequence vs tiling) Gene correlation (sequence vs tiling) sequencing [log2(counts)] N2 YA sequencing [log2(counts)] N2 L4 sequencing [log2(counts)] (a) Genome Biology 2009, (b) N2 L4 N2 YA 10 11 12 13 Sequencing 14 15 tiling [log2(intensity)] 730 5,622 3,334 88% 716 6509 90% 2,611 Tiling % from Tiling identified by Sequencing 10 11 12 13 14 15 tiling [log2(intensity)] (c) Time L3 L4 YA GA seq splice ZC101.2 (unc-52) Figure Tiling array data and sequence-based data give similar views of the transcriptome Tiling array data and sequence-based data give similar views of the transcriptome (a) Gene intensities (left) and exon intensities (right) from the tiling data were binned at 0.1 increments of gene intensity (log2 scale) and compared with the intensities deriving from sequence data; there is a strong correlation (R2 = 0.95) between gene intensities derived from both technologies YA, young adult.(b) Approximately 90% of the genes expressed based on tiling are also expressed in the sequence data in both stages sequenced (c) Sample screenshot from Affymetrix Integrated Genome Browser illustrating how tiling array data and sequence data correspond to predicted gene structures Tiling array data from four developmental stages (L3 and L4 larvae, YA and gravid adults (GA)) are shown in shades of blue The predicted exons of unc-52 (ZC101.2) are shown at the top of the plot Exons that are differentially spliced across development based on tiling data are shown in yellow Regions corresponding to transfrags that not overlap predicted exon structures are highlighted with purple bars Sequence data for a single developmental stage (YA) is shown in red at the bottom of the figure; note that the regions identified as transcribed by sequencing correspond closely to those identified by tiling Non-adjacent exon boundaries spanned by sequence reads are shown as green bars and the exons removed by the alternative splice shown in red below; the height of the green bar corresponds to the frequency with which the alternative splice events were detected tively be novel exons or previously predicted genes In the latter case, it should be possible to connect these novel transfrags to known gene annotations To examine this, we used Illumina paired end sequencing on poly-A+ RNA derived from mixed stage populations of worms We identified reads mapping to novel transfrags and asked whether the paired sequence read mapped to a known gene structure In approximately 60% of cases (Figure 2a), we could unambiguously connect a novel transfrag confirmed by both sequence and tiling to a previous predicted transcript, suggesting that these are novel exons (Figure 2b, c) Of such novel exons, 65% (20% 5' and 45% 3') are either 5' or 3' to the coding region of the gene, consistent with a view that terminal exons are more variable; therefore, predicting transcript ends is considerably harder and more error prone than predicting internal coding exons Finally, to further investigate the novel transfrags identified by tiling, we compared our tiling data to multiple other gene models in C elegans First, we examined the proportion of Genome Biology 2009, 10:R101 http://genomebiology.com/2009/10/9/R101 Genome Biology 2009, Volume 10, Issue 9, Article R101 Ramani et al R101.5 Figure Novel transfrag annotation Novel transfrag annotation (a) Using stage specific sequence data we were able to show that approximately 60% of the novel transfrags have sequenced reads mapping to them We also show that 60% of these transfrags with sequence information can be connected to known gene annotation using pairedended sequence reads, where one read of the pair is anchored on the transfrag while the other overlaps a gene annotation Examples of novel regions identified from our analysis, show (b) a new 5' exon and (c) a novel transcript (d) Transfrags identified as novel in our tiling data based on WS150 of the genome annotation were compared against WS160, WS170, WS180 and WS190 models We see that >30% of the transfrags that were novel based on WS150 are predicted to be exonic in later annotations (grey bars) Almost 50% of novel transfrags that also have sequence reads overlapping them are predicted to be exonic in later annotations (black bars) We can show annotation overlap for a further 15% (tiling alone - gray) or 25% (tiling with sequence data - black) when we compare the transfrags to TwinScan models Genome Biology 2009, 10:R101 http://genomebiology.com/2009/10/9/R101 Genome Biology 2009, our novel transfrags (based on gene models in version WS150 of Wormbase) that were still novel in later sets of gene models (WS160, WS170, WS180, and WS190) (Figure 2d) We find that approximately 30% (670 of 2,229) of transfrags that were outside gene models in WS150 have since been incorporated in newer gene models; 90% of those that are novel exons are now confirmed in gene models Of the remaining (approximately 70%; 1,530 of 2,229), we note that many map to alternative gene models outside the canonical C elegans gene models - for example, approximately 15% (204 of 1,530) overlap with Twinscan models We believe that many of these novel transfrags are likely to represent errors in standard gene models, since other gene models predict many of them relatively well Thus, our data, like previous work [44,45], may contribute to refining de novo gene models In total, then, we identified 10,073 (the unique non-overlapping set from the four stages) novel transcribed regions relative to gene models in WS150 using tiling arrays Most of these (Table S4 in Additional data file 1) could be confirmed by sequence and of those identified by both technologies, approximately 30% appear to be novel exons of previously annotated transcripts Alternative splicing detection using tiling arrays and transcriptome sequencing Tiling arrays can be used to measure gene expression at the level of mRNA However, unlike conventional expression arrays, tiling arrays can also be used to examine the expression of individual exons and their relative inclusion into transcripts deriving from any gene Changes in the relative inclusion of an exon across development indicate changes in splicing and we thus investigated the extent to which we could identify splice changes across C elegans development using our tiling data For each exon, we computed its normalized intensity at each developmental stage based on tiling data The normalized intensity (NI) of any exon is the expression level of the exon relative to the expression level of the gene that includes it An NI of approximately for an exon indicates that essentially all transcripts deriving from that gene include that exon; an NI of approximately indicates that this exon is skipped from almost all transcripts deriving from that gene We note that just as gene expression levels measured by tiling and sequence correlate very highly, this is also the case for levels of expression of each individual exon (Figure 1a) Prior to examining how the NI of each exon changes across development, we compared exon inclusion as estimated by NI from tiling with direct measurements of splicing from our sequence data We identified sequence reads that span exonexon junctions - we searched for these both between adjacent exons and between non-adjacent exons (see Materials and methods) Note that reads spanning exon-exon junctions are far more rare than those mapping internally to exons since the effective target is smaller and achieving high coverage of exon junctions thus requires substantially more sequence Volume 10, Issue 9, Article R101 Ramani et al R101.6 depth than that required simply to detect gene expression; furthermore, identification of such exon spanning reads is highly sensitive to correct exon junction predictions To examine the extent to which NI measured from tiling data gives a verifiable measure of splice variation, we identified all exon triplets that appear 'cassette-like' from our tiling data (see Figure 3a for schematic and Figure 3b for an example) that is, triplets where exons A and C have NI of approximately and the middle exon B has an NI of

Ngày đăng: 09/08/2014, 20:20

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN