Biochemistry, 4th Edition P98 potx

29.3 How Are Genes Transcribed in Eukaryotes? 933 Covalent Modification of Histones Chromatin is also remodeled through the action of enzymes that covalently modify side chains on histones within the core octamer. These modifications either dimin- ish DNAϺhistone associations through disruption of electrostatic interactions or in- troduce substitutions that can recruit binding of new protein participants through protein–protein interactions. Initial events in transcriptional activation include acetyl-CoA–dependent acetylation of ⑀-amino groups on lysine residues in histone tails by histone acetyltransferases (HATs) (Figure 29.30). The histone transacetylases responsible are essential components of several megadalton-size complexes known to be required for transcription co- activation (co-activation in the sense that they are required along with RNA polymerase II and other components of the transcriptional apparatus). Examples of such complexes include the TFIID (some of whose TAF II s have HAT activity), the SAGA complex (which also contains TAF II s), and the ADA complex. N-Acetylation suppresses the positive charge in histone tails, diminishing their interaction with the negatively charged DNA. Phosphorylation of Ser residues and methylation of Lys residues in histone tails also contribute to transcription regulation (Figure 29.30). Attachment of small proteins to histone C-terminal lysine residues through ubiquitination and sumoylation (see Chap- ter 31) are two additional forms of covalent modification found in nucleosomes. Col- lectively, these modifications create binding sites for proteins that modulate chromatin structure, such as the chromatin-remodeling complexes with bromodomains that interact specifically with acetylated lysine residues and chromodomains that bind to methylated lysine residues. A “histone code” has emerged. Covalent Modification of Histones Forms the Basis of the Histone Code A code based on histone-tail covalent modifications determines gene expression through selective recruitment of proteins. Proteins that cause chromatin com- paction (heterochromatin formation) lead to repression; proteins giving easier ac- cess to DNA through relaxation of histoneϺDNA interactions favor the possibility of gene expression. Histone globular regions 119 2 2 1 3 5 8 12 16 4 4 9 9 14 14 acK meR meK PS 18 18 23 23 20 36 20 20 16 12 12 15 8 5 H3 5 55 3 1 DNA DNA H3 H4 H4 H2B H2A FIGURE 29.30 A schematic diagram of the nucleosome illustrating the various covalent modifications on the N-terminal tails of histones. acK ϭ acetylated lysine residue; meK ϭ methylated lysine residue; meR ϭ methylated arginine residue; PS ϭ phosphorylated serine residue.The numbers indicate the positions of the amino acids in the amino acid sequences. Note the prevalence of modifiable sites, particularly acetylatable lysines,on the N-terminal tails of histones H2B, H3, and H4. 934 Chapter 29 Transcription and the Regulation of Gene Expression The prominent forms of histone covalent modification are lysine acetylation, lysine methylation, serine phosphorylation, lysine ubiquitination, and lysine sumoylation. The lysine residue at position 9 (K9) in the histone H3 amino acid sequence is methylated in heterochromatin, the compacted, repressed state of chromatin. In contrast, lysine 4 (K4) of histone H3 typically is methylated in chromatin where gene expression is active. Different proteins are recruited to these two methylated forms of histone H3. Methylated K9 recruits heterochromatin protein 1 (HP1), which binds via its chromodomain. On the other hand, methylated K4 binds CHD1, a chromatin remodeling protein with two chromodomains. (CHD1 is an acronym for chromodomain, helicase, DNA-binding.) Ubiquitination of Lys 120 in the C-terminal tail of H2B favors methylation (and thus transcription activation), whereas ubiquitination of Lys 119 in the C-terminal tail of H2A favors repression. Sumoylation of Lys residues tends to repress transcription; apparently, sumolyation antagonizes acetylation. Methylation and Phosphorylation Act as a Binary Switch in the Histone Code As cells enter mitosis, the chromatin becomes condensed and histone H3 is not only methylated at K9 but also phosphorylated at the adjacent serine residue, S10. S10 phosphorylation triggers the dissociation of HP1 from the heterochromatin. Thus, phosphorylation of the residue neighboring K9 trumps HP1 binding. Similarly, phosphorylation of the threonine residue (T3) neighboring K4 in the histone H3 tail evicts CHD1 from its site on the methylated K4. Apparently, lysine methylation is the “on” position for the binary switch that recruits specific proteins to histone tails, and phosphorylation at a neighboring residue turns the switch to the “off” position by ejecting the bound proteins. There are at least 16 instances of serines or threonines immediately flanking lysine residues in the four histones that constitute the histone core octamer of the nucleosome. The methylation-phosphorylation binary switch may be a general phenomenon in the regulation of chromatin dynamics. Chromatin Deacetylation Leads to Transcription Repression Deacetylation of histones is a biologically relevant matter, and enzyme complexes that carry out such reactions have been characterized. Known as histone deacetylase complexes, or HDACs, they catalyze the removal of acetyl groups from lysine residues along the histone tails, restoring the chromatin to a repressed state. Beyond these effects on transcription, histone modifications determine whether significant cellular events involving DNA allocation through mitosis and meiosis may occur. Nucleosome Alteration and Interaction of RNA Polymerase II with the Promoter Are Essential Features in Eukaryotic Gene Activation Gene activation (the initiation of transcription) can thus be viewed as a process requiring two principal steps: (1) alterations in nucleosomes (and thus, chromatin) that relieve the general repressed state imposed by chromatin structure, followed by (2) the interaction of RNA polymerase II and the GTFs with the promoter. Transcription activators (proteins that bind to enhancers and response elements) ini- tiate the process by recruiting chromatin-altering proteins (the chromatin-remodeling complexes and histone-modifying enzymes described previously). Once these alterations have occurred, promoter DNA is accessible to TBPϺTFIID, the other GTFs, and RNA polymerase II. Transcription activation, however, requires communication between RNA polymerase II and the transcription activator for transcription to take place. Me- diator (or Srb/Med) fulfills this function. Mediator interacts with both the transcription activator and the CTD of RNA polymerase II. This Mediator bridge provides an essential interface for communication between enhancers and promoters, triggering RNA polymerase II to begin transcription. A general model for transcription initiation is shown in Figure 29.31. Once transcription begins, Mediator is replaced by 29.4 How Do Gene Regulatory Proteins Recognize Specific DNA Sequences? 935 another complex called Elongator. Elongator has HAT subunits whose activity remod- els downstream nucleosomes as RNA polymerase II progresses along the chromatin- associated DNA. The interactions described thus far emphasize regulation of gene expression at the level of RNA polymerase II recruitment to promoters. However, whole-genome analyses show that, for many genes, RNA polymerase II is already situated at promoters and appears to be paused there, awaiting signals that will activate the elongation phase of transcription. Thus, the expression of many genes may be regulated at the level of transcription elongation. Beyond these considerations, various mechanisms regulate gene expression through events that take place subsequent to transcription. Post-transcriptional gene regulation mediated by microRNAs, such as RNAi (see Chapter 12) and gene silencing (see Chapter 10), as well as alternative RNA splicing and nucleotide changes introduced through RNA editing (as described in this chapter), are mechanisms targeting transcripts. Post-translational modifications of proteins also play a major role in the regulation of gene expression, as assessed at the level of biologi- cal activity (see Chapter 30). A SINE of the Times An interesting twist on transcription regulation comes from the discovery that certain noncoding RNAs (ncRNAs) act as transcription factors through direct binding to RNA polymerase II. For example, ncRNA B2 in mouse and Alu RNA in humans are RNAs encoded within short interdispersed elements (SINEs). SINEs are abun- dant within animal DNA and were once considered “junk” DNA because they lack protein-coding properties. Ala RNA or ncRNA B2 blocks promoter-bound RNA polymerase II by interfering with transcription initiation. 29.4 How Do Gene Regulatory Proteins Recognize Specific DNA Sequences? Proteins that recognize nucleic acids do so by the basic rule of macromolecular recognition. That is, such proteins present a three-dimensional shape or contour that is structurally and chemically complementary to the surface of a DNA sequence. When the two molecules come into contact, the numerous atomic interactions that underlie recognition and binding can take place. Nucleotide sequence–specific recognition by the protein involves a set of atomic contacts with the bases and the sugar– phosphate backbone. Hydrogen bonding is critical for recognition, with amino acid side chains providing most of the critical contacts with DNA. Protein contacts with the bases of DNA usually occur within the major groove (but not always). Protein contacts with the DNA backbone involve both H bonds and salt bridges with electronegative oxygen atoms of the phosphodiester linkages. Structural studies on regulatory proteins that bind to specific DNA sequences have revealed that roughly 80% of such proteins can be assigned to one of three principal classes based on their possession of one of Coactivator Acetylase Ac Ac Ac Ac Mediator pol II, GTFs Co-repressor Deacetylase TF FIGURE 29.31 A model for the transcriptional regulation of eukaryotic genes.The DNA is a green ribbon wrapped around disclike nucleosomes. A specific transcription factor (TF, pink) is bound to a regulatory element (either an enhancer or silencer). RNA polymerase II and its associated GTFs (blue) are bound at the promoter.The N-terminal tails of histones are shown as wavy lines (blue) emanating from the nucleosome discs. A specific transcription factor that is a transcription activator stimulates transcription through interaction with a co-activator whose HAT activity renders the DNA more accessible and through interactions with the Mediator complex associated with RNA polymerase II. A specific transcription factor that is a repressor interacts with a co-repressor that has HDAC activity that deacetylates histones, restructuring the nucleosomes into a repressed state. (From Figure 1 in Kornberg, R. D., 1999. Eukaryotic transcription control. Trends in Biochemical Sciences 24:M46–M49.) 936 Chapter 29 Transcription and the Regulation of Gene Expression three kinds of small, distinctive structural motifs: the helix-turn-helix (or HTH), the zinc finger (or Zn-finger), and the leucine zipper-basic region (or bZIP). The latter two motifs are found only in DNA-binding proteins from eukaryotic organisms. In addition to their DNA-binding domains, these proteins commonly possess other structural domains that function in proteinϺprotein recognitions essential to oligomerization (for example, dimer formation), DNA looping, transcriptional activation, and signal reception (for example, effector binding). ␣-Helices Fit Snugly into the Major Groove of B-DNA A recurring structural feature in DNA-binding proteins is the presence of ␣-helical segments that fit directly into the major groove of B-form DNA. The diam- eter of an ␣-helix (including its side chains) is about 1.2 nm. The dimensions of the major groove in B-DNA are 1.2 nm wide by 0.6 to 0.8 nm deep. Thus, one side of an ␣-helix can fit snugly into the major groove. Although examples of ␤-sheet DNA recognition elements in proteins are known, the ␣-helix and B-form DNA are the predominant structures involved in proteinϺDNA interactions. Significantly, proteins can recognize specific sites in “normal” B-DNA; the DNA need not assume any unusual, alternative conformation (such as Z-DNA). Proteins with the Helix-Turn-Helix Motif Use One Helix to Recognize DNA The HTH motif is a protein structural domain consisting of two successive ␣-helices separated by a sharp ␤-turn (Figure 29.32). Within this domain, the ␣-helix situated more toward the C-terminal end of the protein, the so-called helix 3, is the DNA recognition helix; it fits nicely into the major groove, with several of its side chains touching DNA base pairs. Helix 2, the helix at the beginning of the HTH motif, creates a stable structural domain through hydrophobic interactions with helix 3 that locks helix 3 into its DNA interface. Proteins with HTH motifs bind to DNA as dimers. In the dimer, the two helix 3 cylinders are antiparallel to each other, such that their N⎯→C orientations match the inverted relationship of nucleotide sequence in the dyad-symmetric DNA-binding site. An example is Antp. Antp is a member of a family of eukaryotic proteins involved in the regulation of early embryonic development that have in common an amino acid sequence element known as the homeobox 6 domain. The homeobox is a DNA motif that encodes a related 60–amino acid sequence (the homeobox domain) found among proteins of virtually every HUMAN BIOCHEMISTRY Storage of Long-Term Memory Depends on Gene Expression Activated b y CREB-Type Transcription Factors Learning can be defined as the process whereby new information is acquired and memory as the process by which this information is re- tained. Short-term memory (which lasts minutes or hours) requires only the covalent modification of preexisting proteins, but long-term memory (which lasts days, weeks, or a lifetime) depends on gene expression, protein synthesis, and the establishment of new neuronal connections. The macromolecular synthesis underlying long-term memory storage requires cAMP-response element-binding (CREB) protein– related transcription factors and the activation of cAMP-dependent gene expression. Serotonin (5-hydroxytryptamine, or 5-HT, a hor- mone implicated in learning and memory) acting on neurons pro- motes cAMP synthesis, which in turn stimulates protein kinase A to phosphorylate CREB protein–related transcription factors that activate transcription of cAMP-inducible genes. These genes are characterized by the presence of CRE (cAMP response element) consensus sequences containing the 8-bp TGACGTCA palindrome. CREB transcription factors are bZIP-type proteins (see later discus- sion). These exciting findings opened a new arena in molecular biology, the molecular biology of cognition. Eric Kandel was awarded the 2000 Nobel Prize in Physiology or Medicine for, among other things, his discovery of the role of CREB-type transcription factors in long-term memory storage. Cognition is the act or process of knowing; the acquisition of knowledge. 6 Homeo derives from homeotic genes, a set of genes originally discovered in the fruit fly Drosophila melanogaster through their involvement in the specification of body parts during development. FIGURE 29.32 An HTH motif protein: Antp monomer bound to DNA. Helix 3 (yellow) is locked into the major groove of the DNA by helix 2 (magenta) (pdb id ϭ 9ANT). 29.4 How Do Gene Regulatory Proteins Recognize Specific DNA Sequences? 937 eukaryote, from yeast to man. Embedded within the homeobox domain is an HTH motif. Homeobox domain proteins act as sequence-specific transcription factors. Typically, the homeobox portion comprises only 10% or so of the protein’s mass, with the remainder of the protein serving in proteinϺprotein interactions essential to transcription regulation. Other DNA-binding proteins with HTH motifs are lac repressor, trp repressor, and the C-terminal domain of CAP. How Does the Recognition Helix Recognize Its Specific DNA-Binding Site? The edges of base pairs in dsDNA present a pattern of hydrogen-bond donor and ac- ceptor groups within the major and minor grooves, but only the pattern displayed on the major-groove side is distinctive for each of the four base pairs AϺT, TϺA, CϺG, and GϺC. (You can get an idea of this by inspecting the structures of the base pairs in Figure 11.6.) Thus, the base-pair edges in the major groove act as a recognition matrix identifiable through H bonding with a specific protein, so it is not necessary to melt the base pairs to read the base sequence. Although formation of such H bonds is very important in DNAϺprotein recognition, other interactions also play a significant role. For example, the C-5-methyl groups unique to thymine residues are nonpolar “knobs” projecting into the major groove. Proteins Also Recognize DNA via “Indirect Readout” Indirect readout is the term for the ability of a protein to indirectly recognize a particular nucleotide sequence by recognizing local conformational variations resulting from the effects that base sequence has on DNA structure. Superficially, the B-form structure of DNA appears to be a uniform cylinder. Nevertheless, the conformation of DNA over a short distance along its circumference varies subtly according to local base sequence. That is, base sequences generate unique contours that proteins can recognize. Because these contours arise from the base sequence, the DNA-binding protein “indirectly reads out” the base sequence through interactions with the DNA backbone. In the E. coli Trp re- pressorϺtrp operator DNA complex, the Trp repressor engages in 30 specific hydrogen bonds to the DNA: 28 involve phosphate groups in the backbone; only 2 are to bases. Thus, some sequence-specific DNA-binding proteins are able to recognize an overall DNA conformation caused by the specific DNA sequence. Some Proteins Bind to DNA via Zn-Finger Motifs There are many classes of Zn-finger motifs. The prototype Zn-finger is a structural feature formed by a pair of Cys residues separated by 2 residues, then a run of 12 amino acids, and finally a pair of His residues separated by 3 residues (Cys-x 2 -Cys-x 12 -His-x 3 - His). This motif may be repeated as many as 13 times over the primary structure of a Zn-finger protein. Each repeat coordinates a zinc ion via its 2 Cys and 2 His residues (Figure 29.33). The 12 or so residues separating the Cys and His coordination sites Cys Cys His His (a) ( b Zn Cys Cys His His (b) Zn (c) FIGURE 29.33 The Zn-finger motif of the C 2 H 2 type showing (a) the coordination of Cys and His residues to Zn and (b) the secondary structure. (c) Structure of a classic C 2 H 2 zinc finger protein (zif268) with three zinc fingers bound to DNA (pdb id ϭ 1ZAA). (c) 938 Chapter 29 Transcription and the Regulation of Gene Expression are looped out and form a distinct DNA interaction module, the so-called Zn-finger. When Zn-finger proteins associate with DNA, each Zn-finger binds in the major groove and interacts with about five nucleotides, adjacent fingers interacting with contiguous stretches of DNA. Many DNA-binding proteins with this motif have been identified. In all cases, the finger motif is repeated at least two times, with at least a 7– to 8–amino acid linker between Cys/Cys and His/His sites. Proteins with this general pattern are assigned to the C 2 H 2 class of Zn-finger proteins to distinguish them from proteins bearing another kind of Zn-finger, the C x type, which includes the C 4 and C 6 Zn-finger proteins. The C x proteins have a variable number of Cys residues available for Zn chelation. For example, the vertebrate steroid receptors have two sets of Cys residues, one with four conserved cysteines (C 4 ) and the other with five (C 5 ). Some DNA-Binding Proteins Use a Basic Region-Leucine Zipper (bZIP) Motif bZIP is a structural motif characterizing the third major class of sequence-specific, DNA-binding proteins. This motif was first recognized by Steve McKnight in C/EBP, a heat-stable, DNA-binding protein isolated from rat liver nuclei that binds to both CCAAT promoter elements and certain enhancer core elements. 7 The DNA-binding domain of C/EBP was localized to the C-terminal region of the protein. This region shows a notable absence of Pro residues, suggesting it might be arrayed in an ␣-helix. Within this region are two clusters of basic residues: A and B. Further along is a 28-residue sequence. When this latter region is displayed end-to-end down the axis of a hypothetical ␣-helix, beginning at Leu 315 , an amphipathic cylinder is generated, similar to the one shown in Figure 6.22. One side of this amphipathic helix consists prin- cipally of hydrophobic residues (particularly leucines), whereas the other side has an array of negatively and positively charged side chains (Asp, Glu, Arg, and Lys), as well as many uncharged polar side chains (glutamines, threonines, and serines). The Zipper Motif of bZIP Proteins Operates Through Intersubunit Interaction of Leucine Side Chains The leucine zipper motif arises from the periodic repetition of leucine residues within this helical region. The periodicity causes the Leu side chains to protrude from the same side of the helical cylinder, where they can enter into hydrophobic interactions with a similar set of Leu side chains extending from a matching helix in a second polypeptide. These hydrophobic interactions establish a stable noncovalent linkage, fostering dimerization of the two polypeptides (as shown in Figure 29.34). The leucine zipper is not a DNA-binding domain. Instead, it functions in protein dimerization. Leucine zippers have been found in other mammalian transcriptional regulatory proteins, including Myc, Fos, and Jun. The Basic Region of bZIP Proteins Provides the DNA-Binding Motif The actual DNA contact surface of bZIP proteins is contributed by a 16-residue seg- ment that ends exactly 7 residues before the first Leu residue of the Leu zipper. This DNA contact region is rich in basic residues and hence is referred to as the basic region. Two bZIP polypeptides join via a Leu zipper to form a Y-shaped molecule in which the stem of the Y corresponds to a coiled pair of ␣-helices held by the leucine zipper. The arms of the Y are the respective basic regions of each polypeptide; they act as a linked set of DNA contact surfaces (Figure 29.34). The dimer interacts with a DNA target site by situating the fork of the Y at the center of the dyad-symmetric DNA sequence. The two arms of the Y can then track along the major groove of the DNA in opposite directions, reading the specific recognition sequence (Figure 29.35). An interesting aspect of bZIP proteins is that the two polypeptides need not be identical (Figure 29.35). Heterodimers can form, pro- vided both polypeptides possess a leucine zipper region. An important conse- Chelation is from the Greek word chele, meaning “claw”; it refers to the binding of a metal ion to two or more nonmetallic atoms in the same molecule. Leucine zipper (dimerization motif) BR-B BR-A Basic region (DNA contact surface) N N CC FIGURE 29.34 Model for a dimeric bZIP protein. Two bZIP polypeptides dimerize to form a Y-shaped molecule.The stem of the Y is the Leu zipper, and it holds the two polypeptides together. Each arm of the Y is the basic region from one polypeptide. Each arm is composed of two ␣-helical segments: BR-A and BR-B (basic regions A and B). 7 The acronym C/EBP designates this protein as a “CCAAT and enhancer-binding protein.” 29.5 How Are Eukaryotic Transcripts Processed and Delivered to the Ribosomes for Translation? 939 quence of heterodimer formation is that the DNA target site need not be a palin- dromic sequence. The respective basic regions of the two different bZIP polypeptides (for example, Fos and Jun) can track along the major groove reading two different base sequences. Heterodimer formation expands enormously the DNA recognition and regulatory possibilities of this set of proteins. 29.5 How Are Eukaryotic Transcripts Processed and Delivered to the Ribosomes for Translation? Transcription and translation are concomitant processes in prokaryotes, but in eukaryotes, the two processes are spatially separated (see Chapter 10). Transcrip- tion occurs on DNA in the nucleus, and translation occurs on ribosomes in the cytoplasm. Consequently, transcripts must be transported from the nucleus to the cytosol to be translated. On the way, these transcripts undergo processing: alterations that convert the newly synthesized RNAs, or primary transcripts, into mature messenger RNAs. Also, unlike prokaryotes, in which many mRNAs encode more than one polypeptide (that is, they are polycistronic), eukaryotic mRNAs encode only one polypeptide (that is, they are exclusively monocistronic). Eukaryotic Genes Are Split Genes Most genes in higher eukaryotes are split into coding regions, called exons, 8 and noncoding regions, called introns (Figure 29.36; see also Figure 10.20). Introns are the intervening nucleotide sequences that are removed from the primary transcript when it is processed into a mature RNA. Gene expression in eukaryotes en- tails not only transcription but also the processing of primary transcripts to yield the mature RNA molecules we classify as mRNAs, tRNAs, rRNAs, and so forth. The Organization of Exons and Introns in Split Genes Is Both Diverse and Conserved Split genes occur in an incredible variety of interruptions and sizes. The yeast actin gene is a simple example, having only a single 309-bp intron that separates the nucleotides encoding the first 3 amino acids from those encoding the remaining 350 or so amino acids in the protein. The chicken ovalbumin gene is composed of 8 exons FIGURE 29.35 Model for the heterodimeric bZIP transcription factor c-FosϺc-Jun bound to a DNA oligomer containing the AP-1 consensus target sequence TGACTCA (pdb id ϭ 1FOS). Gene Promoter/enhancer sequences Exon 1 Intron Intron IntronExon 2 Exon 3 Exon 4 DNA coding strand 5Ј mRNA mRNA transcript Transcription Poly(A) addition signal 3Ј-untranslated region (variable length since transcription termination is imprecise) Processing (capping, methylation, poly (A) addition, splicing) Exon 15Ј-untranslated region Exon 2 Exon 3 Exon 4 7-mG capMature mRNA Exon 1 Exon 2 Exon 3 Exon 4 (A) 100–200 FIGURE 29.36 The organization of split eukaryotic genes. 8 Although the term exon is commonly used to refer to the protein-coding regions of an interrupted or split gene, a more precise definition would specify exons as sequences that are represented in mature RNA molecules. This definition encompasses not only protein-coding genes but also the genes for various RNAs (such as tRNAs or rRNAs) from which intervening sequences must be excised in order to generate the mature gene product. 940 Chapter 29 Transcription and the Regulation of Gene Expression and 7 introns. The two vitellogenin genes of the African clawed toad Xenopus laevis are both spread over more than 21 kbp of DNA; their primary transcripts consist of just 6 kb of message that is punctuated by 33 introns. The chicken pro ␣-2 collagen gene has a length of about 40 kbp; the coding regions constitute only 5 kb distributed over 51 exons within the primary transcript. The exons are quite small, ranging from 45 to 249 bases in size. Clearly, the mechanism by which introns are removed and multiple exons are spliced together to generate a continuous, translatable mRNA must be both precise and complex. If one base too many or too few is excised during splicing, the coding sequence in the mRNA will be disrupted. The mammalian DHFR (dihydrofolate reductase) gene is split into 6 exons spread over more than 31 kbp of DNA. The 6 exons are spliced together to give a 6-kb mRNA (Figure 29.37). Note that, in three different mammalian species, the size and position of the exons are essentially the same but that the lengths of the corresponding introns vary considerably. Indeed, the lengths of introns in vertebrate genes range from a minimum of about 60 bases to more than 10,000 bp. Many introns have nonsense codons in all three reading frames and thus are untranslatable. Introns are found in the genes of mitochondria and chloroplasts as well as in nuclear genes. Although introns have been observed in archaea and even bacteriophage T4, none are known in the genomes of bacteria. Post-Transcriptional Processing of Messenger RNA Precursors Involves Capping, Methylation, Polyadenylylation, and Splicing Capping and Methylation of Eukaryotic mRNAs The protein-coding genes of eukaryotes are transcribed by RNA polymerase II to form primary transcripts or pre-mRNAs that serve as precursors to mRNA. As a population, these RNA molecules are very large and their nucleotide sequences are very heterogeneous because they represent the transcripts of many different genes, hence the des- ignation heterogeneous nuclear RNA, or hnRNA. Shortly after transcription of hnRNA is initiated, the 5Ј-end of the growing transcript is capped by addition of a guanylyl residue. This reaction is catalyzed by the nuclear enzyme guanylyl transferase using GTP as substrate (Figure 29.38). The cap structure is methylated at the 7-position of the G residue. Additional methylations may occur at the 2Ј-O positions of the two nucleosides following the 7-methyl-G cap and at the 6-amino group of a first base adenine (Figure 29.39). 3؅-Polyadenylylation of Eukaryotic mRNAs Transcription by RNA polymerase II typically continues past the 3Ј-end of the mature messenger RNA. Primary transcripts show heterogeneity in sequence at their 3Ј-ends, indicating that the precise point where termination occurs is nonspecific. However, termination does not nor- mally occur until RNA polymerase II has transcribed past a consensus AAUAAA sequence known as the polyadenylylation signal. Most eukaryotic mRNAs have 100 to 200 adenine residues attached at their 3Ј-end, the poly(A) tail. [Histone mRNAs are the only common mRNAs that lack Chinese hamster Exon Intron Mouse Human 0 5 10 15 20 25 30 kb FIGURE 29.37 The organization of the mammalian DHFR gene in three representative species. Note that the exons are much shorter than the introns. Note also that the exon pattern is more highly conserved than the intron pattern. 29.5 How Are Eukaryotic Transcripts Processed and Delivered to the Ribosomes for Translation? 941 poly(A) tails.] These A residues are not encoded in the DNA but are added post- transcriptionally by the enzyme poly(A) polymerase, using ATP as a substrate. The consensus AAUAAA is not itself the poly(A) addition site; instead it defines the position where poly(A) addition occurs (Figure 29.40). The consensus AAUAAA is found 10 to 35 nucleotides upstream from where the nascent primary transcript is cleaved by an endonuclease to generate a new 3Ј-OH end. This end is where the poly(A) tail is added. The processing events of mRNA capping, poly(A) addition, and splicing of the primary transcript create the mature mRNA. Interestingly, both the guanylyl transferase that adds the 5Ј-cap structure and the enzymes that process the 3Ј-end of the transcript and add the poly (A) tail are anchored to RNA polymerase II via interactions with its RPB1 CTD. Nuclear Pre-mRNA Splicing Within the nucleus, hnRNA forms ribonucleoprotein particles (RNPs) through as- sociation with a characteristic set of nuclear proteins. These proteins interact with the nascent RNA chain as it is synthesized, maintaining the hnRNA in an untangled, O HN H 2 N N N N OH OH CH 2 CH 2 O N N N N NH 2 GA OH 5Ј P 5Ј-capped transcript NP NP NP . . . . .P O H 2 N GTP CH 2 O OHOH N N N HN G CH 2 O OH 5Ј-end of transcript N N N N N NH 2 P NP NP . . . . .P A + P + O PPPPP PP P P P Guanylyl transferase FIGURE 29.38 The capping of eukaryotic pre-mRNAs.Guanylyl transferase catalyzes the addition of a guanylyl residue (G p ) derived from GTP to the 5Ј-end of the growing transcript, which has a 5Ј-triphosphate group already there. In the process, pyrophosphate (pp) is liberated from GTP and the terminal phosphate (p) is removed from the transcript. Gppp ϩ pppApNpNpNp ⎯→GpppApNpNpNp ϩ pp ϩ p (A is often the initial nucleotide in the primary transcript). O HN H 2 N N N + N CH 3 CH 2 OP O – O OP O – O OP O – O OCH 2 O OOCH 3 N N N N NH 2 OP O – OCH 2 O OOCH 3 N N N N NH 2 OP O – OCH 2 O OOH N NH O O etc. GA A U 5Ј 3Ј OH OH O FIGURE 29.39 Methylation of several specific sites located at the 5Ј-end of eukaryotic pre-mRNAs is an essential step in mRNA maturation. A cap bearing only a single OCH 3 on the guanyl is termed cap 0. This methylation occurs in all eukaryotic mRNAs.If a methyl is also added to the 2Ј-O position of the first nucleotide after the cap, a cap 1 structure is generated.This is the predominant cap form in all multicellular eukaryotes.Some species add a third OCH 3 to the 2Ј-O position of the second nucleotide after the cap, giving a cap 2 structure. Also, if the first base after the cap is an adenine, it may be methylated on its 6-NH 2 . In addition, approximately 0.1% of the adenine bases throughout the mRNA of higher eukaryotes carry methylation on their 6-NH 2 groups. 942 Chapter 29 Transcription and the Regulation of Gene Expression accessible conformation. The substrate for splicing, that is, intron excision and exon ligation, is the capped primary transcript emerging from the RNA polymerase II transcriptional apparatus, in the form of an RNP complex. Splicing occurs exclusively in the nucleus. The mature mRNA that results is then exported to the cytoplasm to be translated. Splicing requires precise cleavage at the 5Ј- and 3Ј-ends of introns and the accurate joining of the two ends. Consensus sequences define the exon/intron junctions in eukaryotic mRNA precursors, as indicated from an analy- sis of the splice sites in vertebrate genes (Figure 29.41). Note that the sequences GU and AG are found at the 5Ј- and 3Ј-ends, respectively, of introns in pre-mRNAs from higher eukaryotes. In addition to the splice junctions, a conserved sequence within the intron, the branch site, is also essential to pre-mRNA splicing. The site lies 18 to 40 nucleotides upstream from the 3Ј-splice site and is represented in higher eukaryotes by the consensus sequence YNYRAY, where Y is any pyrimidine, R is any purine, and N is any nucleotide. The Splicing Reaction Proceeds via Formation of a Lariat Intermediate The mechanism for splicing nuclear mRNA precursors is shown in Figure 29.42. A covalently closed loop of RNA, the lariat, is formed by attachment of the 5Ј-phosphate group of the intron’s invariant 5Ј-G to the 2Ј-OH at the invariant branch site A to form a 2Ј-5Ј phosphodiester bond. Note that lariat formation creates an unusual branched nucleic acid. The lariat structure is excised when the 3Ј-OH of the consensus G at the 3Ј-end of the 5Ј exon (Exon 1, Figure 29.42) covalently joins with the 5Ј-phosphate at the 5Ј-end of the 3Ј exon (Exon 2). The reactions that occur are transesterification reactions where an OH group reacts with a phosphoester bond, displacing an OOH to form a new phosphoester link. Because the reactions lead to no net change in the number of phosphodiester linkages, no energy in- DNA 3Ј 3Ј RNA polymerase Initiates RNA polymerase continues AAAUAA AAAUAA G/U G/U G/U cap cap cap 3Ј -OH cap cap CF s CF s Cleavage, CF s dissociate, as does 3Ј-fragment CPSF dissociates Polyadenylylates the 3Ј-end CPSF CPSF CPSF CPSF PAP (A) 100–200 AAAUAA AAAUAA AAAUAA AAAAAAA A FIGURE 29.40 Poly(A) addition to the 3Ј-ends of transcripts occurs 10 to 35 nucleotides downstream from a consensus AAUAAA sequence, defined as the polyadenylylation signal. CPSF (cleavage and polyadenylylation specificity factor) binds to this signal sequence and medi- ates looping of the 3Ј-end of the transcript through interactions with a G/U-rich sequence even further downstream. Cleavage factors (CFs) then bind and bring about the endonucleolytic cleavage of the transcript to create a new 3Ј-end 10 to 35 nucleotides downstream from the polyadenylylation signal. Poly(A) polymerase (PAP) then successively adds 200 to 250 adenylyl residues to the new 3Ј-end.(RNA polymerase II is also a significant part of the polyadenylylation complex at the 3Ј-end of the transcript, but for simplicity in illustration, its presence is not shown in the lower part of the figure.) A G : G U A A G U Exon 5؅-Splice Site Consensus Intron Py Py Py Py Py Py Py Py – C A G : G –– Exon 3؅-Splice Site Consensus Intron FIGURE 29.41 Consensus sequences at the splice sites in vertebrate genes.

Định dạng
Số trang	10
Dung lượng	691,99 KB