VIETNAM NATIONAL UNIVERSITY, HANOI COLLEGE OF TECHNOLOGY Nguyen Trung Thong MEDSOFT, DECIPHERING PRINCIPLES OF TRANSCRIPTION REGULATION IN EUKARYOTIC GENOMES MASTER THESIS Hanoi 2008 VIETNAM NATIONAL[.]
VIETNAM NATIONAL UNIVERSITY, HANOI COLLEGE OF TECHNOLOGY Nguyen Trung Thong MEDSOFT, DECIPHERING PRINCIPLES OF TRANSCRIPTION REGULATION IN EUKARYOTIC GENOMES MASTER THESIS Hanoi - 2008 VIETNAM NATIONAL UNIVERSITY, HANOI COLLEGE OF TECHNOLOGY Nguyen Trung Thong MEDSOFT, DECIPHERING PRINCIPLES OF TRANSCRIPTION REGULATION IN EUKARYOTIC GENOMES Major: Information Technology Speciality: Computer science Code: 1.01.10 MASTER THESIS Advisor: Assoc Prof Hoang Xuan Huan Hanoi - 2008 Contents Abstract Declaration Acknowledgment List of Figures Glossary and abbreviations Chapter Introduction 1.1 Motivation 1.2 Thesis works and structure Chapter Transcription regulation in eukaryotic genomes 10 2.1 Introduction 10 2.1.1 Gene activation 10 2.1.2 Gene deactivation 12 2.2 Core promoter and basal transcription machinery 13 2.2.1 Structure of core promoter 14 2.2.2 Basal transcription machinery 16 2.3 Regulatory sequences 17 2.3.1 Enhancers and regulatory promoters 18 2.3.2 Activators 18 2.3.3 Repressors and corepressors 20 Chapter Methods to derive principles of transcription regulation 21 3.1 Principles of transcription regulation 21 3.2 Typical methods to derive principles of transcription regulation 22 3.2.1 Bayesian network based method 22 3.2.2 Motif Expression Decomposition method 24 3.2.3 A comparison between two methods 26 Chapter An application of MED method 30 4.1 MEDSoft workflow 31 4.2 Properties of MEDSoft 34 4.3 Experimental results 34 Chapter Conclusions and Future work 40 Bibliography 41 Appendix 46 List of Figures Figure 1.1 Central dogma Figure 2.1 Gene activation model 11 Figure 2.2 Sequence elements of core promoter 14 Figure 3.1 Gene regulatory network 22 Figure 3.2 Sequence elements that determine the regulation of a set of genes involved in transcription 23 Figure 3.3 The Motif-Expression Decomposition Formalism (MED) 25 Figure 3.4 An illustration of the concept of the gene ensemble 26 Figure 3.5 Verification model of regulatory principles 27 Figure 3.6 The distribution of correlation coefficients 28 Figure 3.7 RRPE and PAC relationship case study 29 Figure 4.1 MEDSoft layout 30 Figure 4.2 MEDSoft workflow 31 Figure 4.3 Genes and motifs query 32 Figure 4.4 Single motif analyzing 33 Figure 4.5 Pair of motifs analyzing 33 Figure 4.1 Transcriptional regulatory principle of RPN4 motif (short-range) 36 Figure 4.2 Transcriptional regulatory principle of MERE4 motif (short-range) 36 Figure 4.3 Transcriptional regulatory principle of GCR1 motif (middle-range) 36 Figure 4.4 Transcriptional regulatory principle of HAP234 motif (middle-range) 37 Figure 4.5 Transcriptional regulatory principle of BAS1 motif (long-range) 37 Figure 4.6 Transcriptional regulatory principle of GAL motif (long-range) 37 Figure 4.7 Transcriptional regulatory principle of PROTEOL1 motif (orientation-dependence) 38 Figure 4.8 Transcriptional regulatory principle of STRE’ motif (orientationdependence) 38 Figure 4.9 Transcriptional regulatory principle of MIG1 motif (super-longrange) 38 Figure 4.10 Transcriptional regulatory principle of MERE17 motif (super-longrange) 39 Figure 4.11 Transcriptional regulatory principle of RPE11 motif (spread-out) 39 Glossary and abbreviations Activator: protein product of a regulatory gene that induces expression of a target gene(s) usually by binding to the activation sequence of that gene or by interaction with transcription factors Basal transcription: transcription in in vitro systems consisting of RNA polymerase, the basal transcription factors and naked DNA template; also used to describe in vivo transcription observed in the absence of known activators Chromatin: the packaged eukaryotic chromosome in which the DNA is highly organized into chromatosomes; see higher-order structure Footprinting: technique to identify position of DNA sequences bound by particular proteins Higher-order structure: nucleosomal organization of the chromatin; DNA, organized into nucleosomes joined by linker DNA and associated histone H1, i.e., chromatosome, is further condensed into a fibre of diameter 30 nm, which itself is folded in some manner Histones: small group of highly conserved basic proteins that bind DNA and form core of a nucleosome K-mer: denotes K units in a molecule; e.g., 12-mer oligonucleotide indicates a molecule with 12 nucleotides MED: Motif Expression Decomposition Pre-initiation complex: complex of general transcription factors, i.e., TFIID, TFIIB, TFIIF and TFIIE, and RNA polymerase II assembled at the promoter sufficient for basal transcription; the complex can support a low level of transcription without activators TBP: TATA-box binding protein TF: Transcription Factor, a protein that participates in gene transcription, often by binding to a specific DNA sequence, e.g., TFIID TFII: transcription factor for Pol II 7 Chapter Introduction 1.1 Motivation Transcription is known as the first step (from DNA to RNA) in the universal pipeline of the biological information flow from genome to proteome (Figure 1.1) As a result, transcriptional regulation plays a vital role for the complexity, variety, and development of all organisms [7, 30] Transcription can be regulated at various levels, but there is one level found by Jacob and Monod [22] that has been attracting many attentions The level indicates that the output of transcription on a given gene is controlled by the set of motifs present in the gene’s promoter region (also known as binding sites), and associated transcription factors (TFs) present in the cell Interestingly, TFs are proteins that bind to specific parts of DNA, and proteins are products of gene Transcription of a gene is therefore basically regulated by the set of motifs which belong to the promoter of the gene Figure 1.1 Central dogma Recently, various methods developed for finding motifs and TFs have used yeast Saccharomyces cerevisiae as the model organism due to the availability of multiple yeast genomes and high-quality mRNA [see 4, 5, 44, 53] Nevertheless, studying the effects of motifs on gene expression as a function of promoter context still remains poorly investigated Methods of Pilpel et al [42] and Sudarsanam et al [52] studied the impacts of motif co-occurrence in the set of genes holding motif combinations of interest Even though their study could obtain the combinatorial impacts of motif-motif interactions on gene expression, it did not answer how such impacts are governed by other factors such as geometric features of promoter context A recent interesting method of Beer and Tavazoie [1] did take geometric features into consideration by using a Bayesian network of yeast expression profiles to discover the effect of motif position and orientation on gene expression More specifically, their method used a probabilistic model: after finding sets of co-expressed genes, they identify the DNA sequence (motif) features being responsible for regulation They applied a clustering method for a set of microarray expression data to find different sets of genes that are coexpressed across a set of conditions Each of these sets of genes describes an expression pattern across experimental conditions Then, they found out a large set of putative motifs that are overrepresented in each expression pattern [28] A Bayesian network is then used to derive the mapping between these motifs and the expression patterns The network uses each motif and its related features such as position and orientation as input variables, to measure the probability of having a particular expression pattern Obviously, their method thus has a drawback that it does not consider the individual expression patterns of each single gene, but analyze the expression profiles of gene clusters, a process that might cause loss of information Moreover, even though the metrics to measure the degree of gene expression using expression coherence [42, 52] or average pairwise correlation [1] taken up in their works can discover the impacts of motif on gene expression quite well, these metrics cannot provide a quantitative measure of motif influence on gene expression Compared to the previous methods, the method of Nguyen and D’haeseleer – Motif Expression Decomposition (MED) has more advanced features: (a) It operates on all genes, at the single gene level; (b) There are no assumptions about gene cluster/module memberships, and no manual tunings of parameters; (c) It bases on a deterministic mathematical strategy that is biologically intuitive, and simple 1.2 Thesis works and structure In this thesis, we will focus on studying of the MED method to develop an efficient, flexible, reliable and user-friendly software, MEDSoft, to provide worldwide biology community a way of analyzing and studying the motifs data MEDSoft is a website based on Microsoft ASP.NET technology Therefore, MEDSoft not only inherits advanced properties of MED method, but it also more powerful by new implementation Except the introduction and conclusion, the thesis is organized into chapters The second chapter sketches the basic concepts of transcriptional regulation in Eukaryotes The third chapter will illustrate the typical methods to obtaining principles of transcriptional regulation The fourth chapter describes the main outcome of the thesis, MEDSoft It will show the workflow and features of MEDSoft, as well as discuss some interesting results derived from MEDSoft 10 Chapter Transcription regulation in eukaryotic genomes 2.1 Introduction Transcription regulation is known as an extremely complicated problem in molecular biology It has been investigated by the vastly majority of scientific researchers on the globe However, many things inside it are still remaining a mystery One of the main aims of the gene expression problem is to study how a living organism regulates transcription of approximately thousands of genes in the proper spatial and temporal patterns Knowledge of how transcription factors function throughout gene expression can be applied to fundamental issues in the fields of biology and medicine To decipher these mechanisms, we need to understand a large number of processes influencing transcription and develop technical and strategic approaches for tackling them This chapter sketches an introduction to basic aspects of transcriptional regulation In eukaryotic genomes, DNA sequences are assembled into chromatin to keep genes in an inactive state by restricting access to RNA polymerase and its accessory factors Chromatin is composed of histones, which form a structure called a nucleosome Nucleosomes themselves are assembled into higher-order structures with different properties depending on the regulatory context Throughout the development, genes are turned on and off in a pre-programmed manner controlled by TFs, which bind to specific DNA sites near genes they control However, a particular TF is not committed to each regulatory event Instead, a mechanism called combinatorial control is applied, in which different combinations of regulatory proteins are used to turn genes on (activate) and off (deactivate) in different regulatory contexts [3] 2.1.1 Gene activation In a typical gene, the core promoter in the form of a DNA sequence is located immediately nearby and upstream of the gene The core promoter binds RNA 11 polymerase II (Pol II) and its accessory factors (basal transcription machinery) and guides the Pol II to begin transcribing at the proper start site In vivo, in the absence of regulatory proteins, the core promoter is normally inactive and fails to interact with the basal machinery Immediately upstream of the core promoter is a regulatory promoter, and further away either upstream or downstream are enhancer sequences (as shown in Figure 2.1 A) Regulatory promoters and enhancers are termed activators responsible for activating transcription of gene When the interactions between the activator and the basal machinery happen, the gene activation commonly occurs Some activators are ubiquitously expressed, whilst others are restricted to certain cell types, regulating genes necessary for a particular function of cell Figure 2.1 Gene activation model (A) Model of typical gene and components involved in gene activation and inactivation (B) Activation of a gene and assembly of the Pol II pre-initiation complex ([copyright Cell Press]) 12 When activating a gene, the chromatin enclosing that gene and its control regions must be remodeled to allow transcription Higher-order chromatin structures comprising networks of attached nucleosomes must be decondensed, nucleosomes over gene-specific enhancers and promoters must be made reachable to cell-specific activators, and, eventually, nucleosomes inside the gene itself must be remodeled to allow passage of the transcribing RNA polymerases (see Figure 2.1 B) There are various types of enzymes concerned chromatin remodeling and these are guided by a set of activators These enzymes are divided into two broad categories: ATP-dependent remodeling enzymes and histone acetylases Generally speaking, when these enzymes bind to a gene, they will remodel the chromatin so that activators and the basal machinery can bind The mechanisms of remodeling contain changes in the structure of chromatin and in modification of histones that in some way raise accessibility to TFs Transcription of a gene can be motivated once enhancers are reachable Nevertheless, enhancers could accidentally activate other neighbour genes without suitable regulation due to the fact that enhancers are able to activate transcription when they are located far from a gene It is also known that once the enhancer and promoter are reachable they bind to combinations of activators Binding of activators is commonly cooperative, in which one protein only binds weakly, but multiple activators engage in protein– protein interactions that enhance each of their affinities for the regulatory region The nucleoprotein structures including these combinatorial arrays of activators are termed as enhanceosomes (as shown in Figure 2.1 B) The enhanceosome interacts with the basal transcription machinery and recruits it to a core promoter to construct the ―pre-initiation complex‖ The trio: enhanceosome, basal machinery, and core promoter forms a network of protein–protein and protein– DNA interactions that control the rate of transcription initiation The interactions between the enhanceosome and components of the basal machinery are rarely direct but are linked by proteins called coactivators 2.1.2 Gene deactivation It is interesting to note that in many situations, genes are activated fleetingly and then later turned off In these cases, the sequence of events would comprise inactivation of the pre-initiation complex and constructing of a repressive 13 chromatin environment over the gene and its regulatory regions That constructing concerns two enzymes ATP-dependent remodeling and histone deacetylases The methods to activate a gene often vary, but mostly they involve the binding of sequence-specific repressors to silencer elements Genes are often methylated to maintain the inactive state Methylation also leads to recruitment of histone deacetylases Besides this introduction, we will describe the following sections In section 2.2, we summarize the fundamental mechanics of the transcription, including an overview of core promoter structure and the composition of the basal machinery The basal machinery consists of TFs and Pol II that are vital for the catalytic process of transcription Also, the machinery contains coactivators and corepressors permitting activators and repressors to communicate with the TFs and chromatin In section 2.3, we discuss regulatory DNA sequences, including enhancers and silencers, and regulatory proteins, including activators and repressors 2.2 Core promoter and basal transcription machinery The core promoter is known as the ―heart‖ transcription regulation and generally includes DNA sequence elements that can extend approximately 35 bp upstream and/or downstream of the transcription start site Most core promoter elements interact directly with components of the basal transcription machinery The basal machinery is composed of factors, including Poll II itself, that are vital for transcription in vitro from an isolated core promoter Many studies of the basal machinery have been performed with promoters containing a TATA box as a crucial core element A pre-initiation complex can form in vitro on TATAdependent core promoters by association of the basal factors in this order: TFIID/TFIIA, TFIIB, Poll II/TFIIF, TFIIE, and then TFIIH The features of the basal factors and the mechanisms by which they stimulate transcription start from TATA-dependent promoters have been the topic of recent works [see 8, 15, 26, 29, 36, 39, 43, and 57] The mechanisms by which sequence-specific transcription factors and coregulators influence the frequency of transcription initiation have also been discussed [6, 37, 43] 14 2.2.1 Structure of core promoter Figure 2.2 clearly sketches some of the sequence elements that can contribute to basal transcription from a typical core promoter Each of these sequence motifs is found in only a subset of core promoters The TATA motif can function without TFIIB recognition element (BRE), initiator element (Inr), and downstream core promoter element (DPE) motifs By contrast, the DPE motif demands for the presence of an Inr The BRE is located immediately upstream of a subset of TATA box motifs The DPE consensus was determined with Drosophila core promoters The Inr consensus is shown for both mammals and Drosophila Figure 2.2 Sequence elements of core promoter TATA motif This element, with the consensus TATAAA, was discovered by David Hogness and is used to be called the Hogness box It is positioned 25–30 bp (base-pair) upstream of the transcription start site The TATA box is able to independently guide basal transcription by Pol II on naked DNA templates in vitro The box is sufficient for leading activated transcription when an activator protein binds to a next-door regulatory element In Saccharomyces cerevisiae, TATA boxes were also found to be essential for transcription initiation; but in this organism, the element was located 40–120 bp from the start site [51] 15 Initiator element (Inr) The initiator element (Inr) is a discrete core promoter element that is functionally similar to the TATA box and can perform independently of a TATA box in an analysis of the lymphocyte-specific terminal transferase (TdT) promoter [46, 47] Transcription from this promoter commences at a single start site, yet the region between ~ 25 and ~30 is G/C-rich and is unimportant for promoter activity An extensive mutant analysis showed that the sequence between ~3 and ~5 is necessary and sufficient for accurate transcription in vitro and in vivo [23, 46] By itself, the TdT Inr supports a very low level of specific initiation by Poll II In nuclear extracts, its activity is comparable to that of an isolated TATA box without an Inr at the start site [46, 48] Interestingly, when an Inr is inserted into a synthetic promoter downstream of six binding sites for transcription factor Sp1 (without an TATA box), the Inr supports high levels of transcription that commence at a specific start site within the Inr When the Inr is inserted at a different location relative to Sp1 sites, RNA synthesis consistently begins at the nucleotide dictated by the Inr In the absence of the Inr, transcription embarks on heterogeneous start sites at much lower frequencies Activity of Inr relies on a loose consensus of approximately PyPyA+1NT/APyPy Downstream core promoter element (DPE) DPE is a seven-nucleotide sequence originally discovered in Drosophila DPE bears the consensus sequence RGA/TCGTG and is centered in the region of 30 bp downstream of the Inr site In Drosophila, DPE has been studied in the greatest detail, DPE is found in TATA-less promoters and acts in conjunction with the Inr element to direct specific initiation of transcription TFIIB recognition element (BRE) BRE was discovered by Ebright et al [27], who identified the potential for DNA binding by TFIIB dependent upon the position of TFIIB relative to the major groove in the crystal structure of the TBP–TFIIB–TATA (TBP stands for TATA-box binding protein) ternary complex Recent binding-site-selection experiments exposed that TFIIB bound specifically to a sequence with the 16 consensus G/C G/C G/AGGCC located from –32 to –38, just upstream of the TATA box The BRE is discovered in the majority of eukaryotic promoters Interestingly, however, it is missing in yeast and plants, which suggests that the BRE may not contribute to gene regulation in these organisms 2.2.2 Basal transcription machinery Eukaryotic gene regulation causes to be concerned with a complicated interplay within activators, repressors, the basal transcription machinery, and chromatin The basal transcription machinery consists of Pol II, and the TFs TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH [39], and a complex of coactivators termed the mediator Pol II is a large enzyme Interesting feature of Pol II is the heptapeptide (7-peptide) repeat constituting the carboxyl (-COOH) terminus of the largest subunit This carboxyl-terminal domain is involved in transcription regulation Recent biochemical studies indicate that the TFs support basal transcription and perform lots of the catalytic functions required for initiation Coactivators with the mediator are thought to link the activators and TFs Analysis of genome-wide transcription using DNA microarray technology implies that coactivators are only required for transcription from subsets of genes [20] Coactivators and TFs are both part of a complex called the holoenzyme We consider coactivators that dwell in the holoenzyme as components of the basal machinery The subunits including all TFs have been cloned, and a basic knowledge of their function and mechanism has appeared from studies in almost eukaryotic organisms Researches in yeast have provided perceptive data on basal factor mechanism and have been essential for evaluating the validity and ramifications of biochemical and functional studies performed in mammalian systems [15] Here we argue two problems: (i) how the basal factors themselves assemble into transcription complexes; (ii) their association with coactivators and mediators in the form of the holoenzyme 2.2.2.1 Basal transcription complex assembly Purified TFs and Pol II mediate basal transcription on the core promoter in vitro but are unable to support activated transcription without coactivators In the first studies of transcription complex assembly, TBP was used instead of TFIID because TBP was small; it is able to sustain basal transcription in the presence of 17 the other TFs Moreover, TFIID was not sufficiently purified at that time to analyze basal complexes containing it Earlier studies designated that purified TFs and TBP assembled into a transcription pre-initiation complex on the DNA in a stepwise framework The complex is nucleated by the binding of TBP to the TATA box assisted by TFIIA or TFIIB, which can bind in any order [39] The crystal structures of TBP and TBP–TFIIB and TBP–TFIIA complexes with DNA have been solved, exposing insights into the process of promoter recognition Both TFIIA and TFIIB contacting with DNA and TBP can expand the stability of TBP binding When the binding of TFIIB to TBP is completed, a complex of TFIIF in association with Pol II is recruited, followed by sequential binding of TFIIE and TFIIH 2.2.2.2 Holoenzyme and mediators Conceptually, basal factors were assembled into transcription complexes in a stepwise fashion were attractive from the view point that different steps could, in regard to the truth, be regulated by activators and repressors That mechanism would support to explicate the diversity in gene expression patterns Nevertheless, the significance of the finding depends on TFs being differentially limiting at promoters Differential binding of TFs has not yet appeared as a major regulatory theme, even though there are cases where TFs have different affinities for core promoters (e.g., TBP binding to consensus and nonconsensus TATAs; TFIIB binding to a consensus vs a degenerate BRE; TFIID binding to an Inr-containing vs Inr-less promoter) Instead, studies have concentrated on recruitment of a single large TF-containing complex termed the holoenzyme Contrary to the complexity of the stepwise pathway, the holoenzyme provides a single target through which activators bound to an enhancer or promoter can recruit the basal machinery in a concerted fashion [35] 2.3 Regulatory sequences Transcription regulation is governed by the binding of sequence-specific DNAbinding proteins to regulatory promoters and enhancers In this section, we illustrate the features of activators/repressors and enhancers/silencers 18 2.3.1 Enhancers and regulatory promoters Conceptually, the regulatory promoter is the region nearby the core promoter and within a few hundred base pairs of the transcription start site, and enhancer is a control region found at a greater distance from the transcription start site, either upstream or downstream of the gene or within an intron Because regulatory elements in an enhancer can also function in the context of a promoter, so that the distinctions between promoters and enhancers have even become unclearly In contrast, promoter elements could affect enhancer activity if multimers of the element are inserted at far position Current compilation of promoters, where the transcription start sites have been mapped, is accessible in the eukaryotic promoter database (http://www.epd.isb-sib.ch/) It is thought that enhancers bind activators and other sequence-specific proteins concerned chromatin remodeling Once bound, these activators loop out the mediating DNA to interact with proteins bound to the regulatory and core promoters (i.e., other activators and the basal machinery) These interactions are known to make transcription complex assembly stable The looping model is crucial for the two following reasons First, the energetics of DNA looping has been studied widely in model systems by ligation of large DNA molecules and by cooperation with far proteins [56] Second, in a looping model, chromatin could play a positive architectural role by condensing the DNA that is between enhancer and promoter, aiding long-range interactions 2.3.2 Activators Activators are modular proteins with various domains for DNA binding and activating the transcriptional process [25, 55] It is widely believed that the DNA-binding domain targets the activator to a specific site, maybe connected to cooperativity domains that allow combinatorial interactions with other activators The activation domain, on the other side, interacts with the basal machinery to recruit it to the promoter In some situations, these domains comprise a piece of the same polypeptide (i.e., the yeast GAL4 and GCN4 proteins), whilst, in others, the domains are located on isolated subunits of a multiprotein complex This multisubunit organization gives more chances for combinatorial control and regulatory diversity More specifically, here we discuss two important domains 19 DNA-binding Domains In accord with the sequence and structure of DNA-binding domains, regulatory proteins are often grouped into class The goal function of the DNA-binding domain rules out the site of activator action and the contribution of an activator to gene regulation As a result, to study how activators bind specific sites and distinguish between related sites has been becoming a key focus of interest in the gene expression area Many classes of DNA-binding domains have been described in eukaryotes [41] Some DNA-binding proteins not fit into any of the defined class, whilst in others these classes have been further subdivided Certainly, members of some protein classes bind to similar DNA sequences Nevertheless, in other class, there is slight similarity between recognition sites for the different class members, because the key recognition amino acids are vastly variable among class members Activation Domains The phrase ―activation domain‖ refers loosely to a broad variety of protein domains that interact either with components of the basal transcription machinery or with coactivators It is widely defined that activation domain is a region of protein that stimulates transcription when ascribed to a heterologous DNA-binding domain However, there are situations that residues necessary for activation are interlocked with the DNA-binding domain, even though the majority of activators are modular in structure In addition, most of activators contain multiple activation domains For instance, GAL4 contains one domain on the amino terminus nearby to the DNA binding domain and another on the carboxyl terminus [12] Organization in a domain is substantially flexible For example, deleting analysis of GCN4 shows a functional redundancy, in which remove of one or the other segment elicits a negligible effect on activation; remove the entire domain is need for abrogate activity [21] Other works [9, 19] indicate that activation domains within a regulatory protein function additively or synergistically on activation potential 20 2.3.3 Repressors and corepressors It is commonly known that repressors and corepressors play a key role in regulating gene expression However, up until now, repression mechanisms are poorer understood than activation mechanisms Generally, transcriptional repression can be divided into three big groups In the first group, repression could be happened by inactivation of an activator accomplished by some distinct mechanisms: (i) posttranslational modification of the activator [34], (ii) dimerization of the activator with a nonfunctional partner [2], (iii) competing for binding site of the activator, or an interaction between repressor and activator that outcomes in covering of the activator’s function [33] In the second group, repression could be mediated by proteins associating strongly with TFs and thus inhibit the creation of a pre-initiation complex In the final group, repression is mediated by a specific DNA element and DNA-binding protein, which function dominantly to repress both activated and basal transcription of a gene Some studies prove that in these situations interactions with the basal machinery [17] or chromatin can cause gene inactivation 21 Chapter Methods to derive principles of transcription regulation 3.1 Principles of transcription regulation As mentioned in the previous chapter, transcription regulation in eukaryotes is an intricate field In general, regulatory sequence in that system is mainly composed of two components: activators/repressors and enhancers/silencers The former is made up of proteins binding to DNA sequences and the latter is formed of DNA sequences Transcription regulation is therefore commonly controlled by two associated components: DNA sequences and their surrounding binding proteins sequences Moreover, as illustrated in Figure 1.1, because proteins are the product of gene (DNA sequences) through two processes in central dogma (transcription and translation), transcription is accordingly fundamentally controlled by DNA sequences so called motifs For the scope of this chapter, we only consider a simpler model of transcription regulation (see Figure 3.1) As can be seen from Figure 3.1, transcription will be activated in the presence of input signals (e.g signal A and signal B) Then receptor proteins are responsible for receiving the signals Throughout some complex processes, the transcription factor (TFs) binding to cis-regulatory DNA sequence elements (CREs) will become active to stimulate the transcription In this model, transcription output (egc) of a given gene is governed by two components: CREs (motifs) (e.g Mgi, Mgj) occurring in the promoter region of such gene, and transcription factors (TFs, e.g Aic, Ajc) presenting in the cellular environment Because TFs are gene products, their productions in principle are controlled by motifs Accordingly, transcription of a given gene is primarily regulated by the motifs present in such gene’s promoter, operating as the gene’s condition-independent signal receivers, and the set of functions describing the dependency of motif binding strength—the quantitative level of motif’s influence on gene expression–on promoter context forms the set of principles of transcription regulation For these explanations, it is clear to define that principles of transcriptional regulation are a set of condition-independent rules that cis-regulatory elements, or motifs, obey in order to regulate expression of gene they control Such rules can be a function of promoter context such as the ... domains 19 DNA-binding Domains In accord with the sequence and structure of DNA-binding domains, regulatory proteins are often grouped into class The goal function of the DNA-binding domain rules... Methods to derive principles of transcription regulation 21 3.1 Principles of transcription regulation 21 3.2 Typical methods to derive principles of transcription regulation 22 3.2.1... focus of interest in the gene expression area Many classes of DNA-binding domains have been described in eukaryotes [41] Some DNA-binding proteins not fit into any of the defined class, whilst in