Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 15 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
15
Dung lượng
639,11 KB
Nội dung
Seed-based systematic discovery of specific transcription factor target genes Ralf Mrowka1,2,3, Nils Bluthgen4 and Michael Fahling1,3 ă ă Paul-Ehrlich-Zentrum fur Experimentelle Medizin, Berlin, Germany ă AG Systems Biology – Computational Physiology, Berlin, Germany ´ Johannes-Muller-Institut fur Physiologie, Charite-Universitatsmedizin Berlin, Germany ă ă ă School of Chemical Engineering and Analytical Sciences, Manchester Interdisciplinary Biocentre, University of Manchester, UK Keywords feedback; glaucoma; NF-jB; optineurin; transcription factor target prediction Correspondence R Mrowka, Paul-Ehrlich-Zentrum fur ă Experimentelle Medizin, AG Systems Biology – Computational Physiology, Tucholskystr 2, D-10117 Berlin, Germany Fax: +49 30 450528972 Tel: +49 30 450528218 E-mail: ralf.mrowka@charite.de (Received 26 February 2008, revised April 2008, accepted 16 April 2008) doi:10.1111/j.1742-4658.2008.06471.x Reliable prediction of specific transcription factor target genes is a major challenge in systems biology and functional genomics Current sequence-based methods yield many false predictions, due to the short and degenerated DNA-binding motifs Here, we describe a new systematic genome-wide approach, the seed-distribution-distance method, that searches large-scale genome-wide expression data for genes that are similarly expressed as known targets This method is used to identify genes that are likely targets, allowing sequence-based methods to focus on a subset of genes, giving rise to fewer false-positive predictions We show by cross-validation that this method is robust in recovering specific target genes Furthermore, this method identifies genes with typical functions and binding motifs of the seed The method is illustrated by predicting novel targets of the transcription factor nuclear factor kappaB (NF-jB) Among the new targets is optineurin, which plays a key role in the pathogenesis of acquired blindness caused by adult-onset primary open-angle glaucoma We show experimentally that the optineurin gene and other predicted genes are targets of NF-jB Thus, our data provide a missing link in the signalling of NF-jB and the damping function of optineurin in signalling feedback of NF-jB We present a robust and reliable method to enhance the genomewide prediction of specific transcription factor target genes that exploits the vast amount of expression information available in public databases today The prediction and analysis of the regulatory networks underlying gene expression is a central challenge in systems biology and functional genomics [1,2] Regulation of transcription is the initial mechanism for controlling the expression of genes Key regulators of transcription are transcription factors, which bind to DNA motifs in noncoding regions that control gene transcription Therefore, the identification of transcription factor target genes is one major element in the understanding and reconstruction of the regulatory network Although many DNA motifs for transcription factor binding are known and are contained as consensus sequences and binding matrices in databases such as transfac [3] and jaspar [4], their direct use for genome-wide matching in promoter sequences of higher organisms is greatly limited [5] Current methods that use sequence data give results that are dominated by false predictions [5] The issue of a high proportion of false positives in pure sequence-based methods has been known for a long time [6], and also Abbreviations CASP4, caspase 4; ChIP, chromatin immunoprecipitation; GAPDH, glyceraldehyde-3-phosphate dehydrogenase; HEK, human embryonic kidney; HIF-1, hypoxia-inducible factor 1; HNF4, hepatocyte nuclear factor 4; IKK, IjB kinase; NEMO, nuclear factor kappaB essential modulator; NF-jB, nuclear factor kappaB; OPTN, optineurin; RGA, reporter gene analysis; STAT5A, signal transducer and activator of transcription 5A; TNF-a, tumor necrosis factor-a 3178 FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS R Mrowka et al applies for the transcription factors analysed in this study The major problem is the short length and high degeneracy of the DNA-binding motifs, which give rise to one predicted binding site per 1000–10 000 bp by sheer chance Therefore, other resources, such as phylogenetic footprinting have been explored to further restrict and ‘purify’ potential targets to more likely candidates [7,8] Such methods decrease the number of false predictions by about one order of magnitude, which is still not good enough for genome-wide predictions Because the potential list of targets is too large, further information needs to be exploited to concentrate the analysis on the genes that have a higher probability of being true target genes Gene ontology as a controlled and computer-readable way to annotate genes has been used extensively to characterize clusters of genes from microarray [9,10] data and also to validate microarray data [11] Despite the enormous number of false-positive predictions for transcription factor targets with current methods, significant correlations with gene ontology terms have been found that can be used to enhance prediction quality [12,13] In addition, statistical methods have been developed to associate genes with disease [14], and seed-based computational procedures have been applied to identify brain cancer-related genes [15] Currently, experience and knowledge of pathways and an educated literature search may help us to focus on possible candidates The inclusion of information from expression experiments conducted under different experimental conditions may hint at potential candidates for further evaluation, as these data provide the relevant biological functions of transcription factors, which directly influence mRNA concentrations in the cell Well-designed, small-scale expression profile experiments have been successfully used to identify transcription factors involved in certain pathways [16,17] Especially when applied to time-series data, seed-based clustering methods have been very successful in identifying novel targets by comparing expression kinetics with known targets for p53 and for picking up genes regulated in different cell-cycle phases [18,19] However, these approaches require dedicated microarray experiments We addressed the question as to whether it is feasible to explore the large body of expression information that is already stored in public databases These datasets might contain information about expression at different time points for different cell lines that might be only marginally related to the transcription factor under investigation, and we wondered whether these datasets would allow us to extract the relevant information about the action of transcription factors on their targets Systematic TF target prediction In recent years, several microarray techniques have been developed to measure mRNA concentration on a genome-wide scale [20] In addition, efforts have been made to store individual microarray experiments in databases Microarray expression data have been used in recent times to improve transcription factor target prediction [21] In this work, we developed a method to exploit a dataset of approximately 1200 microarray experiments in conjunction with a seed group of known transcription factor target genes and show that the information available in the databases is sufficient to increase the accuracy of prediction drastically We elucidate and exemplify our seed-distribution-distance method for predicting novel nuclear factor kappaB (NF-jB) targets NF-jB is involved in pathways important for both physiological processes and disease conditions It plays an important role in the control of immune function, differentiation, inflammation, stress response, apoptosis, cell survival, processes of development, and progression of cancers [22] Thus, NF-jB has become one of the most widely studied transcription factors Five NF-jB genes (NFKB1, NFKB2, RELA, c-REL and RELB) belong to the NF-jB gene family, and the resulting proteins are able to form homodimers or heterodimers [23] Prior to activation, NF-jB is localized in the cytoplasm and is tightly associated with its inhibitors (IjB proteins) and p100 proteins Multiple stimuli such as tumor necrosis factor-a (TNF-a), UV radiation and free radicals, activate NF-jB signalling through activation of IjB kinases (IKKs), which phosphorylate IjBs and p100 proteins, subsequently leading to their polyubiquitination and degradation [24] Results The seed-distribution-distance method We started by defining a ‘seed’ group of known NF-jB targets by collecting known NF-jB targets mentioned in an NF-jB review paper [25] matching ensembl entries, resulting in 91 genes Joining the 91 target genes with the genes in the microarray set resulted in 81 genes, which were used as the seed We obtained these large-scale microarray expression data [26] (detailed description of data in supplementary Doc S1) from the Stanford microarray database [27] The set contains genome-wide data from 1202 hybridization experiments from human tissues and cell lines Subsequently, we ranked each gene x according to its similarity L(x) of expression to the seed group (detailed results given in supplementary Doc S2) We defined similarity L(x) for a gene x by taking the FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS 3179 Systematic TF target prediction R Mrowka et al median correlation of gene x to the seed and subtracting its median correlation to all genes (typical distributions of correlations of genes to the seed group are shown in supplementary Fig S1) Thus, if L(x) showed high values, the particular gene was similarly regulated as the seed gene group In contrast, if the absolute value of the similarity measure was low, it indicated that the median of distribution was close to that correlation distribution of the gene to a randomly selected group Using the similarity measure L, we then sorted all remaining human genes and thereby obtained a ranking of the genes according to their similarity to the seed group To avoid a circular argument, we would like to stress that for all statistical analyses and characterization of rank, the seed group was excluded A schematic representation of this procedure is given in Fig The essence of the method is that if a gene’s correlation to those in the seed set (represented by the median) is larger than the median of the correlation to all genes, then it is more likely to be related to the seed set, the members of which are then more likely to be targets of the transcription factor This method requires that at least the initial seed set of true targets is known, and that other targets are correlated to several genes in the seed set Furthermore, the method is based on the assumption that there is a relationship Fig Schematic diagram of the workflow in this study Expression profiles of a gene g are compared to the expression profiles of the seed genes and randomly selected genes A distance score L(x) is calculated that quantifies specific expression similarity to the seed The genes are then ranked on the basis of L(x), searched for putative binding sites in their promoter region, and subjected to a reporter gene assay 3180 FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS R Mrowka et al Systematic TF target prediction between gene coexpression and gene coregulation The ranking can also be done by other scores than the median correlation For instance, we have ranked the genes using a one-sided P-value derived from a computationally more extensive Mann–Whitney ranksum test, and found similar performance as with L(x) (see supplementary Fig S3) Top members in the rank show typical NF-jB functions We next analysed the top members of the obtained rank with regard to their gene ontology classification For the top 600 genes, we examined whether any gene ontology classification is significantly enriched using rigorous statistics [12] It turns out that the list of significant gene functions of the top 600 genes as shown in supplementary Table S1 is congruent with the functions of NF-jB described in the literature We further analysed the occurrences of NF-jB typical functions within the rank We found that there was a steep increase of the density of genes involved in ‘immune response’, starting at approximately rank 700 when moving from lowest to highest ranks The probability of a gene being involved in the immune response is therefore greatly increased for the top members in the rank, as seen in Fig Genes involved in immune response 0.25 0.2 Density Density of occurence 0.2 0.1 0.15 "high rank" 5000 10 000 position "low rank" 0.1 0.05 High density of putative NF-jB DNA-binding sites in promoters in the top group of the rank As the overrepresentation of typical NF-jB-related biological functions might be due to coexpression mediated by different transcription factors, we decided to analyse the sequences of putative promoter regions of the high-ranking genes We predicted binding sites for all vertebrate transcription factors contained in the transfac database in the 500 bp putative promoter region of all genes in the ranking We derived the 500 bp sequences upstream of the transcriptional start site from the ensembl database We chose to limit our search to 500 bp, because we and others observed earlier that the majority of promoter sequences fall within this region [12,28] To illustrate our method, we chose to search for consensus sequences from the transfac database in the putative promoter regions, as this method does not require an additional parameter like more sophisticated weight-matrix methods, which typically require a cut-off score (see also supplementary Table S5) We analysed the distribution of occurrence of all predicted factor-binding sites in the promoters of genes along the rank For each predicted binding motif, we calculated the ratio of the number of occurrences in the upper 5% of the rank divided by the expected occurrence in the top 5% (given by 0.05 times the total number of occurrences) A list of the motifs sorted by this ratio has NF-jB-binding motifs in the top ranks, namely NFKAPPAB65 (P = 0.0028) and NFKAPPAB50 (P = 0.0239) (P-values from the binomial test; see Experimental procedures) In addition, this list includes motifs of the transcription factors BACH2 (P = 0.0025), signal transducer and activator of transcription 5A (STAT5A) (P = 0.0036), and VBP (P = 0.0106), which are enriched on average in the top group A graphical representation is given in Fig (see also supplementary Table S4) Robustness of seed-distribution-distance method 500 1000 1500 2000 Position of gene in the ranking Fig Density of occurrences of genes annotated with the term ‘immune response’ in the ranking after applying the seed-distribution-distance method Immune response genes are highly enriched in the top members of the rank (P < 0.0001, two-sided Mann–Whitney rank-sum test) Red, individual occurrences of immune response genes; black line, density of genes that are annotated with the term Inset: density for all genes in the rank The original seed group contained 81 known NF-jB targets (supplementary Table S2) As, for most transcription factors, fewer targets are known, we investigated whether the seed-distribution-distance method might also give reliable results if the seed was substantially smaller We applied a cross-validation strategy by randomly dividing the original 81 targets into two groups, one group being the seed, and the remaining genes constituting the other group, named the test group, t Several sizes of the seed were used (1, 10, 20 FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS 3181 Systematic TF target prediction R Mrowka et al Enrichment of putative transcription factor binding sites in top group Histogram of recovery test NFκB 65 STAT5a VBP1 2.5 NFκB 50 Sites not enriched 1.5 Binding sites for 234 other vertebrate transcription factors Original seed n = 81 Seed n = 50 Seed n = 20 Seed n = 10 Seed n = 0.5 Relative occurence BACH2 Sites enriched 0.4 0.3 0.2 0.1 0.5 0 Occurence Enriched P < 0.025 Depleted P < 0.025 Fig Distribution of enrichment of putative transcription factorbinding motifs in the ranking after applying the seed-distribution-distance method The seed-distribution-distance method enriches genes with putative NF-jB-binding sites in the respective promoter The top gene group of the seed rank was analysed regarding transcription factor-binding motif enrichment within the )500 bp promoter region The binding motifs for NF-jB 50 and NF-jB 65 are among the transcription factor-binding sites that are most strongly enriched Note that the initial seed group was not contained in this analysis and 50 are shown in Fig 4; cumulative representations of the distributions are provided in supplementary Fig S2) After rank construction using the reduced seed, the test group was then analysed regarding its position in the rank This procedure was repeated 100 times It turned out that the test group members were strongly present in the top positions of the rank, and this was preserved even if a considerable part of the original targets was not used for the seed Even if one used, for example, only 10 of 81 members of the seed, the remaining 71 genes in the test group were highly enriched in the top ranks, as shown in Fig Moreover, we addressed the question of whether the seed-distribution-distance method is also effective in enriching targets for other transcription factors We chose E2F [29,30], ETS1 [31,32], hypoxia-inducible factor (HIF-1) [33], hepatocyte nuclear factor (HNF4), and c-Myc [34], and collected seed groups for these factors (supplementary Tables S2 and S3) We applied our method to these seed groups in a jackknife manner (i.e we iteratively left one seed member out and determined its position in the rank) For all of 3182 2000 4000 6000 8000 10 000 12 000 14 000 Recovered position in gradient Fig Recovery of target genes in a cross-validation test: the original seed was divided into two parts: (a) a group of members for rank construction; and (b) a test group with the remaining members of the original seed Histograms of the recovery position of the test group are shown for the newly constructed ranks using the seed without the test group (median: s, , h, ) If, for example, 10 genes are used as a seed (71 in the test group), the relative occurrence of the recovered positions are still very high (h), i.e the enrichment capability of the seed-distribution-distance method is still highly preserved For comparison, the relative occurrence of members of the original seed in the corresponding rank is given (d) The error bars indicate the 5th and 95th percentiles of the distribution Corresponding cumulative histograms are given in supplementary Fig S2 these additional transcription factors, the seed members left out were strongly enriched in the top of the rank (Fig 5) Moreover, the top members of the rank were strongly enriched with typical gene ontology terms of the factors for E2F and HNF4 For ETS1, HIF-1 and c-Myc, this ontology enrichment is not as clear as for the other three tested factors One reason could be the considerably lower number of gene ontology annotated genes for the specific terms and, in the case of c-Myc, the broad-spectrum ontologies [34] The results of this jack-knife procedure also provide an estimate of how many of the true positives will lie in the upper 5%: about 18–39% of all targets would be in the upper 5% of genes of the rank (26% for NF-jB, 39% for E2F, 29% for ETS1, 18% for HIF-1, 36% for HNF4, and 20% for c-Myc) Thus, applying the seed-distribution-distance method will enrich the true targets in the top 5% of the rank by a factor of 4–8 FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS R Mrowka et al Systematic TF target prediction Table Potential NF-jB targets identified by the seed-distribution-distance method that are in the top group of the rank and have predicted NF-jB-binding motifs within their )500 bp upstream promoter region Interestingly, eight of the 16 identified new targets are known targets of NF-jB Note that all potential new targets were not in the initial seed group, so the otherwise known targets therefore constitute a good validation of our method The third column contains additional information about the results of the analysis of the ChIP assays and the reporter gene analysis (RGA) followed by a + or ) in case of a positive or negative result, respectively Description Reference for evidence as an NF-jB target ENSG00000100906 NF-jB inhibitor alpha (NFKBIA) Sun et al [58], this article, CHIP+, RGA+ (positive control) ENSG00000197635 ENSG00000142539 ENSG00000123240 ENSG00000173432 ENSG00000163739 ENSG00000081041 ENSG00000169245 ENSG00000117151 ENSG00000135604 ENSG00000023445 ENSG00000196954 ENSG00000166718 ENSG00000077150 ENSG00000158714 ENSG00000163435 Dipeptidyl peptidase (DPP4) Transcription factor Spi-B (SPI-B) Optineurin (OPTN) Serum amyloid A protein precursor (SAA1) Growth-regulated protein a precursor (CXCL1) Macrophage inflammatory protein 2a precursor (CXCL2) Small inducible cytokine B10 precursor (CXCL10) Di-N-acetylchitobiase precursor (CTBS) Syntaxin-11 (STX11) Baculoviral IAP repeat-containing protein (BIRC3) Caspase-4 precursor (EC 3.4.22.-) (CASP4) Hypothetical protein Nuclear factor NF-jB p100 subunit (NFKB2) SLAM family member precursor (SLAMF8) E74-like factor (ELF3) ENSEMBL ID Taken together, these results suggest that the seeddistribution-distance method is applicable to other transcription factors as well, and might be used for much smaller seed sizes than the 81 genes used in the NF-jB seed The list of predicted NF-jB targets and experimental verification We assembled a list of predicted NF-jB target genes by selecting all genes that showed a putative NF-jBbinding site (a match of a transfac consensus motif of NF-jB) in the 500 bp upstream of the transcription start site and were members of the upper 5% in the rank The resulting list is shown in Table Eight of the 16 predicted targets have already been reported in the literature to be direct targets of NF-jB, but were not in the seed We decided to validate three of the novel predicted targets by performing luciferase reporter assays We focused on optineurin (OPTN), among SPI-B, and caspase (CASP4), and chose NFKBIA as a positive control and DARS from the bottom of our rank as a negative control We cloned their human promoters in a luciferase reporter plasmid and generated identical plasmids in which the predicted consensus sequence of the NF-jB-binding site was deleted A widely used method to induce NF-jB is stimulation by means of TNF-a Human HEK293 cells were transiently transfected with the reporter plasmids, and TNF-a stimula- This paper, ChIP+, RGA+ This paper, ChIP+, RGA+ Edbrooke et al [59] O’Donnell et al [60] Guitart et al [61] O’Donnell et al [60], suggested Hosokawa et al [62] This article, RGA+, ChiP) Lombardi et al [63] Grall et al [64] tion (1.25–20 ngỈmL)1) was applied For all three unmodified promoters, luciferase activity was strongly induced in a concentration-dependent manner under TNF-a stimulation in the undeleted plasmid, very similar to our positive control NFKBIA In contrast, in the experiment with the plasmids in which we had deleted the putative NF-jB sites, the concentrationdependent stimulation effect was not seen for OPTN and CASP4 promoters, and was strongly reduced for the Spi-B promoter (Fig 6), indicating that the NF-jB action was blocked in the deleted mutant The negative control (DARS) did not show any significant dosedependent change in expression Furthermore, we applied the chromatin immunoprecipitation (ChIP) analysis in order to verify NF-jB interaction with the predicted NF-jB-binding sites A positive ChiP signal was obtained for OPTN and SPI-B as well as for NFKBIA in stimulated cells (Fig 6) NFjB-dependent activation of the CASP4 promoter was not indicated by ChIP analysis in HEK293 cells (Fig 6Be) This correlates well with a very low basal promoter activity, and therefore may be attributed to a silenced CASP4 promoter in the cellular model used Discussion We have described the seed-distribution-distance method for the identification of specific transcription factor target genes This strategy extracts relevant information about gene regulation from large-scale FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS 3183 Systematic TF target prediction Transcription Factor Cross-validation Gene ontology 30 25 Density (%) Number of genes E2F R Mrowka et al 20 15 10 5 0 10 Extracellular matrix Density (%) Number of genes ETS1 Cell cycle 10 10 0 25 Density (%) Number of genes 30 HIF-1 20 15 10 Response to hypoxia Angiogenesis 10 0 Liver development Blood coagulation Lipid metabolic process Density (%) Number of genes HNF4 10 0 Immune response 30 Density (%) Number of genes 40 NFkB 20 10 10 0 25 Cel proliferation Density (%) Number of genes 30 c-Myc 20 15 10 10 0 10 000 20 000 microarray experiments to generate a distribution-distance-derived target prediction based on a seed set of known target genes of a specific transcription factor The target prediction is based on a combination of 3184 5000 10 000 Position in rank Position in rank Fig Left column: cross-validation of the seed distribution method for six different transcription factors By means of a jackknife method, the recovery position of the gene left out in the rank was calculated for each transcription factor seed group There is a clear and high enrichment in the top ranks for each transcription factor tested Right column: we applied the seed distribution method to rank genes We calculated the gene ontology density for typical ontologies of the corresponding factor Enrichment corresponds to an increased density at the top ranks as compared with the density at the bottom ranks transcription factor-binding site information and the distribution distance We took especial care to keep our method simple and the number of free parameters as low as possible, so our results not depend on FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS R Mrowka et al any parameter fine-tuning Despite the simplicity of the method, our predictions are very reliable, with 11 of the 16 predictions being true targets, corresponding to an upper bound of the false discovery rate of 33% On the basis of a jack-knife method, we estimate that our seed-based method of ranking genes will enrich true target genes within the top 5% by a factor of 4–8 Thus, incorporating the vast amount of microarray data stored in databases can help to reduce the extraordinarily high amount of false-positives obtained with purely sequence-based methods [5,7,35] More sophisticated clustering methods might even improve the prediction quality further We provide both statistical and biological evidence that the seed-distributiondistance method is robust and applicable to other transcription factors and is hence very useful in predicting specific transcription factor target genes Top rank members are involved in typical NF-jB-regulated functions and are enriched with putative NF-jB-binding sites The distance criterion for generating the rank is a kind of expression profile similarity measure with respect to the seed group It is not a priori clear that similarly regulated genes share the same gene function The NF-jB analysis, however, reveals that the seed-distribution-distance method highly enriches genes in the top ranks that share typical NF-jB-regulated functions For instance, the processes immune responses, complement activation, regulation of T-cell differentiation and immune cell activation are significantly present in the top group (supplementary Table S1) Moreover, we found specific enrichment of predicted binding motifs for NF-jB 50 and NF-jB 65 in the top 5% of the genes among three others We would expect the other factors to be functionally related to NF-jB This is the case for STAT5A, which has been reported to be involved in severe combined immunodeficiency [36] and is involved in the immune response [37] Please note that these statistics were obtained without the initial seed group Therefore, it would have been possible in our example to determine with high certainty from the constructed rank which seed group was used to build up the rank, namely a group with NF-jB targets OPTN is a direct NF-jB target We predict a list of new NF-jB targets that were not in the initial seed (Table 1) Eight of the 16 predicted novel targets have been previously confirmed Three other predicted NF-jB targets were experimentally investigated in this study, and were identified as direct Systematic TF target prediction NF-jB targets OPTN, Spi-B and CASP4 were in our predicted list of new targets Deletions in the OPTN gene are causative for the adult-onset primary openangle glaucoma [38] Glaucoma affects 67 million people worldwide [39], and is the second largest cause of bilateral blindness in the world [40] It has been suggested that OPTN is involved in the TNF-a signalling pathway [41]; however, the molecular mode of action has been unknown up to now It has been suggested that OPTN blocks the protective effect of E3-14.7K on TNF-a-mediated cell killing, and hence OPTN may be part of the TNF-a signalling pathway that can shift the equilibrium towards induction of apoptosis [38,41] Recently, it has been shown that OPTN increases cell survival and translocates to the nucleus upon an apoptotic stimulus that is dependent upon the GTPase activity of Rab8, an interaction partner of OPTN [42] Interestingly, this protective function of OPTN is lost when the OPTN protein is changed to the mutated form E50K, which is typical for patients with normal tension glaucoma [42] We show that a deletion of a putative NF-jB-binding site in the promoter region of OPTN completely abolishes the enhancing action and modulatory effect of NF-jB on OPTN (Fig 6) Our experiments show clearly that OPTN is a direct target of NF-jB Recent findings indicated that TNF-a potentiates glutamate neurotoxicity through the blockade of glutamate transporter activity [43,44] Furthermore, it was shown that OPTN and NF-jB essential modulator (NEMO) are competitive inhibitors of one another [45] NEMO represents the regulatory subunit of IKK, which is essential for NF-jB activation [46] Together with our data, this makes it apparent that OPTN is part of a negative feedback system that is important for NF-jB action Elevated OPTN expression reduces induced NF-jB activation [45], and is therefore protective against induced neuronal cell death, which depends on NF-jB activity This is in line with findings indicating that the protective function of OPTN is lost upon truncation resulting from the insertion of a premature stop codon, and when the OPTN protein is changed to the mutated form E50K, which is markedly reduced in patients suffer from glaucoma [42] Our data provide the missing link in the signalling of NF-jB and the damping function of OPTN in signalling feedback of NF-jB The knowledge about the direct action of NF-jB on OPTN will greatly enhance our understanding of the signalling pathways relevant for antiapoptosis, and will be helpful in designing possible new cell survival strategies in glaucoma patients The two other newly identified and verified target genes of the NF-jB transcription factor seem to be FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS 3185 Systematic TF target prediction R Mrowka et al A Reporter gene activity (a) B ChIP analysis (a) Control DNA putative Lucreportergene NFkB site –409 –409 Relative values 10.000 putative Lucreportergene NFkB site deletion 1.000 0.100 n.s 0.010 0.001 Control TNF- Control alpha Input Anti-rabbit-AB TNFalpha Anti-NFkB-AB (b) 120 1000 100 800 80 600 60 400 40 P = 0.94 200 20 0 NFKBIA promoter P < 4.2*10 1.25 ng·mL–1 2.5 ng·mL–1 ng·mL–1 10 ng·mL–1 20 ng·mL–1 Control TNF- Control alpha Input TNFalpha Anti-rabbit-AB Control TNFalpha Anti-NFkB-AB (c) –26 OPTN 10.000 TNF-alpha 1.8 1.6 1.4 1.2 0.8 0.6 0.4 0.2 Control 1.25 ng·mL–1 2.5 ng·mL–1 –1 ng·mL 10 ng·mL–1 20 ng·mL–1 P < 0.03 1.000 0.100 0.010 Control TNFalpha Control TNFalpha Control TNFalpha Anti-rabbit-AB Input OPTN Anti-NFkB-AB OPTN NFkB del (d) P < 4.2*10 (d) –12 SPIB 10.000 45 TNF-alpha 40 1.25 ng·mL–1 30 Relative values Control 35 2.5 ng·mL–1 ng·mL–1 25 10 ng·mL–1 20 20 ng·mL–1 15 10 1.000 P < 0.01 0.100 Control SPI-B (e) 0.35 1.25 ng·mL 0.3 2.5 ng·mL–1 –1 ng·mL–1 10 ng·mL–1 20 ng·mL–1 0.15 Anti-NFkB-AB 10.000 Control 0.2 Anti-rabbit-AB TNFalpha CASP4 TNF-alpha 0.25 TNF- Control alpha (e) 0.45 0.4 TNF- Control alpha Input SPI-B NFkB del P < 3.2*10–5 Luciferase activity (firefly/renilla) P < 0.003 1.000 0.100 Relative values Luciferase activity (firefly/renilla) Control DARS promoter (c) Luciferase activity (firefly/renilla) NFKBIA 10.000 TNF-alpha Relative values P < 10–15 Relative values 1200 Luciferase activity(rel.values) Luciferase activity (firefly/renilla) (b) 1.000 0.100 n.s 0.010 0.001 0.1 Control 0.05 CASP4 3186 TNF- Control alpha CASP4 NFkB del Input TNF- Control alpha TNFalpha Control TNFalpha Anti-rabbit-AB Anti-NFkB-AB FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS R Mrowka et al involved in important physiological processes related to typical known functions of NF-jB It is known that the Spi-B transcription factor is expressed in adult pro-T cells, with Spi-B being maximal in the newly committed cells at the DN3 stage [47] Furthermore, Spi-B can interfere with T-cell development [47] CASP4 can function as an endoplasmic reticulum stress-specific caspase in humans, and may be involved in pathogenesis of Alzheimer’s disease [48] When does the seed-distribution-distance method work? The major assumption of our method is that genes that are regulated by the same factor show at least some coregulation We use a genome-wide based similarity measure L(x) based on the comparisons of the median values of two correlation distributions For each gene (x) in the genome, we calculate L(x), which is the median correlation of gene x with all the genes within the seed set minus the median correlation of gene x with all the rest of the genes in the genome Our approach is able to ‘add up’ contributions form all the genes in the seed set, and by the use of the median and not the mean, it can discard a reasonable amount of outliers Subtracting the median correlation with the rest of the genome corrects for the correlation structure of the expression dataset as a whole We also tried a more sophisticated scoring scheme by ranking the genes on the basis of a Mann–Whitney rank-sum test, which did not improve the performance of the ranking procedure The seed-distribution-distance method is extremely robust and produces high enrichment even if a considerable part of the seed is not present This was shown by the cross-validation procedure and the subsequent recovery test Systematic TF target prediction The seed-distribution-distance method is expected to produce a biologically meaningful rank if the seed group is homogeneous with respect to its expression correlation If, for instance, the seed group contains completely unrelated expression clusters that are located in the cluster space in a linearly independent way, the resulting distance measure might not to be capable of building up a transcription factor-specific rank In this case, one would need to cluster the seed group into subseeds and to build up individual clusterspecific ranks For instance, this might be necessary in the case of transcription factors that target different genes depending on the splice form of the transcription factor Interestingly, however, in our analysis, the performance of the method seems not to depend crucially on the homogeneity of the expression of the seed group, as some seed groups that performed well in the cross-validation test had large intraseed variations (supplementary Fig S4) A second consideration relates to the expression dataset The seed-distribution-distance method relies on the assumption that the transcription factor of interest shows some biological activity in the data If, for example, the transcription factor of interest is completely shut down in all experiments, one would not expect to be able to recover the regulation response of that factor This issue might be of importance for genes that are only active at tight periods during development One solution to this problem would be to generate expression experiments with artificial expression of that transcription factor or to include native material from that developmental period in the microarray analysis The third consideration relates to the size of the seed One would expect that if the seed is too small to define the target response adequately, the rank will be poorly defined However, our bootstrapping test showed that 10 seed genes are capable of enriching Fig Experimental validation of predicted NF-jB targets by functional analyses and physical NF-jB interaction with the predicted NF-jBbinding sites in the nuclear chromatin context (A) RGA HEK293 cells were transfected and treated for 24 h with TNF-a in a dose-dependent manner (n = 4) (a) Schematic illustration of experimental design RGA was measured with unmodified native promoter constructs (left column) and in constructs where the putative NF-jB-binding sites were deleted (right column, NF-jB del) (b) Promoter activity for NFKBIA, which is known to be a target of NF-jB, and a negative control (DARS) Only the NFKBIA promoter responded in a dose-dependent manner under stimulation with TNF-a (c, d, e) RGA for the (c) OPTN, (d) SPI-B and (e) CASP4 promoter: All experiments showed a dose-dependent increase in promoter activity under stimulation with TNF-a Deletion of the putative NF-jB-binding site resulted in significantly attenuated dose-dependent responses (B) ChIP analysis HEK293 cells were cultured with TNF-a (10 ngỈmL)1) or without (control) for 24 h prior to crosslinking and ChIP using anti-rabbit serum (negative control) or an antibody to NF-jB Relative values of immunoprecipitated DNA were assessed by real-time PCR (n = 3) (a) Amplification of a coding region part of the intron-less gene encoding GAPDH, which should show no promoter-like activity and contains no potential NF-jB-binding element, served as control DNA (b–e) Verification of the predicted NF-jB-binding sites was obtained for the (b) positive control NFKBIA as well as (c) OPTN and (d) SPI-B NF-jB-dependent activation of (e) the CASP4 promoter is not indicated by ChIP analysis in HEK293 cells FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS 3187 Systematic TF target prediction R Mrowka et al target genes considerably At the other extreme, if, for example the seed size approximates the ‘total number’ of targets, one cannot expect to recover many new targets We assume that the latter does not apply for most of the transcription factors Our aim was to provide a method to incorporate large-scale microarray data to improve the detection of functional binding sites for a given transcription factor To illustrate this, we decided to use matches of consensus sequences as a simple parameter-free method to detect binding sites in promoter sequences More sophisticated methods to detect transcription factor-binding sites are available, and will very likely enhance the performance, e.g by having a strong statistical or physical model for binding based on positional frequency matrices [49–51], or by using knowledge about cooperation among transcription factors [52] Also, incorporating additional sequencebased information such as conservation of promoters to related species is likely to improve the analysis Moreover, a better set of promoters derived from experimentally determined promoters might further improve the analysis [53] Taken together, our results suggest that the huge body of transcriptome data available in databases can be used to strongly enhance the prediction of transcription factor targets for cases in which targeted microarray experiments are not available or are too cost-intensive The described systematic genome-wide approach for identification of transcription factor targets is robust and efficient, and systematically identifies new target genes for any given transcription factor We predict that the exploitation of the expression data stored in public databases with our or similar seed-based methods will improve the search for new target genes of transcription factors Definition of the rank For all gene pairs in the expression dataset, we calculated the correlation coefficient in their expression: P P P n xi yi À xi yi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ P P P P n xi2 À ð xi Þ2 n yi2 À ð yi Þ2 with xi and yi being their the expression values of gene x and gene y in experiment i We omitted experiments in the calculation of the correlation coefficient where one of the genes had no expression value, and discarded correlation coefficients for further analysis if the number of common experiments that could be used to calculate the correlation coefficient was 10 or fewer Given a seed group, we then 3188 Molecular cloning The OPTN promoter and a part of the 5¢-UTR was cloned into the pGL3-Basic (Cat no E1751; Promega GmbH, Mannheim, Germany) plasmid at the SacI and HindIII sites using the tailed primers opti423F (5¢-ACTGAGCTCGGC ATTCTCCTCTTTCTGTGG-3¢) and opti423R (5¢-ACGT AAGCTTGGTGCCTAGGGCTGATGCGC-3¢) The predicted NF-jB-binding site, corresponding to ccgggaaattcccc, was deleted from the reporter gene construct by means of a PCR strategy The following primer inserts were verified by DNA sequencing The controls DARS and NFKBIA were generated using the MluI ⁄ XhoI sites of pGL-3Basic Inserts were generated by PCR using the MluI ⁄ XhoI tailed primers DARSfw (5¢-ACTACGCGTAGTCCAAGAGAGGAGAAACC-3¢) and DARSrv (5¢-ACTCTCGAGCCCGGAGCGCTGGCG GCCGC-3¢), and NFKBIAfw (5¢-ACTGAGCTCCCGA CGACCCCAATTCAAATCG-3¢) and NFKBIArv (5¢-ACT GAAGCTTTGTGGGCTCTGCAGCGCCGC-3¢) The SPI-B and CASP constructs were generated using the MluI ⁄ XhoI tailed primers SPI-Bfw (5¢-ACTGAGCTC GTGAACCCCAGCCCTTCCTCGAT-3¢) and SPI-Brv (5¢-ACTGAAGCTTGGTGGTGCCGGGCGGGCTGT-3¢), and the SacI ⁄ HindIII tailed primers CASP4fw (5¢-ACT ACGCGTAGCAAAGAGTGCTGCCTCCTCCTTCCT-3¢) and CASP4rv (5¢-ACTCTCGAGTTCCCTGGTACAGAG CACCT-3¢) The predicted NF-jB-binding site gggggaa tcccc in the CASP4 construct and the predicted NF-jBbinding site ggggatcccc of SPI-B were deleted using a PCR strategy Transient cell transfection Experimental procedures rx;y calculate a score L(x) for all genes x outside the seed by taking the median correlation to the seed and subtracting the median correlation to all genes (i.e its random median correlation) HEK293 cells were cultured in 96-well plates (lClear Platte 96K; Greiner BIO-ONE GmbH, Frickenhausen, Germany) in DMEM (high glucose; PAA Laboratories GmbH, Colbe, ă Germany), supplemented with 10% heat-inactivated fetal bovine serum, 50 mL)1 penicillin, 50 lgỈmL)1 streptomycin, 15 mm Hepes and mmolỈL)1 glutamine, at 37 °C in a 5% CO2 atmosphere Cotransfections were performed with the firefly luciferase pGL3-basic vector (Promega), as well as its transformed promoter variants, and the Renilla luciferase phRL-TK vector using the RotiFect Reagent (Carl Roth GmbH, Karlsruhe, Germany), according to the manufacturer’s protocol After h, the transfection medium was removed, and medium supplemented with TNF-a solvent (controls) or medium supplemented with TNF-a (1.25–20 ngỈmL)1, n = each) was added, and cells were incubated for 24 h FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS R Mrowka et al Luciferase assays Cells were lysed after 24 h of treatment using 30 lL of passive lysis buffer (Cat no E1941; Promega) after medium removal and gentle washing with NaCl ⁄ Pi The assays were performed on a Luminoskan RS (Labsystems Luminoscan RS, Helsinki, Finland) plate-luminometer using the injector system The firefly luminescence was measured by injecting 100 lL of buffer (470 lm d-luciferin, 270 lm CoA, 33.3 mm dithiothreitol, 530 lm ATP, 2.67 mm MgSO4, 20 mm Tricine, 0.1 mm EDTA), and the Renilla luminescence was measured after injecting 100 lL of buffer (1.1 m NaCl, 2.2 mm Na2EDTA, 0.22 m KxPO4, pH 5.1, 0.44 mgỈmL)1 BSA, 1.3 mm NaN3, 1.43 lm coelenterazin, adjusted finally to pH 5.0; all compounds were obtained from PJK, Germany) The Luminoskan RS device was automatically controlled by a PC using customized software (in-house development by R Mrowka) The relative light units of firefly luminescence were divided by the relative light units of Renilla luminescence of each well to obtain normalization with respect to cell number and transfection efficacy ChIP HEK293 cells were cultured for 24 h under control conditions or TNF-a (10 ngỈmL)1) treatment as described before The final cell number was · 107 cells per dish The cells were then treated with formaldehyde (1% final concentration) The crosslinking procedure was stopped after 15 by adding glycine (125 mm final concentration) For ChIP, the ChIP assay kit [Cat no 17-295; Millipore GmbH (Upstate), Schwalbach ⁄ Ts, Germany] was applied according to the manufacturer’s protocol, and an anti-(rabbit serum) (Cat no sc-2317, negative antibody control Santa Cruz Biotechnology, Inc., Heidelberg, Germany) and an antibody to NF-jB p65 (A) (Cat no sc-109; Santa Cruz Biotechnology, Inc.) were used The immunoprecipitated DNA was purified and then quantified by real-time PCR (GeneAmp 5700; Applied Biosystems, Darmstadt, Germany) using SYBR green and the ready-to-use heat-activated ImmoMix (Cat no 25020; Bioline, Luckenwalde, Germany) The following primers, bridging the predicted NF-jB-binding sites, were used for ChIP analysis: NFKBIA forward, 5¢-ACCCCAGCTCAGGGTTTAGGCTTCT-3¢; NFKBIA reverse, 5¢-TGGCTGGGGATTTCTCTGGG-3¢; OPTN forward, 5¢-ACCCGGGTCCCAGCCTCGAC-3¢; OPTN reverse, 5¢-GACAGCCAGCCGCTCCCTGC-3¢; SPI-B forward, 5¢-TCCAGCTCCTGTCCCATCTC-3¢; SPI-B reverse, 5¢-TGTCACATGGCAGGGATGGC-3¢; and CASP4 forward, 5¢-GTCTGGCAACCCCTGTTGAAT-3¢; CASP4 reverse, 5¢-GCCTGCTGGCTCTGAAGAGTATC-3¢ Amplification of a coding region part of the intron-less gene encoding glyceraldehyde-3-phosphate dehydrogenase Systematic TF target prediction (GAPDH: forward, 5¢-CACCATCTTCCAGGAGCGAG-3¢; and reverse, 5¢-GCAGGAGGCATTGCTGAT-3¢) served as control DNA Databases The sequences of the 500 bp upstream regions of all human genes, as well as the annotation with terms from the gene ontology, were obtained from emsembl [54], using the tool ensmart [55] In this analysis, we neglect the problem of multiple promoters, and assume that the beginning of the longest transcript in ensembl for a gene is the transcription start site Binding matrix data come from TRANSCFAC version 6.1 [56] Human microarray data were obtained from the Stanford Microarray database [27], as used in Stuart et al [26], which contains a collection of 1202 experiments form different independent investigations, including expression profiles from cancer samples, cell lines and different tissue samples, and expression data from studies of diverse biological processes, including cell cycle, stress, signalling, and apoptosis All preprocessing, such as normalization of the microarray data, is described in Stuart et al [26] This set contains 13 555 gene entries, and 12 435 genes were matched to ensembl genes Those matched genes were used for further analysis and the seed-distribution-distance method For the gene ontology analysis, we used all 8915 genes of the dataset that had a gene ontology annotation in the ensembl database The seed group was defined by collecting all NF-jB target genes mentioned in an NF-jB review paper [25] These genes were matched to human ensembl gene entries, resulting in 81 NF-jB seed genes Joining those genes with available genes in the expression set resulted in 60 NF-jB seed genes These 60 resulting genes for NF-jB and the other transcription factors analysed in this study are given in supplementary Table S2 Statistics Enrichment of putative transcription factor binding sites in the top group The binomial test was used to test for binding site enrichment in the top group The two categories for the binomial were: gene having a specific binding site is in the top group, gene is not in the top group The null hypothesis was that there is no deviation of the observed distribution from the theoretical distribution that would be present if there was no preference The alternative hypothesis was that there is a deviation in a one-tailed manner (enrichment, depletion) The consensus sequences for vertebrate transcription factors from transfac version 6.1 [56] were used for prediction, and transcription factors with a minimum genomewide promoter hit count of 30 were included in the analysis FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS 3189 Systematic TF target prediction R Mrowka et al Reporter gene activity For the concentration- and group-dependent analysis, we applied a two-way anova with repeated measurement statistics, and the null hypothesis was rejected at the 0.05 level Reporter gene activity is presented as mean and standard deviation Gene ontology overrepresentation Genes were annotated with gene ontology annotation using ENSMART For each gene ontology term, we then tested whether it is overrepresented in the annotation of the upper 600 genes using a multiple-testing corrected Fisher’s exact test This test is based on the hypergeometric distribution and calculates a false-discovery rate for each P-value threshold We selected a maximum expectable falsediscovery rate of 0.05 to determine significantly overrepresented terms Details of the test are described in [57] Acknowledgements 11 12 13 14 We would like to thank H Herzel, H.-G Holzutter, ă C Gille, S Kielbasa, H Scholz, A Patzak and J Siemens for helpful discussions N Bluthgen acknowledges ă support from DFG, SFB 618 Theoretical Biology’ MF acknowledges support from DFG FA 84512-1 15 16 References Banerjee N & Zhang MQ (2002) Functional genomics as applied to mapping transcription regulatory networks Curr Opin Microbiol 5, 313–317 Cohen CD, Klingenhoff A, Boucherot A, Nitsche A, Henger A, Brunner B, Schmid H, Merkle M, Saleem MA, Koller KP et al (2006) Comparative promoter analysis allows de novo identification of specialized cell junction-associated proteins Proc Natl Acad Sci USA 103, 5682–5687 Wingender E, Chen X, Fricke E, Geffers R, Hehl R, Liebich I, Krull M, Matys V, Michael H, Ohnhauser R et al (2001) The TRANSFAC system on gene expression regulation Nucleic Acids Res 29, 281–283 Sandelin A, Alkema W, Engstrom P, Wasserman WW & Lenhard B (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles Nucleic Acids Res 32, D91–D94 Wasserman WW & Sandelin A (2004) Applied bioinformatics for the identification of regulatory elements Nat Rev Genet 5, 276–287 Bulyk ML (2003) Computational prediction of transcription-factor binding site locations Genome Biol 5, 201, doi: 10.1186/gb-2003-5-1-201 Wasserman WW, Palumbo M, Thompson W, Fickett JW & Lawrence CE (2000) Human–mouse genome 3190 10 17 18 19 20 21 22 comparisons to locate regulatory sites Nat Genet 26, 225–228 Dieterich C, Grossmann S, Tanzer A, Ropcke S, Arndt PF, Stadler PF & Vingron M (2005) Comparative promoter region analysis powered by CORG BMC Genomics 6, 24, doi: 10.1186/1471-2164-6-24 Eisen MB, Spellman PT, Brown PO & Botstein D (1998) Cluster analysis and display of genome-wide expression patterns Proc Natl Acad Sci USA 95, 14863–14868 Quackenbush J (2001) Computational analysis of microarray data Nat Rev Genet 2, 418–427 Beissbarth T (2006) Interpreting experimental results using gene ontologies Methods Enzymol 411, 340–352 Bluthgen N, Kielbasa SM & Herzel H (2005) Inferring combinatorial regulation of transcription in silico Nucleic Acids Res 33, 272–279 Qian Z, Lu L, Liu X, Cai YD & Li Y (2007) An approach to predict transcription factor DNA binding site specificity based upon gene and transcription factor functional categorization Bioinformatics 23, 2449–2454 Walker MG, Volkmuth W & Klingler TM (1999) Pharmaceutical target discovery using Guilt-by-Association: schizophrenia and Parkinson’s disease genes Proc Int Conf Intell Syst Mol Biol, 282–286 Li A & Horvath S (2007) Network neighborhood analysis with the multi-node topological overlap measure Bioinformatics 23, 222–231 Tullai JW, Schaffer ME, Mullenbrock S, Kasif S & Cooper GM (2004) Identification of transcription factor binding sites upstream of human genes regulated by the phosphatidylinositol 3-kinase and MEK ⁄ ERK signaling pathways J Biol Chem 279, 20167–20177 Levenson JM, Choi S, Lee SY, Cao YA, Ahn HJ, Worley KC, Pizzi M, Liou HC & Sweatt JD (2004) A bioinformatics analysis of memory consolidation reveals involvement of the transcription factor c-rel J Neurosci 24, 3933–3943 Barenco M, Tomescu D, Brewer D, Callard R, Stark J & Hubank M (2006) Ranked prediction of p53 targets using hidden variable dynamic modeling Genome Biol 7, R25 Liu D, Umbach DM, Peddada SD, Li L, Crockett PW & Weinberg CR (2004) A random-periods model for expression of cell-cycle genes Proc Natl Acad Sci USA 101, 7240–7245 Brown PO & Botstein D (1999) Exploring the new world of the genome with DNA microarrays Nat Genet 21, 33–37 Dai X, He J & Zhao X (2007) A new systematic computational approach to predicting target genes of transcription factors Nucleic Acids Res 35, 4433–4440 Sarkar FH & Li Y (2008) NF-kappaB: a potential target for cancer chemoprevention and therapy Front Biosci 13, 2950–2959 FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS R Mrowka et al 23 Carmody RJ & Chen YH (2007) Nuclear factor-kappaB: activation and regulation during toll-like receptor signaling Cell Mol Immunol 4, 31–41 24 Hayden MS & Ghosh S (2004) Signaling to NF-kappaB Genes Dev 18, 2195–2224 25 Wu JT & Kral JG (2005) The NF-kappaB ⁄ IkappaB signaling system: a molecular target in breast cancer therapy J Surg Res 123, 158–169 26 Stuart JM, Segal E, Koller D & Kim SK (2003) A gene-coexpression network for global discovery of conserved genetic modules Science 302, 249–255 27 Demeter J, Beauheim C, Gollub J, Hernandez-Boussard T, Jin H, Maier D, Matese JC, Nitzberg M, Wymore F, Zachariah ZK et al (2007) The Stanford Microarray Database: implementation of new analysis tools and open source release of software Nucleic Acids Res 35, D766–D770 28 Dieterich C, Cusack B, Wang H, Rateitschak K, Krause A & Vingron M (2002) Annotating regulatory DNA based on man–mouse genomic comparison Bioinformatics 18(Suppl 2), S84–S90 29 Bracken AP, Ciro M, Cocito A & Helin K (2004) E2F target genes: unraveling the biology Trends Biochem Sci 29, 409–417 30 Xu X, Bieda M, Jin VX, Rabinovich A, Oberley MJ, Green R & Farnham PJ (2007) A comprehensive ChIPchip analysis of E2F1, E2F4, and E2F6 in normal and tumor cells reveals interchangeable roles of E2F family members Genome Res 17, 1550–1561 31 Sementchenko VI & Watson DK (2000) Ets target genes: past, present and future Oncogene 19, 6533–6548 32 Hollenhorst PC, Shah AA, Hopkins C & Graves BJ (2007) Genome-wide analyses reveal properties of redundant and specific promoter occupancy within the ETS gene family Genes Dev 21, 1882–1894 33 Semenza GL (2003) Targeting HIF-1 for cancer therapy Nat Rev Cancer 3, 721–732 34 Fernandez PC, Frank SR, Wang L, Schroeder M, Liu S, Greene J, Cocito A & Amati B (2003) Genomic targets of the human c-Myc protein Genes Dev 17, 1115–1129 35 Wasserman WW & Fickett JW (1998) Identification of regulatory regions which confer muscle-specific gene expression J Mol Biol 278, 167–181 36 Yao Z, Cui Y, Watford WT, Bream JH, Yamaoka K, Hissong BD, Li D, Durum SK, Jiang Q, Bhandoola A et al (2006) Stat5a ⁄ b are essential for normal lymphoid development and differentiation Proc Natl Acad Sci USA 103, 1000–1005 37 Moriggl R, Sexl V, Piekorz R, Topham D & Ihle JN (1999) Stat5 activation is uniquely associated with cytokine signaling in peripheral T cells Immunity 11, 225– 230 38 Rezaie T, Child A, Hitchings R, Brice G, Miller L, Coca-Prados M, Heon E, Krupin T, Ritch R, Kreutzer Systematic TF target prediction 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 D et al (2002) Adult-onset primary open-angle glaucoma caused by mutations in optineurin Science 295, 1077–1079 Quigley HA (1996) Number of people with glaucoma worldwide Br J Ophthalmol 80, 389–393 Quigley HA & Vitale S (1997) Models of open-angle glaucoma prevalence and incidence in the United States Invest Ophthalmol Vis Sci 38, 83–91 Li Y, Kang J & Horwitz MS (1998) Interaction of an adenovirus E3 14.7-kilodalton protein with a novel tumor necrosis factor alpha-inducible cellular protein containing leucine zipper domains Mol Cell Biol 18, 1601–1610 De Marco N, Buono M, Troise F & Diez-Roux G (2006) Optineurin increases cell survival and translocates to the nucleus in a Rab8-dependent manner upon an apoptotic stimulus J Biol Chem 281, 16147–16156 Beg AA & Baltimore D (1996) An essential role for NF-kappaB in preventing TNF-alpha-induced cell death Science 274, 782–784 Zou JY & Crews FT (2005) TNF alpha potentiates glutamate neurotoxicity by inhibiting glutamate uptake in organotypic brain slice cultures: neuroprotection by NF kappa B inhibition Brain Res 1034, 11–24 Zhu G, Wu CJ, Zhao Y & Ashwell JD (2007) Optineurin negatively regulates TNFalpha-induced NF-kappaB activation by competing with NEMO for ubiquitinated RIP Curr Biol 17, 1438–1443 Rudolph D, Yeh WC, Wakeham A, Rudolph B, Nallainathan D, Potter J, Elia AJ & Mak TW (2000) Severe liver degeneration and lack of NF-kappaB activation in NEMO ⁄ IKKgamma-deficient mice Genes Dev 14, 854– 862 David-Fung ES, Yui MA, Morales M, Wang H, Taghon T, Diamond RA & Rothenberg EV (2006) Progression of regulatory gene expression states in fetal and adult proT-cell development Immunol Rev 209, 212–236 Hitomi J, Katayama T, Eguchi Y, Kudo T, Taniguchi M, Koyama Y, Manabe T, Yamagishi S, Bando Y, Imaizumi K et al (2004) Involvement of caspase-4 in endoplasmic reticulum stress-induced apoptosis and Abeta-induced cell death J Cell Biol 165, 347–356 Benos PV, Lapedes AS & Stormo GD (2002) Is there a code for protein–DNA recognition? Probab(ilistical)ly Bioessays 24, 466–475 Stormo GD (2000) DNA binding sites: representation and discovery Bioinformatics 16, 16–23 Roider HG, Kanhere A, Manke T & Vingron M (2007) Predicting transcription factor affinities to DNA from a biophysical model Bioinformatics 23, 134–141 Frith MC, Li MC & Weng Z (2003) Cluster-buster: finding dense clusters of motifs in DNA sequences Nucleic Acids Res 31, 3666–3668 Sandelin A, Carninci P, Lenhard B, Ponjavic J, Hayashizaki Y & Hume DA (2007) Mammalian RNA FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS 3191 Systematic TF target prediction 54 55 56 57 58 59 60 61 62 63 64 R Mrowka et al polymerase II core promoters: insights from genomewide studies Nat Rev Genet 8, 424–436 Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T et al (2002) The Ensembl genome database project Nucleic Acids Res 30, 38–41 Hammond MP & Birney E (2004) Genome information resources – developments at Ensembl Trends Genet 20, 268–272 Wingender E, Dietze P, Karas H & Knuppel R (1996) TRANSFAC: a database on transcription factors and their DNA binding sites Nucleic Acids Res 24, 238–241 Bluthgen N, Brand K, Cajavec B, Swat M, Herzel H & Beule D (2005) Biological profiling of gene groups utilizing gene ontology Genome Inform Ser Workshop Genome Inform 16, 106–115 Sun SC, Ganchi PA, Ballard DW & Greene WC (1993) NF-kappa B controls expression of inhibitor I kappaB alpha: evidence for an inducible autoregulatory pathway Science 259, 1912–1915 Edbrooke MR, Burt DW, Cheshire JK & Woo P (1989) Identification of cis-acting sequences responsible for phorbol ester induction of human serum amyloid A gene expression via a nuclear factor kappaB-like transcription factor Mol Cell Biol 9, 1908–1916 O’Donnell SM, Holm GH, Pierce JM, Tian B, Watson MJ, Chari RS, Ballard DW, Brasier AR & Dermody TS (2006) Identification of an NF-kappaB-dependent gene network in cells infected by mammalian reovirus J Virol 80, 1077–1086 Guitart A, Riezu-Boj JI, Elizalde E, Larrea E, Berasain C, Aldabe R, Civeira MP & Prieto J (2005) Hepatitis C virus infection of primary tupaia hepatocytes leads to selection of quasispecies variants, induction of interferon-stimulated genes and NF-kappaB nuclear translocation J Gen Virol 86, 3065–3074 Hosokawa Y, Suzuki H, Nakagawa M, Lee TH & Seto M (2005) API2–MALT1 fusion protein induces transcriptional activation of the API2 gene through NF-kappaB binding elements: evidence for a positive feed-back loop pathway resulting in unremitting NF-kappaB activation Biochem Biophys Res Commun 334, 51–60 Lombardi L, Ciana P, Cappellini C, Trecca D, Guerrini L, Migliazza A, Maiolo AT & Neri A (1995) Structural and functional characterization of the promoter regions of the NFKB2 gene Nucleic Acids Res 23, 2328–2336 Grall F, Gu X, Tan L, Cho JY, Inan MS, Pettit AR, Thamrongsak U, Choy BK, Manning C, Akbarali Y et al (2003) Responses to the proinflammatory 3192 cytokines interleukin-1 and tumor necrosis factor alpha in cells derived from rheumatoid synovium and other joint tissues involve nuclear factor kappaB-mediated induction of the Ets transcription factor ESE-1 Arthritis Rheum 48, 1249–1260 Supplementary material The following supplementary material is available online: Doc S1 Information about the microarray experiments Doc S2 Seed_ranks.zip: the zip archive contains files of the results of the seed distribution method for the different transcription factors described in this article Fig S1 Histograms of correlation coefficients from expression data for three individual genes Fig S2 Cumulative histograms of the cross-validation analysis with different seed sizes Fig S3 Cross-validation of the seed distribution method in for six different transcription factors by means of the median-based ranking procedure as used in the article and a ranking procedure based on P-values of Mann–Whitney statistics Fig S4 Histograms of correlation coefficients of expression data for individual seed groups and all possible pairs Table S1 Analysis of the overrepresented gene ontology classifications of the top 600 genes in the rank with a false discovery rate of less than 0.001 Table S2 List of ensembl gene IDs used as seeds for the seed distribution method of this article Table S3 Literature sources of the seed lists Table S4 Distribution of enrichment of putative transcription factor-binding motifs (transfac) in the ranking after applying the seed-distribution-distance method Table S5 Sequences of the NF-jB consensi that have been used in the analysis This material is available as part of the online article from http://www.blackwell-synergy.com Please note: Blackwell Publishing are not responsible for the content or functionality of any supplementary materials supplied by the authors Any queries (other than missing material) should be directed to the corresponding author for the article FEBS Journal 275 (2008) 3178–3192 ª 2008 The Authors Journal compilation ª 2008 FEBS ... distribution-distance-derived target prediction based on a seed set of known target genes of a specific transcription factor The target prediction is based on a combination of 3184 5000 10 000 Position... identification of transcription factor targets is robust and efficient, and systematically identifies new target genes for any given transcription factor We predict that the exploitation of the expression... in the case of transcription factors that target different genes depending on the splice form of the transcription factor Interestingly, however, in our analysis, the performance of the method