Talukder et al BMC Genomics (2021) 22 163 https //doi org/10 1186/s12864 021 07440 5 RESEARCH ARTICLE Open Access An intriguing characteristic of enhancer promoter interactions Amlan Talukder1, Haiyan[.]
(2021) 22:163 Talukder et al BMC Genomics https://doi.org/10.1186/s12864-021-07440-5 RESEARCH ARTICLE Open Access An intriguing characteristic of enhancer-promoter interactions Amlan Talukder1 , Haiyan Hu1* and Xiaoman Li2* Abstract Background: It is still challenging to predict interacting enhancer-promoter pairs (IEPs), partially because of our limited understanding of their characteristics To understand IEPs better, here we studied the IEPs in nine cell lines and nine primary cell types Results: By measuring the bipartite clustering coefficient of the graphs constructed from these experimentally supported IEPs, we observed that one enhancer is likely to interact with either none or all of the target genes of another enhancer This observation implies that enhancers form clusters, and every enhancer in the same cluster synchronously interact with almost every member of a set of genes and only this set of genes We perceived that an enhancer can be up to two megabase pairs away from other enhancers in the same cluster We also noticed that although a fraction of these clusters of enhancers overlap with super-enhancers, the majority of the enhancer clusters are different from the known super-enhancers Conclusions: Our study showed a new characteristic of IEPs, which may shed new light on distal gene regulation and the identification of IEPs Keywords: Enhancers, Promoters, Enhancer clusters, Super-enhancers Background Enhancers are short genomic regions that can boost the condition-specific transcription of their target genes [1, 2] They directly interact with the promoters of their target genes via chromatin looping to control the temporal and spatial expression of target genes [3–7] Enhancers can be several dozens to a couple of thousand base pairs (bps) long and can be located in the distal upstream or downstream of their target genes [1] Although the longest distance between enhancers and their targets validated by low-throughput experiments is about one mega bps (Mbps) [3, 4], recent high-throughput experiments showed that the distance can be larger than two Mbps in many cases [8, 9] Because of such a long distance, it is still challenging to identify interacting enhancer-promoter *Correspondence: haihu@cs.ucf.edu; xiaoman@mail.ucf.edu Department of Computer Science, University of Central Florida, FL 32816 Orlando, USA Burnett School of Biomedical Science, College of Medicine, University of Central Florida, FL 32816 Orlando, USA pairs (IEPs) In this study, an IEP refers to an enhancerpromoter pair that physically interacts, although such an interaction may or may not have any functional effect observed yet Many methods are available to identify enhancers Early experimental studies identify enhancers by“enhancer trap”, which has established our rudimentary understanding of enhancers in spite of its low-throughput and timeconsuming nature [10, 11] Early computational methods predict enhancers through comparative genomics, which are cost-effective but may produce many false positives With the next-generation sequencing (NGS) technologies, enhancers are identified through a variety of experimental methods such as chromatin immunoprecipitation followed by massive parallel sequencing (ChIP-seq), DNase I hypersensitive sites sequencing (DNase-seq), global runon sequencing (GRO-seq), cap analysis gene expression (CAGE), etc [12–17] In ChIP-seq experiments, genomic regions enriched with H3K4me1 and H3K27ac modifica- © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data Talukder et al BMC Genomics (2021) 22:163 tions are widely considered as active enhancers, and those with H3K4me1 and H3K27me3 modifications are taken as repressed enhancers [14] In DNase-seq, distal open chromatin regions are considered as potential enhancers for gene regulation studies [18–21] In GRO-seq and CAGE experiments, bidirectional transcripts are employed to identify active enhancers [15, 22, 23] Correspondingly, computational methods based on NGS data are developed to predict enhancers on the genome-wide scale [14, 24–26] These methods range from the early ones that are based solely on H3K4me3 and H3K4me1 ChIP-seq experiments to the later ones that are based on various types of epigenomic and genomic signals and have been applied to predict enhancers in different cell lines A large number of enhancers have been discovered so far For instance, about 2900 enhancers from comparative genomics were tested with mouse transgenic reporter assay and stored in the VISTA database [27] The Functional Annotation of the Mouse/Mammalian Genome (FANTOM) project identified 32,693 enhancers from balanced bidirectional capped transcripts [15] This set of enhancers is arguably the largest set of mammalian enhancers with supporting experimental evidence [28] There are also hundreds of thousand computationally predicted human enhancers, such as those predicted by ChromHMM and Segway [24, 25] This set of enhancers represents the most comprehensive set of computationally predicted human enhancers currently available although they are much less reliable In addition to individual enhancers, super-enhancers are identified, each of which is a group of enhancers in a genomic region that collectively control the expression of genes involved in cellidentities [29, 30] Despite this relatively effortless discovery of enhancers, the identification of IEPs is still nontrivial Early experimental procedures to identify IEPs are expensive and time-consuming [31, 32] Recent Hi-C experiments hold a great promise to identify IEPs on the genome-scale, while are still are not cost effective in order to generate high-resolution Hi-C interactions [8, 9, 33] To date, these experiments have only been carried out on a few cell lines or cell types Although computational methods, from the early ones defining the closest genes as target genes, to the later ones considering the correlation of epigenomic signals in enhancers and those in promoters, to the current ones based on more sophisticated approaches [15, 19, 34–40], have shown some success in predicting enhancer target genes, they either not consider or have low– performance on cell-specific IEP prediction [36] Through these experimental and computational studies, megabase size self-interacting genomic regions called topologically associated domains (TADs) are also discovered in mammalian genomes, where IEPs usually fall within the TADs instead of crossing different TADs [41] Page of 13 All existing computational methods almost always consider one enhancer-promoter pair at a time to determine whether they interact We hypothesized that when two enhancers interact with a common target gene, these two enhancers may be spatially close to each other and may thus interact with all target genes of both enhancers In other words, if two enhancers share a target gene, they may share all of their target genes as well If this hypothesis is true, we should consider the interactions of multiple enhancers and multiple target genes simultaneously to predict IEPs, which may improve the accuracy of the computational prediction of the IEPs, especially that of cell-specific IEPs To test this hypothesis, we collected experimentally supported IEPs determined in five previous studies [6, 8, 9, 33, 42] and investigated how different enhancers may share their target genes in different cell lines and cell types (Methods) We considered both experimentally annotated enhancers from FANTOM and computationally predicted enhancers by ChromHMM in different samples [15, 24] We observed that two enhancers are likely to either share almost all of their target genes or interact with two completely disjoint sets of target genes, in a cell line or a cell type This observation implies an interesting characteristic of IEPs, which has not been considered by existing studies to predict IEPs Our study may also shed new light on the underlying principles of chromatin interactions and facilitate the more accurate identification of IEPs Results Two enhancers are likely to interact with either exactly the same set or two completely different sets of genes In order to study IEPs, we calculated the bipartite clustering coefficient (BCC) of enhancers in each cell line or cell type, with two sets of enhancers and five sets of experimentally supported IEPs (Methods, Fig 1a) BCC is commonly used to measure how nodes share their neighboring nodes in a bipartite graph Note that every set of IEPs corresponds to a bipartite graph, where the enhancer set and the gene set correspond to the two disjoint sets of nodes, and their interactions correspond to the edges (Fig 1b) The neighboring nodes of an enhancer are the target genes of this enhancer With the goal to investigate how different enhancers share their target genes, the BCC is a perfect measurement, which can measure the percentage of target genes pairs of enhancers may share in a given set of IEPs (Fig 1b) We observed that the BCC of enhancers was usually larger than 0.90 This indicates that when any pair of enhancers interact with one common target gene, both enhancers are likely to interact with all target genes of these two enhancers First, we studied the IEPs based on the looplists from Rao et al [9], with the annotated FANTOM enhancers [15] and the GENCODE promoters [43] (Fig 1a) We Talukder et al BMC Genomics (2021) 22:163 Page of 13 Fig a The process of generating IEPs using the chromatin interaction data from five studies, enhancer regions from FANTOM and ChromHMM and promoters defined around the GENCODE annotated gene TSSs b An toy interaction network between three enhancers (e1 , e2 and e3 ) and three promoters (p1 , p2 and p3 ) The average BCC of the enhancers in this example is noticed that the BCC of enhancers was no smaller than 0.97 in all cell lines with enough IEPs (Table and Supplementary Table S1) We further calculated the average BCC of the enhancers interacting with more than one gene We found that their average BCC was no smaller than 0.96 in all cell lines, suggesting that two enhancers are likely to interact with either the same set or two disjoint sets of target genes In other words, the target genes of any pair of enhancers usually are either the same or completely different To assess the statistical significance of the above observation, we studied the BCC of enhancers in randomly generated IEPs (Supplementary Table S1) These random lEPs were constructed using the same set of enhancers and promoters but randomized interactions, where we randomly chose promoters to interact with an enhancer so that the same enhancer had the same number of interactions as it had originally We generated five different sets of random IEPs in this way with five different random seeds With these random IEPs in the eight cell lines, we barely had a handful of enhancers sharing promoters with other enhancers in any cell line, suggesting that it is not by chance that multiple enhancers interact with a common set of target genes in the Rao et al.’s looplists (Supplementary Tables S1) For all four cell lines we could calculate the BCC, the BCC of enhancers was 0.51, 0.37, 0.33 and 0, respectively, which was much smaller than the BCC of enhancers in the above sets of real IEPs (p-value = 0, Supplementary Table S1) When we considered the BCC of enhancers interacting with multiple genes, the BCC values were no larger than 0.34 for random IEPs, while it was no smaller than 0.96 for the real IEPs, also suggesting that the observation that the + 12 + 12 = 0.5 BCC of enhancers being close to was not by chance (Supplementary Table S1) Second, we studied IEPs defined by different cutoffs in seven cell lines (Methods) Compared with the IEPs from the above Rao et al.’s looplists, these IEPs defined by cutoffs were likely to include many more bona fide IEPs and more false positives as well Under the cutoffs 30, 50 and 100, the BCC of enhancers in all seven cell lines except GM12878 was no smaller than 0.85, 0.89 and 0.92, respectively (Supplementary Table S1) Since GM12878 had a much higher sequencing depth than other cell lines, it was understandable that a stringent cutoff for other cell lines was still loose for GM12878 We thus tried the cutoffs 150, 200, 300, and 400 for GM12878 We noticed that the BCC of enhancers was 0.97 in GM12878 with the cutoff 400 Coincidently, the number of IEPs in GM12878 defined at this cutoff was similar to that in other cell lines defined at the cutoff 100 (Supplementary Table S1) We thus considered the cutoff 400 in GM12878 and the cutoff 100 for other cell lines Note that in HMEC, HUVEC, KBM7 and NHEK, the BCC of enhancers was no smaller than 0.92 even under the cutoff 100 Moreover, the BCC of enhancers was increasing with more stringently defined IEPs, suggesting that the BCC of enhancers is close to if it is not (Supplementary Table S1) In order to assess the statistical significance of the observed BCC of enhancers in IEPs from different cutoffs, similarly, we compared the above BCC of enhancers with that from randomly generated IEPs (Supplementary Table S1) Again, for every cutoff in every cell line, the BCC of enhancers for random IEPs was much smaller than the BCC of enhancers for real IEPs (p-value = 0) For instance, under the cutoff 50, the BCC of enhancers was no larger MCF7 K562 IMR90 NHEK KBM7 K562 IMR90 HUVEC HMEC HELA GM12878 (1) 0.97 (0.99) (1) (1) (1) (37) 260 (2558) (95) 144 (554) 47 (638) (0.93) 0.8 1167 (5303) 2916 2190 0.89 (0.86) 0.9 (0) (33449) NA (NA) NA 11 0.97 (0.99) (2384) all 294 IEPs 0.83 (0.78) 0.75 (0.87) 0.84 (NA) NA NA (1) (0.99) (1) NA (0.98) 0.96 (1) NA (0.96) 0.96 multiple 25.15 (41.9) 30.98 (34.97) 34.86 (NA) NA NA (17.35) 10.81 (8.98) 4.8 (10.47) NA (26.9) 15.42 (12.5) NA 18.44) 17.47 % of total enhancers with multiple promoters and BCC > 0(E1 ) 87.5 66.76 (53.62) 50.92 (70.75) 62.16 (NA NA NA (100) 100 (100) 100 (100) NA (95.91) 90.32 (100) NA 88.95) 0.86 (0.67) 0.86 (0.68) 0.77 (NA NA (1) (0.99) (1) NA (0.97) 0.96 (1) (0.95) 0.97 all % of E1 with BCC ≥ 0.9 0.75 (0.65) 0.75 (0.66) 0.73 (NA NA NA (1) (0.98) (1) NA (0.96) 0.91 (1) NA (0.93) 0.95 multiple BCC of promoters 22.59 (57.26) 26.43 (49.11) 37.66 (NA NA NA (27.92) 12.82 (9.35) 6.25 (19.35) NA (37.41) 13.97 11.76) NA (27.15) 19.35 % of total promoters with multiple enhancers and BCC > 0(P1 ) 57.41 (38.73) 44.13 (40.92) 52.98 (NA NA NA (100) 100 (93.1) 100 (100) NA (93.82) 88 (100) NA (87.98) 91.67 % of P1 with BCC ≥ 0.9 (2021) 22:163 In the head row, “multiple” means the enhancers (or promoters) with multiple interacting promoters (enhancers) “All” means all enhancers (or promoters) When two numbers are in an entry, the number in the parenthesis is from the ChromHMM enhancers BCC value is labeled “NA”, when the number of corresponding enhancers (or promoters) is zero Li Jin Rao Cell line BCC of enhancers Table The BCC of enhancers and that of promoters are likely to be in a cell line Talukder et al BMC Genomics Page of 13 Talukder et al BMC Genomics (2021) 22:163 than 0.78 for random IEPs, while the corresponding number was no smaller than 0.89 for real IEPs If we considered only enhancers interacting with multiple target genes, the BCC of enhancers for random IEPs was about two times smaller than that for real IEPs For instance, under the cutoff 50, the largest BCC value was 0.40 for random IEPs, while the smallest BCC value for real IEPs was 0.69 Third, to see how this observation might change if we used the data from other labs or other experimental protocols, we studied the IEPs from four additional studies (Fig 1a, Methods) [6, 8, 33, 42] When we calculated the BCC of enhancers using the IEPs defined by Jin et al themselves [33], it was 0.94 When considering the IEPs defined by Jin et al based on the FANTOM enhancers and the annotated promoters by GENCODE, it was 0.90 In terms of the ChIA-PET datasets [6], it was 0.80 in K562 and 0.89 in MCF7 (Table 1) For the nine cell types from Javierre et al [8], it was no smaller than 0.96 in all cell types For the SPRITE data from Quinodoz et al [42], it was 0.92, 0.92 and for the cutoffs 30, 50 and 100, respectively (Supplementary Table S1) Although the IEPs were from different labs and from different experimental procedures, in all cases, the BCC of enhancers was larger than 0.80 and the majority of enhancers interacting with multiple promoters had their individual BCCs larger than 0.90, suggesting that the BCC of enhancers is likely to be in these samples Again, for the corresponding randomly generated IEPs for these datasets, on average, the BCC value was 0.48, much smaller than the corresponding ones from original IEPs, which was 0.96 (p-value = 0, Supplementary Table S1) Finally, we repeated the above analyses with the ChromHMM enhancers instead of the FANTOM enhancers, because the number of the FANTOM enhancers was relatively small compared with the estimated number of enhancers and there were much more ChromHMM enhancers than FANTOM enhancers [24] We had similar observations in all cases (Table 1, Supplementary Table S1) That is, the BCC of enhancers for IEPs in a cell line was close to For instance, for IEPs based on the looplists, it was almost a perfect in all cell lines For the Hi-C data from Rao et al under the cutoff 400 for GM12878 and 100 for other cell lines, it was no smaller than 0.93 For the Hi-C data from Jin et al [33], it was 0.93 For the ChIA-PET data from Li et al [6], it was 0.86 For the nine cell types from Javierre et al [8], it was no smaller than 0.97 For the SPRITE data on GM12878 cell line [42], the BCC values were 0.9, 0.95 and 0.99 for the cutoffs 30, 50 and 100, respectively In almost all cases, the majority of enhancers with multiple promoters had their individual BCCs larger than 0.90 In summary, the BCC of enhancers was likely to be close to for different sets of IEPs, data from different labs, different experimental protocols, different cell lines Page of 13 and cell types, and different enhancer sets The analyses based on IEPs from different cutoffs suggest that the BCC of enhancers is quite robust, although it is smaller when more loosely defined IEPs are used It is close to or becomes when the IEPs are defined more and more stringently (with fewer false positive IEPs) These analyses suggest that what we observed may be an intrinsic property of enhancers That is, if two enhancers interact with one common gene, they are likely to interact with each of their individual target genes Two target genes tend to interact with exactly the same set or two completely different sets of enhancers We studied the BCC of promoters in each set of the aforementioned IEPs to see whether the similar hypothesis was true for the BCC of promoters Our data showed that the BCC of promoters was likely to be as well, although this was not so evident as the BCC of enhancers in certain cases First, we studied the BCC of promoters with IEPs based on the looplists [9] It was close to no matter whether we used the FANTOM enhancers or the ChromHMM enhancers (Table 1) We also calculated the BCC of promoters in randomly simulated IEP datasets, where we kept the same sets of enhancers and promoters but randomly selected enhancers to interact with promoters so that every promoter had the same number of interacting enhancers as it had in the original set of IEPs The BCC of promoters was 0.52 at best in any cell line in these random datasets, suggesting that it was not by chance that the BCC of promoters was close to in all cell lines (Supplementary Table S2) Second, we studied the BCC of promoters based on lEPs defined with different cutoffs [9] (Supplementary Table S2) When we used the FANTOM enhancers, the BCC of promoters was often close to For instance, with the cutoff 400 for GM12878 and the cutoff 100 for other cell lines, the BCC of promoters was no smaller than 0.91 in all the cell lines For different cutoffs, it was usually no smaller than the BCC of enhancers, which was close to in most cases When we used the ChromHMM enhancers, however, it was not as large as those from the FANTOM enhancers For instance, with the cutoff 400 for GM12878 and the cutoff 100 for other cell lines, the BCC of promoters varied from 0.64 to 0.91 in different cell lines The BCC values got smaller with smaller cutoffs, which might be due to the much lower quality of the enhancers predicted by ChromHMM compared with the experimentally defined FANTOM ones Although the BCC of promoters was not as large as the BCC of enhancers when the ChromHMM enhancers were used, the actual BCC of promoters could also be close to This was because the computationally predicted ChromHMM enhancers might result in predicting false Talukder et al BMC Genomics (2021) 22:163 interactions and thus a low BCC of promoters Moreover, the BCC of promoters was always increasing with more and more stringently defined IEPs Although we did not observe that the BCC of promoters was close to at the cutoff 100 we tried, it was indeed close to when the looplists defined by Rao et al were considered In addition, the BCC of promoters for random IEPs in every cell line and under every cutoff was much smaller than that for the real IEPs, indicating that the observed much larger BCC of promoters was not by chance (Supplementary Table S2) Third, we studied the BCC of promoters based on lEPs from other studies (Fig 1a, Table and Supplementary Table S2) [6, 8, 33, 42] For the original IEPs from Jin et al., it was 0.11 However, when the IEPs were defined from the overlap of these original IEPs with the GENCODE promoters and the two types of enhancers, it was 0.77 and 0.68, respectively (Table 1) The low BCC of promoters for the original IEPs may be partially due to the promoters Jin et al used, which had 11,313 promoters inferred by Jin et al., compared to the 57,820 promoters annotated by GENCODE [33] In terms of the ChIA-PET data [6], when we used the FANTOM enhancers, the BCC of promoters was 0.86 in K562 and 0.86 in MCF7; when we used the ChromHMM enhancers [8], it was 0.67 in K562 ChromHMM did not have annotated enhancers in MCF7 For the nine cell types from Javierre et al., it was no smaller than 0.98 and 0.91 when the FANTOM enhancers and the ChromHMM enhancers were used, respectively For the SPRITE data on the GM12878 cell line [42], the BCC values of promoters were no smaller than 0.89 and 0.71 in the IEPs defined with the FANTOM and ChromHMM enhancers, respectively Overall, although it was not as large as the BCC of enhancers, because of the imperfectness of all these collected IEPs, and the fact that the majority of promoters interacting with multiple enhancers had their individual BCC larger than 0.90, and they were much larger than the corresponding BCC of promoters for random IEPs (Supplementary Table S2), the BCC of promoters was likely to be close to as well In other words, a gene usually interacts with all enhancers of another gene or interacts with a completely different set of enhancers from this second gene Enhancers form clusters that have special characteristics Since the BCC of enhancers is close to 1, we can organize enhancers into clusters, where every enhancer in the same cluster is likely to interact wtih the same set of target genes We thus built an enhancer graph by connecting enhancers that share at least one common target We then grouped enhancers into clusters based on such a graph in each cell line (Methods, Fig 2) Here we only considered the looplists and the IEPs obtained from the most stringent cutoff (400 in GM12878 and 100 in other cell lines) to Page of 13 obtain enhancer clusters, as they were more reliable than other sets of IEPs We obtained to 2134 clusters in different cell lines The number of clusters in a cell line and across different cell lines varied dramatically, depending on the IEPs and the enhancers used (Supplementary Table S3) When the ChromHMM enhancers were used, there were many more clusters and 67% to 96% of all enhancers were included in clusters When the FANTOM enhancers were used, fewer clusters were identified and about 16% to 67% of the total enhancers were in clusters The average number of enhancers in a cluster varied from to in different cell lines Enhancers in the majority of clusters interacted with only one gene, while on average, enhancers in 18.36% clusters interacted with at least two different genes We studied the distance between the consective enhancers in a cluster, the distance between their consecutive targets and the distance between enhancers and their target genes (Fig and Supplementary Table S4) On average, about 84% of the enhancers in a cluster were within 10 kbps However, there was a small fraction of enhancers in a cluster that were more than 50 kbps away from each other For instance, when the looplists and the FANTOM enhancers were considered, there were more than 8% enhancers in a cluster that were more than 50 kbps away from each other in GM12878, HMEC and IMR90 Although enhancers in a cluster were often close to each other, their distances to each other were not significantly smaller than the distances of random enhancer pairs (Supplementary Table S5, almost all p-values>0.2) In terms of the target genes, the majority of them were within 10 kbps, with a small fraction far from each other For instance, when the looplists and the FANTOM enhancers were considered, we found 25.93%, 21.43% and 33.33% of the target genes of an enhancer cluster that were more than 50 kbps away from each other in GM12878, HMEC and IMR90, respectively It was also worth pointing out that the enhancers in a cluster were normally consecutive and active enhancers while their target genes were normally not consecutive In all cell lines, on average, more than 90% of the enhancers in a cluster were consecutive active enhancers while fewer than 17% of the target genes of an enhancer cluster were consecutive Since enhancers in a cluster were consecutive in the genome and the majority of enhancers in a cluster were close to each other, they seemed like the super-enhancers We thus compared the enhancer clusters with known super-enhancers (Supplementary Table S6) On average, 29.77% of enhancer clusters overlapped with the corresponding super-enhancers in a cell line while the majority of enhancer clusters did not overlap with the known super-enhancers (Fig 4a), which may represent new super-enhancers On the other hand, a large proportion of Talukder et al BMC Genomics (2021) 22:163 Page of 13 Fig Clusters of enhancers with Hi-C reads Here all ChromHMM active enhancer clusters in GM12878 are shown within the region Chr1:161,060,000-161,175,000 Total five clusters belong to this region The bottom half of the figure shows the five enhancer clusters (grey, yellow, green, purple and brown on the two sides) interacting with the common gene promoter regions (in the middle), arranged from left to right according to their relative genomic locations The top half of the figure shows the same interactions of the five clusters (same color codes) with Hi-C reads For example, the yellow cluster of enhancers interact with NIT1 and PFDN2 gene promoters with 687 Hi-C reads The unmarked enhancer (blue) and gene promoter (UFC1) did not belong to any cluster The location of the enhancers relative to each other and to the target genes are shown in the middle known super-enhancers did not overlap with the enhancer clusters in the corresponding cell lines (Fig 4b) Interestingly, when a super-enhancer overlapped an enhancer cluster, more than 80% of the genomic regions that contain all enhancers in this enhancer cluster were within this super-enhancer We also studied how the enhancers in a cluster located relative to a TAD (Supplementary Table S7) The enhancers in a cluster were usually within the same TAD, with no smaller than 98.08% of enhancers in a cluster within a TAD in every cell line, independent of IEPs and enhancers used In most cell lines, for all clusters, all enhancers in a cluster were within a TAD The slight deviation from the 100% in certain cases may be due to the imperfectness of the IEPs, enhancers, and TADs, mostly due to the computationally predicted enhancers, as the percentage was always 100% in almost all the cell lines when the FANTOM enhancers were used We studied how the enhancer clusters were shared by different cell lines as well (Supplementary Table S8) That is, for an enhancer cluster in a cell line, how likely was the same cluster identified in another cell line We found that on average no more than 12% enhancer clusters were identified in two cell lines Moreover, the percentage was smaller when the looplists were used than when the stringent cutoffs were used to define IEPs, implying that the looplists were too strict to include many bona fide IEPs The small percentage of shared enhancer clusters suggested that most enhancer clusters were cell-specific, which is consistent with the properties of super-enhancers [29, 30] Discussion We observed that two enhancers either not share any target gene or share almost all of their target genes This observation was true when different sets of IEPs, two ... ChromHMM enhancers BCC value is labeled “NA”, when the number of corresponding enhancers (or promoters) is zero Li Jin Rao Cell line BCC of enhancers Table The BCC of enhancers and that of promoters... FANTOM enhancers, because the number of the FANTOM enhancers was relatively small compared with the estimated number of enhancers and there were much more ChromHMM enhancers than FANTOM enhancers... every cutoff in every cell line, the BCC of enhancers for random IEPs was much smaller than the BCC of enhancers for real IEPs (p-value = 0) For instance, under the cutoff 50, the BCC of enhancers