Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 118 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
118
Dung lượng
3,39 MB
Nội dung
IDENTIFICATION AND CHARACTERIZATION OF CIS-REGULATORY ELEMENTS FOR VERTEBRATE DEVELOPMENTAL CONTROL GENES SUMANTRA CHATTERJEE M.Sc (ZOOLOGY) UNIVERSITY OF CALCUTTA, 2004 A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF BIOLOGICAL SCIENCES, NUS & GENOME INSTITUTE OF SINGAPORE NATIONAL UNIVERSITY OF SINGAPORE 2009 ACKNOWLEDGEMENTS There are several people I would like to acknowledge without whose help this project would have not been possible: First and foremost my supervisor and mentor Dr Thomas Lufkin for being truly a teacher and a guide For showing me the ropes of doing science properly and teaching me its fundamental tenet- that it is critical to ask the correct question before looking for answers For being extremely patient and believing in me I truly believe he has made me a better scientist and a better human being and to paraphrase Newton- If I have seen further it is by standing on his shoulders My co-supervisor Dr Zhiyuan Gong He has really been very helpful and was always willing to hear me out and offer critical suggestions He has really helped shape this project to a great extent My heartfelt thanks to Dr Guillaume Bourque from the Genome Institute of Singapore (GIS) for his help and his unlimited patience in getting the bioinformatics part of the project off the ground His cheerfulness and his willingness to go out of the way to help went a long way in ensuring that there was a perfect synergy between the bioinformatics and the biology Members of my lab, who ensured that scientifically I never fell flat on my face I thank them for their constructive criticism and technical help throughout my PhD More importantly I thank them for making the lab such a fun and joyful place to work in and making me a part of their lives Special thanks to our lab manager Song Jie for ensuring that the lab functions without any hitch and to our fish facility manager Siva for being very accommodative of all our demands Thanks to Department of Biological Sciences, National University of Singapore for funding my studies through the NUS research scholarship and for taking care of all administrative matters very efficiently Thanks also to Genome Institute of Singapore for allowing me to work there My biggest thanks are to my mom and dad and my brother, for showing the way at times and for being there always I wouldn’t be here today without their unconditional love and support and their unwavering faith in my abilities Table of Contents Acknowledgments……………………………………………………………………2 Table of contents………………………………………………………………… Summary…………………………………………………………………………… List of tables………………………………………………………………………….7 List of figures…………………………………………………………………………8 List of abbreviations……………………………………………………………… 10 Chapter : Introduction……………………………………………………………11 1.1 Cis-regulatory elements………………………………………………………12 1.2 Cis-regulatory elements in disease and development……………………… 14 1.3 Methods to identify cis-regulatory elements…………………………………16 1.4 Pre genomics era methods to locate and validate cis-regulatory elements… 17 1.5 Genomics era methods to locate and validate cis-regulatory elements………20 1.6 In silico predictions of cis-regulatory elements………………………………23 1.7 Comparative genomics and conserved non coding elements……………… 26 1.8 Transcription factors………………………………………………………….28 1.9 Scope of this study……………………………………………………………31 Chapter : Materials and methods 2.1 Zebrafish as a model system………………………………………………… 33 2.2 Fish maintenance ……….…………………………………………………….34 2.3 RNA in situ hybridization…………………………………………………… 35 2.4 Conservation tracks on UCSC genome browse……………………………… 37 2.5 Genome alignments and conserved non coding elements…………………… 38 2.6 Functional Assays………………………………………………………………38 2.7 BAC trargetting construct and modification……………………………………40 2.8 Electrophoretic mobility shift Assays (EMSAs)……………………………… 44 Chapter : Results for functional assays 3.1 nkx3.2……… ……………………………………………………………………… 46 3.2 CNEs around nkx3.2……………………………………………………………… 46 3.3 pax9………………………………………………………………………………….49 3.4 CNEs around pax9………………………………………………………………… 50 3.5 otx1b……………………………………………………………………………… 53 3.6 CNEs around otx1b………………………………………………………………….53 3.7 foxa2…………………………………………………………………………………57 3.8 CNEs around foxa2………………………………………………………………….57 Chapter : BAC modification and transgenics…………………………………… 61 4.1 Results for BAC modification…………………………………………………… 64 4.2 BAC dissection to detect non conserved enhancers………………………… .66 Chapter : Biochemistry and transgenics to determine transcription factor binding sites……………………………………………………………………… 71 5.1 Results for EMSA on non conserved enhancer……………………………………72 5.2 Enhancer binding transcription factor…………………………………………… 73 Chapter 6: Discussion 76 6.1 “Conserved” non coding element………………………………………………… 78 6.2 Are all CNEs developmental enhancers…………………………………………….79 6.3 Redundant enhancers……………………………………………………………… 83 6.4 Zebrafish in the world of conservation…………………………………………… 84 6.5 Neutral evolution and non-coding DNA……………………………………………85 6.6 Concluding remarks…………………………………………………………… .89 Bibliography………………………………………………………………………… 93 Appendix…………………………………………………………………………………104 List of Publications…………………………………………………………………… 118 Summary Identifying DNA sequences (enhancers) that direct the precise spatial and temporal expression of developmental control genes remains a significant challenge in the annotation of vertebrate genomes Locating these sequences, which in many cases lie at a great distance from the transcription start site, has been a major obstacle in deciphering gene regulation The completion of a number of vertebrate genome sequences with its unprecedented coverage and resolution, as well as the concurrent development of genomic alignment, visualization, and analytical bioinformatics tools, has made large scale genomic sequence comparisons between diverse species, not only possible but an increasingly popular approach for the discovery of putative cis-regulatory elements Here we present a study which integrates comparative genomics (teleost fish to humans) , with medium throughput functional assay to identify a list of conserved non-coding sequences to test for regulatory activity in important genes which control the specification and development of various tissue types in vertebrates, that are ancient in origin and conserved in evolution Our results clearly indicate that high level of functional conservation of genes is not necessarily associated with sequence conservation of their regulatory elements Moreover, highly conserved non-coding elements may not have known role in cis-regulation This study emphasizes the need to look beyond just sequence conservation and use multiple approaches to obtain the complete regulatory profile of a gene List of Tables Table 1: CNEs around nkx3.2 which were tested in the functional assay………………48 Table 2: CNEs around pax9 which were tested in the functional assay……………… 52 Table 3: CNEs around otx1b which were tested in the functional assay……………….56 Table 4: CNEs around foxa2 which were tested in the functional assay……………….60 Table A1: EST clones used to synthesize RNA probes for in situ hybridization………104 Table A2: Various terms for conserved elements in the vertebrate genome……………114 Table A3: PCR primer sequences for CNEs around nkx3.2………………………….…115 Table A4: PCR primer sequences for CNEs around pax9………………………………116 Table A5: PCR primer sequences for CNEs around otx1b…………………………… 116 Table A6: PCR primer sequences for CNEs around foxa2………………………………117 Table A7: bp mutant probes used for EMSA………………………………………….117 Table A8: 10 bp mutant probes used for EMSA…………………………………………117 List of Figures Fig 1: A Diagrammatic representation of BAC modification…………………………… 44 Fig 2: Snapshot of UCSC genome browser showing genomic locus around nkx3.2………47 Fig 3: In situ hybridization , modified BAC expression and expression of functional CNEs for nkx3.2……………………………………………………………………… 38 Fig 4: Snapshot of UCSC genome browser showing genomic locus around pax9…………51 Fig 5: In situ hybridization , modified BAC expression and expression of functional CNEs for pax9 51 Fig 6: Snapshot of UCSC genome browser showing genomic locus around otx1b……… 54 Fig 7: In situ hybridization , modified BAC expression and expression of functional CNEs for otx1b…………………………………………………………………… 55 Fig 8: Snapshot of UCSC genome browser showing genomic locus around foxa2……… 58 Fig 9: In situ hybridization , modified BAC expression and expression of functional CNEs for foxa2…………………………………………………………………………59 Fig 10:BAC dissection to detect non-conserved enhancer …………………………………68 Fig 11A: An UCSC genome browser snapshot of a 6.4 kb non conserved enhancer near otx1b……………………………………………………………………………….70 Fig 11B: An UCSC genome browser snapshot of a non conserved enhancer near otx1b whittled down to 1kb……………………………………………………….70 Fig 11C: An UCSC genome browser snapshot of a non conserved enhancer near otx1b whittled down to 200 bp………………………………………………… 70 Fig 12: EMSA to detect core binding site……………………………………………….74 Fig 13: Elucidation of transcription factor binding to the enhancer…………………… 75 Fig 14: Phylogenetic tree based on neutral evolution……………………………………88 Fig A1: An UCSC genome browser snapshot of a functional CNE bx1……………… 105 Fig A2: An UCSC genome browser snapshot of a functional CNE px2……………… 106 Fig A3: An UCSC genome browser snapshot of a functional CNE px4…………………107 Fig A4: An UCSC genome browser snapshot of a functional CNE px7…………………108 Fig A5: An UCSC genome browser snapshot of a functional CNE ox2…………………109 Fig A6: An UCSC genome browser snapshot of a functional CNE ox3…………………110 Fig A7: An UCSC genome browser snapshot of a functional CNE ox4…………………111 Fig A8: An UCSC genome browser snapshot of a functional CNE fx2………………….112 Fig A9: An UCSC genome browser snapshot of a functional CNE fx6………………….113 Fig A10: The EGFP reporter construct used in the functional assays……………… .114 Fig A11: Recapitulation of data for a forebrain enhancer as positive control……………114 List of Abbreviations BAC Bacterial artificial chromosome bp Base pair °C Degree Celsius CNE Conserved non-coding element dpf Days post fertilization DNase Deoxyribonuclease EGFP Enhanced green fluorescent protein EST Expressed sequence tag hpf Hours post fertilization kb Kilobase Myr Million years ORF Open reading frame PCR Poymerase chain reaction PTU N-Phenylthiourea RPM Revolutions per minute TF Transcription factor TSS Transcription start site UCSC University of California at Santa Cruz 10 APPENDIX Gene EST* Polymerase for in vitro transcription Nkx3.2 FDR306-P00006-BR_A14 sp6 Pax9 FDR103-P00067-DEPE-F_G04 sp6 Otx1b FDR202-P00029_BR_D17 sp6 Foxa2 FDR107-P00066-BR_B19 sp6 Table A1: The EST clones used to synthesize the insitu hybridization RNA probes for each gene * The EST clone numbers are the reference numbers for each clone in the GIS zebrafish EST library 104 Fig A1: An UCSC genome browser snapshot of the functional CNE bx1 found near nkx3.2 105 Fig A2: An UCSC genome browser snapshot of the functional CNE px2 found near pax9 106 Fig A3: An UCSC genome browser snapshot of the functional CNE px4 found near pax9 107 Fig A4: An UCSC genome browser snapshot of the functional CNE px7 found near pax9 108 Fig A5: An UCSC genome browser snapshot of the functional CNE ox2 found near otx1b 109 Fig A6: An UCSC genome browser snapshot of the functional CNE ox3 found near otx1b 110 Fig A7: An UCSC genome browser snapshot of the functional CNE ox4 found near otx1b 111 Fig A8: An UCSC genome browser snapshot of the functional CNE fx2 found near foxa2 112 Fig A9: An UCSC genome browser snapshot of the functional CNE fx6 found near foxa2 113 SmaI EcoRI BamHI Β – globin promoter SmaI EGFP + Poly A Fig.A10: The EGFP reporter construct which was used in all transgenic assays The fragment was cut out from the vector using the SmaI restriction site, gel purified and co-injected with the DNA element Fig A11: Recapitulation of expression of transgene driven by Dlx5/6 intergenic enhancer The top panel is from reference 104 The bottom panel shows our experiment White arrow indicates diencephalic and telencephalic domains Abbreviation UCR Term Ultra conserved region UCE Ultra conserved element CNS Conserved noncoding sequence Defination ≥200 bp 100% conserved between human and mouse Nontranscribed, ≥50 bp, ≥95% conserved between human and mouse, and at least partially aligned to fugu ≥100 bp 100% conserved between human and mouse Nongenic, human:mouse, >100 bp and with ≥70% identity 114 CNE Conserved noncoding element HCNR Highly conserved noncoding region HCNE Highly conserved noncoding element CNG Conserved nongenic sequence Nontranscribed, >100 bp human:fugu alignments with MegaBlast Visual inspection of mouse:Xenopus:zebrafish alignments Windows ≥50 bp that not overlap coding regions and for which the probability of being under purifying selection, given the conservation score, is ≥95% Nontranscribed, human:mouse BLAST with an e-value < 10-20 and similarity ≥98% Table A2: Various terminologies used to define conserved non coding sequences in the vertebrate genome Adapted from [203] CNE bx1 bx2 bx3 bx4 Forward primer 5`ATGTTCAGTGTTTCCCGCGT3` 5`ATCAATCATTTATGTAAGGA3` 5`TGTGTGTGTGTGTGTGTGTC3` 5`GAAGGACTTTATTTAGGGTG3` Reverse primer 5`TCTGCTCCGATTCAATGAAG3` 5`GTGTGTGTGTGTGTGCACTACCTGCATCAGC3` 5`AACTAAAATGGGTATTATGG3` 5`CAGAAACACAACTGGGAGAA3` Table A3: PCR primer sequences for CNEs around nkx3.2 115 CNE px1 px2 px3 px4 px5 px6 px7 px8 px9 px10 px11 Forward primer 5`CCATCATCCTTGTCACCTGG3` 5`CGAAAGAGTCTATTAGGTTT3` 5`AGAAATTAAGGCGAGCAAAA3` 5`TCAAACCATTGAGTTCTGTT3` 5`AAATTCAAAAAATTCCACAT3` 5`TGCCCCCGTTTTCAAGCTTT3` 5`GAGAGGAGCTGCTAAGATAA3` 5`TAGTTGATTTTTCATGTGCA3` 5`ACCTCCAGCATTTCAGCTCA3` 5`ATGCTGTTATAATAACCCAT3` 5`TCCGTCACACCTGTCTTACC3` Reverse primer 5`CTGGTCATAAACCAGCAGAG3` 5`GAATAAAACAAAGAGCAAAC3` 5`CCGACAAGATGAATCGGGAT3` 5`TGTATGTGATTTGTATTGTG3` 5`CCCTCCTAAAATTAATAAGA3` 5`CTGCAGCCTCGCTAAGTTTT3` 5`AATAGCAGACCTGGAATTGA3` 5`TGTAGAGAGAGATTCCCAAA3` 5`CATAGATGTCCTTAAACACA3` 5`ATTTAACCCTTGCGATTACT3` 5`CAGTGACATTCATGCCCAGC3` Table A4: PCR primer sequences for CNEs around pax9 CNE ox1 ox2 ox3 ox4 ox5 ox6 ox7 ox8 ox9 ox10 ox11 ox12 ox13 ox14 ox15 ox16 Forward primer 5`TGCATTTTGCTGGTTTACTC3` 5`TGGAGGGTAGACTGTGACCA3` 5`TCAAATGCAGCAAGCGAGGG3` 5`AAAGCAAATAACTCTGAAGT3` 5`AGTTTTAATATATTATAAAA3` 5`AAACACATTTCCGCTGAAGA3` 5`ACGCTCAAAATTGATATGTT3` 5`TAAAGACTGCTGCAAAAAAA3` 5`CCTCGCTAGAATCCCCCTTT3` 5`AGGCCCAGACAGAAGCACTC3` 5`GTGATGTTTGACGCATATTT3` 5`TCTAGTGCTTGCTGGGACAA3` 5`TATCAACACTATGCCACAAA3` 5`TTCACATCCTAAAGACTGCT3` 5`AGAATCCCCCTTTCCCTGAG3` 5`GTGATGTTTGACGCATATTT3` Reverse primer 5`TCACCAAAGCCCTTCGACCG3` 5`CAGAAGAAGCTCAGCTCTGT3` 5`TATGCCATGACTCCTATTTC3` 5`TAAAAAGCAAACTCCATTTC3` 5`AGCGGGAGTGTGATGGAGAC3` 5`CGTGACAAAAAGAAACTAAT3` 5`ATGTATATACAGTATAGGCT3` 5`ACCACTCTGGCTCTCAGGCC3` 5`GATGTGAAAAAACTTTAAAC3` 5`ATACCCTGTTCTACTTCTTC3` 5`CTTGACATAGTTATTTTAAT3` 5`TATATTTATTTTATTAATCT3` 5`AGCAGGCTGGTCATTGTCCT3` 5`GGCTAATCATTTTTTAACTG3` 5`AATGATTTCTACCAGGATGT3` 5`CTTGACATAGTTATTTTAAT3` Table A5: PCR primer sequences for CNEs around otx1b 116 CNE fx1 fx2 fx3 fx4 fx5 fx6 fx7 fx8 fx9 Forward primer 5`CTGTCAGGAGGAATTAAAGT3` 5`AGATGTCTTATAGGCTCATG3` 5`CATCCAACACTGTTGTCAAA3` 5`TTAGCATTAATGTACAATGA3` 5`TTTAAACCTGTTTTCAAGAC3` 5`TATTATTATTGTTGTTGTTG3` 5`CTGTTCAAGCGTCAAATTAT3` 5`GAAGCCTGGCCACACTCATA3` 5`TGCTTAATTAAGAATCCACA3` Reverse primer 5`TTTACAAATCATTTTTGTTG3` 5`AGTGAAAGAAGTTTGCCTG3` 5`AATTTAGCATAATTTCCATG3` 5`TCAAAAGAACAGCATTCATT3` 5`GACTGGCCCTATAATACACA3` 5`GTGGGAGAGAGAGTCTTCAT3` 5`TTTCTCCTTCAACTTGAATA3` 5`TGGCATTTCACCAATCATCC3` 5`TGAGTGTGCTTTTCTTTTTA3` Table A6: PCR primer sequences for CNEs around foxa2 Name Sequence EM4 Probe Probe Probe Probe Probe Probe Probe Probe CTCGTTCTCTGCCTTTCTCTTTCTTTCTTTCTCTCCCTCT AGATGTCTCTGCCTTTCTCTTTCTTTCTTTCTCTCCCTCT CTCGTGAGAGGCCTTTCTCTTTCTTTCTTTCTCTCCCTCT CTCGTTCTCTTAAGGTCTCTTTCTTTCTTTCTCTCCCTCT CTCGTTCTCTGCCTTGAGAGTTCTTTCTTTCTCTCCCTCT CTCGTTCTCTGCCTTTCTCTGGAGGTCTTTCTCTCCCTCT CTCGTTCTCTGCCTTTCTCTTTCTTGAGGGCTCTCCCTCT CTCGTTCTCTGCCTTTCTCTTTCTTTCTTTAGAGACCTCT CTCGTTCTCTGCCTTTCTCTTTCTTTCTTTCTCTCAAGAG Table A7: The nucleotides in red are the ones mutated (5 bp) with respect to their wild type sequence in EM4 Name Sequence EM4 M 1-10 M 11-20 M 21-30 M 31-40 CTCGTTCTCTGCCTTTCTCTTTCTTTCTTTCTCTCCCTCT AGATGGAGAGGCCTTTCTCTTTCTTTCTTTCTCTCCCTCT CTCGTTCTCTTAAGGGAGAGTTCTTTCTTTCTCTCCCTCT CTCGTTCTCTGCCTTTCTCTGGAGGGAGGGCTCTCCCTCT CTCGTTCTCTGCCTTTCTCTTTCTTTCTTTAGAGAAAGAG Table A8: The nucleotides in red are the ones mutated (10 bp) with respect to their wild type sequence in EM4 117 List of Publications S Chatterjee, L Min, RK Karuturi, T Lufkin: The role of post-transcriptional RNA processing and plasmid vector sequences on transient transgene expression in zebrafish Transgenic Res, 2009 19:299-304 S Chatterjee, P Kraus, T Lufkin.: A Symphony of Inner Ear Developmental Control Genes BMC Genetics, 2010 Accepted Manuscript 118 ... locate and validate candidate cis- regulatory elements for developmental control genes that are ancient in origin and conserved in evolution 1.1 Cis- Regulatory Elements Cis- regulatory elements. .. methods to locate and validate cis- regulatory elements? ?? 17 1.5 Genomics era methods to locate and validate cis- regulatory elements? ??……20 1.6 In silico predictions of cis- regulatory elements? ??……………………………23... [7] and separates genes with discordant expression patterns [8] Locus control regions are made up of multiple cis- regulatory elements and are capable of directing expression of one or more genes