Comparative analysis of microbial genomes architecture and applications

224 240 0
Comparative analysis of microbial genomes  architecture and applications

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

COMPARATIVE ANALYSIS OF MICROBIAL GENOMES: ARCHITECTURE AND APPLICATIONS KISHORE RAMAJI SAKHARKAR (M.Tech, Computer Science, BITS, Pilani, India) A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF MICROBIOLOGY YONG LOO LIN SCHOOL OF MEDICINE NATIONAL UNIVERSITY OF SINGAPORE 2006 i DEDICATED TO MY PARENTS “SHEVANTI AND RAMAJI” ii ACKNOWLEDGEMENTS I am very grateful to Professor Vincent Tak Kwong Chow who introduced me to this fascinating field of microbial genomics. I take this as a special opportunity to thank him profusely for his patience, continued support, guidance and encouragement. My heartfelt thanks to Professor Pervaiz Shazib, Vice-Dean, Research, Yong Loo Lin School of Medicine, National University of Singapore for extending all possible help and support throughout my research work. I am indebted to Professor Micheal and Professor Olson of Genome Research Centre, Washington, USA for providing access to essential genes in Pseudomonas aeruginosa. Special thanks are due to Prof. Stanley Falkow, Stanford University, USA and Prof. Salama, Fred Hutchinson Cancer Centre, USA for making available the H. pylori mutagenesis data. During this period of my doctoral research program, I was certainly blessed to interact with many eminent scientists in this emerging field of BioInformatics. The optimism and critical comments showered on the project were invaluable. My thanks are also to Ms. Geetha Sreedhara Warrior, Ms. Siti Maryam Binte Masnor, Ms. Stacy Tan and Ms. Geetha Baskaran for administrative support. Last but not least; my thanks are due to my wife, Dr. Meena Kishore Sakharkar, my daughter Neha and my son Anurag for their endless support and love. iii TABLE OF CONTENTS SUMMARY vii LIST OF TABLES ix LIST OF FIGURES xi ABBREVATIONS xiii CHAPTER 1: INTRODUCTION 1.1 Genes and Genomes 1.2 Genomes of Prokaryotes 1.3 Why microbes? 1.4 Microbial Genome Program 1.5 Comparative genomics as a tool for Microbial Genomics 10 1.6 Comparative genomics of bacteria 12 1.7 Genome data format 15 1.8 Obligatory Intracellular Parasites 18 1.9 Organization of the thesis 20 CHAPTER 2: GENE ORDER VISUALISER –GOV 22 2.1 What is GOV? 23 2.2 Why GOV? 23 2.3 Background 23 2.4 Methodology 25 2.5 Implementation and Example 26 iv 2.6 Discussion 27 2.7 Limitations 29 CHAPTER 3: PROTEOME PROFILE DATABASE -PPD 30 3.1 What is PPD? 31 3.2 Background 31 3.3 Methodology 34 3.4 Implementation 35 3.5 Example 35 3.6 Conclusion 38 3.7 Caveats 38 3.8 Genes lost and Genes retained 38 CHAPTER 4: GENOME REDUCTION 40 4.1 Introduction 41 4.2 Methodology 42 4.3 Results and Discussion 44 4.3.1 Genome size and number of genes 47 4.3.2 Genome decay and host adaptation 49 4.3.3 Protein length distribution 54 4.3.4 Metabolic pathway comparison of Glycolysis, TCA and Pentose Phosphate Pathway 68 4.3.5 Backbone genome and gene essentiality 77 4.4 Conclusion 85 v 4.5 Caveats 86 4.6 Reduced genomes and Overlapping genes 86 CHAPTER 5: COMPARATIVE ANALYSES OF OVERLAPPING GENES 87 5.1 Introduction 88 5.2 Methodology 91 5.3 Results and Discussion 93 5.3.1 Genome and Overlapping genes 93 5.3.2 Comparative study of overlapping genes 98 5.4 Conclusion 110 5.5 Overlapping genes and Gene fusion 110 CHAPTER 6: GENE FUSION 112 6.1 Introduction 113 6.2 Methodology 116 6.3 Results and Discussion 116 6.3.1 Non-verlapping genes, Intergenic DNA and gene fusion 118 6.3.2 Overlapping genes and Gene fusion 123 6.3.3 Are fusion componets in Operon? 124 6.3.4 Gene Fusion – COG and Domains 127 6.3.5 Fusion genes as microbial drug target. 128 6.4 Conclusion 130 6.5 Caveats 130 6.6 In silico drug target identification 131 vi CHAPTER 7: IDENTIFICATON OF DRUG TARGETS 137 7.1 Introduction 138 7.2 Methodology 140 7.3 Results and discussion 141 7.3.1 Essential genes in P. Aeruginosa 142 7.3.2 Essential genes in Salmonella genome 146 7.4 Conclusion 147 7.5 Caveats 148 CHAPTER 8: CONCLUSION & FUTURE WORK 159 CONCLUSION 160 FUTURE WORK 163 LIST OF PUBLICATIONS 164 BIBLIOGRAPHY 166 APPENDIX – I 188 APPENDIX – II 198 APPENDIX – III 204 vii SUMMARY The availability of complete genome sequences of many bacterial species is, for the first time, facilitating many computational approaches for understanding bacterial genomes. One of the major incentives behind the genome sequencing of numerous pathogenic bacteria is the desire to better understand their peculiarities and to develop new approaches for controlling human diseases caused by these organisms. This task has become even more urgent with the rapid evolution of antibiotic resistance in many bacterial pathogens. Novel drug targets are required in order to design new defenses against antibiotic-resistant pathogens. The availability of the complete genome sequence of many pathogenic microbes provides information on every potential drug target and is an invaluable resource in the search for novel compounds. This thesis is an attempt to computationally analyze the host-specific adaptations in obligatory intracellular bacteria and develop tools that can facilitate genome analysis towards better understanding of bacterial genome evolution and accelerating computational identification of microbial drug targets. On one hand we have developed novel tools like GOV and PPD for understanding microbial genome design, architecture and evolution. We then utilized the data derived from these tools for the understanding of host-specific adaptations in reduced genomes (obligatory intracellular parasites) and their intimate one sided associations with eukaryotic cells. We demonstrate that gene loss in these bacteria is differential, function dependent and independent of protein length. It is revealed that there is substantial sharing of ‘backbone genome’ in all the obligatory intracellular parasites. Further filtering of these targets may help us identify a target that is “selective and specific” for viii all the obligate genomes. A substantial proportion of genes in the “backbone genome” have overlapping gene architecture and are involved in important cellular functions. Certain overlapping genes are also found to be involved in gene fusion events. Genes involved in fusion are identified as essential genes and these could again be putative drug targets. It is known from our analysis that fusion genes have incremental structural and functional architectures and that inter-genic DNA has a significant role in these enhanced attributes and have contributed to genome evolution. We have developed an in silico approach for the identification of putative drug targets in microbial genomes and have confirmed our findings by comparison with experimental data. These processes are efficient ways for exploring genomes at niche and life-style level, enriching potential target genes, and for identifying those that are critical for normal cell function. The comprehensive essential gene lists generated will allow an accelerated genetic dissection of traits such as metabolic flexibility and inherent drug resistance. Such a strategy will enable us to locate critical pathways and steps in pathogenesis; to target these steps by designing new drugs; and to inhibit the infectious agent of interest with new antimicrobial agents. These results underscore the utility of large genomic databases for in silico systematic drug target identification in the postgenomic era. ix LIST OF TABLES Table 4.1 List of organism 45 Table 4.2 Description of COG category codes 52 Table 4.3 Pathway alignment table for Glycolysis and Gluconeogenesis. 73 Table 4.4 Pathway alignment table for Pentose Phosphate Pathway. 74 Table 4.5 Pathway alignment table for TCA Cycle. 75 Table 4.6 List of genes in ‘backbone genomne’. implies essential by match in essential genes set of M.genitalium. 80 Table 5.1 Directions for overlapping types 92 Table 5.2 Number of genes in different directions of overlap in genomes under study. 96 Table 5.3 List of genes present as overlapping genes in Rickettsia prowazekii and Rickettsia conorii with same number of nucleotides in overlap. 103 Table 5.4 List of genes present as overlapping genes in Rickettsia prowazekii and Rickettsia conorii with different number of nucleotides in overlap. 105 Table 5.5 List of genes present as overlapping genes in Rickettsia prowazekii and at zero inter-genic distance in Rickettsia conorii. 107 Table 5.6 List of genes present as overlapping genes in Rickettsia prowazekii and at inter-genic distance of at least 1bp in Rickettsia conorii. 108 Table 5.7 List of genes present as overlapping genes in Rickettsia conorii and at inter-genic distance of at least 1bp in Rickettsia prowazekii 109 Table 6.1 Features of H. pylori J99 and H. pylori 26695. 117 Table 6.2 Cases of fusion in juxtaposed genes 120 Table 6.3 List of genes present as fusion genes in strain H.pylori J99 and split juxtaposed genes in H. pylori 26695. 132 194 # 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 Name Legionella pneumophila subsp. pneumophila str. Philadelphia Nitrobacter winogradskyi Nb-255 Legionella pneumophila str. Lens Nitrosococcus oceani ATCC 19707 Symbiobacterium thermophilum IAM 14863 Geobacillus kaustophilus HTA426 Salinibacter ruber DSM 13855 Acinetobacter sp. ADP1 Oceanobacillus iheyensis HTE831 Legionella pneumophila str. Paris Thermobifida fusca YX Desulfotalea psychrophila LSv54 Pelobacter carbinolicus DSM 2380 Desulfovibrio desulfuricans G20 Desulfovibrio vulgaris subsp. vulgaris str. Hildenborough Bdellovibrio bacteriovorus HD100 Geobacter sulfurreducens PCA Pseudoalteromonas haloplanktis TAC125 Synechocystis sp. PCC 6803 Geobacter metallireducens GS-15 Caulobacter crescentus CB15 Vibrio cholerae O1 biovar eltor str. N16961 Bordetella pertussis Tohama I Methanosarcina mazei Go1 Clostridium acetobutylicum ATCC 824 Bacillus halodurans C-125 Bacillus subtilis subsp. subtilis str. 168 Bacillus licheniformis ATCC 14580 Bacillus licheniformis ATCC 14580 Haloarcula marismortui ATCC 43049 King B B B B B B B B B B B B B B B B B B B B B B B A B B B B B A Size 3.40 3.40 3.41 3.52 3.57 3.59 3.59 3.60 3.63 3.64 3.64 3.66 3.66 3.73 3.77 3.78 3.81 3.85 3.95 4.01 4.02 4.03 4.09 4.10 4.13 4.20 4.21 4.22 4.22 4.27 %GC 38.30 62.00 38.40 50.30 68.70 52.00 66.50 40.40 35.70 38.30 67.50 46.60 55.10 60.00 60.00 50.60 60.90 40.10 47.00 59.50 67.00 47.00 68.00 41.50 37.00 43.00 42.00 46.20 46.20 61.10 Proteins 2942 3122 2878 2974 3337 3498 2801 3325 3500 3027 3110 3116 3118 3775 3379 3587 3446 2940 3167 3519 3737 2742 3436 3370 3672 4066 4105 4152 4196 3131 Genes 3002 3198 3001 3143 3481 3612 2865 3425 3594 3136 3184 3204 3208 3865 3480 3623 3528 3075 3216 3621 3819 2889 3867 3437 3844 4171 4225 4290 4289 3186 Gene Density 0.88 0.94 0.88 0.89 0.98 1.01 0.80 0.95 0.99 0.86 0.87 0.88 0.88 1.04 0.92 0.96 0.93 0.80 0.81 0.90 0.95 0.72 0.95 0.84 0.93 0.99 1.00 1.02 1.02 0.75 195 # 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 Name Vibrio fischeri ES114 Bacillus clausii KSM-K16 Mycobacterium bovis AF2122/97 Mycobacterium tuberculosis CDC1551 Mycobacterium tuberculosis H37Rv Rhodospirillum rubrum ATCC 11170 Rhodobacter sphaeroides 2.4.1 Dechloromonas aromatica RCB Shigella dysenteriae Sd197 Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150 Shigella flexneri 2a str. 2457T Silicibacter pomeroyi DSS-3 Leptospira interrogans serovar Copenhageni str. Fiocruz L1-130 Escherichia coli K12 Shigella boydii Sb227 Gloeobacter violaceus PCC 7421 Leptospira interrogans serovar Lai str. 56601 Azoarcus sp. EbN1 Chromobacterium violaceum ATCC 12472 Bordetella parapertussis 12822 Salmonella enterica subsp. enterica serovar Typhi Ty2 Yersinia pestis biovar Medievalis str. 91001 Mycobacterium avium subsp. paratuberculosis K-10 Shigella flexneri 2a str. 301 Yersinia pestis CO92 Yersinia pseudotuberculosis IP 32953 Methanosarcina barkeri str. fusaro Salmonella enterica subsp. enterica serovar Choleraesuis str. SC-B67 Xanthomonas oryzae pv. oryzae KACC10331 Salmonella typhimurium LT2 King B B B B B B B B B B B B B B B B B B B B B B B B B B A B B B Size 4.28 4.30 4.35 4.40 4.41 4.41 4.45 4.50 4.55 4.59 4.60 4.60 4.63 4.64 4.65 4.66 4.69 4.73 4.75 4.77 4.79 4.80 4.83 4.83 4.83 4.84 4.87 4.94 4.94 4.95 %GC 38.40 44.80 65.63 65.60 65.60 65.40 68.80 59.30 50.00 52.20 50.00 68.00 35.00 50.00 47.40 62.00 35.00 65.10 64.83 68.00 52.00 48.00 69.30 50.00 48.00 47.00 39.20 51.50 63.70 52.00 Proteins 2575 4096 3920 4189 3989 3791 3022 4171 4274 4093 4068 3810 3394 4237 4136 4430 4360 4133 4407 4185 4318 3895 4350 4182 3885 3901 3606 4441 4080 4425 Genes 2716 4204 4003 4293 4048 3870 3092 4283 4664 4403 4577 3901 3481 4410 4466 4482 4400 4203 4529 4467 4645 4138 4398 4566 4113 4095 3811 4699 4209 4622 Gene Density 0.63 0.98 0.92 0.98 0.92 0.88 0.69 0.95 1.03 0.96 1.00 0.85 0.75 0.95 0.96 0.96 0.94 0.89 0.95 0.94 0.97 0.86 0.91 0.95 0.85 0.85 0.78 0.95 0.85 0.93 196 # 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 Name Magnetospirillum magneticum AMB-1 Shigella sonnei Ss046 Erwinia carotovora subsp. atroseptica SCRI1043 Xanthomonas campestris pv. campestris str. ATCC 33913 Salmonella enterica subsp. enterica serovar Typhi str. CT18 Shewanella oneidensis MR-1 Vibrio vulnificus CMCP6 Xanthomonas campestris pv. campestris str. 8004 Vibrio parahaemolyticus RIMD 2210633 Bacillus anthracis str. Ames Bacillus anthracis str. Sterne Escherichia coli CFT073 Bacteroides fragilis NCTC 9343 Vibrio vulnificus YJ016 Xanthomonas axonopodis pv. citri str. 306 Bacillus thuringiensis serovar konkukian str. 97-27 Bacteroides fragilis YCH46 Bordetella bronchiseptica RB50 Colwellia psychrerythraea 34H Bacillus cereus ATCC 10987 Bacillus cereus ATCC 14579 Rhodopseudomonas palustris CGA009 Bacillus anthracis str. 'Ames Ancestor' Escherichia coli O157:H7 Escherichia coli O157:H7 EDL933 Agrobacterium tumefaciens str. C58 Agrobacterium tumefaciens str. C58 Photorhabdus luminescens subsp. laumondii TTO1 Methanosarcina acetivorans C2A Ralstonia solanacearum GMI1000 King B B B B B B B B B B B B B B B B B B B B B B B B B B B B A B Size 4.97 5.04 5.06 5.08 5.13 5.13 5.13 5.15 5.17 5.23 5.23 5.23 5.24 5.26 5.27 5.31 5.31 5.34 5.37 5.43 5.43 5.46 5.50 5.59 5.62 5.67 5.67 5.69 5.75 5.81 %GC 65.10 50.80 51.00 64.00 52.00 45.00 46.00 64.00 45.00 35.00 35.40 50.00 44.00 45.00 64.00 35.40 33.50 68.00 38.00 38.00 38.00 65.00 35.20 50.00 50.00 59.00 59.00 42.80 42.70 69.00 Proteins 4559 4223 4472 4181 4395 4324 2926 4273 3080 5311 5287 5379 4184 3259 4312 5117 4578 4994 4910 5603 5234 4813 5309 5253 5324 2785 2715 4683 4540 3440 Genes 4563 4553 4614 4242 4711 4566 3049 4370 3223 5630 5415 5589 4347 3387 4374 5261 4670 5072 5054 5772 5476 4891 5635 5395 5453 2835 2760 5012 4721 3509 Gene Density 0.92 0.90 0.91 0.84 0.92 0.89 0.59 0.85 0.62 1.08 1.04 1.07 0.83 0.64 0.83 0.99 0.88 0.95 0.94 1.06 1.01 0.90 1.02 0.97 0.97 0.50 0.49 0.88 0.82 0.60 197 # 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 Name Bacillus cereus E33L Burkholderia mallei ATCC 23344 Pseudomonas syringae pv. syringae B728a Pseudomonas syringae pv. phaseolicola 1448A Rhizobium etli CFN 42 Pseudomonas putida KT2440 Pseudomonas aeruginosa PAO1 Bacteroides thetaiotaomicron VPI-5482 Nocardia farcinica IFM 10152 Photobacterium profundum SS9 Pseudomonas fluorescens PfO-1 Pseudomonas syringae pv. tomato str. DC3000 Sinorhizobium meliloti 1021 Burkholderia thailandensis E264 Anabaena variabilis ATCC 29413 Pseudomonas fluorescens Pf-5 Rhodopirellula baltica SH Nostoc sp. PCC 7120 Hahella chejuensis KCTC 2396 Burkholderia pseudomallei K96243 Ralstonia eutropha JMP134 Burkholderia pseudomallei 1710b Mesorhizobium loti MAFF303099 Burkholderia sp. 383 Streptomyces coelicolor A3(2) Bradyrhizobium japonicum USDA 110 Streptomyces avermitilis MA-4680 King B B B B B B B B B B B B B B B B B B B B B B B B B B B Size 5.84 5.84 6.09 6.11 6.16 6.18 6.26 6.29 6.29 6.40 6.44 6.54 6.69 6.72 7.07 7.07 7.15 7.21 7.22 7.25 7.26 7.31 7.60 8.68 9.05 9.11 9.12 %GC 35.40 68.00 58.80 55.60 61.20 61.50 67.00 42.00 70.70 41.70 60.50 58.30 63.00 67.60 41.40 67.00 55.40 41.30 53.90 68.00 64.50 68.00 62.00 67.00 72.10 64.00 72.00 Proteins 5134 2029 5089 4983 4035 5350 5567 4778 5683 3416 5736 5470 3341 3276 5039 6137 7325 5366 6778 3399 2407 3736 6743 1209 7769 8317 7577 Genes 5269 2115 5220 5225 4126 5516 5647 4864 5747 3603 5833 5692 3412 3343 5130 6231 7404 5431 6863 3529 2442 3799 6804 1221 7912 8373 7666 Gene Density 0.90 0.36 0.86 0.86 0.67 0.89 0.90 0.77 0.91 0.56 0.91 0.87 0.51 0.50 0.73 0.88 1.04 0.75 0.95 0.49 0.34 0.52 0.90 0.14 0.87 0.92 0.84 198 APPENDIX - II EC# Description Reaction 5.3.1.9 glucose-6-phosphate isomerase D-glucose 6-phosphate = D-fructose 6-phosphate 2.7.1.11 6phosphofructokinase ATP + D-fructose 6-phosphate = ADP + D-fructose 1,6-bisphosphate 4.1.2.13 Fructosebisphosphate aldolase glyceraldehyde-3phosphate dehydrogenase Phosphoglycerate kinase D-fructose 1,6-bisphosphate = glycerone phosphate + D glyceraldehydes 3-phosphate 5.4.2.1 Phosphoglycerate mutase 2-phospho-D-glycerate = 3-phospho-D-glycerate 4.2.1.11 Phosphopyruvate hydratase 2-phospho-D-glycerate = phosphoenolpyruvate + H2O 2.7.1.40 pyruvate kinase ATP + pyruvate = ADP + phosphoenolpyruvate 1.8.1.4 Dihydrolipoyl dehydrogenase protein N6-(dihydrolipoyl)lysine + NAD+ = protein N6-(lipoyl)lysine + NADH + H+ 5.3.1.1 triose-phosphate isomerase D-glyceraldehyde 3-phosphate = glycerone phosphate 2.3.1.12 dihydrolipoyllysineresidue acetyltransferase Pyruvate dehydrogenase acetyl-CoA + enzyme N6-(dihydrolipoyl)lysine = CoA + enzyme N6-(S-acetyldihydrolipoyl)lysine 1.2.1.12 2.7.2.3 1.2.4.1 D-glyceraldehyde 3-phosphate + phosphate + NAD+ = 3-phospho-D-glyceroyl phosphate + NADH + H+ ATP + 3-phospho-D-glycerate = ADP + 3-phospho-D-glyceroyl phosphate pyruvate + [dihydrolipoyllysine-residue acetyltransferase] lipoyllysine = [dihydrolipoyllysine-residue acetyltransferase] S acetyldihydrolipoyllysine + CO2 199 2.7.1.63 Polyphosphateglucose phosphotransferase Alcohol dehydrogenase (phosphate)n + D-glucose = (phosphate)(n-1) + D-glucose 6-phosphate 2.7.1.2 Glucokinase ATP + D-glucose = ADP + D-glucose 6-phosphate 3.6.1.7 Acylphosphatase An acylphosphate + H2O = a carboxylate + phosphate 4.1.1.1 Pyruvate decarboxylase A 2-oxo acid = an aldehyde + CO2 1.2.1.3 Aldehyde dehydrogenase an aldehyde + NAD+ + H2O = an acid + NADH + H+ 1.1.1.2 Alcohol dehydrogenase (NADP+) acetate-CoA ligase an alcohol + NADP+ = an aldehyde + NADPH + H+ ATP + acetate + CoA = AMP + diphosphate + acetyl-CoA 3.2.1.86 6-phospho-betaglucosidase 6-phospho-beta-D-glucosyl-(1,4)-D-glucose + H2O = D-glucose + D-glucose 6-phosphate 2.7.1.69 Phosphotransferase protein N(pi)-phospho-L-histidine + sugar = protein histidine + sugar phosphate 3.1.3.10 glucose-1phosphatase Alpha-D-glucose 1-phosphate + H2O = D-glucose + phosphate 1.1.1.1 6.2.1.1 an alcohol + NAD+ = an aldehyde or ketone + NADH + H+ List of enzymes, enzyme classification (EC) number and reactions carried out by enzymes involved in GLYCOLYSIS and GLUCONEOGENESIS 200 EC # Description Reaction 4.1.1.49 phosphoenolpyruvate carboxykinase (ATP) ATP + oxaloacetate = ADP + phosphoenolpyruvate + CO2 2.3.3.1 citrate (Si)-synthase acetyl-CoA + H2O + oxaloacetate = citrate + CoA 4.2.1.3 cis-aconitase citrate = cis-aconitate + H2O; cis-aconitate + H2O = isocitrate 1.1.1.42 Isocitrate dehydrogenase isocitrate + NADP+ = 2-oxoglutarate + CO2 + NADPH; oxalosuccinate + NADP+ = 2-oxoglutarate + CO2 + NADPH 1.2.4.2 Oxoglutarate dehydrogenase 1.8.1.4 Dihydrolipoyl dehydrogenase 2-oxoglutarate + [dihydrolipoyllysine-residue succinyltransferase] lipoyllysine = [dihydrolipoyllysine-residue succinyltransferase] S s ccin ldih drolipo ll=sine + CO2 protein N6-(dihydrolipoyl)lysine + NAD+ protein N6-(lipoyl)lysine 2.3.1.61 + NADH + H+ succinyl-CoA + enzyme N6-(dihydrolipoyl)lysine = CoA + enzyme N6-(S-succinyldihydrolipoyl)lysine 6.2.1.5 dihydrolipoyllysineresiduesuccinyl transferase succinyl-CoA synthetase 1.3.99.1 succinate dehydrogenase succinate + acceptor = fumarate + reduced acceptor 4.2.1.2 fumarate hydratase (S)-malate = fumarate + H2O 1.1.1.37 malate dehydrogenase (S)-malate + NAD+ = oxaloacetate + NADH + H+ ATP + succinate + CoA = ADP + phosphate + succinyl-CoA 201 EC # Description Reaction 4.1.3.34 citryl-CoA lyase (3S)-citryl-CoA = acetyl-CoA + oxaloacetate 2.8.3.10 citrate CoA-transferase acetyl-CoA + citrate = acetate + (3S)-citryl-CoA 4.1.3.6 Citrase Citrate = acetate + oxaloacetate 6.4.1.1 pyruvate carboxylase ATP + pyruvate + HCO3- = ADP + phosphate + oxaloacetate 4.1.1.32 phosphoenolpyruvate carboxykinase GTP + oxaloacetate = GDP + phosphoenolpyruvate + CO2 List of enzymes, enzyme classification (EC) number and reactions carried out by enzymes involved in TCA CYCLE. 202 EC# Description Reaction 2.7.1.45 2-dehydro-3deoxygluconokinase ATP + 2-dehydro-3-deoxy-D-gluconate = ADP + 6-phospho-2-dehydro-3-deoxy-D-gluconate 4.2.1.12 Phosphogluconate dehydratase 6-phospho-D-gluconate = 2-dehydro-3-deoxy-6-phospho-D-gluconate + H2O 4.1.2.14 2-dehydro-3-deoxyphosphogluconate aldolase Gluconokinase 2-dehydro-3-deoxy-D-gluconate 6-phosphate = pyruvate + D-glyceraldehyde 3-phosphate 2.7.1.12 ATP + D-gluconate = ADP + 6-phospho-D-gluconate 1.1.5.2 Quinoprotein glucose dehydrogenase D-glucose + ubiquinone = D-glucono-1,5-lactone + ubiquinol 5.3.1.9 Isomerases D-glucose 6-phosphate = D-fructose 6-phosphate 1.1.1.49 glucose-6-phosphate dehydrogenase D-glucose 6-phosphate + NADP+ = D-glucono-1,5-lactone 6-phosphate + NADPH + H+ 3.1.1.31 6phosphogluconolactonase 6-phospho-D-glucono-1,5-lactone + H2O = 6-phospho-D-gluconate 1.1.1.44 Phosphogluconate dehydrogenase 6-phospho-D-gluconate + NADP+ = D-ribulose 5-phosphate + CO2 + NADPH 5.1.3.1 ribulose-phosphate 3epimerase D-ribulose 5-phosphate = D-xylulose 5-phosphate 5.3.1.6 ribose-5-phosphate isomerase D-ribose 5-phosphate = D-ribulose 5-phosphate 2.2.1.1 Transketolase sedoheptulose 7-phosphate + D-glyceraldehyde 3-phosphate = D-ribose 5-phosphate + D-xylulose 5-phosphate 203 2.2.1.2 Transaldolase sedoheptulose 7-phosphate + D-glyceraldehyde 3-phosphate = D-erythrose 4-phosphate + D-fructose 6-phosphate 3.1.3.11 fructose-bisphosphatase D-fructose 1,6-bisphosphate + H2O = D-fructose 6-phosphate + phosphate 2.7.1.11 6-phosphofructokinase ATP + D-fructose 6-phosphate = ADP + D-fructose 1,6-bisphosphate 4.1.2.13 fructose-bisphosphate aldolase 4.1.2.4 deoxyribose-phosphate aldolase D-fructose 1,6-bisphosphate = glycerone phosphate + Dglyceraldehyde phosphate 2-deoxy-D-ribose 5-phosphate = D-glyceraldehyde 3-phosphate + acetaldehyde 5.4.2.7 phosphopentomutase alpha-D-ribose 1-phosphate = D-ribose 5-phosphate 5.4.2.2 phosphoglucomutase alpha-D-glucose 1-phosphate = alpha-D-glucose 6-phosphate 2.7.6.1 ribose-phosphate diphosphokinase ATP + D-ribose 5-phosphate = AMP + 5-phospho-alpha-D-ribose 1-diphosphate List of enzymes, enzyme classification (EC) number and reactions carried out by enzymes involved in Pentose Phosphate Pathway 204 APPENDIX - III COMPOSITE PROTEIN H. pylori J99 AND COMPONENTS FROM H. pylori 26695 GI/Locus tag/COG COMPOSITE CDD COMPONENTS CDD 15611107 jhp0036 COG2948 gnl|CDD|8701 pfam03743, TrbI, Bacterial conjugation TrbI-like protein. HP0041 .No hits found! HP0042 gnl|CDD|8701 15611156 jhp0086 - gnl|CDD|8043 pfam01531, Glyco_transf_11, Glycosyl transferase family 11. HP0094 .No hits found! 15611221 jhp0151 COG0642 15611244 jhp0174 COG1479 HP0093 gnl|CDD|17769 gnl|CDD|16521 gnl|CDD|25316 gnl|CDD|22751 gnl|CDD|25917 gnl|CDD|24456 gnl|CDD|11912 gnl|CDD|10512 gnl|CDD|14132 gnl|CDD|14130 gnl|CDD|13499 gnl|CDD|12624 gnl|CDD|7029 gnl|CDD|11193 HP0165 cd00075, HATPase_c, Histidine kinase-like ATPases cd00082, HisKA, His Kinase A. smart00387, HATPase_c, Histidine kinase-like ATPases smart00304, HAMP, HAMP (Histidine kinases. pfam02518, HATPase_c, Histidine kinase-, DNA gyrase B pfam00672, HAMP, HAMP domain. COG2205, KdpD, Osmosensitive K+ channel histidine kinase. HP0164 COG0642, BaeS, Signal transduction histidine kinase COG5002, VicK, Signal transduction histidine kinase. COG5000, NtrY, Signal transduction histidine kinase COG4251, COG4251, Bacteriophytochrome. COG3290, CitA, Signal transduction histidine kinase. pfam03235, DUF262, Protein of unknown function DUF262. COG1479, COG1479, Uncharacterized conserved protein. gnl|CDD|8043 pfam03743, TrbI, Bacterial conjugation TrbI-like protein. pfam01531, Glyco_transf_11, Glycosyl transferase family. .No hits found! HP0187 gnl|CDD|17769 gnl|CDD|16521 gnl|CDD|25316 gnl|CDD|25917 gnl|CDD|11912 gnl|CDD|10512 gnl|CDD|14132 gnl|CDD|14130 gnl|CDD|13499 gnl|CDD|7029 cd00075, HATPase_c, Histidine kinase-like ATPases cd00082, HisKA, His Kinase A (dimerization/phosphoacceptor) smart00387, HATPase_c, Histidine kinase-like ATPases. pfam02518, HATPase_c, Histidine kinase-, DNA gyrase B. COG2205, KdpD, Osmosensitive K+ channel histidine kinase COG0642, BaeS, Signal transduction histidine kinase. COG5002, VicK, Signal transduction histidine kinase. COG5000, NtrY, Signal transduction histidine kinase. COG4251, COG4251, Bacteriophytochrome. pfam03235, DUF262, Protein of unknown function. HP0186 gnl|CDD|7029 pfam03235, DUF262, Protein of unknown function. 205 15611680 jhp0613 COG4889 gnl|CDD|14813 gnl|CDD|24292 gnl|CDD|24402 gnl|CDD|14023 gnl|CDD|10786 gnl|CDD|10833 gnl|CDD|10384 cd00079, HELICc, Helicase superfamily c-terminal domain. smart00490, HELICc, helicase superfamily c-terminal. pfam00271, Helicase_C, Helicase conserved C-terminal. COG4889, COG4889, Predicted helicase COG1061, SSL2, DNA or RNA helicases of superfamily II. COG1111, MPH1, ERCC4-like helicases [DNA replication. COG0513, SrmB, Superfamily II DNA and RNA helicases HP0668 gnl|CDD|14855 gnl|CDD|17767 gnl|CDD|14813 gnl|CDD|24291 gnl|CDD|24292 gnl|CDD|24402 gnl|CDD|15768 gnl|CDD|25466 gnl|CDD|14023 gnl|CDD|10786 gnl|CDD|13378 gnl|CDD|10384 gnl|CDD|10833 gnl|CDD|10480 gnl|CDD|14023 cd00269, DEXHc, DEXH-box helicases. cd00046, DEXDc, DEAD-like helicases superfamily. cd00079, HELICc, Helicase superfamily c-terminal. smart00487, DEXDc, DEAD-like helicases superfamily. smart00490, HELICc, helicase superfamily c-terminal. pfam00271, Helicase_C, Helicase conserved C-terminal. pfam04471, Mrr_cat, Restriction endonuclease. pfam00270, DEAD, DEAD/DEAH box helicase. COG4889, COG4889, Predicted helicase. COG1061, SSL2, DNA or RNA helicases of superfamily. COG4096, HsdR, Type I site-specific restriction-modifi. COG0513, SrmB, Superfamily II DNA and RNA helicases. COG1111, MPH1, ERCC4-like helicases. COG0610, COG0610, Type I site-specific restriction. COG4889, COG4889, Predicted helicase. HP1354 gnl|CDD|14023 gnl|CDD|10160 COG4889, COG4889, Predicted helicase. COG0286, HsdM, Type I restriction-modification system. HP1353 gnl|CDD|14023 COG4889, COG4889, Predicted helicase. HP0688 HP0689 gnl|CDD|15509 gnl|CDD|12632 gnl|CDD|10291 gnl|CDD|12632 HP0712 gnl|CDD|12515 HP0713 HP0733 gnl|CDD|25933 pfam02661, Fic, Fic protein family. gnl|CDD|12515 COG3177, COG3177, Uncharacterized conserved protein. .No hits found! HP0732 .No hits found! HP0836 gnl|CDD|9723 HP0669 15611680 jhp0613 COG4889 gnl|CDD|14813 gnl|CDD|24292 gnl|CDD|24402 gnl|CDD|14023 gnl|CDD|10786 gnl|CDD|10833 gnl|CDD|10384 cd00079, HELICc, Helicase superfamily c-terminal domain. smart00490, HELICc, helicase superfamily c-terminal. pfam00271, Helicase_C, Helicase conserved C-terminal. COG4889, COG4889, Predicted helicase. COG1061, SSL2, DNA or RNA helicases of superfamily II. COG1111, MPH1, ERCC4-like helicases. COG0513, SrmB, Superfamily II DNA and RNA helicases. gnl|CDD|7370 gnl|CDD|15509 gnl|CDD|12632 gnl|CDD|12632 gnl|CDD|10291 smart00486, POLBc, DNA polymerase type-B family. pfam03104, DNA_pol_B_exo, DNA polymerase family B. COG3298, COG3298, Predicted 3'-5' exonuclease. COG3298, COG3298, Predicted 3'-5' exonuclease. COG0417, PolB, DNA polymerase elongation subunit. 15611718 jhp0651 COG3177 gnl|CDD|25933 gnl|CDD|12515 gnl|CDD|11892 pfam02661, Fic, Fic protein family. COG3177, COG3177, Uncharacterized conserved protein. COG2184, Fic, Protein involved in cell division. 15611736 jhp0669 - .No hits found! 15611842 jhp0775 gnl|CDD|9723 15611695 jhp0628 COG3298 pfam04164, DUF400, Protein of unknown function, DUF400. pfam03104, DNA_pol_B_exo, DNA polymerase family B. COG3298, COG3298, Predicted 3'-5' exonuclease. COG0417, PolB, DNA polymerase elongation subunit. COG3298, COG3298, Predicted 3'-5' exonuclease. COG3177, COG3177, Uncharacterized conserved protein. pfam04164, DUF400, Protein of unknown function. 206 COG3018 gnl|CDD|9723 15611877 jhp0810 COG1629 gnl|CDD|28191 gnl|CDD|25563 gnl|CDD|11341 gnl|CDD|13909 pfam04164, DUF400, Protein of unknown function, DUF400. cd01347, ligand_gated_channel, TonB dependent/Ligand. pfam00593, TonB_dep_Rec, TonB dependent receptor. COG1629, CirA, Outer membrane receptor proteins. COG4771, FepA, Outer membrane receptor for ferrientero. HP0837 gnl|CDD|9723 pfam04164, DUF400, Protein of unknown function. HP0916 gnl|CDD|28191 gnl|CDD|25563 gnl|CDD|11341 gnl|CDD|13910 gnl|CDD|13909 gnl|CDD|13461 gnl|CDD|28191 gnl|CDD|25563 gnl|CDD|11341 gnl|CDD|28191 gnl|CDD|25563 gnl|CDD|11341 gnl|CDD|13910 gnl|CDD|13909 gnl|CDD|13461 cd01347, ligand_gated_channel, TonB dependent. pfam00593, TonB_dep_Rec, TonB dependent receptor. COG1629, CirA, Outer membrane receptor proteins. COG4772, FecA, Outer membrane receptor for Fe3+. COG4771, FepA, Outer membrane receptor for ferrienter. COG4206, BtuB, Outer membrane cobalamin receptor protein [Coen . cd01347, ligand_gated_channel, TonB dependent/Ligand. pfam00593, TonB_dep_Rec, TonB dependent receptor. COG1629, CirA, Outer membrane receptor proteins. cd01347, ligand_gated_channel, TonB dependent/Ligand. pfam00593, TonB_dep_Rec, TonB dependent receptor. COG1629, CirA, Outer membrane receptor proteins. COG4772, FecA, Outer membrane receptor for Fe3+. COG4771, FepA, Outer membrane receptor for ferrienter. COG4206, BtuB, Outer membrane cobalamin receptor protein [Coen . gnl|CDD|28191 gnl|CDD|25563 gnl|CDD|11341 cd01347, ligand_gated_channel, TonB dependent/Ligand. pfam00593, TonB_dep_Rec, TonB dependent receptor. COG1629, CirA, Outer membrane receptor proteins. HP0915 15611918 jhp0851 COG1629 HP0916 cd01347, ligand_gated_channel, TonB dependent/Ligand. pfam00593, TonB_dep_Rec, TonB dependent receptor. COG1629, CirA, Outer membrane receptor proteins. COG4772, FecA, Outer membrane receptor for Fe3+. COG4771, FepA, Outer membrane receptor for ferrienter. COG4771, FepA, Outer membrane receptor for ferrienter. COG4206, BtuB, Outer membrane cobalamin receptor protein [Coen . HP0915 gnl|CDD|28191 gnl|CDD|25563 gnl|CDD|11341 gnl|CDD|13910 gnl|CDD|13909 gnl|CDD|13909 gnl|CDD|13461 15611933 jhp0866 - .No hits found! 15612138 jhp1073 - gnl|CDD|11146 15612337 jhp1272 - .No hits found! COG1432, COG1432, Uncharacterized conserved protein. HP0931 .No hits found! HP0932 .No hits found! HP1146 .No hits found! HP1145 gnl|CDD|11146 COG1432, COG1432, Uncharacterized conserved protein. HP1354 gnl|CDD|14023 gnl|CDD|10160 COG4889, COG4889, Predicted helicase. COG0286, HsdM, Type I restriction-modification system. HP1353 gnl|CDD|14023 COG4889, COG4889, Predicted helicase. CDD domain assignments for the list of genes present as fusion genes in H. pylori strain J99 and split juxtaposed genes in H. pylori 26695. 207 COMPOSITE PROTEIN H. pylori 26695 AND COMPONENTS FROM H. pylori J99 COMPOSITE CDD 15644650 HP0017 COG3451 gnl|CDD|27819 gnl|CDD|27819 gnl|CDD|17296 gnl|CDD|25783 gnl|CDD|12779 cd01127, TrwB, Bacterial conjugation protein TrwB. cd01127, TrwB, Bacterial conjugation protein TrwB. pfam03135, CagE_TrbE_VirB, CagE, TrbE, VirB family pfam01580, FtsK_SpoIIIE, FtsK/SpoIIIE family. FtsK COG3451, VirB4, Type IV secretory pathway, VirB4. COMPONENTS CDD jhp0917 jhp1304 gnl|CDD|27819 gnl|CDD|17296 gnl|CDD|25783 gnl|CDD|2461 gnl|CDD|12779 gnl|CDD|11385 gnl|CDD|12779 gnl|CDD|12779 .No hits found! jhp1305 .No hits found! jhp0053 .No hits found! jhp0054 .No hits found! jhp1304 .No hits found! jhp1305 .No hits found! jhp1301 gnl|CDD|7029 pfam03235, DUF262, Protein of unknown function DUF262. gnl|CDD|11193 COG1479, COG1479, Uncharacterized conserved protein. .No hits found! jhp0918 15644663 HP0030 - .No hits found! 15644690 HP0060 - .No hits found! 15645052 HP0424 - gnl|CDD|14001 15645054 HP0426 COG1479 gnl|CDD|7029 gnl|CDD|11193 15645054 HP0426 COG1479 gnl|CDD|7029 gnl|CDD|11193 15645069 HP0441 COG3451 gnl|CDD|27819 gnl|CDD|17296 gnl|CDD|17162 gnl|CDD|12779 gnl|CDD|12833 COG4867, COG4867, Uncharacterized protein. pfam03235, DUF262, Protein of unknown function. COG1479, COG1479, Uncharacterized conserved. jhp1302 pfam03235, DUF262, Protein of unknown function. COG1479, COG1479, Uncharacterized conserved. jhp1430 jhp1431 cd01127, TrwB, Bacterial conjugation protein TrwB. pfam03135, CagE_TrbE_VirB, CagE, TrbE, VirB family pfam02534, TRAG, TraG/TraD family. The TraG/TraD. COG3451, VirB4, Type IV secretory pathway, VirB4. COG3505, VirD4, Type IV secretory pathway, VirD4. jhp0917 jhp0918 cd01127, TrwB, Bacterial conjugation protein TrwB. pfam03135, CagE_TrbE_VirB, CagE, TrbE, VirB family. pfam01580, FtsK_SpoIIIE, FtsK/SpoIIIE family. pfam01935, DUF87, Domain of unknown function DUF87. COG3451, VirB4, Type IV secretory pathway, VirB4. COG1674, FtsK, DNA segregation ATPase FtsK/SpoIIIE. COG3451, VirB4, Type IV secretory pathway, VirB4. COG3451, VirB4, Type IV secretory pathway, VirB4. gnl|CDD|7029 gnl|CDD|11193 gnl|CDD|11193 pfam03235, DUF262, Protein of unknown function DUF262. COG1479, COG1479, Uncharacterized conserved protein. COG1479, COG1479, Uncharacterized conserved protein. gnl|CDD|27819 gnl|CDD|17296 gnl|CDD|25783 gnl|CDD|2461 gnl|CDD|12779 gnl|CDD|11385 gnl|CDD|12779 gnl|CDD|12779 cd01127, TrwB, Bacterial conjugation protein TrwB. pfam03135, CagE_TrbE_VirB, CagE, TrbE, VirB family. pfam01580, FtsK_SpoIIIE, FtsK/SpoIIIE family. pfam01935, DUF87, Domain of unknown function DUF87. COG3451, VirB4, Type IV secretory pathway, VirB4. COG1674, FtsK, DNA segregation ATPase FtsK/SpoIIIE. COG3451, VirB4, Type IV secretory pathway, VirB4. COG3451, VirB4, Type IV secretory pathway, VirB4. 208 15645087 HP0459 COG3451 gnl|CDD|27819 gnl|CDD|17296 gnl|CDD|12779 gnl|CDD|10307 cd01127, TrwB, Bacterial conjugation protein TrwB. jhp0917 pfam03135, CagE_TrbE_VirB, CagE, TrbE, VirB family. COG3451, VirB4, Type IV secretory pathway. COG0433, COG0433, Predicted ATPase. jhp0670 gnl|CDD|27819 gnl|CDD|17296 gnl|CDD|25783 gnl|CDD|2461 gnl|CDD|12779 gnl|CDD|11385 gnl|CDD|12779 gnl|CDD|12779 gnl|CDD|14855 gnl|CDD|17767 gnl|CDD|24291 gnl|CDD|15768 gnl|CDD|25466 gnl|CDD|14023 gnl|CDD|10786 gnl|CDD|13378 gnl|CDD|10833 gnl|CDD|14813 gnl|CDD|24292 gnl|CDD|24402 gnl|CDD|14023 gnl|CDD|10786 gnl|CDD|10384 gnl|CDD|10833 gnl|CDD|12921 jhp0669 .No hits found! pfam03692, UPF0153, Uncharacterised protein family COG0727, COG0727, Predicted Fe-S-cluster oxidored. COG2983, COG2983, Uncharacterized conserved. jhp0702 .No hits found! jhp0701 pfam03692, UPF0153, Uncharacterised protein family COG0727, COG0727, Predicted Fe-S-cluster oxidored. COG2983, COG2983, Uncharacterized conserved. jhp0703 gnl|CDD|9511 pfam03692, UPF0153, Uncharacterised protein family. gnl|CDD|10596 COG0727, COG0727, Predicted Fe-S-cluster oxidoreduct .No hits found! jhp0702 .No hits found! jhp0899 .No hits found! jhp0898 .No hits found! jhp0901 gnl|CDD|915 jhp0900 .No hits found! jhp0918 15645292 HP0668 - gnl|CDD|14855 gnl|CDD|17767 gnl|CDD|14813 gnl|CDD|24291 gnl|CDD|24292 gnl|CDD|24402 gnl|CDD|15768 gnl|CDD|25466 gnl|CDD|14023 gnl|CDD|10786 gnl|CDD|13378 gnl|CDD|10384 gnl|CDD|10833 gnl|CDD|10480 15645353 HP0733 - .No hits found! 15645383 HP0764 COG0727 gnl|CDD|9511 gnl|CDD|10596 gnl|CDD|12327 15645383 HP0764 COG0727 gnl|CDD|9511 gnl|CDD|10596 gnl|CDD|12327 15645580 HP0964 - .No hits found! 15645582 HP0966 COG0699 .No hits found! jhp0612 cd00269, DEXHc, DEXH-box helicases. cd00046, DEXDc, DEAD-like helicases superfamily. cd00079, HELICc, Helicase superfamily c-terminal. smart00487, DEXDc, DEAD-like helicases superfamily. . smart00490, HELICc, helicase superfamily c-termi. pfam00271, Helicase_C, Helicase conserved C-termi. pfam04471, Mrr_cat, Restriction endonuclease. pfam00270, DEAD, DEAD/DEAH box helicase. COG4889, COG4889, Predicted helicase. COG1061, SSL2, DNA or RNA helicases of superfamily jhp0613 COG4096, HsdR, Type I site-specific restriction. COG0513, SrmB, Superfamily II DNA and RNA helicase. COG1111, MPH1, ERCC4-like helicases. COG0610, COG0610, Type I site-specific restriction. cd01127, TrwB, Bacterial conjugation protein TrwB. pfam03135, CagE_TrbE_VirB, CagE, TrbE, VirB family. pfam01580, FtsK_SpoIIIE, FtsK/SpoIIIE family. pfam01935, DUF87, Domain of unknown function DUF87. COG3451, VirB4, Type IV secretory pathway, VirB4. COG1674, FtsK, DNA segregation ATPase FtsK/SpoIIIE. COG3451, VirB4, Type IV secretory pathway, VirB4. COG3451, VirB4, Type IV secretory pathway, VirB4. cd00269, DEXHc, DEXH-box helicases. cd00046, DEXDc, DEAD-like helicases superfamily. smart00487, DEXDc, DEAD-like helicases superfamily. pfam04471, Mrr_cat, Restriction endonuclease. pfam00270, DEAD, DEAD/DEAH box helicase. COG4889, COG4889, Predicted helicase. COG1061, SSL2, DNA or RNA helicases of superfamily II. COG4096, HsdR, Type I site-specific restriction-modi. COG1111, MPH1, ERCC4-like helicases [DNA replication, recombin cd00079, HELICc, Helicase superfamily c-terminal. smart00490, HELICc, helicase superfamily c-terminal. pfam00271, Helicase_C, Helicase conserved C-terminal. COG4889, COG4889, Predicted helicase. COG1061, SSL2, DNA or RNA helicases of superfamily II. COG0513, SrmB, Superfamily II DNA and RNA helicases. COG1111, MPH1, ERCC4-like helicases. COG3596, COG3596, Predicted GTPase. pfam00350, Dynamin_N, Dynamin family. 209 15646019 HP1409 COG1479 gnl|CDD|7029 gnl|CDD|11193 15646019 HP1409 COG1479 gnl|CDD|7029 gnl|CDD|11193 15646021 HP1411 COG0223 gnl|CDD|14001 pfam03235, DUF262, Protein of unknown function. COG1479, COG1479, Uncharacterized conserved. jhp1301 jhp1302 pfam03235, DUF262, Protein of unknown function. COG1479, COG1479, Uncharacterized conserved. COG4867, COG4867, Uncharacterized protein. jhp1430 gnl|CDD|7029 pfam03235, DUF262, Protein of unknown function. gnl|CDD|11193 COG1479, COG1479, Uncharacterized conserved protein. .No hits found! jhp1431 gnl|CDD|7029 gnl|CDD|11193 gnl|CDD|11193 jhp1304 .No hits found! jhp1305 .No hits found! pfam03235, DUF262, Protein of unknown function DUF262 COG1479, COG1479, Uncharacterized conserved protein COG1479, COG1479, Uncharacterized conserved protein. CDD domain assignments for the list of genes present as fusion genes in H. pylori strain 26695 and split juxtaposed genes in H. pylori J99. [...]... genome size and metabolic and functional diversity as demonstrated by the size of the genomes of Bacillus and Streptomyces (formation of spores, antibiotic synthesis), rhizobia (symbiotic nitrogen fixation), and Pseudomonas (degradation of a wide range of aromatic compounds) 14 Figure 1.2: A plot of genome size versus number of proteins Figure 1.3: A plot of genome size versus number of genes 15... 6.4 List of genes present as fusion genes in strain H.pylori 26695 and split juxtaposed genes in H pylori J99 134 Table 7.1 Results of computational analysis of P aeruginosa genome 144 Table 7.2 Results of the computational analyses of Salmonella genomes 144 Table 7.3 List of putative drug targets in P aeruginosa 150 xi LIST OF FIGURES Figure 1.1 Number of Prokaryotic, Archeal and Eukaryotic genomes. .. parasites (compared to twenty one as of Feb, 2006) and 4 species of Mycoplasma genomes (compared to over ten as of Feb, 2006) that were completely sequenced There were reports on the evolution of these genomes by genome reduction from larger genomes However, no data was available on comparative analysis of these genomes Thus, we embarked on a project to study the minimal genomes as a group with specific... microbes, improve tools for annotation and analysis of sequence data, develop high-throughput methods for determining gene function and gene expression, and develop methods for examining protein-protein and protein-nucleic acid interaction Figure 1.1 shows the rapid increase in number of sequenced microbial genomes 9 Figure 1.1: Number of Prokaryotic, Archeal and Eukaryotic genomes sequenced since 1995 9... developments as the fruits of the MGP mature Already, we have become more appreciative of the extent of the microbial world's effect on earth, realizing how little we know about this kingdom and wondering at its potential benefits to our world 1.5 COMPARATIVE GENOMICS AS A TOOL FOR MICROBIAL GENOMICS Comparative genomics is the study of the differences and similarities in genome structure and organization in... interactions in cells and organisms as a major player in the complexity of live systems They made it possible to reveal conserved and variable elements of the genomes and to suppose that tens of thousands of proteins are made of just about 1,500-2,000 discrete structural protein units called domains or modules Different modular proteins are formed from these modules taken in different combinations, and this shuffling... only 580,000 base pairs of DNA and yet encodes 470 genes Future studies on this and other minimal genomes will help increase our understanding of more complex genomes 3) Microbial diversity Evolution of life Among the oldest life forms known, the archaea make up one of three phylogenetic or evolutionary domains into which all life is classified The other two are the eukarya and the bacteria Archaea... between GENe and chromosOME and it stood for the complete haploid set of chromosomes and genes (Winkler, 1920) Life as we know it is specified by genomes Every organism possesses a genome that contains the biological information needed to construct and maintain a living example of that organism Most genomes, including those of all cellular life forms, are made up of DNA, but a few viruses have RNA genomes. .. idea of evolvability as a universal feature of the living entities, and a very important concept (v) that not only natural selection, but also internal developmental biases can form the basis for evolutionary changes 1.6 COMPARATIVE GENOMICS OF BACTERIA As of February, 2006, the website of the National Center for Biotechnology listed 293 bacterial genomes (25 Archaea and 268 Eubacteria) whose genomes. .. picture of organismal evolution A large number of various microbial genomes sequenced recently for the first time make it possible to analyze evolutionary changes at a whole genome level, unlike a single gene level Intraspecies and interspecies comparisons of the sequenced genomes demonstrates that the organism’s complexities do not directly correlate with the number of genes and suggest the importance of . COMPARATIVE ANALYSIS OF MICROBIAL GENOMES: ARCHITECTURE AND APPLICATIONS KISHORE RAMAJI SAKHARKAR (M.Tech,. construct and maintain a living example of that organism. Most genomes, including those of all cellular life forms, are made up of DNA, but a few viruses have RNA genomes. 3 1.2 GENOMES OF PROKARYOTES. intracellular bacteria and develop tools that can facilitate genome analysis towards better understanding of bacterial genome evolution and accelerating computational identification of microbial drug

Ngày đăng: 15/09/2015, 17:11

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan