Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 133 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
133
Dung lượng
3,05 MB
Nội dung
STATISTICAL STRATEGIES FOR NEXT GENERATION LARGE-SCALE GENETIC STUDIES WANG XU (BSc Hons, National University of Singapore) A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY SAW SWEE HOCK SCHOOL OF PUBLIC HEALTH NATIONAL UNIVERSITY OF SINGAPORE 2014 I DEC CLARATION N I hereby decllare that thiss thesis is my y original woork and it haas been writtten by me in its i entirety. I have duly acknowledge a ed all the sou urces of infoormation whiich have beeen used in th he thesis. This theesis has also not been sub bmitted for aany degree iin any univerrsity prreviously. Wang W Xu II ACKNOWLEDGEMENTS I would like to express my special appreciation and thanks to my supervisor A/P Teo Yik Ying for being such a tremendous mentor for me. Thank you for encouraging my research and for allowing me to grow as a research scientist. You are the most patient supervisor I can ever imagine. Your advice on both research as well as on my career have been priceless to me and will inspire me throughout my life time. I would also like to thank Prof Chia Kee Seng for bring me into the field of public health. Thanks for the training and opportunities you offered in the first year of my research, that’s where I developed my interest in biostatistics and decided to a PhD. A special thanks to my colleagues in NUS Statistical Genetics Group and friends in School of Public Health. Thanks for all kinds of encouragement, support and friendship you have given me. The thesis and all the work in my PhD course would not have been possible without your help and support. Last but not least, I would like to express my love and thankfulness to my family. Words cannot describe how grateful I am for your love, caring, tolerance and for all the sacrifices that you’ve made on my behalf. Your love and prayer for me was what sustained me thus far. III TABLE OF CONTENTS SUMMARY . 1 LIST OF TABLES . 2 LIST OF FIGURES . 3 PUBLICATIOINS . 5 CHAPTER - INTRODUCTION . 6 1.1 Genome-Wide Association Study 6 1.1.1 Linkage Disequilibrium and Indirect Association 6 1.1.2 Genotyping and sequencing Technologies 7 1.2 Genome-wide Meta-analysis 8 1.2.1 Genetic diversity and biological heterogeneity . 9 1.2.2 Statistical approaches for meta-analysis 9 1.3 Trans-ethnic Fine-mapping . 11 1.4 Shift from Common to Rare Variants . 13 CHAPTER2 – AIMS 15 2.1 Study - Comparing Methods for Performing Trans-Ethnic Meta-Analysis of Genome-wide Association Studies 15 2.2 Study - A Statistical Method for Region-Based Meta-analysis of Genomewide Association Studies in Genetically Diverse Populations . 15 2.3 Study - Trans-Ethnic Fine-Mapping Using Population-Specific Reference Panels in Diverse Asian Populations 15 2.4 Study – Trans-Ethnic Fine-Mapping of Rare Causal Variants . 16 CHAPTER3 – COMPARING METHODS FOR PERFORMING TRANSETHNIC META-ANALYSIS OF GENOME-WIDE ASSOCIATION STUDIES . 17 Introduction . 17 Materials and Methods 20 Fixed-effects meta-analysis (FE) . 20 Random-effects meta-analysis (RE) . 20 Random-effects meta-analysis by Han and Eskin (RE-HE) . 21 Bayesian approach meta-analysis (MANTRA) . 22 Simulation set-up 23 Type diabetes GWAS . 25 Results 27 Power and false positive rates . 27 Application to T2D data . 29 Discussion . 39 CHAPTER4 – A STATISTICAL METHOD FOR REGION-BASED METAANALYSIS OF GENOME-WIDE ASSOCIATION STUDIES IN GENETICALLY DIVERSE POPULATIONS 42 Introduction . 42 IV Materials and Methods 45 Region-based analysis 45 Type diabetes datasets . 48 Software implementation 49 Results 49 Power and false positive rates . 49 Application to T2D data . 54 Discussion . 58 Supplementary Material 60 CHAPTER5 – TRANS-ETHNIC FINE-MAPPING USING POPULATIONSPECIFIC REFERENCE PANELS IN DIVERSE ASIAN POPULATIONS 84 Introduction . 84 Materials and Methods 86 Simulation Setup . 86 GWAS cohorts 86 Identification of trait-associated loci . 87 Statistical analyses . 89 Results 90 Rank of the association signals at the causal variant 90 Trans-ethnic fine-mapping GWAS loci for eye traits and blood lipids 92 Loci with evidence of multiple association signals 93 Trans-ethnic fine-mapping narrows associated regions 99 Population-specific versus 1KGP cosmopolitan reference panel 99 Discussion . 104 Supplementary Material 106 Simulation to test for the rank of association signals at causal variant 106 CHAPTER – TRANS-ETHNIC FINE-MAPPING OF RARE CAUSAL VARIANTS 108 Introduction . 108 Fine-mapping of causal variants 109 Trans-ethnic fine-mapping of common causal variants . 110 Trans-ethnic fine-mapping of rare causal variants 112 Conclusion 117 CHAPTER - CONCLUSIONS AND DISCUSSIONS 118 REFERENCES 120 V SUMMARY In the past 10 years, Genome-wide association studies (GWAS) have successfully identified thousands of loci that are associated with complex diseases and human traits. By aggregating samples from multiple populations across the world, a new wave of GWA meta-analyses have increased the statistical power to identify novel findings with smaller effect sizes. However, the amount of phenotypic variation explained by GWAS is much less than the total heritability estimated by twin and family studies. The missing heritability is believed to be caused by the following three reasons: i) classical approaches for meta-analysis are hampered by the presence of effect size and allelic heterogeneity; ii) the causal variants that fundamentally affect the diseases and traits are yet to be discovered; iii) the unexplored genetic impact of low-frequency and rare causal variants. To address these problems, we conducted four studies of trans-ethnic meta-analyses and finemapping. We began with a systematic review to identify the most powerful statistical approach to accommodate the issue of effect size heterogeneity. To address the problem of allelic heterogeneity, we designed a novel strategy to assess regional association evidence which successfully captures the additional phenotypic variation explained by multiple causal variants. In order to locate the causal variants with more accuracy, we evaluated the merit of trans-ethnic finemapping and accessed the impact of population-specific reference panel in identifying the functional variants that biologically affecting the phenotypes of interest. Last but not least, we extent to explore the feasibility of trans-ethnic finemapping for rare causal variants by evaluating whether the conditions that have made the process successful for common variants are also hold for rare variants. 1 LIST OF TABLES Table 1. False-Positive Rate of FE, RE, RE-HE and MANTRA at thresholds of increasing significance 36 Table 2. Power comparison of the four methods under different simulation scenarios .36 Table 3. Summary information of the seven T2D GWAS .37 Table 4. SNPs exhibiting significant association signals of seven type2 diabetes genome-wide association studies. 38 Table 5. False positive rates in the meta-analyses .53 Table 6. Results of the region-based meta-analysis for type diabetes 56 Table 7. Results of the SNP-based analyses for each of the three discovery populations and also for the meta-analysis 57 Table 8. Comparison of eigenvalue thresholds in the regional analyses .77 Table 9. Comparison of over-representation P-value thresholds in the regional analyses .77 Table 10. List of 56 SNPs from DIAGRAM+ (table extracted and condensed from the DIAGRAM+ publication). 78 Table 11. Percentage (%) of phenotypic variance explained by the various disease models in the T2D case-control from WTCCC .79 Table 12. Results of the gene-based meta-analysis for type diabetes .80 Table 13. Results of the pathway-based meta-analysis for type diabetes .81 Table 14. Genes that contributed to the region-based association signal at the adherens junction pathway .82 Table 15. Results of the gene-based analyses of the 41 DIAGRAM+ gene loci in the four population scans in T2D for Singapore and the WTCCC. 83 Table 16. Summary of study-specific quality control, imputation and analysis 88 Table 17. 176 genetic loci in the NIH GWAS catalogue from GWAS in eye traits and blood lipids .88 Table 18. 26 loci with significant association evidence in the meta-analysis of the three Asian cohorts .94 Table 19.Functional proxies for the top ranking SNPs at ABCA1 and CARD10 .95 Table 20. Independent association signals identified from conditional analyses 96 Table 21. Properties of the 99% credible sets of SNPs at significant loci .102 Table 22. Comparison between population-specific and 1KGP cosmopolitan reference panels .103 Table 23. Population genetic characteristics of common and rare variants .109 Table 24. Comparisons between trans-ethnic fine-mapping of common and rare causal variants .113 2 LIST OF FIGURES Figure 1. Identification of genetic variants by risk allele frequency and strength of genetic effect 14 Figure 2. Histogram plots of the estimated effect sizes under different simulated scenarios 26 Figure 3. Comparison of P-value and the Bayes’ factor under null hypothesis 30 Figure 4. Statistical power of different meta-analysis approaches 31 Figure 5. Comparison of P-value and the Bayes’ factor under alternative hypothesis 32 Figure 6. Comparison of the statistical power of the four meta-analysis methods under different scenarios of effect size heterogeneity and number of populations .33 Figure 7. Manhattan plots from the FE, RE-HE and MANTRA .34 Figure 8. Forest plots of the meta-analyses at HNF4A 35 Figure 9. Different LD patterns between unobserved causal variant and tag SNPs affect meta-analysis results 43 Figure 10. Pictorial representation of the proposed algorithm for region-based analysis .45 Figure 11. Linear interpolating the statistical significance in the Binomial test when the number of significant SNPs is not an integer value .47 Figure 12. Power comparisons of the different methods for the meta-analysis across all three populations . 51 Figure 13. Power comparisons of the different methods for meta-analysis in the presence of allelic heterogeneity 52 Figure 14. Power comparisons of the different methods for meta-analysis across all three HapMap populations at relative risk 1.3 .71 Figure 15. Performance of the region-based method using different genotype panels for estimating LD 72 Figure 16. Power comparisons of the region-based method with different window sizes, as compared to the meta-analysis with only the genotyped SNPs, or with the imputed SNPs common to all three populations. .73 Figure 17. Comparisons between the SNP-based and region-based metaanalysis in genomic regions displaying evidence of LD variations .74 Figure 18. A pathway map of the cell-cell adherens junctions pathway from the KEGG online resource .75 Figure 19. Comparison of the statistical evidence for the gene-based analysis of the WTCCC T2D case-control data with different buffer sizes 76 Figure 20. Histograms and cumulative frequencies on the ranks of the simulated causal variants out of 2,000 rounds of simulations .91 Figure 21. Regional plots of conditional analysis at the HDL-C locus ABCA1, for the Chinese (SCES), Malays (SiMES), Indians (SINDI). 97 Figure 22. Regional plots of conditional analysis at the ODA locus CARD10, for the Chinese (SCES), Malays (SiMES), Indians (SINDI). 98 3 Figure 23. Regional plots of SNPs at the LDL-C locus CELSR2, for the Chinese (SCES), Malays (SiMES), Indians (SINDI) and the meta-analysis of all three cohorts 100 Figure 24. Regional plots of SNPs at the LDL-C locus TOMM40-APOE from two trans-ethnic meta-analyses using either the population-specific reference panels or the cosmo-politan reference panel from the 1000 Genomes Project 101 Figure 25. Trans-ethnic fine-mapping of common and rare causal variants. 111 4 PUBLICATIOINS Wang X, Liu X, Sim X, Xu H, Khor CC, Ong RT, Tay WT, Suo C, Poh WT, Ng DP, Liu J, Aung T, Chia KS, Wong TY, Tai ES, Teo YY (2012) A statistical method for region-based meta-analysis of genome-wide association studies in genetically diverse populations. Eur J Hum Genet. 20(4):469-75 Wang X, Chua HX, Chen P, Ong RT, Sim X, Zhang W, Takeuchi F, Liu X, Khor CC, Tay WT, Cheng CY, Suo C, Liu J, Aung T, Chia KS, Kooner JS, Chambers JC, Wong TY, Tai ES, Kato N, Teo YY. (2013) Comparing methods for performing trans-ethnic meta-analysis of genome-wide association studies. Hum Mol Genet. 22(11):230 Wang X, Teo YY. (2013) Trans-ethnic fine-mapping of rare causal variants. (In press) Wang X, Cheng CY, Liao J, Sim XL, Liu JJ, Chia KS, Tai ES, Little P, Khor CC, Aung T, Wong TY, Teo YY (2014) Evaluation of trans-ethnic fine-mapping with population-specific and cosmopolitan imputation reference panels across multiple traits in diverse Asian populations. (Submitted) Mahajan A, Go MJ, Zhang WH, Below J, Gaulton K, Ferreira T, Horikoshi M, Johnson A, Ng CY, Prokopenko I, Saleheen D, Wang X, Zeggini E…Seielstad M, Teo YY, Boehnke M, Parra E, Chambers J, Tai ES, McCarthy M, Morris A. (2014) Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type diabetes susceptibility. Nature Genetics. 10.1038/ng.2897 Ong RT, Wang X, Liu X, Teo YY. (2013) Efficiency of trans-ethnic genome-wide meta-analysis and fine-mapping. Eur J Hum Genet. 20(12):1300-7 Zakharov S, Wang X, Liu JJ, Teo YY. (2014) Improving power for robust transethnic meta-analysis of rare and low-frequency variants with a partitioning approach. Eur J Hum Genet. Pillai N, Okada Y, Saw WY, Ong TW, Wang X, Tantoso E …Plummer F, Lee JD, Chia KS, Luo M, de Bakker P, Teo YY. (2014) Predicting HLA alleles from highresolution SNP data in three Southeast Asian populations. Hum Mol Genet. 23(16):4443‐51 Kato N*, Loh M*, Takeuchi F*, Verweij N*, Wang X*, Zhang WH*, Kelly T*, Saleheen D*, Lehne BJ*, Leach IM*, …. McCarthy M, Scott J, Teo YY*, He J*, Elliott P*, Tai ES*, Harst P*, Kooner J*, Chambers J*. (2014) Trans-ethnic genome-wide association study identifies 15 new genetic loci influencing blood pressure traits, and implicates a role for DNA methylation: the International Genetics of Blood Pressure (iGEN-BP) Study. Nature Genetics (Submitted) * Authors have equal contributions to the paper 5 Table 24 Comparisons between trans-ethnic fine-mapping of common and rare causal variants Conditions for trans- Common causal variants Rare causal variants 1. Presence of a causal Likely to be an older mutation, thus present and Likely to be more recent, thus tend to be variant across populations functional across populations from different ancestry- or population-specific, where the same from different genetic genetic ancestries SNP may be causal in one population but ethnic fine-mapping ancestries monomorphic or not functional in other populations 2. Method of discovering Each SNP is typically the unit of analysis and While SNP-based analyses are performed as with and quantifying genetic association testing measures the evidence of each common variants, the typical unit of association SNP to be linked to the phenotype of interest measurement aggregates the allele counts across multiple SNPs in a region to measure genetic burden, thus presenting a region-based evidence 3. Linkage disequilibrium Likely to be in LD with neighboring SNPs, and Likely to be in weak or impractical strength of (LD) between a causal these SNPs present evidence of similar LD with neighboring SNPs due to low frequency variant and neighboring magnitude as the causal variant of the functional allele SNPs 113 (i) Presence of a causal variant across populations of different ancestries The fundamental concept of trans-ethnic analyses assumes that the same genetic unit, whether it is a SNP, a gene exon, or the entire gene itself, is biologically responsible for altering the expression of the phenotype across the different populations that are being jointly analyzed. For common causal variants, this assumption is likely to be valid given that these mutations tend to be older and would have occurred prior to the divergence of these different populations [129]. In contrast, rare SNPs are more likely to be recent mutations and thus ancestry- or even population-specific [129]. This presents a significant challenge in attempts to pool the evidence of phenotypic association at a rare SNP, since the SNP may be polymorphic and functional in one population, but may be monomorphic in the remaining populations, and the joint analysis attenuates rather than strengthens the statistical evidence [125]. The 1000 Genomes Project provided vital insights to the distribution of polymorphic SNPs across global populations. Through whole-genome sequencing of more than 2,500 individuals from at least 20 population groups around the world, the 1KGP presents an unbiased survey of genetic variation across diverse populations. One of the crucial findings that is relevant to determine the success of trans-ethnic association analyses is on the specificity of polymorphisms according to MAF. The 1KGP reported that common variants with MAF exceeding 10% are shared across almost all the populations in Phase I of the project, whereas only 17% of the rare variants are present in populations within the same ancestry group; and 53% of the rare variants with MAF < 0.5% are population-specific [97]. This finding suggests that, while trans-ethnic analyses of rare variants may be realistic for populations from the same ancestry, it is unlikely to be feasible to extend this to multiple populations from diverse ancestries. 114 (ii) Method of discovering and quantifying genetic associations A GWAS typically analyses each SNP independently for evidence of phenotypic association. The strength and direction of the association is similarly quantified at the SNP level, measuring the impact of each additional copy of the minor allele in altering phenotype. This relies on standard statistical procedures such as analysis of variance (ANOVAs) or regression analyses, or univariate approaches such as chi-square tests or t-tests of averages. These approaches have proven to be reasonably successful in locating bona fide associations with common variants. However, the statistical ability of these methods to successfully detect evidence of phenotypic association depends on observing sufficient number of samples that are carrying particular copies of the two alleles. These approaches are thus poorly powered to measure the evidence at rare variants, where the number of samples carrying the risk allele may be very small. For example, Asimit and Zeggini illustrated, through a series of simulations, that as the causal allele frequency decreases from 5% to 1% to 0.1%, the sample size required to attain a power of 80% to detect an allelic odds ratio of at the accepted genome-wide significance level of P = × 10-8 increases from 2,500 to 12,000 to 117,000 [130]. As a result, analyses of rare variants for phenotype association typically aggregate the cumulative impact of multiple SNPs located in a contiguous genomic region, for example by pooling the number of copies of rare alleles within a phenotype stratum. The underlying assumption for genetic burden test is that the set of rare variants within a region collectively influence the disease susceptibility, and the statistical evidence is measured according to whether the rare alleles tend to be more specific to subjects in a phenotype classification. However, methods such as the Cohort Allelic Sum Test (CAST) [131], the Weighted Sum Test (WST) [132], and the collapsing regression method [133], tend to ignore the direction of the effects of the rare alleles, and these tend to lower the power of the aggregated allele counts to correlate with phenotype expression, since rare alleles from different 115 causal variants may be deleterious or beneficial. The Sequence Kernel Association Test (SKAT) [134] properly accommodates for the direction of the effects of rare alleles, and has been shown to possess higher statistical power than most of the collapsing approaches. For a genomic region that genuinely harbors causal variants across multiple populations, pooling the evidence from individual SNPs is unlikely to improve the strength of the statistical association, since the architecture of rare variants suggests that different rare causal variants in the same region are likely to be present across the different populations. However, given that the unit of analysis for rare variants typically interrogates the entire genomic region; trans-ethnic analyses can boost the ability to locate these associated regions by aggregating the statistical evidence of phenotypic association (Figure 25). Identifying the rare causal variants in the emerging genomic region will require interrogating which SNPs contribute to the primary association signal within each population and by assessing the annotations – a process of fine-mapping that similarly is unlikely to benefit from trans-ethnic strategies. (iii) Linkage disequilibrium between a causal variant and neighboring SNPs Causal variants with minor allele frequencies that are in excess of 5% are often in useful levels of LD with neighboring SNPs, and they tend to present similar evidence of phenotypic association as the causal variants. GWAS has relied on such long stretches of high LD in identifying the markers that correlate with phenotype expression. Trans-ethnic fine-mapping of these common causal variants is thus necessary to distinguish the surrogate SNPs from the causal variants. The situation is notably different for rare causal variants, as these tend to be in weak levels of LD with surrounding markers due to their low minor allele counts. From this perspective, there is no need to depend on trans-ethnic fine-mapping to localize rare causal variants, and often the causal variants can be identified by 116 interrogating the evidence within a population, as suggested by Zhu and colleagues who developed the “preferential LD” approach [135]. They suggested that weak levels of LD are present between a rare causal variant and a small set of markers that may be used to locate the genomic region, but such LD is still considerably stronger than those present between the causal variants and other surrounding SNPs. Based on this assumption, the “preferential LD” approach searches for rare variants with unexpectedly higher LD with the discovery variant, which are subsequently more likely candidates as the causal variants. When applied to a range of diseases, this approach successfully confirmed two wellknown rare causal variants for Crohn’s disease in the NOD2 gene [136], two nonsynonymous ITPA variants (rs1127354 and rs7270101) that cause ribavirininduced hemolytic anemia [137], and rare variants in UGT1A6 gene for bladder cancer [28]. Conclusion Trans-ethnic fine-mapping has seen remarkable success in disentangling the conundrum of long stretches of high LD to either locate common causal variants, or at least narrow the genomic region where these functional variants at MAF > 5% can be found. However, the genetic architecture of rare variants is considerably different from that of common variants without the complication introduced by LD. For common causal variants, it appears existing methods are more than adequate to locate and validate an association signal, and the challenge lies in identifying the genuine causal variants from perfect surrogates. For rare variants, the greater challenge appears to lay in locating and validating an associated genomic region, rather than in fine-mapping the causal variants. Indeed, once a genomic region has been systematically confirmed to be associated with a phenotype, fine-mapping the causal variants is unlikely to require more than the careful interrogation of which rare SNPs contributed to the association signal and their functional annotations within one study cohort. 117 CHAPTER - CONCLUSIONS AND DISCUSSIONS The last 10 years have been the era of genome-wide association studies. Vast efforts have been expanded to look for genetic variants that are associated with complex diseases and human traits, although the amount of phenotypic variation explained remains moderate at best. This thesis has focused on the studies of two primary statistical approaches: i) the global meta-analysis that extends the identification of novel genetic variants, through both SNP-based and region-based statistical approaches; ii) the trans-ethnic fine-mapping that localizes the real biologically functional variants for the phenotype of interest, targeted at both common and rare variants. Both meta-analysis and trans-ethnic fine-mapping require the pooling of GWAS studies from multiple populations, although the preferences over the level of LD diversity are contrary to each other. Meta-analysis requires similar LD structures in multiple populations to increase the sample size without introducing additional study heterogeneity; on the other side, trans-ethnic fine-mapping fundamentally relies on the LD diversity to differentiate the causal variants from the surrogate tagging SNPs. As such, a useful approach is to investigate whether there are any population diversity metrics that will be useful for identifying the populations or genomic regions where trans-ethnic approaches for meta-analysis and finemapping are likely to be more efficient. Four metrics have been explored by the author and colleagues in a separate study and the results suggested that quantifying the average FST of the SNPs in the region or measuring the population specificity of haplotypes in the region is indicative of meta-analysis. For finemapping of causal variants, assessing the degree of haplotype sharing and the extent of LD variation between populations are more informative [22]. Although this piece of work is not included in this thesis, it is indispensable in the study of trans-ethnic meta-analysis and fine-mapping. The study of both meta-analysis and trans-ethnic fine-mapping rely on imputation technology to complement the genotyping microarrays with denser set of 118 haplotype reference panels. The cosmopolitan panel is used in meta-analysis to harmonize the SNP contents in diverse populations. Population-specific reference panels are advocated for trans-ethnic fine-mapping to better reflect the genetic structure in each population. Although imputation is the most effective approach to estimate the information for untyped SNPs, it is unfortunately constrained by the imputation accuracy. Especially in the process of fine-mapping, small deviations in the genotype estimation can lead to a misidentification of the real causal variant. As the cost of whole-genome sequencing is dropping rapidly, imputation technology will gradually be replaced by direct sequencing of the whole genome with high accuracy. GWAS are expected to discover more phenotypic associated variants, not only common SNPs, but also rare variants and de novo mutations. The focus of genetic studies should thus shift to interpretation and utilization of the GWAS discoveries. The two studies of trans-ethnic fine-mapping in this thesis have shown that the attempt to use purely statistical methods to identify the exact causal variant is unlikely to achieve success. Some scientists thus suggest focusing on defining generically applicable functional assays or workflows for chasing down causal variants within implicated haplotypes. Successful examples include the CRISPR/Cas9 system, where candidate causal variants for a given association are systematically introduced to a uniform genetic background in a relevant cell type for measuring the impact on the transcriptional output of nearby genes. More efforts need to be put in to the development of functional assays since the current work is limited by the fact that no standard statistical evidence can be defined and consistently applied. An alternative approach is to combine the GWAS findings with other information measured from the transcriptome, the proteome as well as the metabolome. For example, recent studies have reported that associations between loci such as FADS1, ELOVL2 or SLC16A9 and lipid concentrations have been explained by GWAS with metabolomics. There are limits on what we can learn from genetics alone. As we are entering an era of ‘personal genomics’ with additional ‘-omics’ 119 data available, we can merge them with the genetic data to achieve our goal to understand the genotype-phenotype relationship for the purpose of improving healthcare. REFERENCES 1. Hartl DL (1980) Principles of population genetics. Sunderland, Mass.: Sinauer Associates. xvi, 488 p. p. 2. Rosenberg NA, Huang L, Jewett EM, Szpiech ZA, Jankovic I, et al. (2010) Genome‐wide association studies in diverse populations. Nat Rev Genet 11: 356‐366. 3. International HapMap C (2003) The International HapMap Project. Nature 426: 789‐ 796. 4. Hirschhorn JN, Daly MJ (2005) Genome‐wide association studies for common diseases and complex traits. Nat Rev Genet 6: 95‐108. 5. Barrett JC, Cardon LR (2006) Evaluating coverage of genome‐wide association studies. Nat Genet 38: 659‐662. 6. Durbin RM, Abecasis GR, Altshuler DL, Auton A, Brooks LD, et al. (2010) A map of human genome variation from population‐scale sequencing. Nature 467: 1061‐ 1073. 7. Dhandapany PS, Sadayappan S, Xue Y, Powell GT, Rani DS, et al. (2009) A common MYBPC3 (cardiac myosin binding protein C) variant associated with cardiomyopathies in South Asia. Nat Genet 41: 187‐191. 8. Tang MX, Stern Y, Marder K, Bell K, Gurland B, et al. (1998) The APOE‐epsilon4 allele and the risk of Alzheimer disease among African Americans, whites, and Hispanics. JAMA 279: 751‐755. 9. Florez JC, Jablonski KA, Bayley N, Pollin TI, de Bakker PI, et al. (2006) TCF7L2 polymorphisms and progression to diabetes in the Diabetes Prevention Program. N Engl J Med 355: 241‐250. 10. Han B, Eskin E (2011) Random‐effects model aimed at discovering associations in meta‐analysis of genome‐wide association studies. Am J Hum Genet 88: 586‐598. 11. Morris AP (2011) Transethnic meta‐analysis of genomewide association studies. Genet Epidemiol 35: 809‐822. 12. Beyene J, Tritchler D, Asimit JL, Hamid JS (2009) Gene‐ or region‐based analysis of genome‐wide association studies. Genet Epidemiol 33 Suppl 1: S105‐110. 13. Sanna S, Li B, Mulas A, Sidore C, Kang HM, et al. (2011) Fine mapping of five loci associated with low‐density lipoprotein cholesterol detects variants that double the explained heritability. PLoS Genet 7: e1002198. 14. Li M, Atmaca‐Sonmez P, Othman M, Branham KE, Khanna R, et al. (2006) CFH haplotypes without the Y402H coding variant show strong association with susceptibility to age‐related macular degeneration. Nat Genet 38: 1049‐1054. 15. Hughes AE, Orr N, Patterson C, Esfandiary H, Hogg R, et al. (2007) Neovascular age‐ related macular degeneration risk based on CFH, LOC387715/HTRA1, and smoking. PLoS Med 4: e355. 16. Howie B, Marchini J, Stephens M (2011) Genotype imputation with thousands of genomes. G3 (Bethesda) 1: 457‐470. 120 17. Jallow M, Teo YY, Small KS, Rockett KA, Deloukas P, et al. (2009) Genome‐wide and fine‐resolution association analysis of malaria in West Africa. Nat Genet. 18. Deelen P, Menelaou A, van Leeuwen EM, Kanterakis A, van Dijk F, et al. (2014) Improved imputation quality of low‐frequency and rare variants in European samples using the 'Genome of The Netherlands'. Eur J Hum Genet. 19. Gao X, Haritunians T, Marjoram P, McKean‐Cowdin R, Torres M, et al. (2012) Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels. Front Genet 3: 117. 20. Wong LP, Ong RT, Poh WT, Liu X, Chen P, et al. (2013) Deep whole‐genome sequencing of 100 southeast Asian Malays. Am J Hum Genet 92: 52‐66. 21. Wong LP, Lai JK, Saw WY, Ong RT, Cheng AY, et al. (2014) Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole‐genome sequencing. PLoS Genet 10: e1004377. 22. Twee‐Hee Ong R, Wang X, Liu X, Teo YY (2012) Efficiency of trans‐ethnic genome‐ wide meta‐analysis and fine‐mapping. Eur J Hum Genet. 23. Teo YY, Ong RT, Sim X, Tai ES, Chia KS (2010) Identifying candidate causal variants via trans‐population fine‐mapping. Genet Epidemiol 34: 653‐664. 24. Hughes T, Kim‐Howard X, Kelly JA, Kaufman KM, Langefeld CD, et al. (2011) Fine‐ mapping and transethnic genotyping establish IL2/IL21 genetic association with lupus and localize this genetic effect to IL21. Arthritis Rheum 63: 1689‐1697. 25. Cirulli ET, Goldstein DB (2010) Uncovering the roles of rare variants in common disease through whole‐genome sequencing. Nat Rev Genet 11: 415‐425. 26. Gibson G (2011) Rare and common variants: twenty arguments. Nat Rev Genet 13: 135‐145. 27. Sebat J, Lakshmi B, Malhotra D, Troge J, Lese‐Martin C, et al. (2007) Strong association of de novo copy number mutations with autism. Science 316: 445‐ 449. 28. Tang W, Fu YP, Figueroa JD, Malats N, Garcia‐Closas M, et al. (2012) Mapping of the UGT1A locus identifies an uncommon coding variant that affects mRNA expression and protects from bladder cancer. Hum Mol Genet 21: 1918‐1930. 29. Nejentsev S, Walker N, Riches D, Egholm M, Todd JA (2009) Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science 324: 387‐389. 30. Pinto D, Pagnamenta AT, Klei L, Anney R, Merico D, et al. (2010) Functional impact of global rare copy number variation in autism spectrum disorders. Nature 466: 368‐372. 31. Wang X, Chua HX, Chen P, Ong RT, Sim X, et al. (2013) Comparing methods for performing trans‐ethnic meta‐analysis of genome‐wide association studies. Hum Mol Genet 22: 2303‐2311. 32. Donnelly P (2008) Progress and challenges in genome‐wide association studies in humans. Nature 456: 728‐731. 33. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, et al. (2008) Genome‐ wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9: 356‐369. 34. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, et al. (2009) Finding the missing heritability of complex diseases. Nature 461: 747‐753. 121 35. Ehret GB, Munroe PB, Rice KM, Bochud M, Johnson AD, et al. (2011) Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478: 103‐109. 36. Voight BF, Scott LJ, Steinthorsdottir V, Morris AP, Dina C, et al. (2010) Twelve type 2 diabetes susceptibility loci identified through large‐scale association analysis. Nat Genet 42: 579‐589. 37. Lindgren CM, Heid IM, Randall JC, Lamina C, Steinthorsdottir V, et al. (2009) Genome‐wide association scan meta‐analysis identifies three Loci influencing adiposity and fat distribution. PLoS Genet 5: e1000508. 38. Prokopenko I, Langenberg C, Florez JC, Saxena R, Soranzo N, et al. (2009) Variants in MTNR1B influence fasting glucose levels. Nat Genet 41: 77‐81. 39. Thye T, Vannberg FO, Wong SH, Owusu‐Dabo E, Osei I, et al. (2010) Genome‐wide association analyses identifies a susceptibility locus for tuberculosis on chromosome 18q11.2. Nat Genet 42: 739‐741. 40. Adeyemo A, Gerry N, Chen G, Herbert A, Doumatey A, et al. (2009) A genome‐wide association study of hypertension and blood pressure in African Americans. PLoS Genet 5: e1000564. 41. Torgerson DG, Ampleford EJ, Chiu GY, Gauderman WJ, Gignoux CR, et al. (2011) Meta‐analysis of genome‐wide association studies of asthma in ethnically diverse North American populations. Nat Genet 43: 887‐892. 42. Kato N, Takeuchi F, Tabara Y, Kelly TN, Go MJ, et al. (2011) Meta‐analysis of genome‐ wide association studies identifies common variants associated with blood pressure variation in east Asians. Nat Genet 43: 531‐538. 43. Cho YS, Chen CH, Hu C, Long J, Hee Ong RT, et al. (2011) Meta‐analysis of genome‐ wide association studies identifies eight new loci for type 2 diabetes in east Asians. Nat Genet 44: 67‐72. 44. Takeuchi F, Katsuya T, Chakrewarthy S, Yamamoto K, Fujioka A, et al. (2010) Common variants at the GCK, GCKR, G6PC2‐ABCB11 and MTNR1B loci are associated with fasting glucose in two Asian populations. Diabetologia 53: 299‐ 308. 45. Kooner JS, Saleheen D, Sim X, Sehmi J, Zhang W, et al. (2011) Genome‐wide association study in individuals of South Asian ancestry identifies six new type 2 diabetes susceptibility loci. Nat Genet 43: 984‐989. 46. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, et al. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851‐861. 47. Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol. 48. Florez JC, Jablonski KA, Bayley N, Pollin TI, de Bakker PI, et al. (2006) TCF7L2 polymorphisms and progression to diabetes in the Diabetes Prevention Program. N Engl J Med 355: 241‐250. 49. DerSimonian R, Laird N (1986) Meta‐analysis in clinical trials. Control Clin Trials 7: 177‐188. 50. Hardy RJ, Thompson SG (1996) A likelihood approach to meta‐analysis with random effects. Stat Med 15: 619‐629. 51. Kass RE, Raftery AE (1995) Bayes Factors. Journal of the American Statistical Association 90: 773‐795. 122 52. Weir BS, Hill WG (2002) Estimating F‐statistics. Annu Rev Genet 36: 721‐750. 53. Spencer CC, Su Z, Donnelly P, Marchini J (2009) Designing genome‐wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet 5: e1000477. 54. Sim X, Ong RT, Suo C, Tay WT, Liu J, et al. (2011) Transferability of type 2 diabetes implicated loci in multi‐ethnic cohorts from Southeast Asia. PLoS Genet 7: e1001363. 55. Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, et al. (2007) A genome‐wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316: 1341‐1345. 56. Wang X, Liu X, Sim X, Xu H, Khor CC, et al. (2012) A statistical method for region‐ based meta‐analysis of genome‐wide association studies in genetically diverse populations. Eur J Hum Genet 20: 469‐475. 57. Kathiresan S, Willer CJ, Peloso GM, Demissie S, Musunuru K, et al. (2009) Common variants at 30 loci contribute to polygenic dyslipidemia. Nat Genet 41: 56‐65. 58. Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, et al. (2010) Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467: 832‐838. 59. Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, et al. (2008) Meta‐analysis of genome‐wide association data and large‐scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet 40: 638‐645. 60. Stephens M, Balding DJ (2009) Bayesian statistical methods for genetic association studies. Nat Rev Genet 10: 681‐690. 61. Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing‐data inference for whole‐genome association studies by use of localized haplotype clustering. Am J Hum Genet 81: 1084‐1097. 62. Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome‐wide association studies by imputation of genotypes. Nat Genet 39: 906‐913. 63. Teo YY, Small KS, Kwiatkowski DP (2010) Methodological challenges of genome‐wide association analysis in Africa. Nat Rev Genet 11: 149‐160. 64. Clark AG, Hubisz MJ, Bustamante CD, Williamson SH, Nielsen R (2005) Ascertainment bias in studies of human genome‐wide polymorphism. Genome Res 15: 1496‐ 1502. 65. (2007) Genome‐wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661‐678. 66. Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M (2010) KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res 38: D355‐360. 67. Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28: 27‐30. 68. Kanehisa M, Goto S, Hattori M, Aoki‐Kinoshita KF, Itoh M, et al. (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34: D354‐357. 69. Teo YY, Sim X, Ong RT, Tan AK, Chen J, et al. (2009) Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations. Genome Res 19: 2154‐2162. 123 70. Han X, Luo Y, Ren Q, Zhang X, Wang F, et al. (2010) Implication of genetic variants near SLC30A8, HHEX, CDKAL1, CDKN2A/B, IGF2BP2, FTO, TCF2, KCNQ1, and WFS1 in type 2 diabetes in a Chinese population. BMC Med Genet 11: 81. 71. Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PI, et al. (2007) Genome‐wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316: 1331‐1336. 72. Sladek R, Rocheleau G, Rung J, Dina C, Shen L, et al. (2007) A genome‐wide association study identifies novel risk loci for type 2 diabetes. Nature 445: 881‐ 885. 73. Takeuchi F, Serizawa M, Yamamoto K, Fujisawa T, Nakashima E, et al. (2009) Confirmation of multiple risk Loci and genetic impacts by a genome‐wide association study of type 2 diabetes in the Japanese population. Diabetes 58: 1690‐1699. 74. Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, et al. (2007) Replication of genome‐wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316: 1336‐1341. 75. Hanson RL, Bogardus C, Duggan D, Kobes S, Knowlton M, et al. (2007) A search for variants associated with young‐onset type 2 diabetes in American Indians in a 100K genotyping array. Diabetes 56: 3045‐3052. 76. Wang Y, O'Connell JR, McArdle PF, Wade JB, Dorff SE, et al. (2009) From the Cover: Whole‐genome association study identifies STK39 as a hypertension susceptibility gene. Proc Natl Acad Sci U S A 106: 226‐231. 77. Torkamani A, Topol EJ, Schork NJ (2008) Pathway analysis of seven common diseases assessed by genome‐wide association. Genomics 92: 265‐272. 78. Perera HK, Clarke M, Morris NJ, Hong W, Chamberlain LH, et al. (2003) Syntaxin 6 regulates Glut4 trafficking in 3T3‐L1 adipocytes. Mol Biol Cell 14: 2946‐2958. 79. Smith EN, Chen W, Kahonen M, Kettunen J, Lehtimaki T, et al. (2010) Longitudinal genome‐wide association of cardiovascular disease risk factors in the bogalusa heart study. PLoS Genet 6. 80. Campbell MC, Tishkoff SA (2008) African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping. Annu Rev Genomics Hum Genet 9: 403‐433. 81. Cheverud JM (2001) A simple correction for multiple comparisons in interval mapping genome scans. Heredity 87: 52‐58. 82. Nyholt DR (2004) A simple correction for multiple testing for single‐nucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Genet 74: 765‐769. 83. Lin DY (2005) An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics 21: 781‐787. 84. Moskvina V, Schmidt KM (2008) On multiple‐testing correction in genome‐wide association studies. Genet Epidemiol 32: 567‐573. 85. Pan W (2009) Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genet Epidemiol 33: 497‐507. 86. Li J, Ji L (2005) Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95: 221‐227. 124 87. Mukhopadhyay I, Feingold E, Weeks DE, Thalamuthu A (2010) Association tests using kernel‐based measures of multi‐locus genotype similarity between individuals. Genet Epidemiol 34: 213‐221. 88. Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, et al. (2010) Powerful SNP‐set analysis for case‐control genome‐wide association studies. Am J Hum Genet 86: 929‐942. 89. Pe'er I, Chretien YR, de Bakker PI, Barrett JC, Daly MJ, et al. (2006) Biases and reconciliation in estimates of linkage disequilibrium in the human genome. Am J Hum Genet 78: 588‐603. 90. Teo YY, Fry AE, Bhattacharya K, Small KS, Kwiatkowski DP, et al. (2009) Genome‐wide comparisons of variation in linkage disequilibrium. Genome Res 19: 1849‐1860. 91. Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, et al. (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460: 748‐752. 92. Sawada K, Mitra AK, Radjabi AR, Bhaskar V, Kistner EO, et al. (2008) Loss of E‐ cadherin promotes ovarian cancer metastasis via alpha 5‐integrin, which is a therapeutic target. Cancer Res 68: 2329‐2339. 93. Conacci‐Sorrell M, Zhurinsky J, Ben‐Ze'ev A (2002) The cadherin‐catenin adhesion system in signaling and cancer. J Clin Invest 109: 987‐991. 94. Canonici A, Steelant W, Rigot V, Khomitch‐Baud A, Boutaghou‐Cherid H, et al. (2008) Insulin‐like growth factor‐I receptor, E‐cadherin and alpha v integrin form a dynamic complex under the control of alpha‐catenin. Int J Cancer 122: 572‐582. 95. Rogers GJ, Hodgkin MN, Squires PE (2007) E‐cadherin and cell adhesion: a role in architecture and function in the pancreatic islet. Cell Physiol Biochem 20: 987‐ 994. 96. Tsai FJ, Yang CF, Chen CC, Chuang LM, Lu CH, et al. (2010) A genome‐wide association study identifies susceptibility variants for type 2 diabetes in Han Chinese. PLoS Genet 6: e1000847. 97. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, et al. (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491: 56‐65. 98. Mahajan A, Go MJ, Zhang W, Below JE, Gaulton KJ, et al. (2014) Genome‐wide trans‐ ancestry meta‐analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat Genet 46: 234‐244. 99. Marchini J, Howie B (2010) Genotype imputation for genome‐wide association studies. Nat Rev Genet 11: 499‐511. 100. Howie BN, Donnelly P, Marchini J (2009) A flexible and accurate genotype imputation method for the next generation of genome‐wide association studies. PLoS Genet 5: e1000529. 101. Hindorff LA MJ, (European Bioinformatics Institute), Wise A, Junkins, HA HP, et al. A atalog of Published Genome‐Wide Association Studies. . available at wwwgenomegov/gwastudies. 102. Lopes MC, Hysi PG, Verhoeven VJ, Macgregor S, Hewitt AW, et al. (2013) Identification of a candidate gene for astigmatism. Invest Ophthalmol Vis Sci 54: 1260‐1267. 125 103. Fan Q, Zhou X, Khor CC, Cheng CY, Goh LK, et al. (2011) Genome‐wide meta‐ analysis of five Asian cohorts identifies PDGFRA as a susceptibility locus for corneal astigmatism. PLoS Genet 7: e1002402. 104. Khor CC, Ramdas WD, Vithana EN, Cornes BK, Sim X, et al. (2011) Genome‐wide association studies in Asians confirm the involvement of ATOH7 and TGFBR3, and further identify CARD10 as a novel locus influencing optic disc area. Hum Mol Genet 20: 1864‐1872. 105. Ramdas WD, van Koolwijk LM, Ikram MK, Jansonius NM, de Jong PT, et al. (2010) A genome‐wide association study of optic disc parameters. PLoS Genet 6: e1000978. 106. Guggenheim JA, McMahon G, Kemp JP, Akhtar S, St Pourcain B, et al. (2013) A genome‐wide association study for corneal curvature identifies the platelet‐ derived growth factor receptor alpha gene as a quantitative trait locus for eye size in white Europeans. Mol Vis 19: 243‐253. 107. Mishra A, Yazar S, Hewitt AW, Mountain JA, Ang W, et al. (2012) Genetic variants near PDGFRA are associated with corneal curvature in Australians. Invest Ophthalmol Vis Sci 53: 7131‐7136. 108. Han S, Chen P, Fan Q, Khor CC, Sim X, et al. (2011) Association of variants in FRAP1 and PDGFRA with corneal curvature in Asian populations from Singapore. Hum Mol Genet 20: 3693‐3698. 109. Kathiresan S, Melander O, Guiducci C, Surti A, Burtt NP, et al. (2008) Six new loci associated with blood low‐density lipoprotein cholesterol, high‐density lipoprotein cholesterol or triglycerides in humans. Nat Genet 40: 189‐197. 110. Willer CJ, Sanna S, Jackson AU, Scuteri A, Bonnycastle LL, et al. (2008) Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat Genet 40: 161‐169. 111. Aulchenko YS, Ripatti S, Lindqvist I, Boomsma D, Heid IM, et al. (2009) Loci influencing lipid levels and coronary heart disease risk in 16 European population cohorts. Nat Genet 41: 47‐55. 112. Teslovich TM, Musunuru K, Smith AV, Edmondson AC, Stylianou IM, et al. (2010) Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466: 707‐713. 113. Cho YS, Chen CH, Hu C, Long J, Ong RT, et al. (2012) Meta‐analysis of genome‐wide association studies identifies eight new loci for type 2 diabetes in east Asians. Nat Genet 44: 67‐72. 114. de Bakker PI, Ferreira MA, Jia X, Neale BM, Raychaudhuri S, et al. (2008) Practical aspects of imputation‐driven meta‐analysis of genome‐wide association studies. Hum Mol Genet 17: R122‐128. 115. Vithana EN, Aung T, Khor CC, Cornes BK, Tay WT, et al. (2011) Collagen‐related genes influence the glaucoma risk factor, central corneal thickness. Hum Mol Genet 20: 649‐658. 116. Rasmussen‐Torvik LJ, Pacheco JA, Wilke RA, Thompson WK, Ritchie MD, et al. (2012) High density GWAS for LDL cholesterol in African Americans using electronic medical records reveals a strong protective variant in APOE. Clin Transl Sci 5: 394‐399. 126 117. Conrad DF, Jakobsson M, Coop G, Wen X, Wall JD, et al. (2006) A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nat Genet 38: 1251‐1260. 118. Hanchard N, Elzein A, Trafford C, Rockett K, Pinder M, et al. (2007) Classical sickle beta‐globin haplotypes exhibit a high degree of long‐range haplotype similarity in African and Afro‐Caribbean populations. BMC Genet 8: 52. 119. Hormozdiari F, Kostem E, Kang EY, Pasaniuc B, Eskin E (2014) Identifying Causal Variants at Loci with Multiple Signals of Association. Genetics. 120. Huang L, Li Y, Singleton AB, Hardy JA, Abecasis G, et al. (2009) Genotype‐imputation accuracy across worldwide human populations. Am J Hum Genet 84: 235‐250. 121. Visscher PM (2008) Sizing up human height variation. Nat Genet 40: 489‐490. 122. McCarthy MI, Hirschhorn JN (2008) Genome‐wide association studies: potential next steps on a genetic journey. Hum Mol Genet 17: R156‐165. 123. Bosse M, Megens HJ, Madsen O, Paudel Y, Frantz LA, et al. (2012) Regions of homozygosity in the porcine genome: consequence of demography and the recombination landscape. PLoS Genet 8: e1003100. 124. Wang Y, Chen YH, Yang Q (2012) Joint rare variant association test of the average and individual effects for sequencing studies. PLoS One 7: e32485. 125. Teo YY, Small KS, Fry AE, Wu Y, Kwiatkowski DP, et al. (2009) Power consequences of linkage disequilibrium variation between populations. Genet Epidemiol 33: 128‐135. 126. Wu Y, Waite LL, Jackson AU, Sheu WH, Buyske S, et al. (2013) Trans‐ethnic fine‐ mapping of lipid loci identifies population‐specific signals and allelic heterogeneity that increases the trait variance explained. PLoS Genet 9: e1003379. 127. Hughes T, Sawalha AH (2011) The role of epigenetic variation in the pathogenesis of systemic lupus erythematosus. Arthritis Res Ther 13: 245. 128. Franceschini N, van Rooij FJ, Prins BP, Feitosa MF, Karakas M, et al. (2012) Discovery and fine mapping of serum protein loci through transethnic meta‐ analysis. Am J Hum Genet 91: 744‐753. 129. Raychaudhuri S (2011) Mapping rare and common causal alleles for complex human diseases. Cell 147: 57‐69. 130. Asimit J, Zeggini E (2010) Rare variant association analysis methods for complex traits. Annu Rev Genet 44: 293‐308. 131. Morgenthaler S, Thilly WG (2007) A strategy to discover genes that carry multi‐ allelic or mono‐allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res 615: 28‐56. 132. Madsen BE, Browning SR (2009) A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 5: e1000384. 133. Morris AP, Zeggini E (2010) An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol 34: 188‐193. 134. Wu MC, Lee S, Cai T, Li Y, Boehnke M, et al. (2011) Rare‐variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89: 82‐93. 135. Zhu Q, Ge D, Heinzen EL, Dickson SP, Urban TJ, et al. (2012) Prioritizing genetic variants for causality on the basis of preferential linkage disequilibrium. Am J Hum Genet 91: 422‐434. 127 136. Wang K, Dickson SP, Stolle CA, Krantz ID, Goldstein DB, et al. (2010) Interpretation of association signals and identification of causal variants from genome‐wide association studies. Am J Hum Genet 86: 730‐742. 137. Fellay J, Thompson AJ, Ge D, Gumbs CE, Urban TJ, et al. (2010) ITPA gene variants protect against anaemia in patients treated for chronic hepatitis C. Nature 464: 405‐408. 128 [...]... compared the performance of the two classical statistical methods for performing meta-analyses (FE, RE) to the two recently introduced strategies for trans-ethnic meta-analyses (RE-HE and MANTRA) using a series of simulations performed with HAPGEN using seed haplotypes from ten HapMap Phase 3 populations (excluding the admixed population ASW) We simulated 3,000 cases and 3,000 controls for each of the... Identification of genetic variants by risk allele frequency and strength of genetic effect Genetic effect is compromised by the risk allele frequency The detection of genetic variants concentrates within the range identified by diagonal dotted lines Adapted from reference [30] 14 CHAPTER2 – AIMS 2.1 Study 1 - Comparing Methods for Performing Trans-Ethnic MetaAnalysis of Genome-wide Association Studies Whilst... GWAS have primarily focused on genetically homogeneous populations, the next- generation genome-wide surveys are starting to pool studies from ethnically diverse populations within a single meta-analysis However, the process is hampered by the presence of effect size heterogeneity In this study, we aim to compare four different strategies for meta-analyzing GWAS across genetically diverse populations,... than genetic exposures, environmental and lifestyle factors can also modify the impact of the genetic contributions to the phenotypes of interest [9] 1.2.2 Statistical approaches for meta-analysis It is commonly agreeable that the aim of the global meta-analysis is to include as many studies as possible to increase the power to detect novel genetic variants, agnostic of the population ancestry or genetic. .. yielding a total of 30 studies in total and a possible sample size of 90,000 cases and 90,000 controls for the joint analysis of the 30 studies In calculating the empirical false positive rates, we simulated 300,000 SNPs in each of the 30 studies under the null hypothesis of no association (see Materials and Methods for details) We varied the definition of statistical significance for P-value from 5 ... simulations For each simulation, we generated 30 studies, where each of the ten HapMap3 populations is used to simulated three studies, with 3,000 cases and 3,000 controls in each study In the simulations to calculate the false positive rates, the allelic relative risk for every causal SNP in each population was set at 1.0 and the meta-analyses were performed at the causal SNPs The false positive rates for. .. that are designed to maximize the genetic coverage in Europeans [5] As such, the level of SNP sharing between the two platforms remained modest at best In the past few years, technologies for measuring genomic variation have changed rapidly both in terms of SNP density on microarrays as well as the genotyping accuracy The most striking leap forward is known as the next- generation sequencing (NGS) technology... accessing whether the conditions are relevant for rare variant analyses 16 CHAPTER3 – COMPARING METHODS FOR PERFORMING TRANSETHNIC META-ANALYSIS OF GENOME-WIDE ASSOCIATION STUDIES The content of this chapter has been published in Wang et al 2013 [31] Introduction Genome-wide association studies have seen unprecedented successes at discovering novel genetic variants that influence the severity... order to perform a fair power comparison of the different methods for metaanalysis, we have defined statistical significance as a P-value < 7.9×10-7 or a Bayes’ factor > 105 (see Table 2 for the same comparison at a P-value < 5×10-8 and the equivalent Bayes’ factor > 106.1) We considered five scenarios involving the 30 studies in our simulation where the focal SNP was functional in: (i) all 30 studies. .. sizes and relatedness between populations, MANTRA has been reported to confer significantly higher power than both FE and RE Here we perform a comparison of the four strategies for meta-analyzing GWAS across genetically diverse populations to gauge the relative performance in terms of sensitivity and specificity We achieve this through a series of simulations where we intentionally: (i) vary the effect . I STATISTICAL STRATEGIES FOR NEXT GENERATION LARGE- SCALE GENETIC STUDIES WANG XU (BSc Hons, National University of Singapore) A THESIS SUBMITTED FOR THE DEGREE. Methods for Performing Trans-Ethnic Meta-Analysis of Genome-wide Association Studies 15 2.2 Study 2 - A Statistical Method for Region-Based Meta-analysis of Genome- wide Association Studies in Genetically. Comparing Methods for Performing Trans-Ethnic Meta- Analysis of Genome-wide Association Studies Whilst early GWAS have primarily focused on genetically homogeneous populations, the next- generation