Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 27 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
27
Dung lượng
152,2 KB
Nội dung
CHAPTER 4: DISCUSSION 176 Discussion 4.1 LD and Variation of the MHC in Singaporean Chinese Many association studies have highlighted strong links between the MHC and a number of diseases in the Singaporean Chinese population, and a key to identifying the genetic aetiology of these diseases lies in the understanding of the LD structure of the MHC LD describes the non-independence of alleles at different loci, and knowing how LD varies across the MHC will guide the efficient design of markers for population-based disease association studies However, a haplotype map of the MHC in the local Chinese population is currently not available To this end, SNP maps of increasing marker density were constructed and these sets of data provide a comprehensive guide to the patterns of variation and LD of the MHC in the Singaporean Chinese population The LD of the MHC was first analysed in relation to the entire chromosome 6p, before progressively drilling down at high-resolution into the MHC proper SNP data was later integrated with HLA genotypes, providing an opportunity to analysis LD with respect to the HLA haplotype background The data presented here provides a foundation for future disease association studies in the local Chinese population Strong correlation in SNP profile of two Asian populations – that of Singaporean Chinese and the Han Chinese in Beijing The HapMap project has become an important resource for genetic variation in the human genome, with the SNP frequencies and haplotype structure of the HapMap populations strongly influencing the design of commercially available SNP panels used in large association studies (e.g The Wellcome Trust Case Control Consortium 2007) The utility of such a map across other populations around the world is still being investigated, and the SNP maps in this study afforded the opportunity to 177 Discussion examine the applicability of the HapMap data to the local Chinese population 45 Beijing Han Chinese (CHB) individuals were genotyped as part of the HapMap project and these genotype results are readily available from the project’s online resources From these, we found that the SNP allele frequencies in the Singaporean Chinese population are tightly correlated to those in the CHB population (Figure 3.12), stressing both the reliability of the SNP data sets as well as highlighting the ancestral proximity of the populations, even though the former consists mainly of southern Chinese while the latter is more skewed towards the northern Chinese The tight correlation in allele frequencies also indicates that in the absence of available variation data in other regions of the genome for the Singaporean Chinese population, the CHB panel of the HapMap project will serve as an adequate proxy The correlation of the Singaporean Chinese SNP frequencies with the other HapMap populations also reflect well the genealogy between the populations, with the Japanese population showing high correlation, followed by the Caucasians and the Africans The transferability of HapMap derived tag SNPs in the local population was also investigated True to the allele frequency correlations, the tag SNPs generated using the Beijing Han Chinese population outperforms any of the other sets, including a combined set of Japanese and Chinese samples (Table 3.8) This again has an important implication in SNP selection for disease association studies, as typically the Japanese and Han Chinese population panels are considered as a single unit in HapMap datasets, but when utilising these data for studies pertaining to the Singaporean Chinese population, better efficiency and power will be achieved if the Han Chinese data set was considered singularly 178 Discussion High-resolution map efficiently captures SNP variation in the Chinese MHC The 1877 informative SNPs used in this study to construct the high-resolution variation map of the MHC is a subset of the total SNP variation previously identified in the loci Using the genotype data from the HapMap Han Chinese population as a proxy, almost all (98.5%) of the informative SNPs across the region are found to be well represented by at least a single partner in this SNP map and more than half are perfectly correlated with at least one of the 1877 SNPs (Figure 3.13) Hence we find that this map efficiently captures the other un-typed variation in the Chinese population, and further supports its effectiveness in describing LD patterns of the local population Linkage disequilibrium in the MHC-telomeric region is stronger than that of the MHC Various linkage analysis and estimates from sperm typing have previously reported that the recombination rate across the MHC is lower than that of the chromosomal average (Cullen et al 2002, Kong et al 2002, Tsunoda et al 2004) Similarly within the local Chinese population, sliding window analysis of LD across the chromosome 6p show a striking peak of LD around 28.9Mb, with average D´ and r2 plots elevated above the chromosome average in a 8Mb segment that includes the MHC (Figure 3.3) There however is evidence that the LD in the MHC-telomeric region is stronger than the MHC itself, most evident in the LD heatmap of the chromosome 6p (Figure 3.2), with strong pairwise D´ seen even in SNPs separated by up to 5Mb Furthermore, of the haplotype blocks that are identified from the low-density SNP map, two of them lie within this telomeric-MHC segment (Figure 3.5) This pattern is also repeated in the 2.6kb resolution SNP map, where pairwise LD values between 179 Discussion SNPs in the extended class I region are higher than the rest of the MHC for a similar marker distance (Figure 3.14) Several explanations for this lengthy track of high LD exist Firstly recombination may be suppressed as a consequence of the tight clustering of histone and tRNA genes in this region Transcription levels of both of these gene families are required in large quantities and tight clustering of these gene groups may be a result of selection pressures for maximising transcription activity (Horton et al 2004) The strong LD pattern across this segment could be due to a hitchhiking effect between the histone/tRNA clusters and the surrounding regions A second fascinating explanation for this high LD is the association between olfactory receptors and MHC alleles The genomic arrangement of olfactory receptors and MHC genes is syntenic across human, mice and rats Experiments performed with semi-natural mice populations have shown that there is a distinct preference for MHC-dissimilar partners in mate-selection (Potts et al 1991) A study performed on a reproductively and culturally isolated human population also showed a lower number of MHC-dissimilar spouses than expected if there were no preferential mate-selection (Ober et al 1997) Additionally, it has been shown that odorants from MHC gene products influence individual specific odours (Yamakazi et al 1999) The MHCinfluenced negative assortment mating may either be driven by avoidance for inbreeding and/or MHC-heterozygous advantage against infections or parasites (Beauchamp and Yamakazi 1997, Ehlers et al 2000) A selection-driven linkage between the olfactory receptor cluster and MHC loci would consequently give rise to the high LD block seen in the data Indeed, taken in this light, the overall suppressed 180 Discussion recombination rate across the MHC loci could be a result of hitchhiking with the teloMHC region (Horton et al 2004) High-resolution LD structure of the MHC in the Singaporean Chinese Population The high-resolution SNP map provided an opportunity to describe in detail the LD pattern within the MHC in different ways On a population level, LD pattern has a block-like structure (most evident in Figure 3.16) with long isolated stretches of consecutive SNPs sharing high pairwise LD flanked by short intervals within which no allelic association is seen between neighbouring SNPs This block-like structure of LD was first suggested in 2001 in a seminal paper (Daly et al 2001), and subsequently seen in genome-wide scans (Gabriel et al 2002), cumulating in the construction of a haplotype map of the human genome (International HapMap Consortium 2005, 2007) The data in this study fits this model well The haplotype blocks were formally defined using a well-established criteria that linked segments of consecutive markers in significantly high D´ as haplotype blocks (Gabriel et al 2002), and in the high-resolution map 203 haplotype blocks were identified in the 4.9Mb region The characteristics of these haplotype blocks are remarkably similar to those identified in a Caucasian population in terms of average lengths, diversity and coverage (Miretti et al 2005), and probably reflect the similar historical recombination events in both populations, supporting the single “out of Africa” hypothesis (Gabriel et al 2002, International HapMap Consortium 2005) As these studies were conducted in different populations and marker sets, the haplotype 181 Discussion block similarity also lends support to the robustness of the data generated in this study, as well as the haplotype block definitions employed The MHC can be divided into sub-regions that reflect the clustering of the different classes of HLA genes within (Horton et al 2004), and to see if the strength of LD varies depending on the sub-region, LD was analysed separately in each as well as the MHC as a whole While the class I, class III and extended-class II regions have similar LD patterns, the extended class I and class II regions are found to be at extreme ends Linked to the high LD segment of the telo-MHC, the extended class I region is in strong LD and carry longer haplotype blocks In contrast, across a similar physical length, the class II region has lesser LD with a more fragmented, shorter block structure This difference in LD agrees with the recombination rate reported for the Caucasian population; a very low 0.195 cM/Mb in the extended class I, and nearly times higher in the class II (1.712 cM/Mb) (Miretti et al 2005) There is a positive correlation between polymorphism levels and the recombination rate (Nachman 2001, Kauppi et al 2003) and as the extremely polymorphic class II region is under balancing selection, this could have driven the recombination rate higher, facilitating the creation of new haplotype variants in an arms race against pathogens (Meyer and Thomson 2001) This pattern of LD has an important implication on SNP selection for disease association studies of the MHC In an association scan using a panel of test SNP markers, the power of such a panel is related to the LD between the test markers and the disease loci through the LD parameter r2 (Pritchard and Przeworski 2001) Therefore for an association scan with a fixed sample size of cases and controls, it is 182 Discussion important to choose a density of SNPs that reflects the LD of the tested regions with a higher density needed in regions of low LD This point is also reiterated in the generation of tag SNPs for the MHC in Singaporean Chinese (Table 3.7) It is seen that the tagging efficiency – as defined by the number of tag SNPs needed per Mb – is highest in the extended class I region where the LD is the strongest (83 tags / Mb), and lowest in the class II region (222 tags / Mb) Consequently, in SNP selection or marker design for future MHC disease association studies, instead of an even distribution of SNPs across the MHC a proportionally greater number should be typed in the class II region A major impetus of LD maps is to guide the selection of SNPs in disease association studies based on the localized LD patterns, these selected SNPs are also called tag SNPs and we have identified that 701 tag SNPs are sufficient to tag the common variation in the MHC in Singaporean Chinese 4.2 Haplotype-specific LD Patterns in the Singaporean Chinese MHC To investigate if LD in the MHC varies in a HLA allele or haplotype specific manner, the pattern of LD in common HLA alleles was analysed in different approaches, the first of which is through simple haplotype counting When 2- and 3-locus HLA haplotype frequencies are counted, it becomes evident that several HLA pairs appear on the same chromosome more frequently than one would expect by chance, suggesting that LD exist between HLA pairs (Table 3.4) The second type of analysis used to study allele specific LD was through the use of SNP homozygosity plots of different combinations of HLA haplotypes (Figures 3.8-3.10, Figure 3.25) These homozygosity plots provide an indication of the conservation of HLA haplotypes, and 183 Discussion it is seen that tightly linked HLA allele pairs carry higher levels of similarity in the homozygosity plots underlining the fact that little recombination occurs in between Finally, extended haplotype homozygosity (EHH) analysis was employed to describe the similarity of haplotypes carrying identical HLA alleles at increasing distance from the HLA loci, which reflects the strength and number of recombination events in these haplotypes (Figure 3.18) In each of these analyses there is clear evidence that LD pattern in the MHC is HLA allele and haplotype specific The EHH plots show that common alleles such as A*0207, A*3303, B*4601 and B*5801 are associated with long extended stretches of LD, and consequently such alleles are seen with dominant HLA partners at other loci when the extent of LD stretches from one HLA locus to another This is also reflected by the high D´ between these HLA pairs and the low p-values of the observed haplotype frequencies indicate that these haplotypes not exist by chance The SNP homozygosity plots reiterate these patterns; alleles associated with long EHH show little variability in the haplotypes that carry them, suggesting that they are conserved and not interrupted by recombination Yet there also exist other common HLA alleles, such as A*1101 and B*4001, which are frequently found with multiple HLA partners at other loci and short EHH stretches The majority of SNP homozygosity plots of haplotypes carrying these alleles show high variability at multiple locations, indicating that these are not conserved but rather gradually decayed by recombination Conserved extended haplotypes in the Singapore Chinese population From this study we conclusively identify conserved extended haplotypes that are common in the Singaporean Chinese population: A*0203-C*0702-B*3802- 184 Discussion DRB1*1602 DR9), (A2-B38-DR16), A*0207-C*0102-B*4601-DRB1*0901 A*1101-C*0801-B*1502-DRB1*1202 (A11-B15-DR12) and (A2-B46A*3303- C*0302-B*5801-DRB1*0301 (A33-B58-DR3) These conserved haplotypes can be seen through high LD between the corresponding HLA alleles, as well as in the 2-, 3and 4-locus SNP homozygosity plots The haplotype alignments of the samples homozygous for the A2-B46-DR9 and A33B58-DR3 haplotypes provide confirmation of these CEHs, and emphasize the stretch of genetic fixity in these haplotypes Although full-length Caucasian MHC sequences have recently been reported (Horton et al 2008), the homozygous samples here provide a different dimension to the understanding of CEHs as they represent biological replicates of the extent of conservation and also unambiguously demarcate the boundaries of this extent The A33-B58-DR3 conserved extended haplotype The A33-B58-DR3 CEH is prominent due to its frequency (5%) and its length The haplotype alignments of the individuals homozygous for this CEH provide 10 nonambiguous phased chromosomes that show the stretch of genetic fixity of A33-B58DR3 haplotypes The conservation of the A33-B58-DR3 haplotype is striking; of the 10 chromosomes carrying the haplotype are indistinguishable from the telomeric end of the SNP map (28.9Mb) to 33Mb of the chromosome 6p, breaking just before the HLA-DPA1 locus (Figure 3.26) This marks a stretch of at least 4Mb that is inherited as a block uninterrupted by recombination, and such a long stretch of genetic fixity across unrelated individuals has never before been reported in a variation map of this resolution 185 Discussion inherited from a distal ancestor The same may be said for the 4-locus A*1101B*4001-C*0702-DRB1*0901 haplotype in Figure 3.25, of the chromosomes carrying this combination of alleles, none of them are identical across the interval The B*4001 allele is ubiquitous in different populations and ethnic groups around the world; besides the Chinese populations in China, Singapore and Taiwan, it has a high frequency in indigenous populations in the Americas, Australia and Asia It is also found with a frequency of at least 5% in several Caucasian populations across the United States and Europe (Middleton et al 2003) The wide distribution of this allele suggests that it has been around in the human population for a while, and over time repeated recombination events has rendered the B*4001 ancestral haplotype unrecognisable Alleles matching at the HLA-A, -C, -B and -DRB1 loci are taken as surrogates for matching the entire MHC haplotype in solid organ or bone marrow transplants This however obviously falls short between B*4001 individuals A recent study of graft versus host disease (GVHD) in bone marrow transplants compared patients and donors matched only at the classical HLA loci to those matched with the entire MHC haplotype They report an odds ratio of 4.5 for acute GVHD in MHC haplotype unmatched compared to MHC haplotype matched patients (Petersdorf et al 2007) The data shown here indicates that the non-conservation of B*4001 haplotypes should be taken into consideration when matching donors to recipients in solid organ and hematopoietic cell transplantations 188 Discussion The short extent of LD in B*4001 haplotypes also indicate that if the allele is seen to segregate with disease in any linkage or association studies, the association is unlikely to be due to LD with another disease locus in the MHC Not surprisingly, there is a dearth of data showing B*4001 disease associations despite the widespread distribution of this allele and the vast number of association studies performed in the MHC Existence of conserved extended haplotypes The conspicuous difference between non-conserved and conserved MHC haplotypes cannot be accounted for simply by random genetic drift, which would predict that common haplotypes in the population tend to be older and hence exist as shorter LD blocks (Ahmad et al 2003) Instead, population genetic factors such as admixture and natural selection need to be considered, as well as a mechanistic suppression of recombination in these CEHs Recombination suppression has been given as a possible explanation for extended haplotypes Local cis factors have been known to influence the heterogeneity of recombination rate within recombination hotspots, possibly by making the genomic location more amenable to double-stranded breaks or access by the recombination machinery (Cullen et al 2002, Jeffrey and Neumann 2005, Neumann and Jeffreys 2006) While this might explain the lack of recombination activity at certain hotspots, it is unlikely that all the active recombination hotspots in the MHC – there are at least 12 identified by sperm typing (Jeffreys et al 2000, 2001, Cullen et al 2002) – are inert in all of these CEHs Furthermore, it would not be correct to say that recombination does not occur in any of these haplotype; the alleles in these CEHs – 189 Discussion including the ones showing strongest LD such as B*4601 and B*5801 – exist in haplotypes outside of the CEHs, albeit at a low frequency, but indicative that at least some recombination has occurred historically Of the CEHs identified in this study, the distribution of the A2-B46-DR9 and A2B38-DR16 haplotypes are almost exclusively found in Chinese populations, making these ethnic-specific CEHs The alleles belonging to the A33-B58-DR3 haplotype are however also found in high frequencies in the Indian, Pakistani and Mongolian populations (Middleton et al 2003) Similarly, the A11-B15-DR12 haplotype is also found in relatively high frequencies in the Malay and Javanese populations native to South East Asia (Middleton et al 2003) Hence, we cannot rule out admixture as a reason for the presence of some of these CEHs in the population A recent introduction of these CEHs into the Chinese gene pool could have resulted in an increase in frequency of these haplotypes without allowing time for recombination to disrupt the LD (Ardlie et al 2002) It is also possible that these extended haplotypes exist as a result of natural selection Alleles found on these CEHs may have conveyed a certain selective advantage to the individuals carrying them, possibly due to resistance to infectious diseases such as an influenza or cholera pandemic, resulting in the rapid increase in the frequencies of the protective allele The DNA in the segments surrounding the protective allele are swept along with the allele, and if the selection event occurred relatively recently in the history of the population, an insufficient amount of time may have passed for substantial recombination events to have broken down the haplotype, resulting in these long stretches of LD (Sabeti et al 2002) Another way that natural selection 190 Discussion may have led to the presence of these long haplotypes is if combinatorial sets of alleles in concert (for example alleles at the HLA-A and HLA-B) provide a selective advantage to the individuals carrying them Likewise these haplotypes are swept to high frequencies in the population rapidly, preserving the allelic combinations in these haplotypes Comparisons between conserved extended haplotypes The MHC is associated with many autoimmune and infectious diseases This association is generally seen through the segregation of the HLA alleles with the disease condition, but although hundreds of diseases are associated with HLA alleles, till date only a handful of the associations have been conclusively shown to be due to the HLA allele itself The allele specific LD patterns in the MHC are seen to complicate the identification of the disease loci, and this is especially so in associations with CEHs, where the genetic fixity makes it difficult to tease out the actual disease locus from spurious signals due to LD An approach of narrowing disease loci in CEHs is through recombinant mapping (Degli-Esposti et al 1992a, Price et al 1999) This involves comparing haplotypes in patients and identifying fragments of the disease-associated CEHs that are common in the patient group Alternatively, if multiple CEHs are associated with a disease, knowing the segments of the MHC that are common between the CEHs will help narrow down the possible disease locus (Stewart et al 2004) To identify shared segments between CEHs, the full-length Caucasian MHC sequences were assigned into the 203 haplotype blocks identified in this study Haplotype blocks were chosen as the unit for comparison as they have been shown to 191 Discussion be well preserved between different populations, particularly those of non-African descent, and the data will be considerably less noisy then comparing the alleles of individual SNPs Also, identical or similar haplotype configuration in blocks between different CEHs indicates that the shared segment was derived from a common ancestral haplotype, and possibly similar allelic content within (Durrant et al 2004) An interesting observation from the comparisons is the dissimilarity of the A33-B58DR3 haplotype with the other Chinese CEHs The total length of similar haplotype blocks between A33-B58-DR3 and the other CEHs are less than 500kb each In contrast, the total length of block similarity with the Caucasian A1-B8-DR3 (COX), A3-B7-DR15 (PGF) and A26-B18-DR3 (QBL) is 712kb, 526kb and 1,153kb respectively Among the other Chinese CEHs, they share between 740kb to 1,134kb blocks of similarity This seem to suggest that the A33-B58-DR3 arrived from a different lineage from the other Chinese CEHs, and the high frequency of this haplotype in central Asia also shows that this is not a CEH unique to the ethnic Chinese The strong LD and high conservation of the A33-B58-DR3 haplotype suggests that it could have been introduced relatively recently into the population via admixture and possibly accompanied by a rise in frequency due to increased fitness or selection pressures, with insufficient time to allow recombination events to breakdown the haplotype Regardless of the origins of the haplotypes, the comparisons between CEHs allows for the discovery of shared blocks between disease-associated haplotypes For example, both the A11-B15-DR12 and A33-B58-DR3 haplotypes are known to be associated with drug-allergy induced Stevens-Johnson syndrome (Chung et al 2004, 192 Discussion Hung et al 2005) The data in this study show that the CEHs share a 158kb block within the MHC class III Genes encoded in this locus include the C4A and C4B complement cascade genes, TNXB – coding for the tenascin-X, a protein associated with the extracellular matrix in muscle tissues, CYP21A2 – coding for a cytochrome P450 drug metabolizing enzyme, as well as CREBL1 – a cAMP response element binding protein A33-B58-DR3 and A2-B46-DR9 also share an association with nasopharyngeal carcinoma, a condition with an incidence rate of up to 30-50 per 100,000 people/year in the Chinese population (Chan et al 1983, Lu et al 1990, Hildesheim et al 2002) There is a dearth of long haplotype blocks common between these CEHs, but blocks greater than 20kb can be identified One of these carries the BAT1 locus, encoding a RNA-binding protein that is associated with the regulation of cytokine production (Allcock et al 2001) A second shared block is 30kb long and contains a locus encoding for EGF8 – an epidermal growth factor The third shared block lies in a gene desert region between RPP21 and HLA-E 4.3 Genetic Fixity of HLA Alleles The HLA loci are the most polymorphic in the human genome For example in the Singaporean Chinese population alone, there are 45 different HLA-B alleles seen The SNP haplotypes around each HLA locus however hints at a simpler pattern of diversity HLA alleles can be classified into different allelic lineages according to sequence specificities, and generally alleles in a family belong to the same clade in phylogenetic tress constructed using entire nucleotide sequences (Gu and Nei 1999, McKenzie et al 1999) It is clear from the alignments of SNP haplotypes that HLA 193 Discussion alleles belonging to the same allelic lineage show invariant haplotypes in stretches of DNA around the HLA locus For example, the common A02 alleles in the population, A*0201, A*0203, and A*0207 are identical across a block of 140 SNPs covering 380kb (Figure 3.20) Two-SNP tag captures ancient DR haplotypes At the HLA-DRB1 locus, this fixity takes on different levels HLA-DRB1 alleles may be separated into serological groups according to the DRB1 allele, but also into one of DR serological groups These DR groups carry specific combinations of pseudogenes and DRB3, -4 or -5 antigens (Svensson and Andersson 1997), and arose before the separation of hominoid species This also can be seen from the DRB1 alleles in chimpanzees; many human DRB1 allelic lineages are found in chimpanzees (Bergstrom et al 1999) The SNP haplotypes around the HLA-DRB1 locus segregate according to allelic lineage, and interestingly into the ancient DR lineages A haplotype tag of SNPs that flank the HLA-DRB1 locus captures the DR lineages perfectly, showing that no recombination has occurred across that segment since the separation of the hominoid species (Figure 3.23) The SNP haplotypes that stretch across the 150kb DRB region show further haplotype specificity within DRB1 lineages (Figure 3.24), defining fixed blocks around the DRB loci 194 Discussion Haplotype background similarity between HLA-B*4601 and B*1502 reflects inter-locus gene conversion event Although no recombination has occurred in regions around the classical HLA loci, there obviously is an accumulation of polymorphisms within the loci that generates the spectrum of HLA alleles seen in the modern day population Many of these are point mutations, especially within the peptide-binding grooves, but gene conversion events between homologous and non-homologous loci also contribute to the diversity The HLA-B*4601 allele is one such example, having resulted from a gene conversion event that replaced part of the peptide-binding groove of a B15 allele with a homologous section of a Cw1 allele, leaving the B15 haplotype background intact (Zemmour et al 1992, Barber et al 1996) Consequently, the SNP haplotype for B*4601 haplotypes are indistinguishable from B*1502 haplotypes around the HLA-B locus While the exon sequence of B*4601 has previously been used to explain the gene conversion arrangement, the SNP haplotypes here unambiguously support and confirm this genetic event 4.4 Fine-Mapping of Crossover Locations in Sperm Recombinant Hotspots In general, recombination hotspots have been identified across the MHC using techniques: 1) Mapping of crossover locations in recombinant sperm typing (Jeffreys et al 2000, 2001, Cullen et al 2002), and 2) By inferring the population recombination rate from high-density SNP maps (International HapMap Consortium 2005) There however is a distinction between these types of hotspots; hotspots identified by sperm typing are active sites of recombination in the tested sample pool, while strictly speaking population recombination rates inferred from variation maps identify sites of recombination that are active as well as those that occurred within the 195 Discussion history of the samples (Arnheim et al 2003, 2007 Kauppi et al 2004) Although indisputable determination of meiotic recombination hotspots is only possible using recombinant sperm, it is an expensive and arduous process hence only a handful of recombination hotspots have been definitively mapped through recombinant sperm mapping (Jeffreys et al 1998, 2000, 2001, Cullen et al 2002, Jeffreys and Neumann 2002, 2005) A set of recombination hotspot regions was identified across the MHC by Cullen and colleagues using a single-sperm typing technique, and these regions were mapped to intervals ranging from 35kb to 105kb, a resolution constrained by the STR markers used for genotyping However, it is now known that crossover locations in recombination hotspots occur in 2kb or smaller windows (Jeffreys et al 1998, 2000, 2001), and we sort to narrow down these hotspots using the population data in this study The Singaporean Chinese SNP haplotypes show segments across which recombinant haplotypes are formed; one could identify stretches of homozygosity within certain samples and these homozygous haplotypes are disrupted abruptly across a SNP interval When these recombinant haplotypes are aligned it is revealed that the locations of homozygous disruption are similar within them, often clustering across a to SNP interval The locations of these recombinant haplotype breaks agree well with inferred recombination rates and the underlying haplotype block structure, and hence are likely to be recombination hotspots 196 Discussion The Cullen hotspot regions were narrowed to possible crossover windows by identifying the union of historical recombination data from LD maps and location of recombinant haplotypes breaks in the SNP haplotypes Of these hotspots were chosen for re-sequencing, and crossover windows in these were successfully narrowed to sizes of between 660bp to 1.8kb As a proof of concept, an additional hotspot in the TAP2 gene that was previously fine-mapped to a 1.4kb crossover window was identified and re-sequenced using the same strategy The crossover window identified in this study overlaps with the sperm-mapped hotspot, proving that the technique used here was valid The HLA-F telomeric hotspot defined by Cullen and co-workers was narrowed to a 660bp crossover window adjacent to a large composite retrotransposon, a SVA element, that is present in the reference human genome sequence, but not in any of the samples sequenced This SVA element is also found in two of the fully sequenced MHC haplotypes (Horton et al 2008) Insufficient homology is known to suppress homologous recombination between chromosomes, possibly by disrupting the complementary base-pairing required during DNA strand exchange (Silver and Artzt 1981) The large SVA element in such close proximity to this crossover location that is might similarly influence the distribution of crossover haplotypes at this recombination hotspot, such that individuals heterozygous for both SVA elements may have a reduced recombination rate here Further investigation is needed to understand this, possibly through family linkage studies or sperm typing of heterozygous men 197 Discussion It has been discovered in yeast that certain DNA cis motifs contribute to the presence of hotspots possibly by causing the localized region to be more amenable to doublestrand break (DSB) formation (Blumental-Perry et al 2000) However to date, no single DNA motif has been conclusively associated with a recombination hotspot in humans (Arneheim et al 2007) A recent whole-genome scan of historical recombination hotspots identified several motifs that are over-represented in hotspots as opposed to coldspots (Myers et al 2005) and while some of these motifs are seen within the narrowed crossover windows identified in this study, they appear ubiquitous across the windows It is however noted that all of the narrowed crossover windows contain transposable elements and it is possible that these elements are associated with positions on the genome that are more accessible to the recombination or DSB repair complexes, possibly because of an open chromatin structure (Wu and Lichten 1994) Indeed, certain repetitive elements are known to be over-represented in recombination hotspots (Cullen et al 2002, Myers et al 2005) The handful of recombination hotspots fine mapped here did not reveal any sequence elements that may be associated with recombination hotspots However the resequencing of recombinant haplotypes across these regions successfully narrowed the crossover locations in these hotspots to less than 2kb window This is the first known study that utilises recombinant haplotypes in humans to map out recombination hotspots The success of this attempt suggests that it is possible to scale this up to other regions of the genome, and with the exact crossover windows of other hotspots identified, make it possible to finally understand the elusive features that drive recombination hotspots in the genome 198 Discussion 4.5 Conclusion A comprehensive knowledge of variation as well as structure of LD in the MHC will further our understanding of the aetiology of autoimmune and infectious diseases With this in mind we successfully genotyped a panel of Singaporean Chinese individuals, and constructed SNP maps that were used to describe the variation and LD patterns of the MHC Furthermore, the addition of high quality phase information from nuclear families and HLA homozygous cell-lines adds confidence to the data reported here The data made available through this study will be a valuable contribution to the limited knowledge of genetic variation in Asian populations and serve as a model for future studies of LD and recombination Population LD patterns described here will also help guide marker selection in future association studies of the MHC by indicating segments of high LD that will make dense marker sets redundant, and likewise regions of low LD that will benefit from an increased marker density in order to capture variation that is there Linkage disequilibrium patterns are seen to be HLA allele-specific and long extended segments of LD are associated with several common HLA alleles, some of which exist as conserved extended haplotypes in the Singaporean Chinese population Two of these CEHs, A33-B58-DR3 and A2-B46DR9 show remarkably long extent of genetic fixity that has not been seen elsewhere before These long segments of LD need to be taken into consideration when deciphering the results of association studies and caution is advised when drawing conclusions from associations that may arise due to linkage on a CEH Interpopulation comparisons made between MHC haplotypes illustrate not only the differences but also the similarities between ethnic-specific MHC haplotypes, and 199 Discussion knowledge of these shared segments will help better understand MHC disease associations As these and more MHC haplotype data are reported in the future, it will clarify remaining questions about shared lineages and common ancestries between different population groups The haplotype variation of the regions surrounding the classical HLA alleles are seen to segregate according to known HLA allele lineages and is in concordance with phylogenetic relationships constructed from the exon and/or intron sequences of HLA genes It will be interesting to see if these haplotype patterns are conserved across populations, and if the data DR haplotypes are representative of the other HLA loci, we believe that they will We have also shown here that haplotype structures identified at a population level can be utilised to select recombinant haplotypes to successfully fine map crossover locations, and this will be a valuable tool in mapping recombination hotspots across the genome With the advent of new sequencing technologies, recombinant haplotypes may be rapidly identified and precisely mapped using this novel approach, expanding our knowledge of the mechanisms that drive meiotic recombination in the human genome The data in this study also acts as a model to understand the conserved extended haplotypes across the MHC and possibly across the genome From a population perspective, the LD pattern across the MHC is a disjointed landscape of distinct haplotype blocks When considered at a HLA allele or haplotype-specific level, longrange haplotypes are seen to exist across the MHC, some of which stretch up to 4Mb 200 Discussion or more These seemingly discordant views slowly converge when the conserved haplotypes are compared side by side, revealing long stretches of shared blocks between divergent HLA haplotypes Haplotype block structures are a result of recombination events occurring at hotspots, many of which are historical but some of which still are active in the present population The boundaries of the haplotype blocks are robust across different populations and ethnic groups, and as seen in this dataset, agree well with recombination signals from population data as well as sperm-typed recombination hotspots Chromosomes that carry the same haplotype within a block does not imply complete sequence identity between them; balancing selection drives the extreme polymorphism seen in the MHC, particularly at the HLA loci, and polymorphisms accumulate within a haplotype block However as recombination does not drive the diversity within the block, the haplotype block background remains unblemished In other words, similar haplotypes within these blocks should be considered as belonging to a small number of deep clades, with each clade representing a different ancestral background (Daly et al 2001) These blocks are remnants of ancient haplotypes that were slowly disrupted by recombination hotspots over time and shuffled between haplotype backgrounds Different migratory patterns and selection events led to the expansion of specific haplotypes in each ethnic group or sub-population, resulting in the ethnic specific CEHs seen in the population today – for example the A2-B46-DR9 in the Chinese population and A1-B8-DR3 in the Caucasian population Additionally, recent admixture events introduced new CEHs into established populations, and these are 201 Discussion seen as CEHs shared among human populations (e.g A33-B58-DR3) Intermingled with these CEHs are non-conserved haplotypes comprised of shuffled ancestral blocks generated through random drift (e.g non-conserved B*4001 haplotypes) This leads to the kaleidoscopic structure of MHC haplotypes seen in the modern day human population Hence a population-wide view of LD in the MHC is a summation of non-conserved and conserved haplotypes seen to be block-like in nature, yet is a simplicity that belies the underlying mesh of varying genetic fixity lengths 202 ... LD and Variation of the MHC in Singaporean Chinese Many association studies have highlighted strong links between the MHC and a number of diseases in the Singaporean Chinese population, and a. .. available variation data in other regions of the genome for the Singaporean Chinese population, the CHB panel of the HapMap project will serve as an adequate proxy The correlation of the Singaporean... studies in the local Chinese population Strong correlation in SNP profile of two Asian populations – that of Singaporean Chinese and the Han Chinese in Beijing The HapMap project has become an important