Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 22 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
22
Dung lượng
17,11 MB
Nội dung
Results 3.4 Conserved Extended Haplotypes in the Singapore Chinese Population With the earlier lower-resolution SNP map, four conserved extended haplotypes (CEHs) stretching across the MHC was seen in the Singaporean Chinese population (Section 3.1.3) These were: A*0207-C*0102-B*4601-DRB1*0901 (referred to as A2-B46-DR9), A*0203-C*0702-B*3802-DRB1*1602 (A2-B38-DR16), A*1101C*0801-B*1502-DRB1*1202 (A11-B15-DR12) and A*3303-C*0302-B*5801- DRB1*0301 (A33-B58-DR3) The analysis of these CEHs is expanded in this section, making use of the higher resolution SNP map to describe these haplotypes in greater detail The 29 HLA-homozygous samples genotyped were also included in this analysis, providing unambiguous detail to the extent and conservation of these haplotypes 3.4.1 High-Resolution Single-SNP Homozygosity Plots Homozygosity at a polymorphic marker can be thought of as the probability that two randomly selected haplotypes are identical at that marker, and is an indication of the conservation and linkage disequilibrium of underlying haplotypes (Sabatti and Risch 2002) To see if the MHC conserved haplotypes are also visible in this high-resolution SNP map, the single-SNP homozygosity plots were created for the CEHs from position 29.5Mb to 33.5Mb of the map (Figure 3.25) The homozygosity plot for the A*1101-C*0702-B*4001-DRB1*0901 (A11-B40-DR9) haplotype, which is present at high frequency in the population but was shown in the earlier section not to be conserved, was also included for comparison The homozygosity plots were constructed with data from both HLA-homozygous as well as phased haplotypes of unrelated individuals 127 Results The plots show large contrast between the CEHs and the non-conserved A11-B40DR9 haplotype For each CEH, average homozygosity is high in the region between the HLA-A and HLA-DRB1 loci, averaging at least 0.96, before decaying noticeably at the telomeric and centromeric ends By contrast the average homozygosity for the A11-B40-DR9 haplotype was only 0.78, with heterogeneity in the genotypes seen across the entire 4Mb When compared with the plots generated earlier using the sparser SNP map, some additional variation within each CEH is seen At first glance, the A11-B15-DR12 haplotypes show heterogeneity at regions where homozygosity values dip below 0.9: a 500kb block centromeric of the HLA-A locus and another telomeric to the HLA-DRB1 locus However, the variation seen here is present in only one out of a possible chromosomes In other CEHs, variation is seen at sporadic SNPs at different locations, possibly due to recurrent mutations or gene conversion events that had accumulated in the evolutionary history of the conserved haplotype The SNPs at positions 31,838,993 (dbSNP id: rs707937) and 32,447,625 (rs2050189) shows variability in at least CEHs, with homozygosity values dipping in each haplotype The former falls within an intron of the MSH5 gene, while the latter lies in the 5´ UTR of the C6orf10 gene and is less than 300bp away from an Alu transposableelement The consistent variability at these loci in multiple CEHs could indicate fragile segments of the genome prone to repeat mutation, double-crossover or gene conversion events 128 Results HLA-C HLA-B HLA-DRB1 A*1101-C*0702-B*4001-DRB1*0901 n=7 A*1101-C*0801-B*1502-DRB1*1202 n=10 Homozygosity A*0203-C*0702-B*3802-DRB1*1602 n=6 A*0207-C*0102-B*4601-DRB1*0901 n=37 Homozygosity Homozygosity Homozygosity Homozygosity HLA-A A*3303-C*0302-B*5801-DRB1*0301 n=33 Position Along Chromosome 6p (Mb) Figure 3.25 Homozygosity Plots for 4-Locus Haplotypes Homozygosity plots from 29.5Mb to 33.5Mb of the SNP map for the common 4-locus haplotypes constructed from phased data The number of chromosomes used to generate each plot is indicated for each haplotype 129 Results The homozygosity value drops to 0.54 at position 30,683,582 (dbSNP id: rs1264420) in A2-B46-DR9 haplotypes, with 65% of the haplotypes carrying the allele ‘G’ and the rest carrying the allele ‘A’ This SNP lies within an intron of the PPP1R10 gene, bordering an AluJb transposable element In A2-B38-DR16 haplotypes, besides positions 31,838,993 and 32,447,625 described above, low homozygosity is also seen at positions 30,230,602, 31,199,971 and 31,569,068, but the variants at these sites are only seen in one chromosome To briefly summarise the SNP homozygosity plots, very high homozygosity values are seen across each of the CEHs, but several sites of variation exists As the haplotypes used to construct these plots are from a mixture of HLA homozygous samples as well as phase-inferred unrelated individuals, we cannot rule out that some of these variations within CEHs – especially those seen in only one chromosome – are due to incorrect phase calls The variations seen in multiple chromosomes within CEHs are however likely to represent recurrent mutation, double-crossover or gene conversion events that had accumulated within these haplotypes 3.4.2 Haplotypes of HLA Homozygous Samples The inclusion of HLA homozygous individuals in the genotyped samples provides robustness to the description of the CEHs in the population Three of the four Chinese CEHs are represented in these homozygous samples; there are eight A2-B46-DR9homozygous and five A33-B58-DR3-homozygous samples included in the set There is also a single A11-B15-DR12 homozygous individual 130 Results The A11-B15-DR12-homozygous individual is completely homozygous from positions 29,637,733 to 33,648,670, except for SNP positions: 31,838,993 and 32,447,625 These SNP positions correspond to the outliers in the homozygosity plots discussed in the above section, and confirm that these were not artefacts from the in-silico haplotype phasing, reiterating the value that homozygous samples provide With unrelated individuals homozygous for the A33-B58-DR3 haplotype, or 10 independent chromosomes, it is possible to examine the CEH unambiguously without phase-reconstructed haplotypes The SNP haplotype alignment of these 10 chromosomes is illustrated in Figure 3.26 All 10 chromosomes are almost identical across the length of the entire MHC The similarity at the centromeric end starts to fall apart at position 33,011,615, before the location of HLA-DMB, and very close to a recombination hotspot (33,008,000 to 33,0010,000) identified in the HapMap project (International HapMap Consortium 2005) At the telomeric end, a single chromosome breaks at position 29,591,947; again this location is close to a HapMap recombination hotspot at position 29,594,500 Based on of 10 chromosomes, the conservation of the A33-B58-DR3 haplotype stretches from RFP at the 29Mb mark to 33Mb at the very least, a full 4Mb of the MHC In the chromosome 6p LD map (Section 3.1.2), a long segment of strong LD is seen telomeric to the MHC and is therefore likely that the A33-B58-DR3 CEH is conserved way past the telomeric boundary of this SNP map 131 132 The colours indicate the SNP allele at each position (A: red, C: blue, G: yellow, T: green) Figure 3.26 Full-Length SNP Haplotype-Alignments of A*3303-C*0302-B*5801-DRB1*0301 Homozygous Samples The SNP haplotype alignment in this figure is constructed for the entire set of 1877 informative SNPs across the MHC in individuals homozygous for the A*3303-C*0302-B*5801-DRB1*0301 CEH This figure clearly illustrates the strong fixity from positions 29.59Mb to 33.01Mb across the MHC Results Results The full-length SNP haplotypes for the A2-B46-DR9 haplotypes were also aligned and analysed in a similar manner (Figure 3.27) The 16 chromosomes show very high similarity from positions 29,830,949 to 32,847,866 of the chromosome 6p, beyond which no obvious consensus haplotype is seen This puts the telomeric break between MOG and HLA-F, and the centromeric break after the HLA-DQB2 locus Two other chromosomes break at position 29,791,787, which coincides with the breakpoint seen in HLA-A02 alleles discussed in the previous section Unlike the A33-B58-DR3 haplotypes, there are recurrent heterozygous sites in the A2-B46-DR9-homozygous samples Two of these sites coincide with the ones identified earlier in the homozygosity plots; out of the samples are heterozygous at position 30,683,582 (rs1264420), and out of the samples are heterozygous at position 32,447,625 (rs2050189), reconfirming the observation in the homozygosity plots The SNP at position 30,473,719 (dbSNP id: rs3130118) is seen to be heterozygous in samples This SNP lies in a gene-desert between RPP21 and HLAE, and is close to an L1-Line repeat at position 30,474,008 Two samples are heterozygous at position 31,551,302 (rs12660382) This SNP lies within a dormant ribosomal RNA large-subunit, and within 50bp of a THE1A element (at position 31,551,334) that is associated with recombination hotspots (Myers et al 2005) Additionally, samples are heterozygous at position 32,821,245 (rs2239800), a located within an intron of the HLA-DQA2 locus 133 134 The colours indicate the SNP allele at each position (A: red, C: blue, G: yellow, T: green) Figure 3.27 Full-Length SNP Haplotype-Alignments of A*0207-C*0102-B*4601-DRB1*0901 Homozygous Samples The SNP haplotype alignment in this figure is constructed for individuals homozygous for the A*0207-C*0102-B*4601-DRB1*0901 CEH This alignment covers the entire set of 1877 informative SNPs across the MHC, and the 16 chromosomes are nearly identical from positions 29.79Mb to 32.85Mb Results Results 3.4.3 Shared Ancestral Blocks Among Conserved Haplotypes The MHC is associated with over 100 diseases, including most autoimmune and infection diseases (Lechler and Warren 2000, Horton et al 2004) Most of these disease associations are identified through segregation of disease phenotype with an HLA allele, several of which belong to known CEHs in various populations (Price et al 1999, Lechler and Warren 2000, Stewart et al 2004) Very few of these diseases have been conclusively linked to an HLA allele; rather it is thought that non-HLA genes in strong LD with the HLA locus are major contributing factors, possibly transmitted along within the conserved haplotype A lack of detailed knowledge of the variation and allelic content of the CEHs beyond the HLA genes has hampered finemapping of disease loci To this end, the MHC haplotype project (Allcock et al 2002, Stewart et al 2004, Traherne et al 2006, Horton et al 2008) set out to completely sequence Caucasian CEHs, providing a wealth of variation data of the MHC in the Caucasian population The high-resolution SNP map of the Chinese CEHs in this study contributes to this objective and enables comparisons to be made between the Asian and European CEHs For example, the Caucasian A1-B8-DR3 and the Chinese A2-B46-DR9 haplotypes are both associated with myasthenia gravis (Chan et al 1993, Price et al 1999), knowing which segments are shared within both CEHs will help narrow down possible disease loci, a form of recombinant mapping between these conserved haplotypes In order to compare the full-length Caucasian CEH sequences with the Chinese CEHs, the genotypes for the 1877 SNPs used in this study were distilled from each Caucasian haplotype by mapping the SNP flanking sequences to each full-length sequence using BLAST (Altschul et al 1990) The respective SNP alleles can then be 135 Results accurately determined for each Caucasian CEH SNPs with flanking sequences that were poorly matched or had multiple mapping locations in a CEH were ignored and null alleles was assigned to those positions To identify segments that were shared between CEHs, the 203 haplotype-blocks identified earlier in Section 3.2.3 were used – each CEH was converted into a string of contiguous haplotype blocks based on the SNP alleles Haplotype blocks accurately identify regions of low diversity that are uninterrupted by recombination, with sizes and distribution that are robust across nonAfrican populations (International HapMap Project 2005), and using these as the basic unit for comparisons will succinctly describe similarities and differences between CEHs For the local Chinese CEHs, the reference haplotypes for the A2B46-DR9 and A33-B58-DR3 CEHs were obtained from the consensus of homozygous samples, while the reference haplotypes for the A11-B15-DR12 and A2B38-DR1 CEHs were constructed using the HLA-homozygous samples supplemented by consensus of phase-inferred chromosomes The haplotypes of the 12 CEHs (8 Caucasian and Chinese) were compared from 29.0Mb to 33.0Mb and illustrated as separate 1Mb panels in Figure 3.28 Haplotype blocks are drawn linearly across each panel with different colours used to distinguish between different haplotypes within each block Gaps in the diagram signify blocks where the haplotype could not be identified for a particular CEH, either due to gaps in the sequence assembly (for the full-length MHC sequences) or an inability to identify a consensus haplotype due to recombination (local CEHs) Large gaps are seen in of the Caucasian CEHs (APD, DBB, MANN, MCF and SSTO), as the sequence coverage of the MHC in these haplotypes is not complete (Horton et al 2008) 136 The top haplotypes are CEHs in the local Chinese population while the other are extracted from the fully-sequenced Caucasian MHC haplotypes (Horton et al 2008) Missing blocks indicate segments where SNPs could not be mapped to the full-length sequences due to assembly gaps Figure 3.28 Comparing Full-Length Haplotypes of 12 CEHs Across the MHC Panel A: 29.0Mb to 30.0Mb The haplotypes of each CEH is displayed as a series of coloured haplotype blocks, which different colours representing different block haplotypes Results 137 The top haplotypes are CEHs in the local Chinese population while the other are extracted from the fully-sequenced Caucasian MHC haplotypes (Horton et al 2008) Missing blocks indicate segments where SNPs could not be mapped to the full-length sequences due to assembly gaps Figure 3.28 Comparing Full-Length Haplotypes of 12 CEHs Across the MHC Panel B: 30.0Mb to 31.0Mb The haplotypes of each CEH is displayed as a series of coloured haplotype blocks, which different colours representing different block haplotypes Results 138 The top haplotypes are CEHs in the local Chinese population while the other are extracted from the fully-sequenced Caucasian MHC haplotypes (Horton et al 2008) Missing blocks indicate segments where SNPs could not be mapped to the full-length sequences due to assembly gaps Figure 3.28 Comparing Full-Length Haplotypes of 12 CEHs Across the MHC Panel C: 31.0Mb to 32.0Mb The haplotypes of each CEH is displayed as a series of coloured haplotype blocks, which different colours representing different block haplotypes Results 139 The top haplotypes are CEHs in the local Chinese population while the other are extracted from the fully-sequenced Caucasian MHC haplotypes (Horton et al 2008) Missing blocks indicate segments where SNPs could not be mapped to the full-length sequences due to assembly gaps Figure 3.28 Comparing Full-Length Haplotypes of 12 CEHs Across the MHC Panel D: 32.0Mb to 33.0Mb The haplotypes of each CEH is displayed as a series of coloured haplotype blocks, which different colours representing different block haplotypes Results 140 Results Shared regions between CEHs are clearly identifiable, for instance the HLA-A2 carrying haplotypes share a common pattern between 29.5Mb to 30.15Mb, replicating the fixed A2 region reported in the earlier section The shared segments between the local Chinese CEHs and the other haplotypes are also summarized in Table 3.10 In this list, consecutive runs of similar haplotype blocks are considered a contiguous segment, and only shared segments between CEHs that are longer than 20kb are included The gene loci that are covered within each segment are also listed Several interesting observations are described in detail in the following paragraphs A33-B58-DR3 CEH Comparisons The A33-B58-DR3 haplotype remarkably shows more similarity to the Caucasian COX, PGF and QBL haplotypes than to the other Chinese haplotypes Of these, A33-B58-DR3 shares the most number of similar segments, up to 1.15Mb across the MHC, with the QBL haplotype As this haplotype pair carries the same DR allele, they expectedly share up to 270kb long segments in the Class II region, covering the HLA-DR and DQB1 loci These haplotypes also show high symmetry within the Class III region; in fact more than half of the entire Class III region is similar between these haplotypes This includes large 150kb and 170kb segments that carrying the BF, C2 NOTCH4 and HSPA1A loci These haplotypes are however not similar at the RCCX (RP-C4-CYP21-TNX) module that is sandwiched in between This set of observations confirms data from a recent publication that genotyped STR markers in a collection of CEHs at several locations across the MHC (Dorak et al 2006) The QBL and A33-B58-DR3 haplotype pair also shares long blocks within the Class I region, including regions coding for the DDR1, VARSL, POUF1 and the psoriasis candidate genes (PSORS1C1,2,3) 141 Results Table 3.10 Shared Segments Between CEH Pairs CEH Pair Common Segments Gene Loci Segment Start (Mb) End (Mb) Length (kb) 32.055 HLA-DQA2 HLA-DQB2 28.30 C6orf10 32.547 28.20 HLA-DRA 30.432 30.472 40.32 32.222 32.253 30.02 AGPAT1 EGFL8 PPT2 PRRT1 31.588 31.613 24.19 BAT1 32.213 158.39 C4A C4B CREBL1 CYP21A2 FKBPL STK19 TNXB 29.772 29.839 66.28 HLA-F 30.432 30.472 40.32 32.790 22.49 31.683 145.35 ATP6V1G2 BAT1 HCG26 HCP5 LST1 LTA LTB MICB NCR3 NFKBIL1 TNF 30.662 92.79 ABCF1 GNL1 HLA-E PRR3 33.007 76.28 PPP1R2P1 PSMB9 32.253 61.36 AGPAT1 CREBL1 EGFL8 FKBPL PPT2 PRRT1 32.734 32.790 55.33 HLA-DQB1 30.187 30.228 41.11 TRIM10 TRIM31 TRIM40 30.432 30.472 40.32 32.674 32.711 36.68 29.444 29.478 33.62 OR12D2 OR12D3 OR5V1 31.887 31.910 23.94 C6orf48 HSPA1A HSPA1B HSPA1L 30.879 30.901 22.54 C6orf214 31.973 32.038 65.89 BF C2 EHMT2 RDBP SKIV2L ZBTB12 32.418 32.487 68.71 BTNL2 C6orf10 HCG23 32.222 32.253 30.02 AGPAT1 EGFL8 PPT2 PRRT1 31.355 31.382 27.04 29.382 29.444 61.18 OR5U1 OR5V1 31.330 31.382 52.08 HLA-C 32.445 32.482 36.93 BTNL2 C6orf10 HCG23 29.453 29.478 24.73 OR12D2 OR12D3 OR5V1 30.996 31.020 24.32 SFTPG VARSL 29.245 29.389 143.52 OR2J2 OR5U1 32.115 32.213 98.11 CREBL1 CYP21A2 FKBPL TNXB 32.931 33.007 76.28 PPP1R2P1 PSMB9 29.449 29.478 28.67 OR12D2 OR12D3 OR5V1 32.418 32.447 28.30 C6orf10 31.681 A*3303-C*0302-B*5801-DRB1*0301 VS SSTO: A*3201-C*0501-B*4402-DRB1*0403 33.16 32.447 32.191 A*3303-C*0302-B*5801-DRB1*0301 VS PGF: A*0301-C*0702-B*0702-DRB1*1501 32.839 32.931 A*3303-C*0302-B*5801-DRB1*0301 VS MCF: A*0201-C*0304-B*1501-DRB1*0401 32.806 30.569 A*3303-C*0302-B*5801-DRB1*0301 VS MANN: A*2902-C*1601-B*4403-DRB1*0701 40.32 31.538 A*3303-C*0302-B*5801-DRB1*0301 VS DBB: A*0201-C*0602-B*5701-DRB1*0701 30.472 32.767 A*3303-C*0302-B*5801-DRB1*0301 VS COX: A*0101-C*0701-B*0801-DRB1*0301 MICA 30.432 32.055 A*3303-C*0302-B*5801-DRB1*0301 VS A*1101-C*0801-B*1502-DRB1*1202 C4A C4B CREBL1 CYP21A2 FKBPL STK19 TNXB 55.89 32.519 A*3303-C*0302-B*5801-DRB1*0301 VS A*0207-C*0102-B*4601-DRB1*0901 158.39 31.534 32.418 A*3303-C*0302-B*5801-DRB1*0301 VS A*0203-C*0702-B*3802-DRB1*1602 32.213 31.478 31.784 102.81 AIF1 APOM BAT2 BAT3 BAT4 BAT5 C6orf47 CSNK2B LY6G5B LY6G5C LY6G6D 29.712 29.792 79.54 MOG ZFP57 31.304 31.338 33.74 32.445 32.472 27.08 BTNL2 C6orf10 HCG23 142 Results CEH Pair Common Segments Start (Mb) End (Mb) Length (kb) Gene Loci Segment 32.504 207.13 HLA-DRA HLA-DRB1 HLA-DRB5 30.850 31.034 183.85 C6orf214 DDR1 DPCR1 GTF2H4 HCG20 HCG21 SFTPG TIGD1L VARSL 31.887 32.055 168.93 BF C2 C6orf48 DOM3Z EHMT2 HSPA1A HSPA1B HSPA1L NEU1 RDBP SKIV2L SLC44A4 STK19 ZBTB12 31.681 31.835 153.73 AIF1 APOM BAT2 BAT3 BAT4 BAT5 C6orf25 C6orf47 CLIC1 CSNK2B DDAH2 LY6G5B LY6G5C LY6G6C LY6G6D LY6G6E MSH5 31.214 31.338 123.84 CCHCR1 HCG27 POU5F1 PSORS1C1 PSORS1C2 PSORS1C3 TCF19 32.931 33.007 76.63 PPP1R2P1 PSMB9 32.222 32.297 74.97 AGER AGPAT1 EGFL8 GPSM3 NOTCH4 PBX2 PPT2 PRRT1 RNF5 32.734 A*3303-C*0302-B*5801-DRB1*0301 VS QBL: A*2601-C*0501-B*1801-DRB1*0301 32.711 32.794 59.24 HLA-DQB1 31.681 HCG18 RPP21 TRIM39 95.41 DDR1 GTF2H4 TIGD1L VARSL 30.869 67.73 FLOT1 HCG20 IER3 TUBB 32.767 32.40 HLA-DQB1 31.587 25.49 MICB 31.248 22.96 CCHCR1 POU5F1 TCF19 32.451 32.472 21.31 BTNL2 HCG23 30.173 30.535 362.33 HCG17 HCG18 RPP21 TRIM10 TRIM15 TRIM26 TRIM31 TRIM39 TRIM40 30.709 30.822 113.25 C6orf134 C6orf136 DHX16 FLOT1 IER3 KIAA1949 MDC1 NRM TUBB 31.044 31.122 78.10 C6orf205 31.430 31.477 46.60 HLA-B MICA 31.617 31.663 45.85 ATP6V1G2 BAT1 LST1 LTA LTB NFKBIL1 TNF 31.857 31.886 29.13 HSPA1L LSM2 VARS 30.432 30.535 103.86 31.850 31.886 35.88 32.767 32.794 26.39 30.026 30.052 25.91 HCG9 32.191 32.213 21.83 CREBL1 FKBPL 32.269 32.291 21.39 GPSM3 NOTCH4 32.451 32.472 21.31 BTNL2 HCG23 30.996 31.100 104.19 C6orf205 DPCR1 HCG21 SFTPG VARSL 31.857 31.886 29.13 HSPA1L LSM2 VARS 31.252 31.279 27.33 HCG27 PSORS1C3 30.801 A*1101-C*0801-B*1502-DRB1*1202 VS MANN: A*2902-C*1601-B*4403-DRB1*0701 A*1101-C*0801-B*1502-DRB1*1202 VS MCF: A*0201-C*0304-B*1501-DRB1*0401 203.18 31.004 31.225 A*1101-C*0801-B*1502-DRB1*1202 VS DBB: A*0201-C*0602-B*5701-DRB1*0701 30.535 30.909 31.562 A*1101-C*0801-B*1502-DRB1*1202 VS COX: A*0101-C*0701-B*0801-DRB1*0301 30.332 32.734 A*1101-C*0801-B*1502-DRB1*1202 VS A*0207-C*0102-B*4601-DRB1*0901 616.52 30.801 A*1101-C*0801-B*1502-DRB1*1202 VS A*0203-C*0702-B*3802-DRB1*1602 32.297 AGER AGPAT1 AIF1 APOM BAT2 BAT3 BAT4 BAT5 BF C2 C4A C4B C6orf25 C6orf26 C6orf27 C6orf47 C6orf48 CLIC1 CREBL1 CSNK2B CYP21A2 DDAH2 DOM3Z EGFL8 EHMT2 FKBPL GPSM3 HSPA1A HSPA1B HSPA1L LSM2 LY6G5B LY6G5C LY6G6C LY6G6D LY6G6E MSH5 NEU1 NOTCH4 PBX2 PPT2 PRRT1 30.822 21.51 FLOT1 IER3 TUBB 31.121 31.188 67.44 C6orf15 HCG22 30.709 30.766 57.01 C6orf134 C6orf136 DHX16 KIAA1949 NRM 31.973 32.004 31.43 C2 EHMT2 ZBTB12 C6orf27 HSPA1L LSM2 VARS 143 Results CEH Pair Common Segments Gene Loci Segment Start (Mb) End (Mb) Length (kb) 32.115 28.16 HCG22 31.248 22.96 CCHCR1 POU5F1 TCF19 30.028 20.10 HLA-A 31.426 85.60 HLA-C 30.485 30.560 74.71 31.546 31.587 40.76 32.790 22.49 31.341 31.426 85.60 HLA-C 31.458 31.486 28.03 MICA 30.801 30.822 21.51 FLOT1 IER3 TUBB 30.686 354.03 ABCF1 GNL1 HCG18 HLA-E PPP1R10 PRR3 RPP21 TRIM39 30.052 259.37 HCG4 HCG9 HLA-A HLA-F HLA-G 29.510 29.638 127.32 GABBR1 MAS1L OR10C1 OR11A1 OR12D3 OR2H1 UBD 30.879 30.914 35.14 C6orf214 32.794 32.97 31.886 29.13 HSPA1L LSM2 VARS 30.801 30.822 21.51 FLOT1 IER3 TUBB 31.887 31.910 23.94 C6orf48 HSPA1A HSPA1B HSPA1L 30.432 30.535 103.86 31.973 32.055 82.91 31.283 31.314 31.68 32.222 32.253 30.02 AGPAT1 EGFL8 PPT2 PRRT1 31.857 31.886 29.13 HSPA1L LSM2 VARS 31.588 31.613 24.19 BAT1 31.787 31.972 184.84 C6orf25 C6orf26 C6orf27 C6orf48 CLIC1 DDAH2 EHMT2 HSPA1A HSPA1B HSPA1L LSM2 LY6G6C LY6G6D LY6G6E MSH5 NEU1 SLC44A4 VARS 32.767 32.844 76.99 HLA-DQA2 HLA-DQB2 31.044 31.100 56.33 C6orf205 30.879 30.914 35.14 C6orf214 31.283 31.314 31.68 32.451 32.482 31.16 BTNL2 HCG23 30.801 30.822 21.51 FLOT1 IER3 TUBB 30.008 30.028 20.10 HLA-A 32.519 32.657 138.09 HLA-DRA HLA-DRB1 HLA-DRB5 31.205 31.254 49.10 CCHCR1 POU5F1 PSORS1C1 PSORS1C2 PSORS1C3 TCF19 32.222 32.253 30.02 AGPAT1 EGFL8 PPT2 PRRT1 31.787 A*0207-C*0102-B*4601-DRB1*0901 VS PGF: A*0301-C*0702-B*0702-DRB1*1501 31.134 31.857 A*0207-C*0102-B*4601-DRB1*0901 VS MANN: A*2902-C*1601-B*4403-DRB1*0701 HSPA1L LSM2 VARS 31.106 32.761 A*0207-C*0102-B*4601-DRB1*0901 VS DBB: A*0201-C*0602-B*5701-DRB1*0701 FLOT1 IER3 MDC1 NRM TUBB 29.13 29.793 A*0207-C*0102-B*4601-DRB1*0901 VS COX: A*0101-C*0701-B*0801-DRB1*0301 61.61 31.886 30.332 A*0207-C*0102-B*4601-DRB1*0901 VS APD: A*0101-B60-DRB1*1301 30.822 31.857 32.767 A*0207-C*0102-B*4601-DRB1*0901 VS A*0203-C*0702-B*3802-DRB1*1602 30.761 31.341 A*1101-C*0801-B*1502-DRB1*1202 VS SSTO: A*3201-C*0501-B*4402-DRB1*0403 AGER AGPAT1 CREBL1 CYP21A2 EGFL8 FKBPL PBX2 PPT2 PRRT1 RNF5 TNXB 30.008 A*1101-C*0801-B*1502-DRB1*1202 VS QBL: A*2601-C*0501-B*1801-DRB1*0301 151.43 31.225 A*1101-C*0801-B*1502-DRB1*1202 VS PGF: A*0301-C*0702-B*0702-DRB1*1501 32.266 31.910 123.00 C6orf25 C6orf26 C6orf27 C6orf48 CLIC1 DDAH2 HSPA1A HSPA1B HSPA1L LSM2 LY6G6C LY6G6D LY6G6E MSH5 VARS 30.761 30.869 107.83 FLOT1 HCG20 IER3 MDC1 NRM TUBB 30.569 30.662 92.79 ABCF1 GNL1 HLA-E PRR3 31.973 32.055 82.91 BF C2 DOM3Z EHMT2 RDBP SKIV2L STK19 ZBTB12 HCG26 MICB BF C2 DOM3Z EHMT2 RDBP SKIV2L STK19 ZBTB12 144 Results CEH Pair Common Segments Start (Mb) End (Mb) Length (kb) Gene Loci Segment 32.519 32.657 138.09 HLA-DRA HLA-DRB1 HLA-DRB5 30.679 30.766 86.77 C6orf134 C6orf136 DHX16 KIAA1949 MRPS18B NRM PPP1R10 31.205 31.279 73.61 CCHCR1 HCG27 POU5F1 PSORS1C1 PSORS1C2 PSORS1C3 TCF19 32.263 32.297 34.48 GPSM3 NOTCH4 PBX2 32.418 32.447 28.30 C6orf10 30.485 30.535 50.28 32.222 32.253 30.02 AGPAT1 EGFL8 PPT2 PRRT1 32.519 32.657 138.09 HLA-DRA HLA-DRB1 HLA-DRB5 29.510 29.614 103.83 MAS1L OR10C1 OR11A1 OR12D3 OR2H1 32.712 32.743 30.58 HLA-DQA1 HLA-DQB1 32.418 32.447 28.30 C6orf10 30.801 30.822 21.51 FLOT1 IER3 TUBB 29.245 29.389 143.52 OR2J2 OR5U1 30.432 30.535 103.86 31.850 31.886 35.88 C6orf27 HSPA1L LSM2 VARS 32.451 32.482 31.16 BTNL2 HCG23 32.840 32.867 27.74 32.191 32.213 21.83 CREBL1 FKBPL 32.269 32.291 21.39 GPSM3 NOTCH4 31.381 31.426 45.11 30.879 30.914 35.14 33.094 33.125 31.47 31.857 31.886 29.13 32.767 32.794 26.39 30.801 30.822 21.51 FLOT1 IER3 TUBB 30.008 30.028 20.10 HLA-A A*0203-C*0702-B*3802-DRB1*1602 VS MANN: A*2902-C*1601-B*4403-DRB1*0701 33.014 33.085 70.91 BRD2 HLA-DMA HLA-DMB HLA-DOA 32.418 32.447 28.30 C6orf10 31.458 31.486 28.03 MICA A*0203-C*0702-B*3802-DRB1*1602 VS MCF: A*0201-C*0304-B*1501-DRB1*0401 29.712 29.792 79.54 MOG ZFP57 31.973 32.004 31.43 C2 EHMT2 ZBTB12 31.638 31.663 25.25 LST1 LTA LTB TNF 31.205 31.382 176.44 CCHCR1 HCG27 HLA-C POU5F1 PSORS1C1 PSORS1C2 PSORS1C3 TCF19 32.115 32.266 151.43 AGER AGPAT1 CREBL1 CYP21A2 EGFL8 FKBPL PBX2 PPT2 PRRT1 RNF5 TNXB 32.329 32.447 118.01 C6orf10 30.569 30.662 92.79 ABCF1 GNL1 HLA-E PRR3 32.871 32.906 34.57 HLA-DOB TAP2 31.857 31.886 29.13 HSPA1L LSM2 VARS 31.546 31.572 25.42 HCG26 30.801 30.822 21.51 FLOT1 IER3 TUBB 31.138 30.485 32.519 31.188 30.535 32.547 50.75 50.28 28.20 C6orf15 31.562 31.587 25.49 MICB 31.190 31.215 24.24 CDSN PSORS1C1 PSORS1C2 29.464 29.614 150.34 MAS1L OR10C1 OR11A1 OR12D2 OR12D3 OR2H1 OR5V1 30.709 30.742 32.64 C6orf134 C6orf136 DHX16 30.801 30.822 21.51 FLOT1 IER3 TUBB A*0207-C*0102-B*4601-DRB1*0901 VS MCF: A*0201-C*0304-B*1501-DRB1*0401 A*0207-C*0102-B*4601-DRB1*0901 VS QBL: A*2601-C*0501-B*1801-DRB1*0301 A*0207-C*0102-B*4601-DRB1*0901 VS SSTO: A*3201-C*0501-B*4402-DRB1*0403 A*0203-C*0702-B*3802-DRB1*1602 VS COX: A*0101-C*0701-B*0801-DRB1*0301 A*0203-C*0702-B*3802-DRB1*1602 VS DBB: A*0201-C*0602-B*5701-DRB1*0701 A*0203-C*0702-B*3802-DRB1*1602 VS PGF: A*0301-C*0702-B*0702-DRB1*1501 A*0203-C*0702-B*3802-DRB1*1602 VS QBL: A*2601-C*0501-B*1801-DRB1*0301 A*0203-C*0702-B*3802-DRB1*1602 VS SSTO: A*3201-C*0501-B*4402-DRB1*0403 C6orf214 HSPA1L LSM2 VARS HLA-DRA 145 Results The A33-B58-DR3 shares a long block of symmetry (145kb) with the COX (A1-B8DR3) haplotype in the Class I region This segment includes the BAT1, LTA, LTB, MICB, NFKBIL1 and TNF genes This segment of similarity once again supports data published recently (Dorak et al 2006) The PGF (A3-B7-DR15) haplotype shares stretches of symmetry with A33-B58-DR3 in the region telomeric to Class I, a region that includes the several olfactory receptor genes This haplotype pair also shares a similar block in the RCCX locus, confirming the CYP21A2 genotype data reported by Dorak et al 2006 Compared against the other local Chinese CEHs, A33-B58-DR3 shows more muted segments of symmetry Notable segments include a similar RCCX module shared with A2-B38-DR16 and A11-B15-DR12 Interestingly, although A33-B58-DR3 and A2-B46-DR9 are similarly associated with nasopharyngeal carcinoma in the Chinese population (Lu et al 1990, Ren et al 1995, Hildesheim et al 2002), these haplotypes have little symmetry except at the BAT1 locus and a block at the telomeric end of the class III locus carrying PRRT1, PPT2, AGPAT1 and EFGL8 A2-B46-DR9 CEH Comparisons The A2-B46-DR9 CEH is most similar to the A2-B38-DR16 haplotype, especially across the extended Class I and Class I regions, most probably reflecting the common lineage of the HLA-A02 haplotypes Although the DBB (A2-B57-DR7) and MCF (A2-B15-DR4) Caucasian haplotypes carry HLA-A02 alleles too, the sequence coverage within the Class I region of these haplotypes is too fragmented to make much comparisons here (Horton et al 2008) The similarity between A2-B46-DR9 and A2-B38-DR16 stretches for most of the area between 29.5Mb and 30.6Mb, and 146 Results includes olfactory receptor cluster, as well as the HLA-A, HLA-E, HLA-F, HLA-G, PPP1R10, RPP21 and TRIM31 coding regions They however differ at MOG and the TRIM cluster between 30.2Mb and 30.3Mb, possibly due to gene conversion or double-cross over events The A11-B15-DR12 haplotype however carries an identical haplotype with A2-B46-DR3 for this TRIM cluster, part of a 362kb long segment identical between these haplotypes, illustrating the block shuffling that occurs between CEHs A11-B15-DR12 and A2-B46-DR4 also share a 113kb long segment containing gene loci for FLOT1, TUBB, DHX16 and MDC1 There is a 47kb long block carrying HLA-B and MICA shared between this pair, coinciding with the segment of fixity within B15 and B46 haplotypes discussed earlier Compared against the Caucasian MHC haplotypes, A2-B46-DR9 shows segments of similarity across the DRB locus with MANN (A29-B44-DR7), MCF (A2-B15-DR4) and SSTO (A32-B44-DR4), reflecting the common DR53 lineage between these haplotypes as discussed in an earlier section of this thesis A2-B46-DR9 also shares long segments of symmetry with the PGF (A3-B7-DR15) haplotype spanning 400kb across the MHC These fall within the Class I and Class III regions, and include the HSPA1 locus, supporting the results reported recently (Dorak et al 2006) A2-B38-DR16 and A11-B15-DR12 Comparisons These haplotypes share many common segments across the MHC totalling 1.13Mb in length Especially notable is a 617kb segment that runs from 31.68Mb to 32.30Mb possibly encompassing the entire Class III region This is the longest known contiguous segment shared between any CEHs, and includes genes from AIF1 from the telomeric end to NOTCH4 at the centromeric end, but does not include the BAT1- 147 Results LTA-LTB-TNF-NCR3 genes at the telomeric end of the Class III region These haplotypes also share a 151kb segment with the PGF haplotype across the RCCX module at the centromeric end of the Class III region The A2-B38-DR16 haplotype also show strong symmetry with PGF at several other locations, including a 176kb segment that stretches across HLA-C and includes the psoriasis candidate genes (PSORS1C1,2,3) PGF and A2-B38-DR16 share the same HLA-C allele (C*0702), and this segment is possibly a remnant of the ancestral haplotype 148 ... A2 -B 46- DR9 haplotypes, with 65 % of the haplotypes carrying the allele ‘G’ and the rest carrying the allele ? ?A? ?? This SNP lies within an intron of the PPP1R10 gene, bordering an AluJb transposable... within the conserved haplotype A lack of detailed knowledge of the variation and allelic content of the CEHs beyond the HLA genes has hampered finemapping of disease loci To this end, the MHC haplotype... identified in the HapMap project (International HapMap Consortium 2005) At the telomeric end, a single chromosome breaks at position 29,591,947; again this location is close to a HapMap recombination