Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 22 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
22
Dung lượng
1,36 MB
Nội dung
Results 3.3 An Integrated High-Resolution SNP and HLA Haplotype Map of the MHC The fine-scale organization of linkage disequilibrium across the MHC was described in the previous section using a high-resolution SNP map In this section, HLA genotype data was merged into the SNP genotypes to construct an integrated LD map, allowing for the analysis of HLA haplotype-specific differences in LD at a higher resolution The data from HLA homozygous samples was also included into this analysis, providing valuable and unambiguous genotypes for establishing a highresolution map of the conserved extended haplotypes in the local Chinese population Allele level HLA genotypes at the HLA-A, HLA-B, HLA-C and HLA-DRB1 loci were determined for each sample in this study using sequenced-based typing These multi-allelic genotypes were combined with bi-allelic SNP genotypes and integrated full-length haplotypes were re-constructed using the program PHASE (Stephens and Scheet 2005) To maximise the phase-certain haplotypes provided by the familybased samples, the phasing was performed in steps Parental haplotypes were first constructed from the parents-offspring genotypes unambiguously, and next these phase-certain haplotypes were seeded into the set of unrelated samples, improving the performance of the algorithm (Stephens and Scheet 2005) The program PHASE provides an estimate to the quality of the phasing and this was very high; the average phase certainty for a genotype was over 99%, and 96% of the genotypes had a phase certainty of 100% Given the high phasing percentages, the high-density of this map and that most of the SNPs are in strong LD with at least another SNP, it is not believed that the in-silico haplotype-reconstruction introduced much bias into the results presented in the following pages 105 Results 3.3.1 Capturing HLA Alleles with Tag SNPs and Haplotypes Having shown that it is possible to generate tag SNPs with this high-resolution map, and that these tag SNPs sufficiently act as proxies for other SNPs, the tagging of common HLA alleles in the local Chinese population using single SNPs or combinations of SNPs was explored To this, an iterative algorithm was created to search for 1) the best single SNP and 2) the best multiple-SNP haplotype that would tag for each common HLA allele, based on the r2 coefficient All combinatorial possibilities of to 6-SNP haplotypes were assembled and the haplotype that tags a HLA allele with the maximum r2 was taken as the tagging haplotype for that particular allele The results are shown in Table 3.9 Using single-SNP tags, most HLA alleles are not captured well, with only out of the 26 common HLA alleles captured with a maximum r2 >= 0.8 The mean single-tag maximum r2 was 0.67 Interestingly, there is no connection between LD, conserved haplotypes or allele frequency with the ability to tag an HLA allele Alleles that lie on conserved haplotypes such as A*0207, A*0203, B*4601, DRB1*0803 are not captured by a single SNP tag Neither are common alleles such as A*1101 (30% in the population) nor DRB1*1202 (9.4% in the population) No single tag SNP successfully captures a HLA allele perfectly (r2=1) Multiple-SNP haplotype tags however perform much better than single tags 25 of the 26 HLA alleles were tagged with an r2 of at least 0.8, with a mean maximum r2 value of 0.95 In some cases the performance is dramatically better; DRB1*1602 is tagged with an r2 value of 0.23 using a single SNP but is captured with r2 =0.95 using a 5- 106 Results SNP haplotype It is however noted that the HLA-A*0201 allele could not be captured by a single or a multi-SNP tag with high r2 As r2 is not merely a measure of linkage disequilibrium but rather a correlation factor of allele frequencies (Ott 1999), a high r2 between a single SNP on this map and an HLA allele is unlikely unless that SNP allele is unique only to chromosomes carrying that HLA allele This is reflected in the low single-SNP scores Similarly, a high r2 between multi-SNP haplotype tags and an HLA allele occurs when that combination of SNP alleles is unique (or nearly so) to chromosomes carrying that HLA allele By extension, such a multi-SNP haplotype represent segments of the MHC that is identical-by-descent in most chromosomes carrying that HLA allele, and the underlying SNP haplotype unique to the HLA allele The inability to capture the HLA-A*0201 allele with high r2 reflects the difficulty in identifying segments on A*0201 haplotypes that fulfil these criterion 107 Results Table 3.9 Using Tag SNPs to Define HLA Alleles Single-SNP Tags Multi-SNP Tags HLA Allele SNP Allele r A*1101 rs2517713 C 0.74 A*0201 rs2240070 A 0.31 A*0207 rs3130667 A 0.50 A*2402 A*3303 rs2394186 rs1061235 G A 0.88 0.91 A*0203 rs4713210 A 0.35 B*1301 B*4601 B*4001 B*5801 B*3802 rs2844586 rs7452206 rs2253908 rs130077 rs1811197 A G A A A 0.82 0.70 0.93 0.98 0.65 B*1502 rs2524043 G 0.44 C*0702 C*0304 C*0302 C*0102 C*0801 rs2394953 rs7745906 rs130077 rs4361609 rs1049853 G A A C A 0.94 0.63 0.98 0.73 0.91 DRB1*0901 DRB1*1501 rs2395185 rs6931337 A A 0.38 0.71 DRB1*0803 rs1041885 A 0.71 DRB1*1202 DRB1*0301 DRB1*0405 rs2187823 rs2040410 rs17499655 A A A 0.45 0.88 0.59 DRB1*1602 rs701831 A 0.23 DRB1*1101 rs4639334 A 0.65 Mean r2 0.67 SNPs Haplotype r2 rs2517713 rs7739434 rs1633013 rs6457144 rs3807031 rs3129171 rs2844821 rs1811197 rs2074483 rs3807031 rs2853961 rs6457144 rs3094116 rs3094165 rs2860580 rs6901541 rs1633019 rs1002044 rs2240070 rs2442728 rs1269556 rs3893538 rs2523442 C,A,A 0.83 A,A,G,G,G,G 0.74 A,G,T,A 0.97 A,G,A G,T 0.92 1.00 A,C,A,A,A 0.82 C,G G,C G,G,A G,G,G A,G,A 1.00 0.97 0.99 1.00 1.00 A,G,G,C 0.96 rs2394953 rs2853961 rs4713438 rs2074489 rs2894189 rs2853950 rs6904669 rs7452233 rs7451190 rs4122189 rs3130696 rs7759127 rs1633013 rs3095254 G,G,G A,C,A G,G,G A,A C,A,G 0.99 0.92 1.00 0.94 0.95 rs2395185 rs2516049 rs2517645 rs6931337 rs389512 rs6903608 rs9272346 rs2395181 rs3135339 rs2857596 rs660895 rs2856683 rs4713340 rs4988889 rs2254618 rs3948793 rs660895 rs9275555 rs2157337 rs9296032 rs6931337 rs6457617 rs7451962 rs204993 rs3763317 rs4639334 rs7743563 rs3129943 rs6901541 A,A,A A,G,G 0.99 0.98 G,C,G,A 1.00 G,A,G A,T,A G,G,G 0.97 1.00 0.97 C,C,A,G,A 0.95 G,A,T,A,A 0.96 Mean r2 0.95 rs2844573 rs2253908 rs707913 rs2233965 rs3819294 rs2844580 rs2523497 rs6904669 rs7451190 rs7452233 rs1811197 rs3095307 rs1265048 rs6929796 rs3130573 rs2074489 rs2442728 108 Results 3.3.2 Extended Haplotype Homozygosity of HLA Alleles Data from the first SNP map described in Section 3.1.3 of this thesis showed that LD patterns across the MHC vary in a HLA haplotype/allele-specific manner To describe this in detail using the high-resolution SNP map, the extent of LD in haplotypes carrying specific HLA alleles was examined using the Extended Haplotype Homozygosity (EHH) analysis Briefly, EHH calculated at a point X is defined as the probability that chromosomes, carrying an allele (or haplotype) of interest at an anchor locus, are identical-by-descent from the anchor locus to the point X (Sabeti et al 2002) Linkage disequilibrium decays with increasing distance from the anchor locus as the number of historical recombination events that occurred within increases, leading to a similar decay in EHH values A segment of high EHH is therefore indicative of the conservation of an extended haplotype without recombination, and segments of high EHH are identical-by-descent in the respective haplotypes Extended haplotype homozygosity calculations were performed with each of the classical HLA loci (HLA-A, -B, -C and DRB1) as the anchor locus Only common HLA alleles (>= 5% in the population) were included in this analysis These EHH values were plotted as a function of physical distance and shown as a set of charts in Figure 3.18 (panels A to D) Figure 3.19 summarizes the extent of genetic fixity for chromosomes carrying each common HLA allele, by depicting the segments of high EHH in coloured blocks 109 Results Figure 3.18 Extended Haplotype Homozygosity Plots for Common HLA Alleles (See legend on the next page) A HLA-G - HLAA HLA-F Telo HLA-A A*3303 A*0207 A*1101 A*2402 A*0203 A*0201 HLA-C B HLA-B LTA BAT2 C*0102 C*0801 Recombination Rate (cM/Mb) Extended Haplotype Homozygosity (EHH) C*0302 C*0304 C*0702 C HLA-C HLA-B LTA BAT2 B*5801 B*1502 B*4601 B*4001 B*1301 B*3802 D TSBP - DRA DRB1*0301 DQB1 - DQB3 TAP HLA-DRB1 DNA1/ RING3 DMB1/2/ DPB1 DRB1*1602 DRB1*1101 DRB1*0803 DRB1*0901 DRB1*1501 DRB1*1202 DRB1*0405 Position Along Chr 6p (Mb) 110 Results Figure 3.18 Extended Haplotype Homozygosity Plots for Common HLA Alleles Extended Haplotype Homozygosity (EHH) plots of 1Mb wide segments centred on the anchored locus are shown in this figure EHH values (primary Y-axis) are plotted against physical location along chromosome 6p (X-axis) The panels are anchored on different classical HLA loci Panel A: HLA-A, B: HLA-C, C: HLA-B, D: HLA-DRB1 Recombination hotspots are mapped onto the plots Vertical red bars rising from the X-axis are hotspots inferred from the recombination rate published by the HapMap project (International HapMap Consortium 2005) The recombination rate is reflected in the secondary Y-axis of the plots Translucent orange areas are boundaries of recombination hotspots identified by Cullen et al 2002, through genotyping of short tandem repeats in recombinant sperm The names of these hotspots are shown in red lettering above each plot, and correspond to the labels in Cullen et al 2002 Green bars are locations of recombination hotspots precisely mapped by Jeffreys et al 2000, 2001 through DNA sequencing of recombinant sperm Labels of these hotspots correspond to those published by Jeffreys et al There are multiple hotspots within a small 5-10Kb window and these are collapsed into a single bar on the plots Figure 3.19 Extent of EHH of Common HLA Alleles This figure summarizes the extent of EHH for all common alleles at the HLA loci (A,B,C and DRB1) The top table indicates the segments of high EHH for each common HLA allele Segments with EHH >=0.5 and EHH >= 0.8 are listed The bottom panel illustrates the extent of EHH for all common HLA alleles The darker shades indicate stretches where EHH is higher than 0.8 while the lighter shades mark out stretches where EHH is greater than 0.5 The long stretches of EHH for the conserved extended haplotype A*3303-C*0302-B*5801DRB1*0301 is clearly seen in the figure 111 Results Figure 3.19 Extent of EHH of Common HLA Alleles (See legend on previous page) Segment with EHH >=0.8 Segment with EHH >=0.5 Start (Mb) End (Mb) Length (Mb) Start (Mb) End (Mb) Length (Mb) A*0201 A*0203 A*0207 A*1101 A*2402 A*3303 29.803 29.769 29.792 29.947 29.830 29.463 30.032 30.048 30.429 30.032 30.032 30.742 0.23 0.28 0.64 0.09 0.20 1.28 29.680 29.632 29.595 29.839 29.638 28.970 30.187 30.230 30.684 30.036 30.032 31.544 0.51 0.60 1.09 0.20 0.39 2.57 B*1301 B*1502 B*3802 B*4001 B*4601 B*5801 31.111 31.203 31.214 31.429 31.189 30.900 31.544 31.681 31.544 31.545 31.551 31.681 0.43 0.48 0.33 0.12 0.36 0.78 31.102 30.844 31.200 31.382 30.901 29.447 31.568 31.839 31.544 31.545 32.134 32.882 0.47 0.99 0.34 0.16 1.23 3.44 C*0102 C*0302 C*0304 C*0702 C*0801 31.247 30.900 31.249 31.214 31.214 31.429 31.681 31.359 31.425 31.427 0.18 0.78 0.11 0.21 0.21 31.189 29.447 31.240 31.214 31.209 31.551 32.882 31.359 31.425 31.429 0.36 3.44 0.12 0.21 0.22 DRB1*0301 DRB1*0405 DRB1*0803 DRB1*0901 DRB1*1101 DRB1*1202 DRB1*1501 DRB1*1602 32.448 32.490 32.448 32.498 32.482 32.550 32.484 32.527 32.906 32.803 32.792 32.821 32.821 32.767 32.734 32.795 0.46 0.31 0.34 0.32 0.34 0.22 0.25 0.27 29.592 32.448 32.298 32.490 32.447 32.448 32.447 32.448 33.012 32.821 32.800 32.839 32.821 32.803 32.734 32.795 3.42 0.37 0.50 0.35 0.37 0.36 0.29 0.35 DRB1*1602 DRB1*1501 DRB1*1202 DRB1*1101 DRB1*0901 DRB1*0803 DRB1*0405 DRB1*0301 C*0801 C*0702 C*0304 C*0302 C*0102 B*4601 B*4001 B*3802 B*1502 B*1301 A*3303 A*2402 A*1101 A*0203 A*0201 29 29 30 B*5801 30 A*0207 31 31 Position Along Chr 6p (Mb) 32 32 33 33 112 Results In the plots of EHH as a function of physical distance, decay of EHH at increasing physical distance from the anchor locus is evident EHH does not decay in a gradual curve but rather in a step-wise manner, with segments of constant EHH punctuated by a sudden breakdown in values Sites of EHH decay across different allelic backgrounds tend to be clustered together in the same location To see if these segments of clustered EHH decay are consistent with the location of established recombination hotspots, the locations of the latter are mapped onto the EHH plots There are sources of information for recombination hotspots across the MHC One is the hotspots inferred from predicted recombination rates in HapMap populations (International HapMap Consortium, 2005) The other is recombination hotspots identified by genotyping recombinant sperm from random individuals, and there were two groups that separately published sperm crossover locations within the MHC By genotyping short tandem repeats in recombinant single-sperm, the first group identified recombination hotspot-regions across the MHC (Cullen et al 2002) Using a slightly different approach of pooled-sperm typing, the second group very precisely mapped another recombination hotspots within the class II region of the MHC (Jeffreys et al 2000, 2001) The locations of inferred and sperm-recombinant hotspots show a strong agreement with the sites of EHH decay In fact every recombination hotspot, either from the HapMap data or determined from sperm recombinants, lie in locations where there is significant decay in EHH This lends weight to the hypothesis that regions of high EHH are stretches of DNA that are identical-by-descent and unbroken by recombination, interrupted by sites of clustered EHH drops that are likely to be a 113 Results result of increased homologous recombination activity There is a great amount of variability in the rate of decay of EHH depending on the HLA allele background The A*3303-C*0302-B*5801-DRB1*0301 conserved extended haplotype (CEH) stands out immediately from the plots On chromosomes with a B*5801, C*0302 or a DRB1*0301 background, the half-life of EHH (where EHH remains above 0.5) stretches more than 3Mb across the classical MHC region (Figure 3.19) Put another way, if samples carrying any one of these alleles are picked at random, there is a greater than 50% chance that they are completely identical-by-descent across the entire MHC The other conserved extended haplotypes identified earlier not exhibit the same remarkable extent of EHH; the half-life of EHH is about 1Mb for alleles A*0207, B*4601 and B*1502, and for the other CEH associated alleles there are no significant differences from non-CEH alleles The segments of strong homozygosity (EHH >=0.8) and high homozygosity (EHH >= 0.5) seen at each HLA loci point to regions that are in strong allelic association with HLA alleles at that loci All HLA-A alleles, with the exception of allele A*1101, show strong EHH (EHH >=0.8) at the telomeric end that stretches at least to HLA-G (located at 29.90Mb) and HLA-F (29.81Mb) It is therefore likely that each HLA-A allele is associated with only one HLA-F and HLA-G allele If EHH >=0.5 is considered, most HLA-A alleles have high homozygosity to at least position 29.68Mb, and this region includes the gene loci MOG (29.74Mb) The centromeric boundary of strong EHH typically lies just after the HLA-A loci for most A-alleles Haplotypes carrying A*0207 show strong homozygosity to position 30.4Mb with high EHH to position 30.7Mb This segment of the chromosome contains a TRIM cluster (30.17Mb – 30.30Mb) and the HLA-E loci (30.56Mb) For chromosomes carrying 114 Results allele A*3303, the segment of strong EHH stretches more than 1Mb This segment includes the olfactory receptor cluster at position 29.50Mb The half-life EHH of A*3303 chromosomes stretches beyond the telomeric end of this SNP variation map and beyond the HLA-B locus at the centromeric end, encroaching into the MHC class III region At the HLA-B locus, all HLA-B alleles have strong homozygosity (EHH>=0.8) that reaches past the MICA locus (31.48Mb) and the HCP5 gene (31.54Mb) Consequently, each HLA-B allele is likely to be associated with a single allele at those loci The centromeric boundary for strong homozygosity for out of HLA-B alleles lie at a recombination hotspot around position 31.54Mb Strong homozygosity for the other alleles, B*1502 and B*5801, stretches to the recombination hotspot between LTA and BAT2 at around position 31.68Mb These alleles are likely to have strong allelic association with genes into the class III region, which includes MICB, BAT1, NFKBIL1, LTA, TNF, LTB, NCR and AIF1 The EHH half-life segment for allele B*5801 stretches more than 3.44Mb, starting from the telomeric boundary of this SNP map until past the DRB1 region Apart from allele B*4001, the telomeric boundary of strong EHH for HLA-B alleles reaches to least position 31.2Mb, coinciding with a recombination hotspot at position 31.214Mb seen in the HapMap data This segment includes the psoriasis susceptibility candidate genes PSORS1C1, PSORS1C2 and PSORS1C3, as well as HLA-C and POU5F1 that encodes for the OCT4 transcription factor necessary in maintaining stem cell pluripotency This high EHH block consistent in all HLA-B alleles coincides with the Cw-B frozen block described previously (Degli-Esposti et al 1992b, Yunis et al 2003) 115 Results The telomeric boundary for strong homozygosity for HLA-C alleles is similar to that of HLA-B alleles, breaking at the recombination hotspot at position 31.214Mb Hence, HLA-C alleles are similarly likely to have strong allelic association with alleles at the PSORS1C1, PSORS1C2, PSORS1C3 and POU5F1 loci With the exception of allele C*0302, which belongs to the A*3303-C*0302-B*5801DRB1*0301 CEH, none of other common HLA-C alleles show strong homozygosity into the HLA-B locus On the other hand, if the plots are anchored on HLA-B instead, high EHH in all HLA-B alleles (with the exception of B*4001) is seen to stretch across the HLA-C locus This apparent dichotomy is due to the different levels of diversity between the HLA-C and HLA-B loci There are more than twice as many HLA-B alleles compared to HLA-C alleles (45 vs 20) in the local Chinese population represented in this data set Consequently, a single HLA-C allele is associated with more than one HLA-B allele in Singaporean Chinese chromosomes In contrast, all other HLA-B alleles have a unique HLA-C partner, with the exception of B*4001 which is seen with at least HLA-C partners (Table 3.4) The HLA-DRB1 alleles all have a consistent sized segment of strong EHH, framed by recombination hotspots mapped in recombinant sperm – the TSBP-DRA and the DQB1-DQB3 hotspots This segment of strong EHH contains the gene loci for C6orf10 (position 32.40Mb), BTNL2 (32.48Mb), HLA-DRA (32.52 Mb), HLADQA1 (32.71Mb) and HLA-DQB1 (32.74Mb) This segment coincides with the HLA-DR/HLA-DQ frozen block (Degli-Esposti et al 1992b, Yunis et al 2003) 116 Results 3.3.3 Stretches of Genetic Fixity within HLA Lineages HLA alleles can be classified into serological families and generally alleles in a family are derived from a recent common ancestor as seen from phylogenetic analysis of HLA sequences (Gu and Nei 1999, McKenzie et al 1999) To see if this translates to shared-haplotypes in the SNP map unbroken by recombination, the SNP haplotypes across the HLA loci are analysed in chromosomes carrying these monophyletic alleles Using EHH data from the previous section as a guide, segments in the MHC conserved in haplotypes carrying each of these HLA alleles were identified and compared 3.3.3.1 Genetic Fixity in HLA-A02 Haplotypes Three of the common HLA-A alleles (A*0201, A*0203, A*0207) belong to the same A2 serological family, and have been shown to cluster together in a clade in phylogenetic trees constructed from HLA-A sequences (McKenzie et al 1999) Haplotypes carrying these alleles are indistinguishable across a 380kb region (positions 29,791,787 to 30,171,347) in the high-resolution SNP map, a segment containing 140 genotyped SNPs (Figure 3.20 panel A) This segment includes gene loci coding for HLA-F (position 29.81Mb), and HLA-G (29.90Mb), as seen in the gene map in Figure 3.20 Panel B Outside of this segment the conserved haplotypes start to break at positions demarcated by recombination hotspots; one telomeric to HLA-F mapped in recombinant sperm (Cullen et al 2002) and another inferred from HapMap recombination rates This breakpoint also coincides with the haplotype boundaries described earlier 117 A*0207 Haplotypes n=72 A*0203 Haplotypes n=39 A*0201 Haplotypes n=55 A 29,791,787 HLA-F HLA-G HLA-A 30,171,347 Genetic Fixity in HLA-A02 The 380kb segment (marked out in the black border) is highly conserved among HLA A2 haplotypes This segment includes the HLA-F, HLA-G and ZNRD1 genes (Panel B) Panel A: The alignment of SNP haplotypes carrying HLA-A02 alleles The column colours represent different SNP alleles (A: red, C: Blue, G: Yellow, T: Geen) Figure 3.20 Haplotypes B Results 118 Results The similarity across the HLA-A locus in A2 haplotypes explains why it is difficult to find tag SNPs that capture the A2 alleles with a high r2; the best performing 6-SNP haplotype only succeeds in capturing the A*0201 allele with an r2 of 0.74 (Section 3.3.1) The conserved haplotype between A2 alleles means that none of the 140 SNPs within the 380kb region would be useful as tag SNPs, as they would be indistinguishable between the A2 alleles Furthermore, as A*0201 haplotypes start differing outside this region, a consistent stretch of SNPs unique to A*0201 chromosomes is not identifiable The homozygosity of A*0203 and A*0207 haplotypes stretch further into the telomeric regions (as seen in the EHH data), including segments that form haplotypes unique to these alleles Therefore, it is possible to identify multi-SNP tags for tagging the A*0203 and A*0207 alleles 3.3.3.2 Genetic Fixity in Cw3 Chromosomes A similar fixity across the HLA-C locus is seen within C*0302 and C*0304 haplotypes, both belonging to the same Cw3(Cw10) serological family (Lefranc et al 1999) This fixed segment is not as extensive as the one seen in A02 alleles; it stretches from position 31,325,794 to position 31,358,621 of the map, and consists of a run of 20 consecutive SNPs (Figure 3.21) Haplotypes carrying these alleles are indistinguishable within this segment except for outliers carrying the C*0304 allele The haplotype breaks at both ends also coincide with the haplotype block boundaries defined earlier This 33kb frozen block is likely identical-by-descent from the last common ancestor from which the C*0302 and C*0304 alleles were derived from, and recombination events over time has shuffled the segments telomeric and centromeric to this block, leaving the centre intact 119 Results HLA-C 31,344,499 - 31,347,914 31,325,794 31,358,621 C*0302 Haplotypes n=46 C*0304 Haplotypes n=57 Figure 3.21 Genetic Fixity of Cw3 Alleles This alignment of SNP haplotypes shows the genetic fixity across a 33kb block in haplotypes carrying C*0302 and C*0304 alleles The column colours represent different SNP alleles (A: red, C: Blue, G: Yellow, T: Geen) 3.3.3.3 Genetic Fixity between B*1502 and B*4601 Chromosomes HLA-B*1502 and B*4601 alleles not belong to the same serological family and not cluster together in phylogenetic trees (McKenzie et al 1999), and hence are not expected to have been derived from a recent common ancestor However chromosomes carrying these alleles share a segment that is identical-by-descent spanning 50kb from positions 31,428,517 to 31,478,346, indicating that these alleles must have derived from a similar MHC haplotype background (Figure 3.22) Interestingly, B*4601 has been shown to have arose from a gene conversion event between a HLA-C allele and a HLA-B allele, a substitution that replaces part of exon 120 Results of a B15 allele with a donor sequence from a Cw1 allele (Zemmour et al 1992, Barber et al 1996) The shared segments on the SNP haplotypes clearly confirms this; all haplotypes carrying either a B*1502 or a B*4601 allele are identical across a stretch of 22 SNPs crossing the HLA-B locus The 2.6kb resolution of this SNP map does not pick up the variation within the HLA-B locus, but subsequent re-genotyping of the HLA-B locus in some of these samples confirm the differing alleles here HLA-B 31,429,622 - 31,433,001 31,428,517 31,478,346 B*1502 Haplotypes n=25 B*4601 Haplotypes n=89 Figure 3.22 Segment Identical-By-Descent in B*1502 and B*4601 Haplotypes Although B*4601 and B*1502 not belong to the same HLA serotype group, it has been shown that B*4601 alleles arose through an inter-locus gene conversion event on a B15 background The alignment SNP haplotypes confirms this, clearly showing a 50kb segment that is identical between haplotypes carrying these alleles The column colours represent different SNP alleles (A: red, C: Blue, G: Yellow, T: Geen) 121 Results 3.3.3.4 Ancient Markers At the HLA-DRB Locus HLA-DR haplotypes may be broadly classified into major haplotypes, DR1, DR8, DR51, DR52 and DR53 All of these haplotypes carry the DRB1, DRB9 and DRA loci, but vary in the number of other DRB paralogs in between (Bergström et al 1999) DR51, DR52 and DR53 haplotypes are believed to have branched off from the primordial DRB trunk before the separation of humans from the other hominids, while the DR1 and DR8 haplotypes diverged from DR51 and DR52 respectively later on (Svensson and Andersson 1997, Bergström et al 1999) Based on their DRB1 alleles, almost all of the DR haplotypes in the local Chinese population fall into of haplotypes (DR8, DR51, DR52 and DR53) The corresponding SNP haplotypes were aligned to see if there were similarities between haplotypes from the same DR group Using the EHH data, a 249kb segment around the HLA-DR loci that showed high EHH across all DR alleles was isolated from the SNP haplotypes The SNP map coverage around the HLA-DR loci is rather sparse, however a distinct signature formed by SNPs within this DR locus, rs701831 (at position 32,657,379) and rs9270657 (32,673,999), segregate clearly with the major DR serogroups (Figure 3.23) DR51 chromosomes carry a distinct A-A haplotype for these SNPs, DR52 chromosomes carry a G-C haplotype, and DR53 chromosomes have a G-A signature DR8 chromosomes have the same G-C haplotype as DR52 chromosomes, and it has been acknowledged that the DR8 haplotypes most likely diverged from a DR52 background (Svensson et al 1996) These SNPs effectively tag the ancestral DR haplotypes 122 Results DRB9 32536263 DRB5 32.60 Mb DRB6 DQA1 DRB1 32657379 32673999 DQB1 32.70 Mb 32785066 DR51 n=81 PGF-DR51 DR8 n=51 DR52 n=154 COX-DR52 QBL-DR52 APD-DR52 DR53 n=167 MCF-DR53 SSTO-DR53 DBB-DR53 MANN-DR53 Figure 3.23 2-SNP Signature Segregates DR Haplotypes This figure shows a SNP haplotype alignment of the major DR8, DR51, DR52 and DR53 haplotypes within a 249kb segment across the DRB-DQ region as seen in the local Chinese population Also included, in red labels, are the derived SNP haplotypes from the fully sequenced MHC haplotypes published recently (Horton et al 2008) Different colours represent the SNP allele at each location (A: red, C: blue, G: yellow, T: green) The grey sites represent SNPs not mapped in the full-length MHC sequences because of gaps in the assembly Two SNPs (marked in the blue border) segregate with these ancient DR haplotypes and have not been broken apart by homologous recombination since the separation of human from other hominids 123 Results The full-length MHC haplotypes from homozygous cell lines were published recently (Horton et al 2008) and by mapping the SNPs in this study onto these sequences, the same 249kb SNP block was derived for these cell lines The two tag SNPs also capture the DR haplotypes in these cell lines, confirming the validity of the tags and indicating that these tags not only apply to the local Chinese population The DR52 and DR53 haplotype alignments are expanded in Figure 3.24 and illustrate the different resolution of fixity that exist across the DR locus At its core, the 2-SNP tag is fixed across all haplotypes belonging to the same DR52 and DR53 families Outside this core, the haplotypes segregate according to allele level specificities, and exhibit high homozygosity with clear haplotype patterns unique to each DRB1 allele These segments represent “fixed blocks” within which recombination have not occurred since the formation of these DRB1 alleles 124 Results DRB9 32536263 DRB5 32.60 Mb DRB6 DQA1 DRB1 32657379 32673999 DQB1 32.70 Mb 32785066 DRB1*1201 n=14 DRB1*1202 n=41 DRB1*1401 n=12 DRB1*1405 n=8 DRB1*1101 n=24 DRB1*0301 n=40 QBL-0301 COX-0301 Figure 3.24 DR52 and DR53 Haplotypes across the DRB region Panel A: DR52 Haplotypes This SNP haplotype alignment covers the 249kb segment across the DRB region, as seen in Figure 3.23 The colours differentiate the SNP alleles at each location (A: red, C: blue, G: yellow, T: green) Haplotypes for common DRB1 alleles that fall within the DR52 group are shown in this alignment The QBL and COX MHC reference haplotypes carry the DRB1*0301 allele (belonging to the DR52 group) and are included for comparison The SNPs that effectively tag the DR52 haplotype is outlined with a blue border The haplotypes across the segment is unique for each DRB1 allele in the local Chinese population, but all retain the same 2-SNP signature at the DRB core 125 Results DRB9 32536263 DRB5 32.60 Mb DRB6 DQA1 DRB1 32657379 32673999 DQB1 32.70 Mb 32785066 DRB1*0403 n=20 DRB1*0405 n=32 DRB1*0406 n=15 MCF-0401 SSTO-0403 DRB1*0701 n=16 DBB-0701 MANN-0701 DRB1*0901 n=92 Figure 3.24 DR52 and DR53 Haplotypes across the DRB region Panel B: DR53 Haplotypes Haplotypes for common DRB1 haplotypes that fall within the DR53 group are shown here The MCF, SSTO, DBB and MANN reference MHC haplotypes carry DRB1 alleles that belong to the DR53 group and are included for comparison The SNPs that effectively tag the DR53 haplotype are outlined with a blue border 126 ... and the haplotype that tags a HLA allele with the maximum r2 was taken as the tagging haplotype for that particular allele The results are shown in Table 3.9 Using single-SNP tags, most HLA alleles... (Svensson and Andersson 1997, Bergström et al 1999) Based on their DRB1 alleles, almost all of the DR haplotypes in the local Chinese population fall into of haplotypes (DR8, DR51, DR52 and DR53) The. .. tags and indicating that these tags not only apply to the local Chinese population The DR52 and DR53 haplotype alignments are expanded in Figure 3.24 and illustrate the different resolution of