Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 25 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
25
Dung lượng
6,29 MB
Nội dung
Results 3.2 A High-Resolution Linkage Disequilibrium Map of the MHC In the preceding section, the low-resolution, first-generation SNP map provided an overview of the linkage disequilibrium patterns of the MHC in the Singaporean Chinese population From that map, the block-like structure of LD is seen and the conserved extended haplotypes stretching across megabases were described However the density of the first generation SNP map limits the ability to resolve fine-scale recombination patterns in the MHC While only 31.5% of that map falls within a haplotype block, the recent HapMap publication concluded that the fraction of the genome covered by haplotype blocks is greater than 65% (International HapMap Consortium, 2005) In a bid to delineate the fine-scale recombination patterns, a higher resolution SNPvariation map of the MHC in the local Chinese population was created Rapid improvements in SNP genotyping technology coupled with increased polymorphism data from the International HapMap Project and MHC studies in other populations (e.g Miretti et al 2005, de Bakker et al 2006) facilitated a construction of such a map Having established the conserved haplotypes in the previous study, these were taken into consideration and HLA-homozygous samples were sourced for and included in the sample set Genotyping these homozygous samples at a highresolution provides a high quality dataset to study in detail the conserved haplotypes in the local population, and to compare these CEHs with those reported in other populations The data reported here will also provide a resource for studying and understanding HLA-disease associations 80 Results 3.2.1 High-Resolution SNP Variation Map of the MHC For constructing a fine-scale variation map of the MHC, 2360 SNPs were genotyped in 284 Singaporean Chinese individuals The bulk of these samples consisted of 214 randomly selected and unrelated individuals Of these 77 overlapped with the samples genotyped in the previous map Another 27 samples were taken from archived Blymphoblastoid cell-lines that were tested and selected for being homozygous at or HLA loci These samples were representative of the CEHs identified in the previous section The final 41 samples were taken from 12 parental-offspring families (with at least both parents and a child) These 12 families provided 48 phase-unambiguous haplotypes that would be useful for improving the haplotype-reconstruction in the unrelated individuals A breakdown of the 284 samples is shown in Table 3.5 below Table 3.5: Composition of Samples Used in Constructing High Resolution SNP Variation Map Sample Category Count Unrelated Chinese individuals Chinese Nuclear Families (12 families) Homozygous Cell-lines (See below for details) 214 41 29 Total 284 Haplotype Breakdown of Homozygous Samples Count A*0207, B*4601, DRB1*0901 A*3303, B*5801, DRB1*0301 A*0207, B*4601 A*1101, B*4001 A*1101, B*4001, DRB1*1101 A*1101, B*4001, DRB1*0901 A*0203, B*3802, DRB1*4003 A*1101, B*1502, DRB1*1202 A*1101, B*5401, DRB1*0803 A*2402, B*3501, DRB1*1501 A*1101, B*4001, DRB1*1201 A*1101, B*4001, DRB1*0405 3 2 1 1 1 Total 29 81 Results SNP genotyping was once again performed using the Illumina GoldenGate assay on a BeadArray platform Of the 2360 SNP positions attempted, 2290 were successfully genotyped The overall genotyping quality was very high; the locus success rate was over 97%, the call rate was over 99% and the reproducibility was higher than 99.99% Of the 284 Chinese samples genotyped, results were not obtained for (all belonging to the unrelated individuals group), giving a sample success rate of 97.9% For filtering out uninformative and possibly erroneously called genotypes, a series of filters was employed Only SNPs with at least a 5% minor allele frequency and in Hardy-Weinberg equilibrium (using a p-value threshold of 0.001) were retained The minor allele frequency and heterozygosity distributions of the 2290 markers can be seen in Figure 3.11 These charts show that the 2290 SNPs had a uniform MAF distribution with more SNPS skewing towards the higher end of the heterozygosity scale Figure 3.11 Minor Allele Frequency and Heterozygosity Distributions of the High-Resolution MHC SNP Map The MAF and heterozygosity distributions of the successfully genotyped SNPs are shown in the bar charts Bars in grey indicate the proportion of markers that are deemed non-informative and excluded in subsequent analyses Out of the 2290 successfully genotyped SNPs, 1877 were retained in all 82 Results To weed out other potential genotyping errors, SNPs that had genotypes disconcordant with pedigree structure in more than one family were also removed The locations of the SNPs were re-confirmed by mapping the flanking sequences used in the design of the SNP assays back to the human genome assembly This resulted in the remapping of SNPs within the MHC The SNP “rs2308655” was remapped from 31,345,141 to 31,430,282 while “rs1611627” was remapped from 29,965,650 to 29,905,761 In both of these cases, the error was in the Illumina annotation, and the error was communicated back to them In total 1877 markers were retained, establishing a SNP map that covers a 4.91Mb segment of the chromosome 6p, from positions 28.97 to 33.88Mb With an average gap of 2.6kb (and a median of 1.6kb) between consecutive SNPs, this map is about times denser than the previous one Gap intervals range from 18bp to 71kb with over 88% of the gaps less than 5kb There were distinct gaps that span over 25kb and these are listed in Table 3.6 Two of the largest gaps were within the hyper-variable HLA-DRB (71kb) and RCCX loci (59kb), which exhibit MHC haplotype-specific lengths and gene content (Dawkins et al 1999) Individuals carrying different MHC haplotypes may differ in the number of HLA-DRB paralogues as well as different number of copies of the C4A/C4B genes within the RCCX locus The other large gaps cover segments that are densely packed with large tracks of repetitive and transposable elements These gaps most probably reflect difficulties in designing SNP assays in regions with repetitive sequences and variable-length polymorphisms, resulting in the lack of genotype information here 83 Results Table 3.6: Gaps Larger than 25kb in the High-Resolution SNP Map Gap Length (kb) Position Along Chromosome 6p (Mb) 71.05 32.59 – 32.66 Hyper-variable DRB region 61.04 29.95 – 30.01 Gene desert densely filled with large transposable elements 59.29 32.06 – 32.11 Hyper-variable RCCX region 55.56 33.54 – 33.59 Gene desert densely filled with large transposable elements 43.35 31.38 – 31.43 Gene desert densely filled with large transposable elements 27.87 33.79 – 33.82 IHPK3 gene loci interspersed with repeat elements Description of Loci The large gaps in this map coincide with regions of complex polymorphism and repeat elements, reflecting the difficulty in designing SNP assays here For constructing the LD and haplotype maps of the Singaporean Chinese population, only genotype data from the 208 unrelated individuals (6 samples failed the Illumina genotype assays) were used As the 29 specifically chosen homozygous cell-lines and the 41 family-chromosomes were not a random sampling of the local Chinese population, these were not included in constructing population LD maps However, genotype information from the HLA homozygous cell-lines are a valuable source of extended haplotypes across the MHC and these were used in subsequent analysis of HLA haplotypes and recombination breakpoints The family-based genotypes were used to reconstruct phase-unambiguous haplotypes that were subsequently used to improve the haplotype phasing of the unrelated individuals (See Methods) The allele frequencies for the SNPs in this data set were compared to those reported for the populations genotyped as part of HapMap project (International HapMap Consortium, 2005) As expected, of the populations the allele frequencies in the 84 Results local Chinese show the tightest correlation with those reported in the Beijing Chinese (CHB) samples (R2 = 0.94), confirming the quality and reliability of the genotyping data There is also good correlation with the Japanese (JPT) allele frequencies (R2 = 0.84), reflecting the relatively recent shared ancestry of the ethnic groups The CHB and JPT datasets are frequently combined in HapMap data releases, but the results here indicate that when using HapMap data for designing informative genotyping Allele Frequencies of HapMap Population Panels panels in the local Chinese population, it is better to consider the CHB data only R2=0.94 R2=0.84 R2=0.45 R2=0.57 Allele Frequencies of Singaporean Chinese Figure 3.12 Comparing Allele Frequencies with HapMap Panels Allele frequencies for the 1877 informative SNPs genotyped in the local Chinese population were plotted against the corresponding allele frequencies from each HapMap population and the Pearson correlation coefficient was calculated Clockwise from top left: CHB – Han Chinese (Beijing), JPT – Japanese (Tokyo), CEU – Caucasian (CEPH), YRI – African (Yoruban, Nigeria) Data was obtained from HapMap release 22 85 Results 3.2.2 Estimating Coverage of Known Variation in the MHC using the highresolution SNP Map The MHC is known to be the most polymorphic region in the genome and the 1877 SNPs genotyped in this study is a subset of the known variation here (Horton et al 2008) The publicly available HapMap data offers the opportunity to address how effective a proxy this 2.6kb-resolution SNP map is to the other known SNPs in the Chinese population Having established above that HapMap Han Chinese data is a good representative for allele frequencies in the local Chinese population, this Han Chinese data was used as a surrogate test set Deposited Han Chinese genotypes in release 22 of the HapMap consist of 9479 SNPs across the MHC, including the 1877 informative SNPs genotyped in this study To test the efficacy of these 1877 in representing the variation in the remaining HapMap Han Chinese SNPs not genotyped in this study, allelic correlation – as determine by r2 – between the 1877 SNPs and the remaining HapMap SNPs were calculated from the HapMap Han Chinese genotypes The results are plotted in bar charts in Figure 3.13 The panel of SNPs used in this study represents most of the variation in the HapMap Han Chinese population well Of the 7602 HapMap SNP loci not genotyped in this study, more than half (51.1%) are represented by a perfect proxy (r2 = 1) within the 1877 marker set used On average, the 7602 SNPs were represented by a proxy SNP with a mean r2 value of 0.84 Uninformative SNPs make up bulk of the 341 HapMap SNPs that were poorly represented (defined as a SNP without a good proxy, or r2 =0.9 to darker blue regions indicating D´ = 0.8 Percentage of nontag SNPs with perfect tag proxy (r =1) CHB 790 0.92 90% 34% JPT 632 0.84 70% 23% JPT+CHB 780 0.91 87% 33% YRI 872 0.86 74% 34% CEU 806 0.87 76% 33% HAPMAP 1034 0.93 88% 41% SG Chinese 710 0.96 100% 35% * CHB: Han Chinese, JPT: Japanese, YRI: Yoruba-African, CEU: Caucasian CEPH Of the HapMap population panels, tag SNPs generated using the Han Chinese clearly outperforms any of the other populations, with transferability similar to tag SNPs generated in the local population This again reiterates the tight correlation in allele frequencies between the Beijing and local Chinese populations The Japanese population is more homogenous then other populations and therefore the number of tag SNPs defined using a Japanese population is smaller than the rest However, the performance of Japanese tag SNPs in the local Chinese population is clearly the worst Japanese tag SNPs captured the least number of SNPs with an r2 of 0.8, had a lowest average r2 of captured SNPs, as well as lowest number of perfect proxies As the number of Japanese and Chinese samples typed in the HapMap project is smaller than those of the African and Caucasian samples, the HapMap routinely groups the 103 Results populations together as a combined “East Asian” population The transferability tests show that no advantage is gained by grouping the East Asian population samples together, and may in fact lead to a poorer performance in terms of average r2 and number of perfect proxy tags However, grouping the entire set of HapMap population panels together may be beneficial Since this combined set would include all the variation seen in the populations, tags generated from it will all encompassing As a result, the transferability of this tag of SNPs has the best performance, in terms of the number of perfect proxies and average r2 values This however comes with a price of having a larger set of tag SNPs and increased genotyping requirements 104 ... Variation Map of the MHC For constructing a fine-scale variation map of the MHC, 2360 SNPs were genotyped in 2 84 Singaporean Chinese individuals The bulk of these samples consisted of 2 14 randomly... variation in the MHC, r2 values between the 1877 SNPs used and the remaining HapMap SNPs within the MHC locus, were calculated using the genotype data of the HapMap Han Chinese population Panel... also generated by treating the East Asian panels as a single population (Chinese and Japanese) as well as all panels as a whole The 102 Results transferability of each set of tag SNPs was assessed