A linkage disequilibrium map of the human major histocompatibility complex in singapore chinese conserved extended haplotypes and ancestral blocks 2

CHAPTER 2: METHODS AND MATERIALS 30 Methods and Materials 2.1 Subjects All subjects in this study were Singaporean blood donors who described themselves as being of Chinese ancestry (tracing back to their grandparents) and self-declared as having no personal history of autoimmune conditions Appropriate informed consent was obtained For the first SNP map (Section 3.1), subjects consisted of 93 males and 105 females B-lymphoblastoid cell lines (BLCL) were established from whole blood obtained from these subjects For the high-resolution SNP map (Section 3.2), subjects consisted of 214, unrelated blood donors and 41 individuals from 12 nuclear families B-lymphoblastoid cell lines (BLCL) were established from whole blood obtained from these subjects Included in this second set of samples were DNA from selected archived B-lymphoblastoid cell-lines established previously from healthy, unrelated Singaporean Chinese blood donors The archived samples were genotyped at classical HLA loci (HLA-A, -B, -DRB1) and screened to see if they carried homozygous HLA alleles at either loci (HLA-A, -B) or loci (HLA-A, -B, -DRB1) A total of 29 HLA homozygous samples were eventually selected and genotyped in the high-resolution SNP map The gender distribution of this set of samples was 159 males and 125 females 2.2 Establishing B-Lymphoblastoid Cell Lines from Whole Blood Samples To establish a renewable source of DNA, BLCLs were established from whole blood of subjects; peripheral blood mononuclear cells were first isolated from anti-coagulant blood by density-gradient separation, followed by Epstein-Barr virus (EBV) transformation of B cells Briefly, heparinised blood was first layered over Ficoll- 31 Methods and Materials Histopaque (Gibco, USA) in a tube, followed by centrifuging at 900xg for 15mins at room temperature The interface containing mononuclear cells was collected and rinsed twice with 10ml of RPMI medium To infect the cell pellets with EBV, the 2ml of a cell culture medium (RPMI with 10% FCS and PIS) was first added followed by 0.5ml of supernatant from EBV infected B95-8 cell lines The cells were then incubated overnight at 37ºC Incubated cells were rinsed with 10ml of RPMI and resuspended in the cell culture medium The cells were then split and re-incubated at 37ºC The density of the cells was monitored daily and culture medium added when necessary As the cell culture approaches confluency, it is transferred into 25ml and subsequently 75ml culture flasks With each transfer 20ml of fresh cell culture medium is added A portion of the cultured cells are reserved for freezing These cells are first rinsed in RPMI and re-suspended in 1.5ml of ice-cold freezing medium (see Appendix) They were then kept at -70ºC for a minimum of hours before being stored in liquid nitrogen Remaining cells kept for DNA extractions were first rinsed in 1ml of phosphate buffered saline and cell pellets stored at -20ºC for up to days 2.3 DNA Extraction from Cell Pellets Genomic DNA was extracted from cell pellets either by using Gentra’s Autopure LS system with Gentra’s Puregene DNA extraction kit (Gentra Systems, USA), or Chemicon’s Non-organic DNA extraction kit (Millipore, USA) Extracted DNA was re-suspended in 10mM Tris/1mM EDTA (TE buffer) and kept at -20ºC for short-term storage (less than weeks) or -80ºC for longer-term storage Special care was taken to avoid repetitive freeze-thaw cycles that may degrade the DNA 32 Methods and Materials For SNP genotyping, the most important criteria for success are DNA quality and accurate quantification DNA was checked for degradation using gel electrophoresis with 0.7% agarose gels, and ensuring that genomic DNA was not fragmented The minimum quantity of DNA needed for SNP genotyping was 80µl of DNA at a concentration of not less than 50ng/µl (minimum of 4mg of DNA per sample) DNA quantification was mostly performed by DNA staining with PicoGreen (Invitrogen, USA) an ultra-sensitive fluorescent nucleic acid stain for dsDNA Concentrations for several isolated samples were determined using a ND-1000 Nanodrop spectrophotometer (Thermo Fisher Scientific, USA) Briefly, the PicoGreen DNA quantification protocol first involved constructing a single-replicate 8-point DNA standard curve, ranging from to 75ng/µl, from Lamda DNA (Invitrogen, USA) DNA working standards were created by serial dilution of Lamda DNA with TE buffer to the appropriate concentrations, and pipetted into 16 wells of a spectrofluorometer plate Equal amounts of working concentration PicoGreen was added to each DNA standard and incubated at room temperature for mins, protected from light to prevent photodegradation of PicoGreen Spectrofluorometric analysis was then performed on a Tecan Genios fluorescence reader (Tecan, Switzerland) and analysed using the Megallan v4.0 software (Tecan, Switzerland) Fluorescence readings from the DNA standards were used to construct a standard curve DNA samples were subsequently stained with PicoGreen using the same protocol and concentrations determined by comparing fluorescence readings against the standard curve 33 Methods and Materials 2.4 SNP Selection and Genotyping The 1152 SNPs that make up the first generation map (Section 3.1) were selected from release 121 of dbSNP (Smigielski et al 2000) Using a “picket fence” approach, SNPs were selected to achieve a desired coverage of SNP per 100kb across the chromosome 6p, with a higher resolution of SNP per 10kb in the segment between positions 30.0Mb to 37.0Mb of the chromosome arm that contains the MHC In order to avoid SNPs that may not be informative in an Oriental population, SNPs that were annotated as “validated” in dbSNP, as well as those reported to have a minor allele frequency of at least 5% in an East Asian population were preferentially chosen For the high-resolution SNP map (Section 3.2), SNPs from Illumina’s MHC Panel Set (Illumina, USA) were used This set comprised of 2360 SNPs that were previously genotyped successfully and established to be informative (Miretti et al 2005) It was also designed to provide a SNP per 2kb resolution from positions 28.97Mb to 33.88Mb, covering known gene loci as well as intergenic regions across the MHC SNP genotyping was carried out with the Illumina GoldenGate Assay on a BeadArray Platform (Illumina, USA) Briefly, the GoldenGate assay is a highly multiplexed PCR-based genotyping assay that uses a combination of allele-specific oligonucleotides and a locus-specific oligonucleotide to determine the genotype of each targeted SNP in a sample Genomic DNA is first immobilised to a solid support and sets of oligonucleotides targeted to specific SNPs are annealed to the DNA Unbound oligonucleotides are next removed in a series of stringency washes Following this, allele specific extension ligation is used to extend correctly matched allele-specific oligonucleotides to the locus-specific oligonucleotides A subsequent 34 Methods and Materials PCR reaction using fluorescently labelled primers, with a specific dye for each allele, amplifies the appropriate product using universal PCR sites on the oligonucleotides The PCR products are then hybridised onto a 1152-spot microarray that captures locus specific signatures, and by analysing the fluorescent signals on the microarray the genotype of each assayed SNP in a sample can be accurately determined Genotype results were next filtered using a set of quality controls: i SNP genotypes were checked that they did not deviate from the HardyWeinberg equilibrium using a Fisher’s exact test at a significance level of 0.001 ii SNPs with a minor allele frequency of less than 5% were discarded Additionally, for the high-resolution SNP map with family data: iii SNPs that had genotypes disconcordant with pedigree structure in more than one family were discarded Illumina provides a list of flanking sequences that were used for designing the SNP assays To ensure that the location of the SNPs map to the genome as annotated, the flanking sequences were mapped back to the human genome assembly using BLAST (Altschul et al 1990) 2.5 HLA Sequence-Based Typing The HLA genotype at the classical HLA loci HLA-A, -B, -C and -DRB1 was determined for each sample by sequence-based typing For the HLA-A, -B and -C genes, this involved PCR amplification of exons and using specific primers, followed by direct DNA sequencing of the PCR products in opposite directions For 35 Methods and Materials HLA-DRB1, exon was amplified using a cocktail of allele-specific primers at the 5´ end and a loci-specific primer containing a M13 priming sequence at the 3´ end The sequences of the PCR primers are shown in Table 2.1 Table 2.1: Loci List of Oligonucleotide Primers for HLA-A,-B,-C and -DRB1 Amplification Name Sequence (5´to 3´) A1 ggg TCC CAg TTC TAA AgT CCC CAC B2 CCA TCC CCg gCg ACC TAT C1 AgC gAg gKg CCC gCC Cgg CgA C2 C gAA ACS gCC TCT gYg ggg AgA AgC AA B1 B Tgg CCC CTg gTA CCC gT A2 A ggA gAT ggg gAA ggC TCC CCA CT DRB1-10 DRB1-04 Agg AAA CAg CTT ATg ACC TgA gAC gCA CgT TTC TTg gAg CAg gTT AAA C DRB1-01 DRB1 CAg gAA ACA gCT ATg ACC TgA AgA CCA CgT TTC TTg gAg gAg g CAg gAA ACA gCT ATg ACC TgA gAC gCA CgT TTC TTg Tgg SAg CTT AAg TT DRB1-09 CAg gAA ACA gCT ATg ACC TgA CCA gCA CgT TTC TTg AAg CAg gAT AAg TT DRB1-52.1 CAg gAA ACA gCT ATg ACC CCC ACA gCA CgT TTC TTg gAg TAC YCT A DRB1-07 CAg gAA ACA gCT ATg ACC TgA gAC TCA CgT TTC CTg Tgg CAg ggT AAR TAT A DRB1-15 CAg gAA ACA gCT ATg ACC TgA gAC TCA CgT TTC CTg Tgg CAg CCT AAg A DRB1-M13 TgT AAA ACg ACg gCC AgT gCT YAC CTC gCC KCT gCA C PCR reactions for HLA-A,-B and -C were performed in total reaction volumes of 50µl comprising of: 5µl of 2mM dNTPs, 5µl of 10xPCR buffer, 5µl of 5µM 5´ primer, 5µl of 5µM 3´ primer, 100ng of genomic DNA and 0.2µl of High Fidelity Taq DNA polymerase (Lifecodes Corporation, USA) The following thermal cycling profile was used: 98ºC for 2mins, [94ºC for 30secs, 65ºC for 1min, 72ºC for 2mins] – 32 cycles, 72ºC for 10mins PCR reactions for HLA-DRB1 was performed in total reaction volumes of 30µl comprising of: 3µl of 2mM dNTPs, 3µl of 10xPCR buffer, 0.3µl of 20µM of each 36 Methods and Materials primer, 200ng of genomic DNA and 0.2µl of High Fidelity Taq DNA polymerase (Roche Diagnostics, Germany) The following thermal cycling profile was used: 95ºC for 5mins, [95ºC for 30secs, 62ºC for 10secs, 72ºC for 30secs] – 30 cycles, 72ºC for 10mins All PCR reactions were carried out on a GeneAmp PCR System 9700 thermal cycler (Applied Biosystems, USA) PCR products were purified using the QIAquick gel extraction kit (Qiagen, Germany) following the manufacturer’s instructions The purified PCR templates were sequenced with the corresponding primers listed in Table 2.2 to derive HLA genotype of each sample Table 2.2: List of Oligonucleotide Primers Used for HLA Sequencing Primer Name Sequence (5´to 3´) Loci INT2F TTA CCC ggT TTC ATT TTC Ag HLA-A exon INT2R ggA TCT Cgg ACC Cgg Ag HLA-A exon B2F gCC gCg CCg ggA ggA gg HLA-B exon 18CIN3 C1 gCT CCC ACT CCA TgA M13R 2.6 CCC ACT gCC CCT ggT ACC CAg gAA ACA gCT ATg ACC HLA-B, -C exon HLA-C exon HLA-DRB1 exon PCR Amplification for Sequencing Recombination Hotspots For re-sequencing of recombination hotspots, PCR primers were designed such that overlapping PCR fragments tile across the hotspots interval Primers were designed from repeat-masked human genome sequence with the help of the Primer3 application (Rozen and Skaletsky 2000) Table 2.3 lists the pairs PCR primers used for each hotspot and includes the annealing temperatures that were specific to the thermal cycling profiles for each PCR reaction 37 Methods and Materials Table 2.3: List of Oligonucleotide Primers Used to Amplify the Hotspots Intervals for Re-Sequencing Hotspot HLA-F Telo NCR3 - AIF1 DPA1-Telo TAP2 Primer Name Sequence (5´to 3´) 1.1F 1.1R 1.2F 1.2R 1.3F 1.3R 2.1F 2.1R 2.2F 2.2R 2.3F 2.3R 3.1F 3.1R 3.2F 3.2R 3.3F 3.3R 4.1F 4.1R 4.2F 4.2R 4.3F 4.3R 4.4F 4.4R TCT CAT gTC ATC TgC TTC TTg g CTC ACA Tgg CTA TTg gAA AAg g gAC CAg ACA ACA Agg AAC TgA g TTg gTT gAA TAA CTg ggg TCA g CCT gAC CCC AgT TAT TCA ACC TTC TgC CTC CCA TTC CAT AC gCT ggC Tgg ATg TAg TTg Ag TgA TTg gAg Tgg ACA Agg Tg TCA CCT TgT CCA CTC CAA TC gCC TTC CTT CAT TCT CTC ACC gAT ggg TAg TgT ggT TTC TgC CCT TTg AgC CTg ggA gTg g TCT Tgg gTA CAT gCT AAC TTC C ggA CAg ggC ACA TTC TTA gg CTg AAT gAA CCC TgC TCT gg TgA ATA ACA AAT ggg CAA ACC CCT AAg AAT gTg CCC TgT CC gAg CAA Agg gCT gAA gAT TAT g ACT CTg CCT TTC CTC ATC AAA C TCC TCA TCA ACA CCA CTT TCT g TCC CAA ggA gCC ACA gAT Ag TCA CAA TAg CAg Cgg AgA Ag gCC TCC TTT CCC TAT gCT g ggA CgT TgT gAg TTg gAg gT CCA AAg gAg AAg Agg CAC AT Tgg gAg gTg gTT TCA ATC Tg Chromosome Location1 29,790,748 29,792,689 29,792,666 29,796,136 29,796,134 29,798,757 31,676,300 31,678,495 31,678,494 31,680,416 31,679,635 31,681,306 33,128,468 33,131,576 33,130,471 33,132,863 33,131,576 33,134,373 32,907,046 32,909,388 32,908,868 32,910,930 32,910,481 32,913,463 32,913,295 32,915,552 Fragment Size Annealing Temp (bp) (ºC) 1,941 64 3,470 64 2,623 62 2,195 62 1,922 62 1,671 62 3,108 64 2,392 62 2,797 62 2,342 62 2,062 64 2,982 64 2,257 66 Chromosome Location - Coordinates are based on the reference human genome sequence assembly (NCBI36.1) PCR reactions were performed in total reaction volumes of 30µl comprising of: 3µl of 2mM dNTPs, 3µl of 10xPCR buffer with 1.5mM MgCl2, 1.5µl of 5µM of the forward primer, 1.5µl of 5µM of the reverse primer, 1.5µl of 100% DMSO (Sigma-Aldrich, USA), 100ng of genomic DNA and 0.2µl of High Fidelity Taq DNA polymerase (Roche Diagnostics, Germany) The following thermal cycling profile was used: 94ºC for 2mins, [94ºC for 15secs, TA for 30secs, 72ºC for 2mins] – 30 cycles, 72ºC for 7mins (Where TA refers to the primer annealing temperature specific to each pair of primers and listed in Table 2.2) 38 Methods and Materials All PCR reactions were carried out on a GeneAmp PCR System 9700 thermal cycler (Applied Biosystems, USA) PCR products were purified using the QIAquick gel extraction kit (Qiagen, Germany) following the manufacturer’s instructions The purified PCR templates were sequenced with the corresponding primers listed in Table 2.4 below Table 2.4: List of Oligonucleotide Primers Used for Hotspot Re-Sequencing A Hotspot Telomeric to HLA-F Primer Name Sequence (5´to 3´) Chromosome Location 1.1F TCT CAT gTC ATC TgC TTC TTg g 29,790,748 1.s1 TCT gAg TAA AAA gTg CCT ggT g 29,791,266 1.s2 CAC CAg gCA CTT TTT ACT CAg A 29,791,284 1.s3 CCT ATT gTg TTT TCC ATT CC 29,791,689 1.s4 gAA TgC CCA CCC AgT AgC 29,791,829 1.s5 CCT CAA TAC CCA Agg CTC Tg 29,792,260 1.s6 CAg AgC CTT ggg TAT TgA gg 29,792,279 1.2F gAC CAg ACA ACA Agg AAC TgA g 29,792,666 1.s7 gCC TTT TCC AAT AgC CAT gT 29,792,687 1.1R CTC ACA Tgg CTA TTg gAA AAg g 29,792,689 1.s8 TAC Agg CAT gAg CCA CCA 29,793,275 1.s9 Tgg Tgg CTC ATg CCT gTA 29,793,297 1.s10 TgA ATg ACC AAg gTT ACA Cg 29,796,114 1.3F CCT gAC CCC AgT TAT TCA ACC 29,796,134 1.2R TTg gTT gAA TAA CTg ggg TCA g 29,796,136 1.s11 CCC TAA ACA gAg AAC CCT CCA 29,796,539 1.s12 ggC TgC AAg TAA TCC TCC Tg 29,796,665 1.s13 Tgg ATA ACA gAg ggA gAC CA 29,796,880 1.s14 gTg ACA gAg Tgg Tgg ggA CT 29,797,355 1.s15 CAg TCA CAA TgC CCC TCA C 29,797,704 1.s16 TCA ggg CTA Tgg AAT gAA gg 29,797,838 1.s17 gTT AgC CAg gAT ggT CTC g 29,798,083 1.s18 AAA CTg gTC TCT gTC CTA TTT CA 29,798,392 1.3R TTC TgC CTC CCA TTC CAT AC 29,798,757 39 Methods and Materials Table 2.4: List of Oligonucleotide Primers Used for Hotspot Re-Sequencing B Hotspot Between NCR3 and AIF1 Primer Name Sequence (5´to 3´) Chromosome Location 2.1F gCT ggC Tgg ATg TAg TTg Ag 31,676,300 2.s1 TTg AgA CAg AgT TTT gCT gTT g 31,676,691 2.s2 ATC ACg CCA TTg CAC TCC 31,676,712 2.s3 TgA CCT CgT gAT CCA CCT g 31,677,268 2.s4 Agg CAg gTg gAT CAC gAg 31,677,282 2.s5 ggg ATT ACA ggC gTg AgC 31,677,300 2.s6 CTg TgT TAg CCA gCA Tgg TC 31,677,816 2.s7 ggg CTg gAg TgC AAT gAC 31,677,985 2.s8 TCC TTC TAT gTT gCC CAg AC 31,678,065 2.2F TCA CCT TgT CCA CTC CAA TC 31,678,494 2.1R TgA TTg gAg Tgg ACA Agg Tg 31,678,495 2.s9 ggg ATg ACA ggA ggC TgA 31,678,683 2.s10 gAC AgA ATT TTg CTC TTg TTg C 31,678,725 2.s11 Agg Cgg CTT ACC CTg AAT 31,679,338 2.s12 ggg ATg ACA ggA ggC TgA 31,679,400 2.3F gAT ggg TAg TgT ggT TTC TgC 31,679,635 2.s13 ggA Agg TgC TCg CTg AAT 31,679,968 2.s14 TTT TTg CgC TCT CAg CTC 31,680,018 2.s15 Tgg TCT gCC TCT CCg TCT 31,680,257 2.2R gCC TTC CTT CAT TCT CTC ACC 31,680,416 2.s16 TCA CTC TgT CgC CCA gAC 31,680,806 2.s17 gCA gCA gCg ACA gAA AAg 31,680,855 2.3R CCT TTg AgC CTg ggA gTg g 31,681,306 40 Methods and Materials Table 2.4: List of Oligonucleotide Primers Used for Hotspot Re-Sequencing C Hotspot Telomeric to HLA-DPA1 Primer Name Sequence (5´to 3´) Chromosome Location 3.1F TCT Tgg gTA CAT gCT AAC TTC C 33,128,468 3.s1 TgC TCA CAT ACC Agg ACT AAA AC 33,128,850 3.s2 ggA CAA AAC TgC TTC TAA TTg C 33,128,992 3.s3 TTC CTT CAT TTC TTA TCT CAT ACT CC 33,129,086 3.s4 CAC CCA ATC gCA TCA ATT TT 33,129,555 3.s5 Tgg gAT AAT TTg TTT AgC CAg TC 33,129,623 3.s6 TgT gAg CTg gCA TAA ACT gg 33,129,913 3.s7 ggg TAA AgA CCC TgC AAC AC 33,130,068 3.2F CTg AAT gAA CCC TgC TCT gg 33,130,471 3.s8 gCA ggT AgC TTC AAA TCA gg 33,130,904 3.s9 TAT TCC CCg TgA CAg ACC TC 33,131,009 3.s10 ACC CCA AgT AAA gTC CAT gC 33,131,198 3.1R ggA CAg ggC ACA TTC TTA gg 33,131,576 3.3F CCT AAg AAT gTg CCC TgT CC 33,131,576 3.s11 TTC CCC AgA ggT gTT gTT TC 33,131,952 3.s12 CTC CTg ggg TTC ACT CTT Tg 33,132,111 3.s13 TTC TAg gAg gCC AAC CAg TC 33,132,506 3.s14 ggC TTC CAT ATC Cgg TTT TAC 33,132,547 3.s15 TTT TAC CTT gCC Tgg gTT Tg 33,132,849 3.2R TgA ATA ACA AAT ggg CAA ACC 33,132,863 3.s16 CTC AAA TCT CAT Tgg CTg gAg 33,133,216 3.s17 CTg TCT gTT TgA Tgg TTg AAg C 33,133,328 3.s18 AgC AgA gAg ggA AgT gTT gC 33,133,448 3.s19 ATC CTC CCA TTT gCT CTC TC 33,133,603 3.s20 AAC ACA CCC CTg CCT gAC 33,133,827 3.s21 AAA AgC AgA ACC Agg ATT gg 33,133,899 3.3R gAg CAA Agg gCT gAA gAT TAT g 33,134,373 41 Methods and Materials Table 2.4: List of Oligonucleotide Primers Used for Hotspot Re-Sequencing D Hotspot at TAP2 Locus Primer Name Sequence (5´to 3´) Chromosome Location 4.1F 4.s1 ACT CTg CCT TTC CTC ATC AAA C ggA TTg ATg ggT ggA Tgg 32,907,046 32,907,489 4.s2 TCC ATC CAT CCA CCC ATC 32,907,494 4.s3 AAT TAC TTg Cgg gTT TTg g 32,907,966 4.s4 AAA ACC AAA ACC CgC AAg 32,907,971 4.s5 Tgg gCT CCT TTC ACA ACC 32,908,349 4.s6 gAC CgT TCg CAg TTT Tgg 32,908,488 4.2F TCC CAA ggA gCC ACA gAT Ag 32,908,868 4.1R TCC TCA TCA ACA CCA CTT TCT g 32,909,388 4.s7 TgT gAA ggg CAT gTg CAg 32,909,395 4.s8 Tgg TgT TgA TgA ggA TgT gg 32,909,395 4.s9 Agg gAC ACA gCC AgA TCg 32,909,495 4.s10 gCA gCA ggg Agg AgA AAA g 32,909,876 4.s11 AgC AgC TCC ACT TCT AAg TA 32,909,876 4.s12 Tgg gAC Agg gTg gAg AAg 32,909,998 4.s13 TTA TCC ATT CAT CTg TCA AT 32,909,998 4.s14 ggg AAA ggA ggC gTC ATC 32,910,475 4.3F gCC TCC TTT CCC TAT gCT g 32,910,481 4.2R TCA CAA TAg CAg Cgg AgA Ag 32,910,930 4.s15 ggT gAA ACA gAg gAg CAA gC 32,911,584 4.s16 gCT CAC CAT gTC TCC TCT TTC 32,911,719 4.s17 ACT CTC CCA AAC gCC TCT TC 32,912,181 4.s18 TgA CAg ggA TgT gTT CTg Ag 32,912,233 4.s19 CAC ATA TAA gAT ggT ggA C 32,912,493 4.s20 TCC CAg TCC CAg CCT TAT C 32,912,725 4.s21 TgC CTC CTA CCT CCT ACC C 32,912,950 4.s22 gAA Tgg TgA CTA CAT TCA CC 32,913,151 4.4F CCA AAg gAg AAg Agg CAC AT 32,913,295 4.3R ggA CgT TgT gAg TTg gAg gT 32,913,463 4.s23 CAg ACA gAg Cgg gAg CAg 32,913,776 4.s24 ggA ggg gTg TAC ggA TgA 32,914,140 4.s25 gCT CCg CTC CCT CCT ATC 32,914,292 4.s26 AgT TCg ggC TCC AgT TCC 32,914,622 4.s27 TgT CCT TgC TTT gTA ATT ggA g 32,914,772 4.s28 CTC AgC CTC CCg AgT AgC 32,915,049 4.4R Tgg gAg gTg gTT TCA ATC Tg 32,915,552 42 Methods and Materials 2.7 Sequencing of PCR Amplification Products Cycle sequencing of purified PCR products were performed using the ABI BigDye Terminator v3.1 Cycle Sequencing Kit As per manufacturer’s instructions, each reaction was performed in a total volume of 10µl, comprising of: 2µl of BigDye v3.1 reaction mix, 1µl of 5x sequencing buffer, 1µl of 5µM sequencing primer and 50ng of PCR product The reagents were mixed well, spun down briefly and placed in a GeneAmp PCR System 9700 thermal cycler (Applied Biosystems, USA) Thermal cycle sequencing was performed in 25 cycles with this profile: [96ºC for 30secs, 50ºC for 5secs, 60ºC for 4mins] The extension products were then purified using an ethanol/EDTA/sodium acetate precipitation protocol to remove excess dye terminators Briefly, the entire contents for each sequencing reaction was mixed with 1µl of 3M sodium acetate (pH 4.6) and 25µl of ice-cold absolute ethanol The mixture was placed on ice for 15 minutes to allow the extension products to precipitate Following this, precipitated products were spun down at 16,000xg for 20 minutes The supernatant was carefully aspired and discarded The pellet was rinsed with 200µl of 75% ethanol, spun down at 16,000xg for minutes, and once again the supernatant was carefully aspired and discarded The pellet was then dried on a heat block at 70ºC for not more than 10 minutes Each sample was then re-suspended in 15µl of a template suppressive agent supplied by the manufacturer (Applied Biosystems, USA) and heated up to 90ºC to denature the extension products, before rapidly chilling it on ice Sequencing was immediately performed by electrophoresis on 96-well plates in an ABI Prism 3100 Genetic Analyser (Applied Biosystems, USA) 43 Methods and Materials 2.8 Linkage Disequilibrium Calculations The linkage disequilibrium parameter is defined as (Ott 1999): (1) D = pAB − pA × pB From this Lewontin D´ is calculated as: € Dmin = max(− pA pB ,− pA pB ) Dmax = min( pA pB , pA pB ) D′ = € D (D < 0) Dmax or (2) D′ = D (D > 0) Dmin (3) Similarly, r2 is calculated as: € r2 = D pA pA pB pB € (4) LD parameters for HLA allele pairs were calculated with the above formulas, while € LD for SNP pairs were calculated using the same formulas implemented in Haploview (Barrett et al 2005) For allele and haplotype counting, 2x2 contingency tables were constructed to compare expected and observed frequencies in 2-locus haplotypes, with p-values calculated using two-tailed Fisher’s exact test: p= (a + c)!(b + d)!(a + b)!(c + d)! a!b!c!d!n! (5) For 3-locus haplotypes, the one-sample z-test was used to obtain p-values for the € difference between observed and expected frequencies The corresponding test statistic is calculated as: z= po − pe po (1− po ) N (6) € 44 Methods and Materials The homozygosity of a polymorphic marker can be defined as the probability that a randomly selected pair of chromosomes from a population is identical at that marker Homozygosity reflects both the genetic fixity and linkage disequilibrium of the underlying haplotype structure This is calculated as the sum square of the population marker frequencies corrected for sampling (Sabatti and Risch 2002): H =∑ pi2 −1/n 1−1/n (7) € Similarly, haplotype homozygosity (HH) is defined as the probability of selecting a pair of chromosomes at random, and this pair is identical across a defined haplotype The same equation (7) is applied, except that haplotype frequencies are used instead of marker frequencies Extended haplotype homozygosity (EHH) plots are then simply constructed from haplotype homozygosity values calculated one marker at a time from the anchor locus (Sabeti et al 2002) Haplotype blocks were defined using the criteria described in Gabriel et al 2002, which identifies segments of consecutive SNPs in significantly high LD and penalizes sparse haplotype blocks with few markers Briefly, population D´ confidence limits were established for each SNP pair by calculating the probability of the observed data for all possible values of D' The upper (CU) and lower (CL) bounds represent the 5% tails of the overall probability For blocks that consist of or more SNPs, 95% of the SNP pairs must be in strong LD, with CU >= 0.98 and CL >= 0.7 For blocks of SNPs, the same LD criteria for larger SNP blocks remain, with the additional restriction that 4-SNP blocks not exceed 30kb in length Three-marker blocks must be less than 30kb in length with LD confidence limits of CL>=0.5, CU>=0.98 Twomarker blocks must be within LD confidence limits of CL>=0.8 and CU>=0.98 and 45 Methods and Materials not span more than 20kb This algorithm was also implemented in Haploview (Barrett et al 2005) Tag SNPs for the MHC are selected based on the pairwise r2 values between SNPs, using an aggressive iterative search to identify a minimal number of tags that serve as a proxy to other SNPs with a high r2 (>= 0.8) The application Tagger was used and this is also implemented inside the Haploview software (Barrett et al 2005, de Bakker et al 2005) 2.9 Haplotype Phasing Haplotype re-construction (phasing) was performed using a Bayesian-based algorithm, implemented in the software PHASE v 2.2 (Stephens and Scheet 2005) Briefly, this Bayesian approach for haplotype reconstruction treats unknown haplotypes as random quantities and attempts to calculate the posterior distribution – the conditional distribution of the haplotype frequencies given the observable genotype allele frequencies It does this by combining prior information (patterns of haplotypes expected to be observed in the population samples) with likelihood estimates from the observed data Prior information is estimated using a model based on the coalescent theory, in which a mutant offspring will differ only slightly from the progenitor sequence (Stephens and Donnelly 2003) Haplotypes for each individual can then be estimated by picking the most likely haplotype from the posterior distribution This algorithm was selected because it has been found to be the most accurate in phasing SNP data from family as well as un-related individuals (Marchini et al 2006) It has also recently been determined to be more accurate in estimating HLA haplotypes as compared to other alternatives (Bettencourt et al 2008) 46 Methods and Materials To maximise the phase unambiguous information from the family samples, haplotype phasing for the high-resolution SNP map was done in stages First the phaseunambiguous family haplotypes were obtained by running PHASE with pedigree information supplied These produced a set of 24 distinct phase unambiguous haplotypes Next, these haplotypes were treated as “known” haplotypes and seeded together with the unrelated individuals for haplotype reconstruction with PHASE The phase unambiguous data provides prior information for the coalescence-model that is used in the PHASE algorithm and improves the accuracy of the phase estimation (Stephens and Scheet 2005) HLA alleles were translated into unique digits and phased together with the SNPs All phasing runs were performed on a 60-node Apple Xserve G4 cluster (Apple Inc, USA) 2.10 Tagging HLA Alleles using SNP Haplotypes To tag HLA alleles using SNPs, an iterative algorithm was written in Perl that computes the r2 values of common HLA alleles with all possible SNP haplotype combinations, starting with single SNPs to a maximum of 6-SNP haplotypes As the number of possible SNP haplotype combinations become impractically large with increasing haplotype sizes (there are 6x1016 possible 6-SNP haplotypes from 1877 SNPs), only SNP or haplotype variants that have an r2 >=0.7 with the tested HLA allele are carried forward to the next iteration This iteration was stopped when the minimal sized haplotype that tags the HLA allele with an r2 value of is found The iteration was also stopped if the r2 value reaches a plateau for a HLA allele The smallest haplotype combination with the largest r2 value was designated the tag This algorithm was run on a 60-node Apple Xserve G4 cluster (Apple Inc, USA) 47 Methods and Materials 2.11 Data Sources and Bioinformatics Tools MHC Sequence Annotations All genome annotations were obtained from the VEGA server http://vega.sanger.ac.uk/ (Wilming et al 2008) Caucasian MHC Full-Length Sequences The Caucasian MHC sequences were obtained from the MHC haplotype project http:// www.sanger.ac.uk/HGP/Chr6/MHC/ (Horton et al 2008) Human Genome Build All physical map locations in this thesis refer to the coordinates based on the NCBI Build 36.1 of the human genome assembly (http://www.ncbi.nlm.nih.gov/genome/guide/human/release_notes.html) HapMap Data HapMap data used in this study were taken from release 22 of the HapMap project (http://www.hapmap.org) When using genotype data from the HapMap project, care was taken to ensure that the SNP calls were based on the same chromosome strand as the ones used in this study Bioinformatics Tools and Data Storage Data was handled and analysed with the use of Perl scripts build on Bioperl libraries (Stajich et al 2002) All data was stored in a MySQL relational database 48 ... CTT AAg TT DRB1-09 CAg gAA ACA gCT ATg ACC TgA CCA gCA CgT TTC TTg AAg CAg gAT AAg TT DRB1- 52. 1 CAg gAA ACA gCT ATg ACC CCC ACA gCA CgT TTC TTg gAg TAC YCT A DRB1-07 CAg gAA ACA gCT ATg ACC TgA... TTA gg CTg AAT gAA CCC TgC TCT gg TgA ATA ACA AAT ggg CAA ACC CCT AAg AAT gTg CCC TgT CC gAg CAA Agg gCT gAA gAT TAT g ACT CTg CCT TTC CTC ATC AAA C TCC TCA TCA ACA CCA CTT TCT g TCC CAA ggA... Agg AAA CAg CTT ATg ACC TgA gAC gCA CgT TTC TTg gAg CAg gTT AAA C DRB1-01 DRB1 CAg gAA ACA gCT ATg ACC TgA AgA CCA CgT TTC TTg gAg gAg g CAg gAA ACA gCT ATg ACC TgA gAC gCA CgT TTC TTg Tgg SAg

Định dạng
Số trang	19
Dung lượng	389,61 KB