Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 27 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
27
Dung lượng
6,47 MB
Nội dung
Results 3.5 Fine Mapping of Recombination Hotspots in the MHC The vast majority of meiotic recombination hotspots annotated in the human genome are derived from population recombination rate estimated using data from genomewide variation maps (Myers et al 2005, International HapMap Consortium 2005), but very few recombination hotspots have been confirmed and finely mapped in recombinant haplotypes In humans, indisputable determination of meiotic recombination hotspots is only possible by family linkage analyses and crossover mapping in recombinant sperm However the former is not sufficiently powered for accurate fine mapping for hotspots (Arnheim et al 2003) while the latter does not scale up well (Kauppi et al 2004) Hence only a handful of recombination hotspots have been definitively mapped through recombinant sperm mapping (Jeffreys et al 1998, 2000, 2001, Jeffreys and Neumann 2002, 2005, Cullen et al 2002) In the MHC, using allele-specific PCR primers to selectively amplify recombinant haplotypes in pooled sperm DNA, sperm-crossover locations within the Class II region were previously finely mapped to hotspots of around 2kb in length (Jeffreys et al 2000, 2001) Using a different approach, a separate study genotyped polymorphic STR sites in amplified DNA from single-sperm to map recombination hotspot regions across the entire MHC (Cullen et al 2002) However, these hotspots were identified at a lesser resolution ranging from 35kb to 105kb The underlying LD structure in the Chinese population has strong correlation with the location of the sperm-mapped recombination hotspots; all the hotspots precisely mapped by Jeffreys and co-workers (Jeffreys et al 2000, 2001) occur outside of defined haplotype blocks Although the relatively lesser resolution of the hotspot 149 Results regions identified by Cullen and co-workers (Cullen et al 2002) preclude the ability of comparing them to the more precise boundaries of SNP haplotype blocks, this set of hotspot regions match up well with the EHH plots; EHH values reflect the decay of haplotypes due to recombination (Sabeti et al 2002) and within each of these Cullen regions, EHH values of different HLA-haplotypes are seen to drop (Figure 3.18) Furthermore, there are multiple examples of individuals who carry recombinant haplotypes where the haplotype breaks occur within the Cullen regions These suggest that population genetics data could be utilised to narrow down crossover locations in the Cullen hotspot regions With this in mind, the utility of the SNP variation data in this study for fine mapping crossover locations was explored Within each of the Cullen hotspot segments, the location of the crossover was refined by identifying intervals on the SNP map using the following factors: 1) Spikes in the population recombination rate, 2) Locations of EHH, 3) Boundaries of haplotype blocks, 4) Presence of recombinant haplotypes across the interval The Cullen hotspot segments were first mapped to the MHC using the corresponding STR markers (obtained from Cullen et al 2002) Within each of these segments, the distribution of population recombination rate was overlaid using data from the HapMap project (International HapMap Consortium 2005) Next, using the EHH data for HLA haplotypes in Section 3.3.2 of this thesis, localized sites of EHH decay across two or more HLA haplotypes were identified within the Cullen 150 Results segments Detailed LD structure from haplotype boundaries defied in Section 3.2.3 was also overlaid across each segment Finally, SNP haplotypes in the local Chinese population were scanned for recombinant haplotypes within these Cullen segments: samples that carry SNP haplotypes homozygous telomeric to the recombination hotspot with homozygosity diverges centromeric to it (telomeric-homozygous samples) and similarly, samples that were homozygous centromeric to the hotspot, but heterozygous telomeric to it (centromeric-homozygous samples) The homozygous breakpoints in these samples with recombinant haplotypes define an interval of chromosomal crossover associated with the recombination hotspot SNP intervals that are a confluence of the points above were identified as the likely crossover interval within the Cullen hotspot segments Having narrowed down the hotspot location to SNP intervals, the samples with recombinant haplotypes were re-sequenced across these intervals and additional polymorphic markers were identified The genotypic pattern of these polymorphic sites in the recombinant haplotypes was used to mark out the boundary where chromosome crossover occurred; between the last common homozygous markers in telomeric- and centromeric-homozygous samples respectively This way, the crossover window in these recombination hotspots can be unambiguously identified at a base-pair level The Cullen recombination hotspot rergions and corresponding crossover intervals narrowed using the SNP maps are listed in Table 3.11 Sets of PCR and DNA sequencing primers were designed to amplify and re-sequence across of these SNP intervals In each of these 3, the re-sequencing results uncover additional polymorphic 151 Results sites across the interval and these were used to frame the crossover location As a proof of concept, the recombination hotspot mapped within the TAP2 locus (Jeffreys et al 2000) was also re-sequenced using the same approach of SNP interval and sample selection The re-sequencing results of each of these hotspots are detailed in the following pages Table 3.11 Narrowing of Sperm-Mapped Recombination Hotspot Regions Using Population Data Hotspot Segments Identified using STR Markers Typed in Single-Sperm Refined Locations of Hotspots Using LD Structure (Cullen et al 2002) and HapMap Population Recombination Rate STR Markers MOG-CA to RF 3-7 to BAT-CA Location 29,753,470 - 29,798,698 31,643,332 - 31,685,519 Size (bp) 45,228 Interval on SNP Map 29,791,787 - 29,792,613 Size 826 42,187 31,676,448 - 31,680,935 4,487 4,301 61,035 RNG-CA to DPB2A2 33,070,852 - 33,172,428 101,576 33,129,170 - 33,133,471 1-9 to 1-9d 29,914,298 - 29,995,182 80,884 29,946,621 - 30,007,656 3-3A to DRA-CA 3,842 32,406,317 - 32,511,466 105,149 G5-11525 to G4-96 32,778,014 - 32,813,417 35,403 32,789,623 - 32,792,322 2,699 The locations and sizes of the recombination intervals mapped by Cullen et al 2002 through single-sperm typing is listed in this table Using population data from the HapMap project and the SNP variation map in this study, the hotspots in these intervals were narrowed The first three hotspots on this list (in bold) were re-sequenced in recombinant haplotypes in this study 3.5.1 32,446,673 - 32,450,515 TAP2 Recombination Hotspot In the SNP variation map, homozygous haplotypes are seen to break across an interval from positions 32,912,195 to 32,913,448 of the SNP map, located within the TAP2 gene 14 individuals carrying recombinant haplotypes across this interval is seen; have haplotypes that are homozygous up to the telomeric end of this interval, while samples were seen to be homozygous up to the centromeric end A spike in the estimated population recombination rate is reported in the HapMap data in this region of the chromosome, and also coincides with a decay in EHH across multiple HLA-DRB1 haplotypes, including DRB1*0301, DRB1*1602 and DRB1*1501 found 152 Results in CEHs (Figure 3.29, panel B) This interval also falls in between haplotype blocks described earlier (Figure 3.29, panel A) To accurately determine the location of the haplotype breaks in the telomeric- and centromeric-homozygous samples, 4-overlapping PCR fragments spanning 8.5kb from position 32,907,046 to 32,915,571 were amplified from genomic DNA of these 14 samples A total of 36 sequencing primers were designed and utilized to tile across the segment in a bidirectional manner Polymorphisms were identified from the sequence reads, and a locus is tagged as polymorphic only if the alternate allele is seen in at least samples A total of 40 polymorphic sites were identified across the interval, of these were indels and the remaining 38 were bi-allelic SNPs The genotypes of these 40 polymorphic sites in the 14 re-sequenced samples are listed in Table 3.12 The results indicate that all the telomeric-homozygous recombinant samples show a crossover after position 32,912,197, while the centromeric-homozygous recombinant samples had signs of crossing-over before position 32,914,099 Each of the 14 samples carries its first heterozygous marker within this 1.9kb window, allowing the definition of a narrow segment in which homozygosity breaks and crossovers are clustered This crossover window starts within the first exon of the TAP2 gene and ends within its second intron, raising the possibility that the open reading frame may be affected by the crossover (Figure 3.30) 153 Results A B DRB1*0301 DRB1*0405 DRB1*1602 DRB1*0803 DRB1*1501 BM05/206 NP633 NUH022 CM0791 BM04/241 CF0987 DRB1*1201 DRB1*0901 C A A G G G G C A G G G C C A C A G A G G A A G G A C G C G G G A G G G A G C G C A A G G G G C A G G G C C A C A G A G A G G C G A C G C G A G C A A G A G C G DRB1*0901 DRB1*1401 DRB1*1405 DRB1*0405 C C G G A A A A A A G G G G A A G G G G G G A A G G G G C C A A A A G G G G A A G G A A G G G G A A C C A A C C G G G G A A C C G G A A G G G G G G A A A A G G G G A A G G A G A A G A G G G G G A A G A G G G C G G C G G G G C C C C G A G G A A A A G A G G C C C C A A A A A A A A A G G G C C C C A A A A A A A A A A A A DRB1*0405 DRB1*0901 DRB1*1101 DRB1*0901 G G G G A A A A G G G G A A A A G G G G A A A A G G G G A A A A G G G G A A A A A A A A A A A A A A A A C C C C G G G G A A A A G G G G G G G G G G G G G G G G G G G G A A G A G G A G C G G G G G A G A A G A C C G C G A G A C A C A G G G G A A A A G G A G C C C C A A A A A A A A A A G A C C C C A A A A A A A A A A A A DRB1*0901 DRB1*0802 G A G A G A G A G A A A A C G A G G G G G A G G G A C A A G A G C A A A C A A A G A G A G A G A G A A A A C G A G G G G G G A G A G G G C G A A C A A A C A A A 32899381 32900951 32901501 32902654 32903010 32904122 32904771 32905515 32905598 32905662 32905854 32906709 32906773 32908202 32908390 32910181 32910798 32911294 32911818 32912195 32912776 32912912 32913027 32913448 32914099 32914369 32915424 32917826 32919607 32920506 32920836 32920894 32921257 32922637 32922953 32924032 32924678 32925108 32925752 32926752 Telomeric Homozygous Samples C A G G A A A A G A G A A A A G A G G G G G G A G A G G A A G G G A A A A A A G A G A A G G A A A G G G G G G G G G G G G G G G G A A A A A A G A G A A G G A A A A G G G A G G G G G A G G A G G A C C A A A C C C C A C C A A A G A A G G G A A A A G A A G G G A G G A G A G G G G A G G A A A A G G A G A G G G G A G G A A A G G G G G A A G A G G G G G G G C C C C A A A C A C C A A C A C C C C C A C C C C C C A A C A C G A A G G G G G G G G G G G G G C C C C A A A C A C C A A C A C A A A A G G G A G A A G G A G A A G G G G G G G G G A G G A G A A A A A G G G A G A A G G A G A G G G A A G G G G G G A A G A G A G G A G G A A A A G G G G G G A A A G A A G G G G A G G A G G G G G G G G A A A A G A A G A A C C C C G G G G G G G G G G G G G G G G G G G G G G A A A A A A A A A A A A G G G G G G G G G G C C C C C C C C C C G G G G G G G G G G A A G G G G G G G G G G C C C C A A C C C C C C C C C C G G G G G G G G G G A A A A G G A A A A A A A A A A A A A A A A G G G G G G G G G G A A A A A A C C C C C C C C C C C C C C C C A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A G G G G G G G G G G C C C C C C C C C C C C C C C C A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A DRB1*0403 DRB1*1301 DRB1*0901 DRB1*1501 DRB1*1201 DRB1*0901 DRB1*0803 DRB1*1202 DRB1*1501 DRB1*1202 DRB1*0403 DRB1*0901 DRB1*1101 DRB1*0406 DRB1*1501 DRB1*0801 NP378 NP200 NP751 CF02756 NP348 CM99002 NP113 WGP068 Figure 3.29 Recombination Hotspot in the TAP2 Locus From the SNP variation map, several indicators point to a 1294bp interval in the TAP2 vicinity as a recombination hotspot Panel A: The LD heatmap shows that the locus falls between haplotype blocks defined earlier Panel B: EHH plot across the interval Several DRB1 haplotypes show a decay of EHH across this locus There is also a spike in the population recombination rate estimated in the HapMap project (black vertical lines) A crossover location mapped in recombinant sperm typing (Jeffreys et al 2000), maps to this interval and is indicated by a downward red triangle Panel C: 14 individuals that carry recombinant haplotypes across this interval are shown in this panel Both haplotypes of each individual is shown, together with the linked DRB1 allele samples are seen to be homozygous leading to the telomeric edge of a SNP at position 32912195 and the homozygosity breaks after the locus Similarly, samples are homozygous centromeric to position 32913027 with homozygous haplotypes breaking across the locus This establishes a window across which multiple crossover events occurred The TAP2 locus is centromeric to HLA-DRB1, and the DRB1 alleles of these samples are provided as a reference 154 Centromeric Homozygous Samples G G G G G G G G G G G C C G G G Results Using a pooled-sperm approach and selective PCR for amplifying recombinant haplotypes, Jeffreys and colleagues mapped a recombination hotspot was precisely to a 1.4kb window between positions 32,912,006 and 32,913,385, centring in the 2nd intron of the TAP2 gene (Jeffrey et al 2000) This sperm recombination hotspot mapped overlaps precisely with the crossover cluster seen here in recombinant samples, with the approximate centre of the sperm recombination hotspot (32,912,695) lying within this crossover window The agreement between these sets of data lends support to the strategy employed here in combining population SNP variation data as a guide for fine mapping the crossovers clusters The sequence within the 1.9kb crossover window is shown in Figure 3.31 The telomeric end of the window contains a full-length MER7A – a DNA transposon The centromeric end of the window contains the TAP2 exons There are also 18 motifs in this sequence that matches 6- to 9bp motifs found over-represented in recombination hotspots (Myers et al 2005), with most of these motifs residing in the exon of TAP2 155 The telomeric-homozygous samples are homozygous up to at least position 32,912,197, after which marks the start of a segment showing crossovers in the samples This segment continues to 32,914,099 – the last homozygous marker of centromeric-homozygous samples This defines a 1902bp window within which chromosomal crossover occurred in these haplotypes Locations highlighted in green coincide with SNPs genotyped in the Illumina SNP panel Red genotypes highlight sites that are heterozygous in the individual Positions 32,909,374 and 32,909,725 are single-base indels The genotype of “11” at these locations indicate a homozygous insertion and “22” a homozygous deletion Table 3.12 Re-sequencing the Recombination Hotspot in the TAP2 Locus This table lists the genotypes of 14 samples at 40 polymorphic sites obtained by re-sequencing across the 8.5kb interval carrying the recombination hotspot within the TAP2 locus These 14 samples carry recombinant haplotypes and comprise of that are homozygous in the region telomeric to this hotspot and samples homozygous in the region centromeric to this hotspot Results 156 Results Figure 3.30 Genomic Location of the TAP2 Hotspot This figure illustrates the location of the crossover window identified through re-sequencing of individuals carrying recombinant haplotypes The crossover window is located within the TAP2 gene loci and marked out in the pink silhouette The exon-intron structure of the TAP2 gene is shown as orange blocks on the top of figure, and grey tracks are used to mark out locations of repeat elements in the region The genotypes of the 14 re-sequenced samples are indicated at the bottom, with polymorphic sites drawn as circles Blue and red circles indicate that the individual is homozygous and heterozygous at that location respectively Larger circles indicate SNPs genotyped in the SNP panel, while smaller circles are additional polymorphic loci uncovered via re-sequencing The centre of the recombination hotspot identified in recombinant-sperm (Jeffreys et al 2000) lies within this crossover window (red arrow), supporting the strategy used to select SNP intervals and recombinant haplotypes for re-sequencing 157 Results MER7A, DNA Transposon 32912197 G A G T C A G A C A T G A T A T A A T G A G G G T T T G T A C T T T A A T G A C A G G G A T G T G T 32912247 T C T G A G A A A T G T G T C G T T A G A T G G T T T C A T C G T T G T A T G A A C A T C A T A G A 32912297 G T G T A C T T A C A C A A A C C T A G A T G T C A T A G C C T A C T G C A C A C C T A G G C C A T 32912347 G T A G T T T A G C C T A T T G C T C C T A G G C T A C A A A T C T G T A C A A C A T A T A A C T G 32912397 C A C C T A A C A C T G T G G G C G A C T G T A A C A C A G A A G T A A G T A T T T G T G T A T C T 32912447 A A A C A T A G A A A A G G T A C A G T A A A A A T A T G G T A T T A T A A T C T T G T G G G T C C 32912497 A C C A T C T T A T A T G T G G C C C A T C A T T G A C C T A A A C T T C G T T A T G C A G T G C A 32912547 C G A C T G T A G T T T C A G C A G A A A G C A G C C A G G A T G G A A T G A A G G C A C A A T G A Approximate Centre of Sperm Recombination Hotspot (Jeffreys et al 2000) 32912597 A A T G G T T T T C G A G G G T A C T C T A A A T T A A G T A T G A C C A T A A A A A T A G A G A C MER96, DNA Transposon 32912647 A A T C A G G C C G G C T G G G A T T T G G G T A A G G T G A G T G C A C A C C T C C T T A A A C T 32912697 T T G C A C C C C A G G T G C C T C G C T C A C C T C A T C C C A G T C C C A G C C T T A T C A A A 32912747 C A G T T T G T T T G T T T G A G T A T G T C T A G A A A G A G A A A G G A A A G C A A G T G A A G 32912797 G G A A A A G A G T A A T G A T T C T G G A A A G A A A G G T G A T A A G C C T C A G A G T A A G A 32912847 T C T T C A G G G A C T G G C A A G A T G A G C T G G G A A A G A A G A G T G A A A G G G A G A A G 32912897 C A T A C C C A T C C T G A G G G A G T G A C C C T G G A G A G A T A C T T T G G A G A C A G A C T 32912947 T A G G G G T A G G A G G T A G G A G G C A G A A A G A A A T G G A A T T T C A T G G A C C T A G G 32912997 A A T G T T G A G A G A C A A C T G A G A G A C A T T C C A T C T G A G A C T T A A A T T C C T T T 32913047 T G T A C T A C C T T C A C T C A T A A C T T G T T C C T A T A A T A A G A T C A G A T A A A C T T 32913097 T G A A G A T A T T G G A T G A A T A T G A A C G A A G G A A G A A A T G A A T G G A T A G A T G A 32913147 A A C A G A A T G G T G A C T A C A T T C A C C A T A T T T T A G T T T A A G T A T T T T T G T G T 32913197 T T T G C G C C T G A A A G G G C C T A G A A A T G G A G T T A G G G A A G T G A A G A C C C C T A 32913247 T A A A G A T T T G G G G C T A G C A A A T G G A C C C A G C T G C C C A C C A C C T A C C T G C C TAP2 Exon2 32913297 A A A G G A G A A G A G G C A C A T G A A G A A G A T G G C A C T G G C A A A G G C A T G G G G G T 32913347 C A A A A T C A C C T C C C A G G A T G T C A A T C A C A C G A C C A G A A T A G T G A G G G A T T 32913397 A A T G T C T C A C C T G A A A G A G G C A T G A A A A A T A A C A C A A G A A T G T G C T G G T G 32913447 C C C A G G C C C T T T T A C C A C C T C C A A C T C A C A A C G T C C T C T C C T G A C T C A C C TAP2 Exon1 32913497 C A A A A C A G C A A G G A C A A G G A A G A A G A A G G C G G C A A C G A G G A G A G G C A G G T 32913547 C C G G C C T G G A G A G C T T C A G C A G C C T C C A C A T C A A G A C T T T G T T G T T C A C C 32913597 T G G T C C T G C T C C T T C T C C T G G G C T C C A G G A G G G C T C A G A A C A G C C C A C A G 32913647 T G A C C A G C T G A G C C C C G C A G C C C C G T A C C C C A C C A G C A G C C A G C T C C A A G 32913697 G G G C T G A A G C G A C T C T G G C T G G G G G A G C A C G T G A G G C C C C C G C G A C C A G G 32913747 G C T C T C A G G G A G A C A G T C A G G G G G G T G G C C A G A C A G A G C G G G A G C A G C A G 32913797 T G T C C C C A C A A A T C C C A G C A G C C C T C T T A G C T T T A G C A G C C C C C A C A G C C 32913847 C T C C C A G C C G C A G G G T C C C C T C C A G C C A T A G T C C T G G C A G C C C T T G A G G A 32913897 A G C A A A G T C C C C A G A G G G C C C T G A A G C A G C C A C A G T A A A G C C G C G T C C A C 32913947 C A G C A G C A G G G A G G T C C A G G G T C T C A G G T C A G G G A G C C G C A T G G C T C T G T 32913997 C A A C G G A T A C G A G A T G A G A A A T C A T G G G G G T G G A G T C C C A A T C C T T G T C C 32914047 C T G C C C T C C T A C C C G C C C G G C T C C G C C T A A C C C G T C C A T C G G C T T C T C A T 32914097 T T T Figure 3.31 Sequence of the Crossover Window within the TAP2 Recombination Hotspot The 1902bp sequence inside the TAP2 crossover window is detailed in this figure Polymorphic sites uncovered through re-sequencing are shown as highlighted orange residues The approximate centre of the sperm recombination hotspot mapped by Jeffreys et al 2000 is indicated in pink highlights Two DNA transposons lie within this window, marked-out in the black border This window also overlaps with TAP2 exons identified by blue borders Eighteen 6- to 9bp motifs that coincided with those found over-represented in recombination hotspots (Myers et al 2005) are shown in red 158 Results A*2402 A*0207 A*0207 A*0207 A*1101 A*0207 A*0207 A*2402 A*2402 A*0207 A*2402 A*0207 A*0207 A*3101 NUH042 A*0207 A*2402 A*0201 A*0206 A G G A T G G A A A A A A T A G A A C A G G A C C A G A A G G A G G G A A A G G A T G G A A A A A A T A A A A C A A G A A C A A G A A A C A A A G G A A A NP192 A A WGP030 A A CM0791 A A CM99003 A A BM05/195 A CM01510 G G G G G G G G G G G G G G G G G G G G G G G G A A A A A A A A A A A A T T T T T T T T T T T T G G G G G G G G G G G G G G G G G G G G G G G G A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A T T T T T T T T T T T T A A A A A A A A A A A A A G G A A G G A A G A G A A A A A A A A A A A A A A A A A A A A A A A A C C C C C C C C C C C C A A A A A A A A A A A A A G G A A G G A A G A G G G G G G G G G G G G G A A A A A A A A A A A A A C C A A C C A A C A C C C C C C C C C C C C C A A A A A A A A A A A A A G G A A G G A A G A G G A A G G A A G G A G A A A A A A A A A A A A A A G G A A G G A A G A G A G G A A G G A A G A G C A A C C A A C C A C A A G G A A G G A A G A G A G G A A G G A A G A G A G G A A G G A A G A G G A A G G A A G G A G A G A A G G A A G G A G A 29754144 29754858 29759664 29761308 29769435 29772431 29774090 29775252 29776055 29781462 29782720 29790218 29790379 29790845 29791787 29792613 29792973 29796159 29797106 29797181 29798998 29801092 29801478 29803481 29804097 29804828 29806800 29807135 29807364 29807763 29808058 29808162 29808594 29809543 29810489 29810658 29811241 A G G A T G G A A A A A A T A A A A C A A G A A C A A G A A A C A A A G G G A G A A A G G G A A G A T G A A A C A A G A A C A A G A A A C A A A G G A*0201 A*0206 CM0835 A G G A T G G A A A A A A T A A A A C A A G A A C A A G A A A C A A A G G G A G A A A G G G A A G A T G A A A C A A G A A C A A G A A A C A A A G G A*0207 A*0201 CF0914 G G G A G A G A G A G A G A A G A G G A A*0207 A*0207 A*0203 A*0207 A*1101 A*0207 A*0201 A*0207 A*0201 A*0207 A*0201 A*0207 A*0201 A*0207 A*0207 A*0201 A*0207 A*0203 A*0203 A*0207 A A A G A G A G A G A G A G G A G A A G G G G G G G G G G G G G G G G G G G G G A A A A A A A A A A A A A A A A A A A A A T A T A T A T A T A T A T T A T A A T A G A G A G A G A G A G A G G A G A A G G G G G G G G G G G G G G G G G G G G G G A G A G A G A G A G A G A A G A G G A G A G A G A G A G A G A G A A G A G G A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A G A G A G A G A G A G A G A A G A G G A A A A A A A A A A A A A A A A A A A A A T T T T T T T T T T T T T T T T T T T T G A G A G A G A G A G A G A A G A G G A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A C C C C C C C C C C C C C C C C C C C C A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A G G G G G G G G G G G G G G G G G G G G A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A C C C C C C C C C C C C C C C C C C C C A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A G G G G G G G G G G G G G G G G G G G G A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A C C C C C C C C C C C C C C C C C C C C A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G BM05/075 NUH032 CM0452 WGP050 CM01232 NP368 WGP006 Centromeric Homozygous Samples C B Telomeric Homozygous Samples A CM01661 NP744 NP507 Figure 3.32 Recombination Hotspot Telomeric to HLA-F A recombination hotspot region was identified within a 45kb segment from single-sperm typing experiments (Cullen et al 2002) Using data from the SNP variation map and HapMap project, this hotspot was narrowed to a SNP interval on the SNP map Panel A: The LD heatmap shows this SNP interval falling between haplotype blocks defined earlier Panel B: EHH plot across the segment The position of the hotspot segment mapped by Cullen and co-workers is indicated by the red bar at the top of the chart EHH values of HLA haplotypes decay of across this interval This interval also coincides with the breaks of HLA-A02 ancestral haplotypes discussed earlier (Section 3.3) There is also a spike in the population recombination rate estimated in the HapMap project (black vertical lines) across the interval Panel C: 19 individuals who carry recombinant haplotypes across this interval are listed in this panel with both SNP haplotypes of each individual shown together with the linked HLA-A allele samples are seen to be homozygous leading to the telomeric edge of a SNP at position 29,791,787 while 12 samples are homozygous centromeric to position 29,792,613 All of these 12 samples carry A2 alleles at the HLA-A locus This haplotypes of the recombinant samples narrow the crossover window within this hotspot to 827bp 161 Results Fourteen additional SNPs were identified and genotyped through re-sequencing, including that fall within the 826bp breakpoint interval The genotypes of the 19 samples are listed in Table 3.13 The data from the telomeric- and centromerichomozygous samples show a clean break between positions 29,791,953 and AG AG AG AG AG AG AG AA AA AA AA AA AA AA AA AA AA AA AA CT CT CT CT CT CT CT TT TT TT TT TT TT TT TT TT TT TT TT AG AG AG AG AG AG AG AA AA AA AA AA AA AA AA AA AA AA AA AG AG AG AG AG AG AG GG GG GG GG GG GG GG GG GG GG GG GG AG AG AG AG AG AG AG AA AA AA AA AA AA AA AA AA AA AA AA AG AG AG AG AG AG AG AA AA AA NN AA AA AA AA AA AA AA AA 29798998 AC AC AC AC AC AC AC CC CC CC CC CC CC CC CC CC CC CC CC 29798673 29798035 AG AG AG AG AG AG AG GG GG GG GG GG GG GG GG GG GG GG GG 29798641 29796646 CT CT CT CT CT CT CT TT TT TT TT TT TT TT TT TT TT TT TT 29798648 29796480 CC CC CC CC CC CC CC CT CT CT CT CT CT CT CT CT CT CT CT 29798514 29792946 CC CC CC CC CC CC CC CT CT CT CT CT CT CT CT CT CT CT CT 29798530 29792613 AA AA AA AA AA AA AA AG AG AG AG AG AG AG AG AG AG AG AG 29798508 29791953 TT TT TT TT TT TT TT CT CT CT CT CT CT CT CT CT CT CT CT 29798512 29791923 CM99003 BM05/195 CM01510 CM0791 NP192 NUH042 WGP030 BM05/075 CF0914 CM01232 CM01661 CM0452 CM0835 NP368 NP507 NP744 NUH032 WGP006 WGP050 29791787 Centromeric Homozygous Samples Telomeric Homozygous Samples 29790868 29,792,613, a 660bp window that did not have any polymorphic markers within AG AG AG AG AG AG AG GG GG GG GG GG GG GG GG GG GG GG GG AG AG AG AG AG AG AG GG GG GG GG GG GG GG GG GG GG GG GG CG CG CG CG CG CG CG GG GG GG GG GG GG GG GG GG GG GG GG AG AG AG AG AG AG AG AA AA AA AA AA AA AA AA AA AA AA AA Table 3.13 Re-sequencing the Recombination Hotspot Telomeric to HLA-F This table lists the genotypes of the 19 samples at 17 polymorphic sites identified by re-sequencing across a 8kb interval containing the recombination hotspot telomeric to HLA-F These 19 samples carry recombinant haplotypes and comprise of samples homozygous telomeric to this hotspot and 12 samples homozygous centromeric to this hotspot Locations highlighted in green coincide with SNPs genotyped in the Illumina SNP panel Red genotypes highlight sites that are heterozygous in the individual The telomeric-homozygous samples are all homozygous up to at least position 29,791,953, after which marks the start of a segment showing crossovers in the samples This segment continues to 29,791,613 – the last homozygous marker of centromeric-homozygous samples This confines the crossover locations to a 660bp window 162 Results The genomic features around this crossover window are shown in Figure 3.33 The 660bp crossover window is almost completely covered by transposable elements, a Charlie9 DNA transposon and MLT1F1 The large SVA insertion in PGF is also seen to border the crossover location The sequence within this crossover window is detailed in Figure 3.34, and motifs associated with recombination hotspots can be identified in here (Myers et al 2005) Figure 3.33 Genomic Location of the Hotspot Telomeric to HLA-F This figure shows the location of the hotspot mapped to a region telomeric to the HLA-F locus The genotypes of the 19 re-sequenced samples used to define the crossover window are illustrated in the panel, with the consensus crossover location marked in a pink silhouette Blue circles indicate that the individual is homozygous at that SNP position, while heterozygous loci are marked by red circles Larger circles indicate SNPs genotyped in the Illumina panel, while smaller circles are polymorphic loci identified through re-sequencing The grey tracks at the top of the figure mark the locations of repeat elements in the region The large grey shadow indicates the location of a 3kb SVA insertion seen in the PGF haplotype This insertion is not seen in any of the 19 samples sequenced, and also not found in the COX and QBL haplotypes 163 Results Charlie9, MER1-Type, DNA Transposon 29791953 T G A G A T T C A C A T T T T T T C T A C C T T A A A A G C A A T C T G A T T T G G C A A A T A T T 29792003 T T T A A A G A T G A T A T T T G A A T G A G A A A A T T G G C A T T T G G G A C A T T C T T A A A 29792053 C T A A A T T T G A G A C A T C T T A G G C A A A A C A A A T A C T T A T T T T T A A G G C A C T A 29792103 T T G T T A T G G C A C T G A A G T C T T G G A A C T A T T T G A T C T A G T T A C T G T A A G T T 29792153 C T C A G C T G T G T T G C A A C T C A T T A A A G A G A A C A T T G T T A T T A A A G G T A T T T 29792203 G C A A G A A A A A C T T A G A G A T A C T A T A G T A T C T C C T T T C T C T G T C T C A A A C T 29792253 T T T T T C C C C T C A A T A C C C A A G G C T C T G T G A T G T C T C A A A T T T T A A T C A T T 29792303 A C T T T A A A A A G A G A A G T T T A A A G C A T T A A A G A A T T A T A A T C A G A T G A A A G 29792353 C A G C T T T G G A T T T A T A A A A T T C T G A A A C A A T A A T T T T A A T T T T G C T T A A T MLT1F1, LTR Retrotransposon 29792403 T T T G C T T T T A A C A T A T A T G C A A A T T C T T T G A T A C T C T C C A C T T T G C A G A G 29792453 G T G C A G G T T C A T T C C C T C C C T G T G A G T G T G G C C T G G A C T T A A T G A T T C A C 29792503 T T C T A T C T G A T G G A G T G A C T G T T G G T G T A G A A C A A A A A A C T T A C C G T A G C 29792553 T T C T A C C T T T G C T C T C T C T G T C T C T G G G A T C A T G A A C T C T G G G G G A A G C C 29792603 A G C T G C T G T G T Figure 3.34 Sequence of the Crossover Window Associated with the Recombination Hotspot Telomeric to HLA-F The 660bp sequence inside the crossover window is detailed in this figure The polymorphic sites used to define this crossover boundary are shown as orange highlighted residues at either end of the window The entire length of the crossover window is almost covered by a Charlie9 DNA transposon at one end and a LTR retrotransposon at the other Three 6- to 9bp motifs that were found to be over-represented in recombination hotspots (Myers et al 2005) are shown in red 3.5.3 Recombination Hotspot Between the NCR3 and AIF1 Loci Through single-term typing, a segment framed by STR markers 3-A and BAT-CA was identified as a recombination hotspot region with a recombination rate 5.2 times greater than expected (Cullen et al 2002) These markers define a 42kb segment from positions 31.643 and 31.685Mb, and from the SNP variation map in this study, the hotspot location was narrowed to a 4487bp interval between positions 31,676,448 and 31,670,935 This interval falls between clearly defined haplotype blocks (Figure 3.35, panel A) and the EHH of almost all HLA-B haplotypes (including B*4601, B*5801, B*1301, B*3802 and B*4001) are seen to decay here The population 164 Results recombination rate estimates from the HapMap project also show a spike across this interval (Figure 3.35, panel B) From the SNP variation data, 16 individuals carrying recombinant haplotypes that break across this interval can be identified; telomerichomozygous samples and centromeric-homozygous samples (Figure 3.35, panel C) A 5kb segment bridging across the hotspot interval, from 31,676,300 to 31,681,324, was re-sequenced in the 16 individuals Using overlapping PCR fragments and 23 sequencing primers, this 5kb was amplified and sequenced in a tiling manner Eight polymorphic sites were identified, including that matched with the positions genotyped in the SNP map The genotyping results are detailed in Table 3.14 The telomeric-homozygous samples start to carry heterozygous markers at position 31,680,613 while centromeric-homozygous samples begin to break at 31,678,910, defining a 1703bp window across which the crossovers occurred The location of this crossover lies within a gene-desert region between flanked by the NCR3 and AIF1 loci and is littered by repetitive elements (Figure 3.36) Within the crossover window, LINE and SINE elements can be seen contributing to more than 50% of the underlying sequence (Figure 3.37) Nine DNA motifs associated with recombination hotspots (Myers et al 2005) can also be found in the sequence 165 Results B A B*5801 B*4601 B*3802 B*1301 B*4001 C Telomeric Homozygous Samples WGP028 CF0899 NUH035 NP240 NP507 CM0388 CM99004 C C C C C C C C C C C C G G C C G G G G G G A A A A A A G G G G A A A A A A A A A A A A A A G G A A A A A A A A A A A A G G G G G G G G G G G G G G G G A A A A G G G G A A A A A A A A A A G G A A A A C C C C C C C C C C A A G G G G A A A A A A A A A A G G A A A A G G G G G G G G G G A A C C C C C C C C C C C C G G C C T T T T T T A A A A A A A A T T A A A A G G G G G G G G A A A A A A A A A A A A A A A A G G A A C C C C A A A A A A A A C C C C A A A A A A A A A A A A G G A A C C C C C C C C C C C C A A C C G G G G A A G G G G G G G G G G A A A A G G G G G G G G G G G G A A A A A A A A A A A A A A A A G G G G A A A A A A A A G G G G C A A C A C C A C A A C C A A C G A A G G G G A G A G G G G A G G G G G A A G G G G A A G A G G G G G G C G G G G G G G G G G G A A A A G G A A A A G G A G A A C G G C G G C G C G G G C G G C A A A A A A A A A A G A A G A A G G G G A A G G G G A A G A G G A A A A G G A A A A G G A G A A G A A G G G G A G A G G G G A G G G G G A A G G G G A A G A G G G G G G A A G G G G A A G A G G A A A A G A A A A A A A A A A A A A A A G A A A A A A A A A A A A A A A G A A A A A A A A A A A G G G G A A G G G G A A G A G G G A A G A A G A G A G A G G A G A A A A G A A A A A A A A A A A G G G G C C G G G G C C G C G G A G G A G G A G A G G G A G G A 31629281 31630648 31632830 31633298 31633427 31637862 31639194 31641357 31644203 31646223 31646476 31648292 31648535 31648763 31650287 31650455 31651010 31663109 31675401 31676448 31680935 31683255 31686751 31690004 31693198 31695540 31695917 31697655 31698877 31700043 31701455 31703466 31706468 31708799 31709822 31710946 31711570 31711749 31713178 31717792 BM05/206 B*3501 B*2701 B*5502 B*5101 B*5801 B*5801 B*4001 B*4601 B*1511 B*4601 B*3901 B*4001 B*4001 B*3901 B*4001 B*4001 G A G A A G A G A A A A A G A G A A A A A G A G G G G G G A G A G A G A A G A G A C A C C A C A G A G A A G A G A G A G G A G A C C C C C C C C T A T A A T A T A G A G G A G A A A A A A A A A C A C A A C A C A A A A A A A A C C C C C C C C G G G G G G G G A G A G G G G G A A A A A A A A G A G A A G A G A A A A A A A A A A A A A A A A G G G G G G G G G G G G G G G G A A A A A A A A G G G G G G G G A A A A A A A A G G G G G G G G A A A A A A A A A A A A A A A A G G G G G G G G G G G G G G G G A A A A A A A A A A A A A A A A A A A A A A A A G G G G G G G G A A A A A A A A A A A A A A A A G G G G G G G G G G G G G G G G B*2704 B*2704 B*4601 B*4601 B*4601 B*4601 B*4001 B*4601 B*4001 C C C C C C A G G A A G A G A A A A A G A A A A G G G G G G A G G A A A C A A C C C A G G A A A G A A G G G C C C C C G A T T A A A G A A G G A A A A A A G A C C A A C A A A A A G C C C C C A G G G G G G G A A G G G A C A A A A A G G A A G C C C C C C G G G G G G G G G G A A G G G G G G A A A A G G C C C C G G A A A A A A G G G G A A A A A A G G G G G G G G G G G G A A G G G G A A A A A A A A A A A A A A A A A A A A G G G G A A G G G G A A A A A A A A G G G G C C A A A A G G B*4001 B*4006 B*5502 B*4001 B*1502 B*3802 C A A A G A C A G C A G A A A C G G A A C G A G G G A A G G A A A A A A A A C G C G A A G A C A G G A A G C G A G G A G C G A G G G A A G G A A A A A A A A C G B*4001 B*3802 CM1136 CM01173 NP744 BM03/126 WGP036 WGP055 CM0807 NP500 Figure 3.35 Recombination Hotspot Between NCR3 and AIF1 A recombination hotspot region was mapped to a 42kb segment from single-sperm typing experiments (Cullen et al 2002) and narrowed to a 4.5kb SNP interval on the SNP map This interval falls between the NCR3 and AIF loci Panel A: The LD heatmap shows this SNP interval falling between haplotype blocks defined earlier Panel B: EHH plot across the segment The position of the hotspot region mapped by Cullen and co-workers is indicated by the red bar at the top of the chart EHH values of HLA haplotypes decay across a narrow interval within this hotspot, including haplotypes carrying HLA-B alleles found on CEHs (B*4601 and B*5801) There is also a spike in the population recombination rate estimated in the HapMap project (black vertical lines) across the interval Panel C: 16 individuals who carry recombinant haplotypes across this interval are shown in this panel with both haplotypes of each individual shown together with the linked HLA-B allele Half of the samples are homozygous leading to the telomeric edge of the 4.5kb interval with the other half homozygous to the centromeric edge 166 Centromeric Homozygous Samples C C C C C C C C 31678910 31680613 31680906 31680935 31681089 31683255 GG GG AA AA GG AA GG AA AG AG AG AG AG AG AG AG 31677683 AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AC 31678649 31675401 Telomeric Homozygous Samples Centromeric Homozygous Samples BM05-206 CF0899 NP240 NP507 WGP028 CM0388 CM99004 NUH035 BM03-126 NP744 CM0807 NP500 CM1136 CM1173 WGP055 WGP036 31676448 Results GG AA GG GG AA GG GG GG GG GG GG GG AG AG AG GG TT CC CC CC CC CC CC CC CT CT CC CC CC CC CC CC AA GG GG GG GG GG GG GG AG AG GG GG GG GG GG GG AG AG AG AG AG GG GG GG AA AA GG GG AA AA GG GG CT CT CT CT CT TT TT TT CC CC TT TT CC CC TT TT AC AC AC AC AC AC AC AC AA AA CC CC AA AA CC CC GG GG GG GG GG GG GG AG GG GG GG GG GG GG GG GG AG AG AG AG AG GG GG GG AA AA GG GG AA AA GG GG Table 3.14 Re-sequencing the Recombination Hotspot Between NCR3 and AIF1 The 16 recombinant haplotypes were re-sequenced across a 5kb segment containing the recombination hotspot polymorphic sites were identified within the segment, including matching with locations on the SNP variation map The genotypes are listed in this table Locations highlighted in green are SNPs genotyped in the Illumina SNP panel Red genotypes highlight sites that are heterozygous in the individual The crossover boundary is narrowed to a 1.7kb window between positions 31,678,910 and 31,680,613 Figure 3.36 Genomic Location of the Hotspot Between NCR3 and AIF1 This figure shows the location of the hotspot mapped to a gene-desert between NCR3 and AIF1 The grey tracks at the top of the figure mark the locations of repeat elements that are seen to dominate this segment The genotypes of the 16 re-sequenced samples used to define the crossover window are illustrated in the panel, with the consensus crossover location marked in a pink silhouette Blue circles indicate that the individual is homozygous at that SNP position, while heterozygous loci are marked by red circles Larger circles indicate SNPs genotyped in the Illumina panel, while smaller circles are polymorphic loci identified through re-sequencing 167 Results FLAM_A, SINE Element 31678910 G G T C A G G C T G G T C T C G A A C T C C T G A C C T C A A G T G A T C C G C C T G C C T C G G C 31678960 C T C C C A A A G T G C T G G G A T T A T A G G C A T G A G C T A G C A C C C C T G G C C C C A C T FLAM_A, SINE Element 31679010 T T C T T T T T A A A A A G T G T T A T T A T A T A T T T T T T A T T A T A T A T A T T T T T G A G 31679060 A T G A G A T C T C A C T A T G T T G C C C A G G C T A G T C T C A A A G T C C T G A C T C C G G G 31679110 C T T T A G G T G T T C C T C C G A C C T C A G C C T T T C A C G T A G C T G G G A T T A T A G G C L2, LINE Element 31679160 A T G C A C C T G G C T T C C C A C T T T C A T T C A A T A A A T T T T G C G C A T C T A C C A T G 31679210 G C T T T C C T A G G C A A T C C T G T C A T A G C C A C A G T T G T C A C T A C T G C T T A T T C 31679260 T C T G T C A A G T C C C C A A T C T A C A T C T C C C C C T C A G G C C T C T T T C T T G A G A C 31679310 C T A A G T C C A C A C T A T C T A A C T G C T C T C T A G G C G G C T T A C C C T G A A T A C T C 31679360 C A C A G G C A T T T C A A A G T C A T C A G T G T C C A C T C A G A C C A G G T C A G C C T C C T 31679410 G T C A T C C C T G T C C C A G T G A A T G G A A A C A C A A A G C C C C A G T C A C T T A A G G C 31679460 A A A C A C C T G G G A T T C A T C C T A C T C T G C C T T C T C C C T C A G T T C C C C C A T C C 31679510 A A A A G A T C T C C A G G C C C T G T C C A T T T T G C T T C T G A A A G A T C G C A G G T G T C 31679560 T T T C C C T T G C T C T T C A T T C C A C T G G T T G C T A A A T C C C T C A T C A A C T C A A G 31679610 G G G A A A C G A G C A G A G T G C T T C T C T G A T G G G T A G T G T G G T T T C T G C A C A G 31679660 C A T C C C C T T C A T C C C A C C A C T G C T G G G C A T T G A G G T T C A T T C A T C T A T T C 31679710 A G C A T T G C T C T T C A C G A G G G C C T T C C A T G G G C C A G A C A C C C T A T C T T C A T 31679760 C T C T C T T A A T C G C T C T T T T C A G T A T C T C T C T C C T T A T C T C T C A T A T T T C C 31679810 C A C A G C T C T G T C C A C A A C T C T T T C T G T C T C A C C A T G T T A T T C A T A T T A C T 31679860 T G T T T C T T C C C C C G T G T C C A C T C A A A C G C C A C A T C T C T A C A C A C C C C T A C L2, LINE Element 31679910 C C C T C T G C C T C T C T G T C A C A T G C A T A C A C A C T T C T G C T T A T T C A C T C A T T 31679960 C A A C A A A T A T T C A G C G A G C A C C T T C C A C G T G A G A C A T T C T A T T T T T T T T C AluSx, SINE Element 31680010 T T T T T T T T T T T T T G C G C T C T C A G C T C A C T G T A A C C T C C A C C T C C C A G G T T 31680060 C A A A T G A T T C T C C T G C C C C A G C C T C C A G A G T A G C T G G G A T T A C A G G C A C A 31680110 T G C C A C C A C C C C T G G C T A A T T T T T G T A T T T T T A G T A G A G A T G G G G T T T T G 31680160 C C A T G T T G G C C A G G C T G G T C T T G A A C T C C T G G C C T C A A G T G A T C C A C C T G 31680210 C C T C A G C C T C C C A A A G T G C T G G G A T T A C A G G T G T G A G C T G C C G T G T C T G G 31680260 T C T G C C T C T C C G T C T T T C T C T C T C T C T G T C T T C C T C C A T C T C T C T T C G C A 31680310 T C G C T T T C T G C C T C C C C A T C A T T C T C C A T G T T T T C C C T T C C C A T C T C T C C 31680360 C C A T C T A C A T A C C T T A T T C T T T T A C T C C A T T T C T C T T C C T T C C C C A T T T C 31680410 T C T C T G G G T G A G A G A A T G A A G G A A G G C T A G T G A C T A G T C A C C T C T T C C C T 31680460 T T A G G G G C C A G A G T T C A G G C C T G C C T C A G C T C T G C C A G G C T G G T T G G C A C 31680510 T A C T C T T G T T T G C C C T T G G A G T C T C T G C A C A A G G A T G C T T A A A A A A A A A A AluSg, SINE Element 31680560 G T T T A G G C C A G G C A C A G T G G C T A C C G C T T G T A A T C C C A A C A C T T T G G G A G 31680610 G C C G Figure 3.37 Sequence of the Crossover Window Associated with the Recombination Hotspot Between NCR3 and AIF1 The 1.7kb sequence inside the crossover window associated with this hotspot is detailed in this figure The polymorphic sites used to define this crossover boundary are shown as orange highlighted residues at either end of the window This segment is dominated by SINE and LINE interspersed repeats marked out in black borders Nine 6- to 9bp motifs (marked out in red borders) that are found over-represented in recombination hotspots (Myers et al 2005) are also found in the sequence 168 Results 3.5.4 Recombination Hotspot Telomeric of HLA-DPA1 Cullen and co-workers reported a segment between STR markers RNG-CA and DPB2A2 that had an elevated recombination rate 2.5 times higher than expected These markers were mapped to a 102kb segment (positions 33.070Mb to 33.172Mb) lying at the centromeric end of the MHC Using these locations as a guide, the SNP variation map data was analysed for evidence of a recombination hotspot within the segment, and based on the criteria described earlier, a 4.3kb interval (between positions 33,129,170 and 33,133,471) lying 10kb telomeric to the HLA-DPA1 locus was identified as a crossover junction Firstly, this interval fell within a segment of very low LD that was between clearly defined haplotype blocks Secondly, the EHH plots of HLA-DRB1 haplotypes, including associated with CEHs in the population (DRB1*0301 and DRB1*0901), clearly break across this interval, coinciding with a spike in the population recombination rate reported by the HapMap project Finally, samples carrying recombinant haplotypes across the interval can be readily identified Ten of these were homozygous telomeric to the interval, with haplotypes clearly diverging at the centromeric end Another 12 were homozygous centromeric to the interval, with haplotypes breaking at the telomeric end (Figure 3.38) To finely map the crossover locations in these recombinant haplotypes, a 5.9kb segment from positions 33,128,468 to 33,134,394 covering the SNP interval was resequenced in the 22 samples This segment was amplified using overlapping PCR fragments and sequenced using 27 primers, producing reads that tiled across the 5.9kb in both strands From these, a total of 41 polymorphic sites were identified, including that were markers on the SNP variation map The genotype results for all 22 samples at these polymorphic loci are listed in Table 3.15 169 Results A B CF99002 C Telomeric Homozygous Samples NP780 NP604 CF0535 CM0396 NP500 CM99003 CM0461 CF02756 DRB1*0307 DRB1*1501 DRB1*1401 DRB1*1602 DRB1*0901 DRB1*0403 DRB1*0809 DRB1*0901 DRB1*1202 DRB1*1501 DRB1*1602 DRB1*0901 A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A T T T T T T T T T T T T G G G G G G G G G G G G A A A A A A A A A A A A T T T T T T T T T T T T G G G G G G G G G G G G A A A A A A A A A A A A A A A A A A A A A A A A G G G G G G G G G G G G A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A T T T T T T T T T T T T A A G A G A A A A A A A A T A A A T A T A A T A A G G A A G A G G A G A A A G A A G A G G A A A A A A A A A A C A A A A A A G A A A A A G A A A G G A G G G G A A G G G A A G A A A A A G A A A G G G G G G G A G G G G A A C A A A A C C A A A A A G A A A A G G A A A G A A G G G G A A G A G A A A A A A A A A A A A A A A A A A A A A A A A G A A G G G G A A G A G A A A A A A A A A A A A G A A G G G G A A G G G G G G G G G G G G G G G A G G A A A A G G A G A G A A G G G G A A G A G C C C C C C C C C C C C DRB1*1401 DRB1*0901 DRB1*1501 DRB1*0406 DRB1*0803 DRB1*1202 DRB!*0301 DRB1*1302 A A G G G G G G A A G G A A A A A A T T T T T T T T A A A A T T G G G G G G G G A A C C C C A A T T T T T T A A G G A A A A G G A A T T T T T T A A G G G G A A G G A A A A A A A A A A A A G G A A A A A A G G A A A A A A A A A A A A A A G G A A G G A A G G A A G G A A A A A A G G A A G G T T T T A A T T G A G A A A A G A A A A T T T T A G G G A G G G A G G G A G G G A C A C A A C C A A G A A G A A G A A A G A A A A A G A A G A A G A G A G G A G A C C C A C C C A G G G A G G G G A A A G A A A A A A A A A A A A A A A A A A A G A A A G A A A A A A A A A A A G A A A G G A G G G G G G C G C A G G G G A G A G A A A A G A A C C C C C C C C 33092429 33092766 33093354 33093970 33095752 33096673 33098948 33101244 33104328 33105652 33106707 33109466 33111043 33113622 33114847 33116524 33124706 33125435 33129170 33132110 33132251 33133471 33133678 33134088 33136695 33141000 33141792 33142574 33146347 33148693 33150269 33150858 33152166 33152235 33152366 33155009 33155590 33157189 33157362 33157704 NUH035 DRB1*0301 DRB1*1602 DRB1*0901 DRB1*1501 A A A T G A T G A A G A A A A A A A T G A A A A A G A G A A G A A G A G G A G C DRB1*1501 G A T T G C T A T G A A A A A G G G T A A A A A A G A G A A G A A G A G G A G C DRB1*0901 A A A A G A G A A A A A A A G A G A T A T A T A T A T A A T T A T A T A T T A T A T A T T T T A T T A T A T G G G G G G G G G G G G G G G G G G C A C A C A C A C A A C C A C A C A T T T T T T T T T T T T T T T T T T A G A G A G A G A G G A A G A G A G T A T A T A T A T A A T T A T A T A G A G A G A G A G A A G G A G A G A A G A G A G A G A G G A A G A G A G A A G A A A G A A A A G A A A A A A A A G A A A G A A A A G A A A A A A A A G A A A G A A A A G A A A A A A A A G A A A G A A A A G A A A A A A G A A A G A A A G A A A G A G A G A G A A A G A A A G A A A G A G A G A G A G A G A G A G A A G G A G A G A T T T T T T T T T T T T T T T T T T A G A A A A A G A G A A G A A A A G T A T T A A T A A A A T A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A G G G G G G G G G G G G G G G G G G A A A A A A A A A A A A A A A A A A G G G G G G G G G G G G G G G G G G A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A G G G G G G G G G G G G G G G G G G A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A G G G G G G G G G G G G G G G G G G A A A A A A A A A A A A A A A A A A G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G A A A A A A A A A A A A A A A A A A G G G G G G G G G G G G G G G G G G C C C C C C C C C C C C C C C C C C NP200 DRB1*0406 DRB1*1202 DRB1*0301 DRB1*1201 DRB1*0405 DRB1*1101 DRB!*0405 DRB1*1405 DRB1*0803 DRB1*0901 DRB1*1602 DRB1*0809 DRB1*0901 DRB1*0809 DRB1*1101 DRB1*0901 DRB1*0901 DRB1*1401 WGP028 BM05/206 WGP055 NP377 CF1007 CM0815 BM04/006 NP240 Centromeric Homozygous Samples G A G A G A G A G A A G G A G A G A NP633 G A T T G C T A T G A A A A A G G G T A A A A A A G A G A A G A A G A G G A G C DRB1*0803 A A A T G A T G A A G A A A A A A A T A A A A A A G A G A A G A A G A G G A G C DRB1*0701 NP771 A A A T G A T G A A G A A A A A A A T A A A A A A G A G A A G A A G A G G G A C DRB1*1405 G G T A G C T A T G A A A A A G G G A A T A A A A G A G A A G A A G A G G G A C DRB1*1404 BM03/130 Figure 3.38 Recombination Hotspot Telomeric to HLA-DPA1 A recombination hotspot region was mapped to a 102kb segment around the HLA-DPA1 locus by singlesperm typing experiments (Cullen et al 2002) This was narrowed to a 4.3kb SNP interval on the SNP map telomeric to HLA-DPA1 Panel A: The LD heatmap shows this SNP interval falling between haplotype blocks defined earlier Panel B: EHH plot across the segment The position of the hotspot mapped by Cullen and co-workers is indicated by the red bar at the top of the chart EHH values of HLA-DRB1 haplotypes decay across a narrow interval within this hotspot, including haplotypes carrying found on CEHs (DRB1*0301 and DRB1*0901) There is also a spike in the population recombination rate estimated in the HapMap project (black vertical lines) across the interval Panel C: A total of 22 individuals who carry recombinant haplotypes across this interval were identified from the SNP variation map Both haplotypes of each individual are shown together with the linked HLADRB1 allele 10 of the samples are homozygous leading to the telomeric edge of the 4.3kb interval with the other 12 homozygous to the centromeric edge 170 Locations highlighted in green are SNPs genotyped in the Illumina SNP panel Red genotypes highlight sites that are heterozygous in the individual Table 3.15 Re-sequencing the Recombination Hotspot Telomeric to HLA-DPA1 The 22 recombinant haplotypes were re-sequenced across a 5.9kb segment containing the recombination hotspot A total of 41 polymorphic sites were identified, including matching with locations on the SNP variation map The genotypes are listed in this table and used to narrow the crossover boundary to a 1.8kb window between positions 33,131,377 and 33,133,152 Results 171 Results The genotypes of the recombinant haplotypes clearly mark out a window within which chromosomal crossover have occurred Crossing-over for telomerichomozygous samples begin after position 33,131,377, while the last homozygous marker common in centromeric-homozygous samples is at position 33,133,152, marking out a 1,776bp segment within which the crossover in recombinant haplotypes are clustered The LTR (long terminal repeat) segment of a 6.8kb long endogenous retrovirus HERVK22 encroaches into the crossover window (Figure 3.38) The sequence of the crossover window is shown in Figure 3.39, and motifs associated with recombination hotspots (Myers et al 2005) are found within Figure 3.39 Genomic Location of the Hotspot Telomeric to HLA-DPA1 The hotspot is located approximately 7kb telomeric to the HLA-DPA1 locus The genotypes of the 22 re-sequenced samples used to define the crossover window are illustrated in the panel, with the consensus crossover location marked in a pink silhouette Blue circles indicate that the individual is homozygous at that SNP position, while heterozygous loci are marked by red circles Larger circles indicate SNPs genotyped in the Illumina panel, while smaller circles are polymorphic loci identified through re-sequencing The grey track at the top of the figure marks the centromeric end of a 6.8kb endogenous retrovirus HERVK22 that encroaches into the crossover window 172 Results HERVK22, Endogenous Retrovirus 33131377 G A A C T G A T A T T T C C A T T A C A C C C C A A A A C T C C T C A T T T G T G A C C C A G G T G 33131427 C A G A A A C A G G C T T G G T T T G C C T C A T G T A T C A C T A A T T A T A A T A C A T C T A A 33131477 T T T A A A T A T T A C T A G T G T C A T G G T A T T A A G G A G A C A A T C T G A G G C A T T C C 33131527 T A C C C A G T C A A T T T G A C A T G C G A T T G G C A A A G T T C C T C T G C C C T T G C C A C 33131577 C T A A G A A T G T G C C C T G T C C T A G G C C A G A C C C A A A A G A T A C A T A G G C A C A C 33131627 T T A C A G C C T T T A T A G T C T C A G C C A C A G T C A T C C T A G C A A C T G C T A G T G T G 33131677 G C T G T A G C A G C T A T T A C T G A A T C A G T A C A A A T A G G T G C T T T T G T A G A T A A 33131727 T T T G G C C A G A A A T G T G T C T A A T G A A C T T C T C T T A C A G C A G G G T A T A G A G C 33131777 A A A A G A T T C T T G C A C G T C T G C A A G C C C T T G A G G C C C T T G A G G C T G C C C T G 33131827 G A A T A T A T G G G G G A G T A A C A A G A T G C A C T G G T A T T C T A A C A G C A A C T A A A 33131877 C T G C G A C T G G G C G C A T A A A C A T A T C T G C G T C A C T T C T C T A T C A T A G A A T C 33131927 A A T C A A T A C A T A G T T G G G A T G A A G T G A A A C A A C A C C T C T G G G G A A C A T T T 33131977 C A T G A C A A T T T A A T A G C A G A T G T A A A G C A A C T T C A A A C T A A A A T T T T A G A 33132027 A T C C C T T C C C A C T A T A G A T C T A C A C A C C C A A C A A A C A G C C A T A T G G A A G G 33132077 G T G T G C A A G A T C A T C A C T C C T G G T T A G A C C C C C G C T C C T G G G G T T C A C T C 33132127 T T T G A C T G G A A A A G A A T A T T G C T A A T T A T T C T C A T G A T T G T C T T A T G T T A 33132177 T T T G C T A A T T C T A G G A T G C A A A G C C G G A A T G A A A G C G A T G A C T G C C T T G C 33132227 C T G A C A G A C G T G T T G C T G C A C A C A T C T G T A C A C T T C A G T C A A C A G A A G A G 33132277 C G T G T G C A C T T A T T A G A T G A G T G T T G G T A T C A A T G G A T A C A T G T A C A T A T 33132327 C T T A G T T T T C C A A A T T C A G G G A T G T A A C T T C T G T T T G C C A T A A C T G A T T A 33132377 G G T T C T A A T C C T C T A G G G T T A A C A C C T T T T G T G T T A G A A C C T G T G A A G T A 33132427 G A A G T A A C T C A G A A G T G C T C C T C A G A G A G T A G A C A G C T C T T T C T C T A A C C 33132477 G T T T C C A G C T C A G T A G A A T T T A G A A A G G C T T C T A G G A G G C C A A C C A G T C T 33132527 T T T T G A T C C A A C A T T G A A T T G T A A A A C C G G A T A T G G A A G C C A A A T T T C A C 33132577 A G T G G A T C T A A C A A A G T A G C T A A T G G G T A C T A T G C T T C T G A G A A C C T G A A 33132627 C A G G C A T C T G A G A G C T G T A A C T A G A A G A A A A G T A A A G A C T C C G G A C T C C A 33132677 G C A C C A A G C A G G T T T T C C T T A G C A A T T T A C A A C C T G A A G C T C C A A G G A A A 33132727 A A C T A T T T T A G C A T A C A C C A A A A C T A T T C C C A T G T G C C A A C A G G T A A G G A 33132777 G A C T T G T A C T T A T A T T C T G T T T T A T T C T T C T C T A A C T C G T T T C T G T G C A C 33132827 T A T T T C T A T G T T T T C T C C T T A G T T T T A C C T T G C C T G G G T T T G C C C A T T T G 33132877 T T A T T C A T A T C T A T T T A T C A A T C C C A A A A T A C T A A A A G G A T C C A G G C A G G 33132927 G C A G C T T T A A T T G G T G G C T G C A C A G A G T G C T A C T T C C T G T G A G G C A G C A A 33132977 T T C T A A C C C T A G T T G G C A T A C A C T T C A G A T T T T C T C A A C A G C A G A A G A T G 33133027 T A A C C T T C C C A A G A A C C C A C T T C A A C C C T C A G T T C T C T C A C T T C T A T G T C 33133077 C A C G T G A C A C T C T G A T A C A T T T T C C C A C T A T G A A A A G A A C T G T T C T C T T G 33133127 A T A G C A T G C C A T A C C T C T C C C T T C T C Figure 3.40 Sequence of the Crossover Window Associated with the Recombination Hotspot Telomeric to HLA-DPA1 The 1.8kb sequence inside the crossover window associated with this hotspot is detailed in this figure The polymorphic sites identified through re-sequencing are shown as orange highlighted residues The telomeric half of the crossover window coincides with the end of a long endogenous retrovirus marked out in a black border Eight 6- to 9bp motifs (marked out in red borders) that are found overrepresented in recombination hotspots (Myers et al 2005) are also found in the sequence, most falling within the transposable element 173 Results 3.5.5 Narrowing the Location of Sperm-Mapped Recombination Hotspots The results in this section show that data from SNP variation maps can be used to identify individuals who carry recombinant haplotypes that break across spermmapped recombination hotspots When these recombinant haplotypes are aligned it is seen that the locations of homozygous disruptions in these haplotypes cluster around defined SNP intervals, and these intervals are correlated with spikes in population recombination rate, haplotype block structure as well as the underlying LD variation By re-sequencing across these SNP intervals, the crossover windows in the recombinant haplotypes can be accurately determined Using this approach, the crossover window in the TAP2 recombination hotspot can be identified, and confirms the location fine mapped by sperm-typing (Jeffreys et al 2000) This method was applied to the recombination hotspot segments reported by Cullen et al 2002 across the MHC, and with data from the SNP variation map as well as the HapMap population recombination rate, the crossover intervals in these recombination regions can be narrowed to SNP intervals Three of these were resequenced in this study, with the corresponding crossover locations fine-mapped to windows of less than 2kb wide The locations of the narrowed crossover locations in each of the hotspots are summarized in Figure 3.41 Finally, the success in fine mapping the crossover locations suggests that this approach is a viable alternative to sperm-typing, and can be easily scaled up to other regions of the genome while avoiding the male-bias nature of sperm recombination maps 174 Figure 3.41 Summary of Fine-Mapped Crossover Locations in Sperm Recombination Hotspots The chromosomal crossover windows of the recombination hotspots identified using STR markers by Cullen et al 2002 are summarized in this figure The first hotpots were successfully re-sequenced in this study and mapped to windows of less than 2kb each Results 175 ... A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A T T T T T T T T T T T T A A A A A A A A A A A A A. .. A A A A A A A A A A A A A A A A A A A G G G G G G G G G G G G A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A. .. A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A G G G G G G G G G G G G G G G G G G A A A A A A A A A