bacterial artificial chromosomes, volume 2

309 212 0
bacterial artificial chromosomes, volume 2

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Edited by Shaying Zhao Marvin Stodolsky Bacterial Artificial Chromosomes Volume 2: Functional Studies Volume 256 METHODS IN MOLECULAR BIOLOGY TM METHODS IN MOLECULAR BIOLOGY TM Edited by Shaying Zhao Marvin Stodolsky Bacterial Artificial Chromosomes Volume 2: Functional Studies 1 1 Use of BAC End Sequences for SNP Discovery Michael M. Weil, Rashmi Pershad, Ruoping Wang, and Sheng Zhao 1. Introduction Genetic markers have evolved over the years, increasing in their numbers and utility. Beginning with phenotypes such as smooth or wrinkled, the selec- tion of genetic markers broadened to include blood group and histocompatibil- ity antigens, and protein allotypes. Around 1980, DNA itself became the marker (1),first with restriction fragment length polymorphisms (RFLPs) and then with amplification polymorphisms based on simple sequence lengths (SSLPs) (2). Each advance in the availability and usefulness of genetic markers has con- tributed to advances in fundamental and applied genetics. Single nucleotide polymorphisms (SNPs) are particularly powerful markers for genetic studies because they occur frequently in the genome, allowing the construction of dense genetic maps. Also, SNP-based genotyping should be more amenable to automation and multiplexing than genotyping based on other currently available markers. A variety of strategies have been used for SNP discovery. These include rese- quencing approaches based on the standard dideoxy, cycle sequencing method- ology, or DNA “chips.” Recently, we undertook a search for SNPs between commonly used inbred strains of laboratory mice using a resequencing approach. We took advantage of bacterial artificial chromosome (BAC) end sequence data generated by others for the public mouse genome sequenc- ing effort. These sequences allowed us to design polymerase chain reaction (PCR) primers for amplification of homologous sequences in different mouse strains. We then sequenced the PCR products and identified sequence variations between the strains. Whenever possible, we used publicly available software and From: Methods in Molecular Biology, vol. 256: Bacterial Artificial Chromosomes, Volume 2: Functional Studies Edited by: S. Zhao and M. Stodolsky © Humana Press Inc., Totowa, NJ commercially available reagents. The approach is suitable for any organism for which some sequence data are available. 2. Materials 2.1. Sequence Selection and Primer Design Programs 1. CLEAN N is publicly available at http://odin.mdacc.tmc.edu/anonftp/ 2. INPUT PRIMER is available at http://odin.mdacc.tmc.edu/anonftp/ 3. Primer3 is available from the Whitehead Institute/MIT Center for Genome Research at http://www-genome.wi.mit.edu/genome_software/other/primer3.html 2.2. Amplification 1. 10X buffer: 15 mM MgCl 2 , 0.5 M KCl, 0.1 M Tris-HCl, pH 8.3 (Sigma). 2.3. Preparation for Sequencing 1. Exonuclease I (Amersham Life Science). 2. Shrimp Alkaline Phosphatase (USB). 3. Sybr Green Dye (Molecular Probes). 2.4. Sequencing 1. ABI Prism Big Dye Terminator Ready Reaction Kit v2.0 (Perkin-Elmer). 2. 5X Reaction Buffer (Perkin-Elmer). 3. Multiscreen plate (Millipore, MAHV4510). 4. Sephadex G50 Superfine (Amersham). 5. Deionized and distilled water (VWR). 6. 45 µL column loader (Millipore). 2.5. SNP Identification 1. PolyBayes Software is available from the University of Washington (http:// genome:wustl.edu/gsc/Informatics/polybayes/). 3. Methods 3.1. Selection of BAC End Sequences and Primer Design 1. The initial step is to select sequences that are long enough to take advantage of the full accurate read length of the sequencer that will be used. Repetitive sequences are excluded to avoid designing PCR primers that will amplify more than one genomic region. In addition, some SNP genotyping assays are not well suited for discriminating SNPs within a repetitive sequence, so focusing SNP discovery on nonrepetitive sequences will avoid genotyping difficulties later. Sequence selection can be automated with CLEAN N, an in-house computer program that we devel- oped. The input for this program is a flat sequence file in FASTA format in which repetitive sequences are masked with “N” symbols (see Note 1). CLEAN N 2Weil et al. removes sequences shorter than 600 nucleotides and those containing one or more “N” symbols. 2. The remaining sequences are put into an input format for the primer design pro- gram by another in-house program, INPUT PRIMER. We then use the Primer3 program (3), which is available from the Whitehead Institute/MIT Center for Genome Research, to design PCR primers. The basic conditions for designing the primer pairs are as follows: a. Exclude region from base 100 to base 500. b. Exclude primers with more than three identical bases in a row. c. Use default value for optimum T m (60.0°C), minimal T m (57.0°C), maximum T m (63.0°C). d. Use default value for optimum size (20 bases), minimal size (18 bases), max- imum size (27 bases). e. Use default value for optimum GC content (50%), minimal GC (20%), maxi- mum GC (80%). 3.2. Amplification The optimal annealing temperature for each primer set is determined empir- ically by amplifying DNA with the primers in a gradient thermocycler with annealing temperature covering a 12°C range at 2°C intervals centered on the Primer3 predicted annealing temperature. The PCR products are analyzed by agarose gel electrophoresis, and the annealing temperature that generates a single-band PCR product of the expected size is noted. Primer sets that do not generate a single amplification product are discarded. Suitable amplification primer sets are then used to amplify DNA from the strains or individuals being surveyed for SNPs. The PCR conditions are as follows: 1. Template: 2 µL (200 ng). 2. Primer: 0.1 µM each. 3. dNTPs: 200 µM each. 4. 10X buffer: 2.5 µL. 5. Taq Polymerase: 0.02 U/µL. 6. Total volume: 25 µL. PCR cycling conditions are as follows: 1. Presoak: 95°C for 4 min. 2. Denaturation: 95°C for 30 s. 3. Annealing: as determined above, 30 s. 4. Polymerization: 72°C for 30 s. 5. PCR Cycles: 36. 6. Final Extension: 72°C for 7 min. BAC End Sequences 3 3.3. Preparation for Sequencing 1. The amplification products are prepared for sequencing by treatment with Exonu- clease I and Shrimp Alkaline Phosphatase. Each 25 µL reaction mixture receives 1 µL Exonuclease 1 and 1 µL Shrimp Alkaline Phosphatase. 2. The plate is returned to the thermocycler, and incubated at 37°C for 30 min and then at 80°C for 15 min. 3. The concentrations of the PCR products are determined by Sybr Green Dye flu- orescence quantified on a Storm Fluorimager (Molecular Dynamics). 1 µL of each PCR is transferred to a microtiter plate well containing 4 µL of water and 5 µL of 5X Syber Green. The fluorescence intensity of each sample is compared to a stan- dard curve encompassing 7.5–200 ng/µL. 3.4. Sequencing 1. The sequencing reactions are assembled in 96-well microtiter plates as follows: a. x µL PCR product (10 ng per 100 bases to be sequenced) (see Note 2). b. 3 µL Primer 1 pmol/mL (one of the PCR primers is used as the sequencing primer). c. 4 µL Big Dye Terminator Ready Reaction Mix (see Note 3). d. 4 µL 5X Reaction Buffer (see Note 4). e. dH 2 0 to a total reaction volume of 20 µL. 2. The standard thermocycling protocol outlined in the ABI Prism Dye terminator Ready Reaction protocol is followed, except the 4 min extension at 60°C is reduced to 2 min because the PCR products are short (see Note 5): a. Presoak: 96°C for 5 min. b. Denaturation: 96°C for 30 s. c. Annealing: 50°C for 30 s. d. Polymerization: 60°C for 2 min. e. PCR cycles: 25. 3. Excess dye terminator molecules are removed by gel filtration on superfine Sephadex G50 spin columns made in the wells of a Millipore multiscreen plate (see Millipore Tech Note TN053 for detailed protocol) as follows. a. Dry Sephadex is added to the wells of the multiscreen plate with a 0.45-µL column loader. 300 µL of water is added to each well and the Sephadex allowed to swell for 2 h at room temperature (at this point, the plates can be stored in Ziplock bags at 4°C). b. In preparation for sample loading, the multiscreen plate is assembled with a 96-well collection plate using an alignment frame (Millipore) and centrifuged at 450 RCF for 2 min. c. The sequencing reactions are loaded onto the Sephadex and the multiscreen plate is reassembled with a collection plate. d. Following centrifugation at 450 RCF for 2 min, the purified sequencing reac- tions are in the collection plate. They are dried using in a vacuum centrifuge designed to accept 96-well microtiter plates, and then resuspended in 15 µL of deionised and distilled water. 4Weil et al. 4. The collection plates are loaded onto the deck of a 3700 DNA Analyzer (see Note 6). Samples are injected at 2500 V for 55 s and run under standard conditions. a. Cuvet temperature: 40°C. b. Run temperature: 50°C. c. Run voltage: 5250 V. d. Sheath flow volume: 5 mL. e. Run time: 4167 s. f. Sample volume: 2.5 µL. g. Polymer: POP6. 5. Chromatograms generated from the sequencing run are then electronically trans- ferred to a DEC Alpha machine for downstream processing. 3.5. SNP Identification The software program Phred/Phrap, which is part of the Phrap package, is pro- vided by the University of Washington (http://www.phrap.org/) . Phred/Phrap will run phred and phrap, which create quality information for each base and assem- ble the sequences from same primer into a contig or contigs. The output from the Phred/Phrap program is used by the SNP detection program PolyBayes (4), also available from the University of Washington. We run PolyBayes using the default setting of P = 0.003 (1 polymorphic site in 333 bp) as the total a priori probabil- ity that a site is polymorphic and a SNP detection threshold of 0.4 (see Note 7). 4. Notes 1. If the available DNA sequences are not masked, masking can be done using RepeatMasker software from the University of Washington Genome Center (http://ftp.genome.washington.edu/cgi-bin/RepeatMasker). RepeatMasker screens a sequence in FASTA format and returns it with simple sequence repeats, low complexity DNA sequences, and interspersed repeats replaced with “N” symbols. Repetitive element libraries available for use with RepeatMasker are primates, rodents, other mammals, other vertebrates, Arabidopsis, grasses, and Drosophila. 2. The amount of DNA used in the sequencing reaction is based on the size of the PCR product, using 10 ng per 100 bases to be sequenced as a guide. In general, we have found that this approximation for calculating the amount of PCR product that goes into a sequencing reaction produces a balanced sequencing reaction for products up to 1 kb in size. 3. The version 2.0 Big Dye kit was used in preference to version 1.0 because it pro- duces longer reads on the 3700 platform. 4. The 5X Reaction Buffer used in the cycle sequencing reaction contains 400 mM Tris- HCL at pH 9.0 and 10 mM magnesium chloride. Use of this buffer allows the use of 50% less Big Dye Ready Reaction Mix thus reducing sequence reaction costs. 5. In our cycle sequencing protocol, cutting the extension time from 4 min to 2 min per cycle reduces the overall cycling time by 50 min. This time saving can increase productivity in a high throughput environment. BAC End Sequences 5 6. Initially, problems were encountered with the electrokinetic injection of DNA when in house deionized water was used. Chemical impurities present in the water may have been preferentially injected into the capillary, resulting in low-quality sequence data. This problem was remedied by switching to a commercial water source. 7. The PolyBayes setting was not optimized for mouse SNP detection. Acknowledgment BAC end sequences were provided by Dr. Shaying Zhao at The Institute for Genomic Research. This work was supported by Grant CA-16672 from the National Cancer Institute (NIH) and HG02057 from the National Human Genome Research Institute (NIH). References 1. Botstein, D., White, R. L., Skolnick, M., and Davis R. W. (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Amer. J. Hum. Genet. 32, 314–331. 2. Weber, J. L. and May, P. E. (1989) Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. Amer. J. Hum. Genet. 44, 388–396. 3. Rozen, S. and Skaletsky, H. (2000) Primer3 on the WWW for general users and for biologist programmers. Methods Mol. Biol. 132, 365–386. 4. Marth, G. T., Korf, I., Yandell, M. D., et al. (1999) A general approach to single- nucleotide polymorphism discovery. Nat. Genet. 23, 452–456. 6Weil et al. 7 2 Exon Trapping for Positional Cloning and Fingerprinting Scott E. Wenderfer and John J. Monaco 1. Introduction Positional cloning involves the genetic, physical, and transcript mapping of specific parts of a genome (1). Linkage analysis can map specific activities, or phenotypes, to a quantitative trait locus (QTL), a genomic region no smaller than 1 centiMorgan (cM) or megabase (Mb) in length. Physical mapping can then provide a map of higher resolution. Physical maps are constructed from clones identified by screening genomic libraries. Genomic clones can be char- acterized by fingerprinting and ordered to create a contig, a contiguous array of overlapping clones. Transcript identification from the clones in the contig results in a map of genes within the physical map. Finally, expressional and functional studies must be performed to verify gene content. Bacterial artificial chromosomes (BACs) and P1 artificial chromosomes (PACs), both based on Escherichia coli (E. coli) and its single-copy plasmid F factor, can maintain inserts of 100–300 kilobases (kb). Their stability and rel- ative ease of isolation have made them the vectors of choice for the develop- ment of physical maps. Once BAC clones are obtained, exon trapping can be performed as a method of transcript selection even before characterization of the contig is complete. Trapped exons are useful reagents for expressional and functional studies as well as physical mapping of BAC clones to form the com- pleted contig. Exon trapping was first used by Apel and Roth (2) and popularized by Buck- ler and Housman (3). A commercially available vector, pSPL3 (4), has been used in multiple positional cloning endeavors (5–8). Exon trapping relies on the From: Methods in Molecular Biology, vol. 256: Bacterial Artificial Chromosomes, Volume 2: Functional Studies Edited by: S. Zhao and M. Stodolsky © Humana Press Inc., Totowa, NJ conservation of sequence at intron–exon boundaries in all eukaryotic species (see Note 1). By cloning a genomic fragment into the intron of an expression vector, exons encoded in the genomic fragment will be spliced into the tran- script encoded on the expression vector (see Fig. 1). Reverse transcriptase poly- merase chain reaction (RT-PCR) using primers specific for the transcript on the expression vector will provide a product for analysis by electrophoresis and sequencing. 8Wenderfer and Monaco Fig. 1. (A) Exon splicing is conserved in eukaryotes. The sequences at the splice junctions are conserved. The gray box represents the 5′ exon and the checkered box represents the 3′ exon. The white box represents the intron. The bold bases indicate the 3′ splice acceptor, the branch point A, and the 5′ splice donor from left to right. (B) Because splicing is conserved, a genomic fragment (white bar) containing an exon (black box) from any species can be inserted within the intron of an expression con- struct for exon trapping. COS7 cells are transfected with the construct and 48 h later RNA is collected. The expressed recombinant mRNA can be isolated by RT-PCR using primers for the upstream and downstream exon of the expression construct. Genomic fragments lacking an exon would allow the upstream and downstream exons of the expression construct to splice together, resulting in a smaller RT-PCR product (the 177 bp band). We screened BAC clones by shotgun cloning small fragments into the intron of the HIV tat gene behind an SV40 early promoter. The RT-PCR products from two exon trapping experiments are shown. Because the expression vector utilizes its own exogenous promotor, exon trapping is independent of transcript abundance and tissue expression. More- over, exon trapping provides rapid sequence availability. It has proven to be a very sensitive method for transcript identification (9,10) (see Note 2). By pool- ing subclones via shotgun cloning of cosmids, BACs, or yeast artificial chro- mosomes (YACs) into the pSPL3 vector, 30 kb–3 Mb can be screened in a single experiment. Disadvantages include dependence on introns, splice donor and acceptor sites. False negatives are caused by missing genes with only one or two exons, interrupting exons by cloning into the expression vector, and possibly by not meeting unidentified splicing requirements. False positives are caused by cryp- tic splice sites (11),exon skipping (12), and pseudogenes. No one method for transcript identification has become the stand-alone method for positional cloning. Genomic sequence analysis, when sequence is available, should be the primary tool for identification of genes within a genomic region of interest. Bulk sequencing provides a template for computer selection of gene candidates via long open reading frames (ORFs), sequence homology, or motif identification. Gene Recognition and Assembly Internet Link (GRAIL) analysis can be performed manually at a rate of 100,000 kb per person-hour (13). PCR primer pairs can be made for each set of GRAIL exon clusters. Alternatively, predicted GRAIL exons may be represented in the expressed sequence tag (EST) database, a collection of sequences obtained from clones randomly selected from cDNA libraries encompassing a wide range of tissues or cell types. If an EST exists, corresponding cDNA clones can be purchased from the IMAGE consortium (14). Motif and ORF searching does suffer from a lack of specificity and sensitivity and tend to be both time consuming and software/hardware dependent. Exon trapping is an excellent tool for verification of genes predicted in the sequence, as well as for identifi- cation of genes missed by computational techniques. A cluster of trapped exons likely encodes a functional gene product if several correspond to exons also predicted by GRAIL and together they encode a long ORF. When no genomic sequence is available, exon trapping is the method of choice for initially identifying genes. Not only are new genes identified and known genes mapped, but also trapped exons, bona fide or false positives, become markers for the generation of a physical map. Southern or colony blots made from BAC clones can be hybridized with exon probes to map them to specific locations on individual BACs, or to BACs in a contig. Trapped exon probes an also be used to screen further genomic BAC libraries. In our experi- ence, more than 100 markers were generated for every 1 Mb region, resulting in a marker density of one per 10 kb. Therefore, the number of markers gener- ated during a completed exon trapping study will be sufficient for genome Exon Trapping 9 [...]... pSPL3 with 20 ,000 U T4 DNA ligase for 1 h at 42 C and transform DH10b bacterial cells by electroporation at 1.8 kV, 25 µF, 20 0 Ω (see Note 5) 5 Grow transformants overnight in 50 mL LB-amp broth, isolate DNA from shotgun subclones and test heterogeneity by running a PvuII digest on a 1% agarose gel 3 .2 Transient Transfections 1 Plate 2 × 106 COS7 cells / 75 mm2 dish and preincubate 24 h 2 Harvest cells... of the Wilms tumor gene WT1 Proc Natl Acad Sci USA 88, 9618–9 622 6 Taylor, S A., Snell, R G., Buckler, A., et al (19 92) Cloning of the alpha-adducin gene from the Huntington’s disease candidate region of chromosome 4 by exon amplification Nat Genet 2, 22 3 22 7 7 Lucente, D., Chen, H M., Shea, D., et al (1995) Localization of 1 02 exons to a 2. 5 Mb region involved in Down syndrome Hum Mol Genet 4, 1305–1311... methylation of CpG islands coupled with epigenetic silencing of their associated genes is found ( 12, 13, see 14 for a review) Why CpG From: Methods in Molecular Biology, vol 25 6: Bacterial Artificial Chromosomes, Volume 2: Functional Studies Edited by: S Zhao and M Stodolsky © Humana Press Inc., Totowa, NJ 21 22 Cross islands are protected from methylation is not certain However, the finding that deletion... Biol 196, 26 1 28 2 6 Larsen, F., Gunderson, G., Lopez, R., and Prydz, H (19 92) CpG islands as gene markers in the human genome Genomics 13, 1095–1107 7 Ewing, B and Green, P (20 00) Analysis of expressed sequence tags indicates 35,000 human genes Nat Genet 25 , 23 2 23 4 8 International Human Genome Sequencing Consortium (20 01) Initial sequencing and analysis of the human genome Nature 409, 860– 921 9 Riggs,... 2 Materials 2. 1 Preparation of the MBD Column 1 LB broth: 1% bacto tryptone, 0.5% bacto yeast extract, and 1% NaCl (all w/v) 2 LB agar: As LB broth with the addition of 12 g/L Bacto agar 3 100 mM isopropyl β-D thiogalactopyranoside (IPTG) in water, filter-sterilized Store at 20 °C 4 2X SMASH buffer: 125 mM Tris-HCl (pH 6.8), 20 % glycerol, 4% sodium dodecyl sulfate (SDS), 1 mg/mL bromophenol blue, 28 6... mM MgCl2, 100 mM KCl, and 1 mM 2- mercaptoethanol 9 DNA replication buffer contains a final concentration of 0 .2 M HEPES, 50 mM TrisHCL pH 6.8, 5 mM MgCl2, 10 mM 2- mercaptoethanol, 0.4 mg/mL bovine serum albumin (BSA), 10 µM dATP, 10 µM dGTP, 10 µM dTTP, and 5 OD260 U/mL random hexamers mix 10 [γ-32P]dATP and [α-32P]dATP Proper shielding should be used when handling all solutions containing 32P 11 pSPL3VV... µg total RNA (final concentration = 0.15 µg/mL) with 20 0 U Superscript II RT and 1 µM SA2 oligo in 20 µL 1st strand buffer for 30 min at 42 C 14 Wenderfer and Monaco 3 Preincubate cDNA 5 min at 55°C, then treat with 2 U RNAse H for 10 min, store at 4°C 4 Perform PCR on 5 µL cDNA (approx 1 .2 g) with 2. 5 U Taq DNA polymerase and 1 µM each oligos SA2 and SD6 in 40 µL PCR buffer for a total of six cycles... X-100, 10 mM β-mercaptoethanol 24 Cross 12 Buffer E: 50 mM NaCl, 20 mM HEPES (pH 7.9), 10% glycerol, 0.1% Triton X-100, 10 mM β-mercaptoethanol, 8 mM immidazole 13 1 M immidazole in water, filter-sterilized Store at room temperature 2. 2 Basic Protocol for Running an MBD Column 1 MBD buffer: 20 mM HEPES (pH 7.9), 10% glycerol, 0.1% Triton X-100 2 MBD buffer/x M NaCl: 20 mM HEPES (pH 7.9), x M NaCl,... methylation in the rest of the genome (22 ) Here a method is described by which largely intact CpG islands can be isolated from BAC clones by exploiting the differential affinity of DNA fragments containing different numbers of methyl-CpGs for a methyl-CpG binding domain (MBD) column (23 ,24 ) These columns consist of the MBD of the protein MeCP2 (25 ,26 ) coupled to a resin MeCP2 is one of a family of proteins... Acad Sci USA 88, 9 623 –9 627 19 Fan, W F., Wei, X., Shukla, H., et al (1993) Application of cDNA selection techniques to regions of the human MHC Genomics 17, 575–581 20 Goei, V L., Parimoo, S., Capossela, A., Chu, T W., and Gruen, J R (1994) Isolation of novel non-HLA gene fragments from the hemochromatosis region (6p21.3) by cDNA hybridization selection Amer J Hum Genet 54, 24 4 25 1 21 Schuler, G D., . Stodolsky Bacterial Artificial Chromosomes Volume 2: Functional Studies Volume 25 6 METHODS IN MOLECULAR BIOLOGY TM METHODS IN MOLECULAR BIOLOGY TM Edited by Shaying Zhao Marvin Stodolsky Bacterial Artificial Chromosomes Volume. are as follows: 1. Template: 2 µL (20 0 ng). 2. Primer: 0.1 µM each. 3. dNTPs: 20 0 µM each. 4. 10X buffer: 2. 5 µL. 5. Taq Polymerase: 0. 02 U/µL. 6. Total volume: 25 µL. PCR cycling conditions. chromosome 4 by exon amplification. Nat. Genet. 2, 22 3 22 7. 7. Lucente, D., Chen, H. M., Shea, D., et al. (1995) Localization of 1 02 exons to a 2. 5 Mb region involved in Down syndrome. Hum. Mol.

Ngày đăng: 11/04/2014, 00:42

Tài liệu cùng người dùng

Tài liệu liên quan