1. Trang chủ
  2. » Thể loại khác

Ebook Introduction to genetic analysis (9th edition): Part 2

366 49 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 366
Dung lượng 47,28 MB

Nội dung

(BQ) Part 2 book Introduction to genetic analysis presents the following contents: Gene isolation and manipulation, genomics, the dynamic genome - Transposable elements, genetic regulation of cell number - Normal and cancer cells, population genetics, quantitative genetics, evolutionary genetics,... and other contents.

44200_11_p341-388 3/9/04 1:17 PM Page 341 11 GENE ISOLATION AND MANIPULATION KEY QUESTIONS • How is a gene isolated and amplified by cloning? • How are specific DNAs or RNAs identified in mixtures? • How is DNA amplified without cloning? • How is amplified DNA used in genetics? • How are DNA technologies applied to medicine? OUTLINE 11.1 Generating recombinant DNA molecules 11.2 DNA amplification in vitro: the polymerase chain reaction 11.3 Zeroing in on the gene for alkaptonuria: another case study 11.4 Detecting human disease alleles: molecular genetic diagnostics 11.5 Genetic engineering Injection of foreign DNA into an animal cell The microneedle used for injection is shown at right and a cell-holding pipette at left [Copyright M Baret/Rapho/Photo Researchers, Inc.] 341 44200_11_p341-388 3/9/04 1:17 PM Page 342 342 Chapter 11 • Gene Isolation and Manipulation CHAPTER OVERVIEW enes are the central focus of genetics, and so clearly it is desirable to be able to isolate a gene of interest (or any DNA region) from the genome and amplify it to obtain a working amount to study DNA technology is a term that describes the collective techniques for obtaining, amplifying, and manipulating specific DNA fragments Since the mid-1970s, the development of DNA technology has revolutionized the study of biology, opening many areas of research to molecular investigation Genetic engineering, the application of DNA technology to specific biological, medical, or agricultural G problems, is now a well-established branch of technology Genomics is the ultimate extension of the technology to the global analysis of the nucleic acids present in a nucleus, a cell, an organism, or a group of related species (Chapter 12) How can working samples of individual DNA segments be isolated? That task initially might seem like finding a needle in a haystack A crucial insight was that researchers could create the large samples of DNA that they needed by tricking the DNA replication machinery to replicate the DNA segment in question Such replication could be done either within live bacterial cells (in vivo) or in a test tube (in vitro) CHAPTER OVERVIEW Figure Gene of interest Chromosome (a) In vivo (b) In vitro Restriction enzyme Ligase Vector DNA polymerase ORI Bacterial genome DNA polymerase Clone of bacterial cells Enzymes that bind to DNA Primer for DNA polymerization Figure 11-1 How to amplify an interesting gene Two methods are (a) in vivo, by tricking the replication machinery of a bacterium into amplifying recombinant DNA containing the gene, and (b) in vitro, in the test tube Both methods employ the basic principles of molecular biology: the ability of specific proteins to bind to DNA (the proteins shown in yellow) and the ability of complementary single-stranded nucleic acid segments to hybridize together (the primer used in the test-tube method) 44200_11_p341-388 3/9/04 1:17 PM Page 343 343 11.1 Generating recombinant DNA molecules In the in vivo approach (Figure 11-1a), the investigator begins with a sample of DNA molecules containing the gene of interest This sample is called the donor DNA and most often it is an entire genome Fragments of the donor DNA are inserted into nonessential “accessory” chromosomes (such as plasmids or modified bacterial viruses) These accessory chromosomes will “carry” and amplify the gene of interest and are hence called vectors First, the donor DNA molecules are cut up, by using enzymes called restriction endonucleases as molecular “scissors.” These enzymes are a class of DNA-binding proteins that bind to the DNA and cut the sugar – phosphate backbone of each of the two strands of the double helix at a specific sequence They cut long chromosome-sized DNA molecules into hundreds or thousands of fragments of more manageable size Next, each fragment is fused with a cut vector chromosome to form recombinant DNA molecules Union with the vector DNA typically depends on short terminal single strands produced by the restriction enzymes They bond to complementary sequences at the ends of the vector DNA (The ends act like Velcro to join the different DNA molecules together to produce the recombinant DNA.) The recombinant DNAs are inserted into bacterial cells, and generally only one recombinant molecule is taken up by each cell Because the accessory chromosome is normally amplified by replication, the recombinant molecule is similarly amplified during the growth and division of the bacterial cell in which the chromosome resides This process results in a clone of identical cells, each containing the recombinant DNA molecule, and so this technique of amplification is called DNA cloning The next stage is finding the rare clone containing the DNA of interest In the in vitro approach (Figure 11-1b), a specific gene of interest is amplified chemically by replication machinery extracted from special bacteria The system “finds” the gene of interest by the complementary binding of specific short primers to the ends of that sequence These primers then guide the replication process, which cycles exponentially, resulting in a large sample of copies of the gene of interest We will see repeatedly that DNA technology depends on two basic foundations of molecular biology research: • The ability of specific proteins to recognize and bind to specific base sequences, within the DNA double helix (examples are shown in yellow in Figure 11-1) • The ability of complementary single-stranded DNA or RNA sequences to spontaneously unite to form double-stranded molecules Examples are the binding of the sticky ends and the binding of the primers The remainder of the chapter will explore examples of uses to which we put amplified DNA These uses range from routine gene isolation for basic biological research to gene-based therapy of human disease 11.1 Generating recombinant DNA molecules To illustrate how recombinant DNA is made, let’s consider the cloning of the gene for human insulin, a protein hormone used in the treatment of diabetes Diabetes is a disease in which blood sugar levels are abnormally high either because the body does not produce enough insulin (type I diabetes) or because cells are unable to respond to insulin (type II diabetes) In mild forms of type I, diabetes can be treated by dietary restrictions but, for many patients, daily insulin treatments are necessary Until about 20 years ago, cows were the major source of insulin protein The protein was harvested from the pancreases of animals slaughtered in meat-packing plants and purified at large scale to eliminate the majority of proteins and other contaminants in the pancreas extracts Then, in 1982, the first recombinant human insulin came on the drug market Human insulin could be made purer, at lower cost, and on an industrial scale because it was produced in bacteria by recombinant DNA techniques The recombinant insulin is a higher proportion of the proteins in the bacterial cell; hence the protein purification is much easier We shall follow the general steps necessary for making any recombinant DNA and apply them to insulin Type of donor DNA The choice of DNA to be used as the donor might seem to be obvious, but there are actually three possibilities • Genomic DNA This DNA is obtained directly from the chromosomes of the organism under study It is the most straightforward source of DNA It needs to be cut up before cloning is possible • cDNA Complementary DNA (cDNA) is a doublestranded DNA version of an mRNA molecule In higher eukaryotes, an mRNA is a more useful predictor of a polypeptide sequence than is a genomic sequence, because the introns have been spliced out Researchers prefer to use cDNA rather than mRNA itself because RNAs are inherently less stable than DNA and techniques for routinely amplifying and purifying individual RNA molecules not exist The cDNA is made from mRNA with the use of a special enzyme called reverse transcriptase, originally isolated from retroviruses Using an mRNA molecule as a template, reverse transcriptase synthesizes a single-stranded DNA molecule that can then be used as a template for 44200_11_p341-388 3/9/04 1:17 PM Page 344 344 Chapter 11 • Gene Isolation and Manipulation RNA 5′ 3′ Poly(A) tail AAAAAAAA T T T T Oligo(dT) primer Viral reverse transcriptase RNA 5′ AAAAAAAA T T T T 5′ T T T T 5′ cDNA 3′ Hairpin loop NaOH degrades mRNA Cutting genomic DNA Most cutting is done using bacterial restriction enzymes These enzymes cut at specific DNA target sequences, called restriction sites, and this property is one of the key features that make restriction enzymes suitable for DNA manipulation Purely by chance, any DNA molecule, be it derived from virus, fly, or human, contains restrictionenzyme target sites Thus a restriction enzyme will cut the DNA into a set of restriction fragments determined by the locations of the restriction sites Another key property of some restriction enzymes is that they make “sticky ends.” Let’s look at an example The restriction enzyme EcoRI (from E coli) recognizes the following sequence of six nucleotide pairs in the DNA of any organism: 3′ 5Ј-GAATTC-3Ј DNA polymerase l 3Ј-CTTAAG-5Ј cDNA 5′ This type of segment is called a DNA palindrome, which means that both strands have the same nucleotide 3′ T TA A C A ATTC G G S1 nuclease (single-strand-specific) CTTAAG GAATTC 3′ 5′ 5′ 3′ Double-stranded cDNA Eco RI T C Figure 11-2 The synthesis of double-stranded cDNA from mRNA A short oligo(dT) chain is hybridized to the poly(A) tail of an mRNA strand The oligo(dT) segment serves as a primer for the action of viral reverse transcriptase, an enzyme that uses the mRNA as a template for the synthesis of a complementary DNA strand The resulting cDNA ends in a hairpin loop When the mRNA strand has been degraded by treatment with NaOH, the hairpin loop becomes a primer for DNA polymerase I, which completes the paired DNA strand The loop is then cleaved by S1 nuclease (which acts only on the single-stranded loop) to produce a double-stranded cDNA molecule [From J D Watson, J Tooze, and D T Kurtz, Recombinant G TA Eco RI A AATT G C G AATTC C G Recombinant DNA molecule A T AG T A TG AT G TC C TT A G AAT A To create bacteria that express human insulin, cDNA was the choice because bacteria not have the ability to splice out introns present in natural genomic DNA CTTAA G Hybridization DNA: A Short Course Copyright 1983 by W H Freeman and Company.] double-stranded DNA synthesis (Figure 11-2 ) cDNA does not need to be cut in order to be cloned • Chemically synthesized DNA Sometimes, a researcher needs to include in a recombinant DNA molecule a specific sequence that for some reason cannot be isolated from available natural genomic DNA or cDNAs If the DNA sequence is known (often from a complete genome sequence), then the gene can be synthesized chemically by using automated techniques CTTAAG GAATTC Figure 11-3 Formation of a recombinant DNA molecule The restriction enzyme EcoRI cuts a circular DNA molecule bearing one target sequence, resulting in a linear molecule with single-stranded sticky ends Because of complementarity, other linear molecules with EcoRI-cut sticky ends can hybridize with the linearized circular DNA, forming a recombinant DNA molecule 44200_11_p341-388 3/9/04 1:17 PM Page 345 345 11.1 Generating recombinant DNA molecules sequence but in antiparallel orientation Different restriction enzymes cut at different palindromic sequences Sometimes the cuts are in the same position on each of the two antiparallel strands However, the most useful restriction enzymes make cuts that are offset, or staggered For example, the enzyme EcoRI makes cuts only between the G and the A nucleotides on each strand of the palindrome: 5Ј-GAAT TC-3Ј 3Ј-CT TAAG-5Ј These staggered cuts leave a pair of identical sticky ends, each a single strand five bases long The ends are called Enzyme sticky because, being single-stranded, they can base-pair (that is, stick) to a complementary sequence Singlestrand pairing of this type is sometimes called hybridization Figure 11-3 (top left) illustrates the restriction enzyme EcoRI making a single cut in a circular DNA molecule such as a plasmid; the cut opens up the circle, and the resulting linear molecule has two sticky ends It can now hybridize with a fragment of a different DNA molecule having the same complementary sticky ends Dozens of restriction enzymes with different sequence specificities are now known, some of which are listed in Figure 11-4 Some enzymes, such as EcoRI or PstI, make staggered cuts, whereas others, such as SmaI, make flush cuts and leave blunt ends Even flush cuts, which lack sticky ends, can be used for making Source organism Restriction recognition site in double-stranded DNA Escherichia coli 5Ј 9G A9 A9T T C Structure of the cleaved products (a) EcoRI C T T A A9 G 5Ј 5Ј A 9A 9T 9T9C 9G C9 T T A9 A 5Ј G9 5Ј overhang PstI Providencia stuartii 5Ј C T G 9C A 9G 9 G9 A 9C G T C 5Ј 9C 9T G 9C 9A 3Ј G9 3Ј A9 C9G T9C9 9G 3Ј overhang SmaI Serratia marcescens 5Ј C9 C C9 G 9G G9 9G G9 G9 C C C 5Ј G G 9G 9C 9C 9C 9G G 9G C9 C9 C9 Blunt ends (b) HaeIII Haemophilus aegyptius 5Ј 9G G9 C9 C 9 C C9 G G9 5Ј 5Ј C C 9 G9 G G9G9 C C 5Ј Blunt ends HpaII Haemophilus parainfluenzae 5Ј C C 9G G9 G9 G 9C C9 5Ј C 9G 9G 9C G 9G C 5Ј 5Ј overhang Figure 11-4 The specificity and results of restriction enzyme cleavage The 5Ј end of each DNA strand and the site of cleavage (small red arrows) are indicated The large dot indicates the site of rotational symmetry of each recognition site Note that the recognition sites differ for different enzymes In addition, the positions of the cut sites may differ for different enzymes, producing single-stranded overhangs (sticky ends) at the 5Ј or 3Ј end of each double-stranded DNA molecule or producing blunt ends if the cut sites are not offset (a) Three hexanucleotide (six-cutter) recognition sites and the restriction enzymes that cleave them Note that one site produces a 5Ј overhang, another a 3Ј overhang, and the third a blunt end (b) Examples of enzymes that have tetranucleotide (four-cutter) recognition sites C9 44200_11_p341-388 3/9/04 1:17 PM Page 346 346 Chapter 11 • Gene Isolation and Manipulation recombinant DNA Special enzymes can join blunt ends together Other enzymes can make short sticky ends from blunt ends MESSAGE Restriction enzymes cut DNA into fragments of manageable size, and many of them generate singlestranded sticky ends suitable for making recombinant DNA Attaching donor and vector DNA Most commonly, both donor and vector DNA are digested by a restriction enzyme that produces complementary sticky ends and are then mixed in a test tube to allow the sticky ends of vector and donor DNA to bind to each other and form recombinant molecules Figure 11-5a shows a bacterial plasmid DNA that carries a single EcoRI restriction site; so digestion with the restriction enzyme EcoRI converts the circular DNA into a single linear molecule with sticky ends Donor DNA from any other source, such as human DNA, also is treated with the EcoRI enzyme to produce a population of fragments carrying the same sticky ends When the two populations are mixed under the proper physiological conditions, DNA fragments from the two sources can hybridize, because double helices form between their sticky ends (Figure 11-5b) There are many opened-up Plasmid Vector Cleavage site Donor DNA (a) Cleavage by Eco RI endonuclease TT A A A A T T Cleavage sites Cleavage by Eco RI endonuclease AATT AATT TTAA TTAA AATT TTAA Hybridization (b) AA T T TTA A TT AA TT A A DNA ligase (c) AA T T TT AA T T AA TT A A Recombinant plasmid Figure 11-5 Method for generating a recombinant DNA plasmid containing genes derived from donor DNA [After S N Cohen, “The Manipulation of Genes.” Copyright 1975 by Scientific American, Inc All rights reserved.] plasmid molecules in the solution, as well as many different EcoRI fragments of donor DNA Therefore a diverse array of plasmids recombined with different donor fragments will be produced At this stage, the hybridized molecules not have covalently joined sugar – phosphate backbones However, the backbones can be sealed by the addition of the enzyme DNA ligase, which creates phosphodiester linkages at the junctions (Figure 11-5c ) cDNA can be joined to the vector using ligase alone, or short sticky ends can be added to each end of a plasmid and vector Another consideration at this stage is that, if the cloned gene is to be transcribed and translated in the bacterial host, it must be inserted next to bacterial regulatory sequences Hence, to be able to produce human insulin in bacterial cells, the gene must be adjacent to the correct bacterial regulatory sequences Amplification inside a bacterial cell Amplification takes advantage of prokaryotic genetic processes, including those of bacterial transformation, plasmid replication, and bacteriophage growth, all discussed in Chapter Figure 11-6 illustrates the cloning of a donor DNA segment A single recombinant vector enters a bacterial cell and is amplified by the replication that takes place in cell division There are generally many copies of each vector in each bacterial cell Hence, after amplification, a colony of bacteria will typically contain billions of copies of the single donor DNA insert fused to its accessory chromosome This set of amplified copies of the single donor DNA fragment within the cloning vector is the recombinant DNA clone The replication of recombinant molecules exploits the normal mechanisms that the bacterial cell uses to replicate chromosomal DNA One basic requirement is the presence of an origin of DNA replication (as described in Chapter 7) CHOICE OF CLONING VECTORS Vectors must be small molecules for convenient manipulation They must be capable of prolific replication in a living cell in order to amplify the inserted donor fragment They must also have convenient restriction sites at which the DNA to be cloned may be inserted Ideally, the restriction site should be present only once in the vector because then restriction fragments of donor DNA will insert only at that one location in the vector It is also important that there be a way to identify and recover the recombinant molecule quickly Numerous cloning vectors are in current use, suitable for different sizes of DNA insert or for different uses of the clone Some general classes of cloning vectors follow Plasmid vectors As described earlier, bacterial plasmids are small circular DNA molecules that replicate their DNA independent of the bacterial chromosome The 44200_11_p341-388 3/9/04 1:17 PM Page 347 347 www ANIMATED ART Finding specific cloned genes by functional complementation: Making a library of wild-type yeast DNA 11.1 Generating recombinant DNA molecules Restriction-enzyme sites Donor DNA Restriction fragments Recombinant vector with insert or Transformation 2 Bacterial genome 2 Replication, amplification, and cell division 2 1 2 1 Clone of donor fragment 2 1 2 Clone of donor fragment 2 1 1 2 2 Figure 11-6 How amplification works Restriction-enzyme treatment of donor DNA and vector allows the insertion of single fragments into vectors A single vector enters a bacterial host, where replication and cell division result in a large number of copies of the donor fragment plasmids that are routinely used as vectors are those that carry genes for drug resistance These drug-resistance genes provide a convenient way to select for cells transformed by plasmids: those cells still alive after exposure to the drug must carry the plasmid vectors containing the DNA insert, as shown at the left in Figure 11-7 Plasmids are also an efficient means of amplifying cloned DNA because there are many copies per cell, as many as several hundred for some plasmids Examples of some specific plasmid vectors are shown in Figure 11-7 Bacteriophage vectors Different classes of bacteriophage vectors can carry different sizes of donor DNA insert A given bacteriophage can harbor a standard amount of DNA as an insert “packaged” inside the phage particle Bacteriophage ␭ (lambda) is an effective cloning vector for double-stranded DNA inserts as long as about 15 kb Lambda phage heads can package DNA molecules no larger than about 50 kb in length (the size of a normal ␭ chromosome) The central part of the phage genome is not required for replication or packaging of ␭ DNA 44200_11_p341-388 3/9/04 1:17 PM Page 348 348 Chapter 11 • Gene Isolation and Manipulation molecules in E coli and so can be cut out by using restriction enzymes and discarded The deleted central part is then replaced by inserts of donor DNA An insert will be from 10 to 15 kb in length because this size insert brings the total chromosome size back to its normal 50 kb (Figure 11-8) As Figure 11-8 shows, the recombinant molecules can be directly packaged into phage heads in vitro and then introduced into the bacterium Alternatively, the recombined molecules can be transformed directly into E Coli In either case, the presence of a phage plaque on the bacterial lawn automatically signals the presence of recombinant phage bearing an insert Vectors for larger DNA inserts The standard plasmid and phage ␭ vectors just described can accept donor DNA of sizes as large as 25 to 30 kb However, many experiments require inserts well in excess of this upper limit pBR322 vector Eco RV 185 Nhe I 229 Bam HI 375 Sph l 562 Sal l 651 Eag l 939 Nru l 972 tet R BspM l 1063 Ppa l 3435 amp R pUC18 vector Hin dlll Sph l Pst l Sal l Xba l Bam HI Sma l Kpn l Sac l Eco Rl Sca l 3846 Pvu l 3735 Pst l 3609 Polylinker 4.4 kb lacZ ′ amp R 2.7 kb ori ori lac promoter Cut foreign DNA and vector with Sal I Cut foreign DNA and vector with Xbal I Transform bacteria Transform bacteria Plate on ampicillin Plate on ampicillin and X-Gal amp R tet R amp R tet S Blue White amp R amp R tet R Insert Insert amp R No insert Insert amp R No insert Insert Figure 11-7 Two plasmids designed as vectors for DNA cloning, showing general structure and restriction sites Insertion into pBR322 is detected by inactivation of one drugresistance gene (tet R), indicated by the tet s (sensitive) phenotype Insertion into pUC18 is detected by inactivation of the ␤-galactosidase function of lacZЈ, resulting in an inability to convert the artificial substrate X-Gal into a blue dye The polylinker has several alternative restriction sites into which donor DNA can be inserted 44200_11_p341-388 3/9/04 1:18 PM Page 349 349 11.1 Generating recombinant DNA molecules 45 kb Bam HI Bam HI Genomic DNA Sau 3A sites Bacteriophage ␭ vector Digest with Bam HI Partial digest with Sau 3A (Bam HI compatible) Isolate left and right arms Isolate 15-kb fragments Discard smaller and larger fragments Left arm Right arm Ligate Tandem recombinant DNA units Genomic DNA 15 kb Units stuffed into phages in vitro Library of genomic DNA Figure 11-8 Cloning in phage ␭ A nonessential central region of the phage chromosome is discarded, and the ends are ligated to random 15-kb fragments of donor DNA A linear multimer (concatenate) forms, which is then stuffed into phage heads one monomer at a time by using an in vitro packaging system [After J D Watson, M Gilman, J Witkowski, and M Zoller, Recombinant DNA, 2d ed Copyright 1992 by Scientific American Books.] To meet these needs, the following special vectors have been engineered In each case, after the DNAs have been delivered into the bacterium, they replicate as large plasmids Cosmids are vectors that can carry 35- to 45-kb inserts They are engineered hybrids of ␭ phage DNA and bacterial plasmid DNA Cosmids are inserted into ␭ phage particles, which act as the “syringes” that introduce these big pieces of recombinant DNA into recipient E coli cells The plasmid component of the cosmid provides sequences necessary for the cosmid’s replication Once in the cell, these hybrids form circular molecules that replicate extrachromosomally in the same manner as plasmids PAC (P1 artificial chromosome) vectors deliver DNA by a similar system but can accept inserts ranging from 80 to 100 kb In this case, the vector is a derivative of bacteriophage P1, a type that naturally has a larger genome than that of ␭ BAC (bacterial artificial chromosome) vectors, derived from the F plasmid, can carry inserts ranging Infect E coli Plaques Screen library by using nucleic acid probe from 150 to 300 kb (Figure 11-9) The DNA to be cloned is inserted into the plasmid, and this large circular recombinant DNA is introduced into the bacterium by a special type of transformation BACs are the “workhorse” vectors for the extensive cloning required by large-scale genome-sequencing projects (discussed in Chapter 12) Finally, inserts larger than 300 kb require a eukaryotic vector system called YACs (yeast artificial chromosomes, described later in the chapter) For cloning the gene for human insulin, a plasmid host was selected to carry the relatively short cDNA inserts of approximately 450 bp This host was a special type of plasmid called a plasmid expression vector Expression vectors contain bacterial promoters that will initiate transcription at high levels when the appropriate allosteric regulator is added to the growth medium The expression vector induces each plasmidcontaining bacterium to produce large amounts of recombinant human insulin 44200_11_p341-388 3/9/04 1:18 PM Page 350 350 Chapter 11 • Gene Isolation and Manipulation T7 promoter Sp6 promoter Hin dIII Bam HI NotI NotI cosN Cloning strip parB CM R BAC kb F which enters the cell and forms a plasmid chromosome (Figure 11-10a) When phages are used, the recombinant molecule is combined with the phage head and tail proteins These engineered phages are then mixed with the bacteria, and they inject their DNA cargo into the bacterial cells Whether the result of injection will be the introduction of a new recombinant plasmid (Figure 11-10b) or the production of progeny phages carrying the recombinant DNA molecule (Figure 11-10c) depends on the vector system If the latter, the resulting free phage particles then infect nearby bacteria When ␭ phage is used, through repeated rounds of reinfection, a plaque full of phage particles, each containing a copy of the original recombinant ␭ chromosome, forms from each initial bacterium that was infected Recovery of amplified recombinant molecules parA oriS repE Figure 11-9 Structure of a bacterial artificial chromosome (BAC), used for cloning large fragments of donor DNA CM R is a selectable marker for chloramphenicol resistance oriS, repE, parA, and parB are F genes for replication and regulation of copy number cosN is the cos site from ␭ phage HindIII and BamHI are cloning sites at which donor DNA is inserted The two promoters are for transcribing the inserted fragment The NotI sites are used for cutting out the inserted fragment Entry of recombinant molecules into the bacterial cell Transduction (b) + (c) introduction of single recombinant vectors into recipient bacterial cells, followed by the amplification of these molecules as a result of the natural tendency of these vectors to replicate We have seen how to make and amplify individual recombinant DNA molecules Any one clone represents a small part of the genome of an organism or only one of thousands of mRNA molecules that the organism can synthesize To ensure that we have cloned the DNA Figure 11-10 The modes of delivery of recombinant Transformation (a) + MESSAGE Gene cloning is carried out through the Making genomic and cDNA libraries Foreign DNA molecules can enter a bacterial cell by two basic paths: transformation and transducing phages (Figure 11-10 ) In transformation, bacteria are bathed in a solution containing the recombinant DNA molecule, + The recombinant DNA packaged into phage particles is easily obtained by collecting phage lysate and isolating the DNA that they contain For plasmids, the bacteria are chemically or mechanically broken apart The recombinant DNA plasmid is separated from the much larger main bacterial chromosome by centrifugation, electrophoresis, or other selective techniques Infection Lysis DNA into bacterial cells (a) A plasmid vector is delivered by DNA-mediated transformation (b) Certain vectors such as cosmids are delivered within bacteriophage heads (transduction); however, after having been injected into the bacterium, they form circles and replicate as large plasmids (c) Bacteriophage vectors such as phage ␭ infect and lyse the bacterium, releasing a clone of progeny phages, all carrying the identical recombinant DNA Progeny molecule within the phages phage genome 44200_21_p679-706 3/12/04 3:58 PM Page 692 Percentage of total globin synthesis 692 Chapter 21 • Evolutionary Genetics α α 50 in Amino Acid Sequences Among Human Globin Chains β γ 40 TABLE 21-3 Percentage of Similarity 30 ⑀ 20 ζ 10 β 12 18 24 γ δ 30 36 Postconceptual age (weeks) Birth 12 18 24 30 36 42 ␤ ␥ ⑀ ␣ ␨ ␤ ␥ 58 42 34 39 38 73 37 37 75 80 48 Table 21-3 shows the percentage of amino acid identity among these chains, and Figure 21-11 shows the chromosomal locations and intron — exon structures of the genes encoding them The story is remarkably consistent The ␤, ␦, ␥, and ⑀ chains all belong to a “␤-like” group; they have very similar amino acid sequences and are encoded by genes of identical intron — exon structure that are all contained in a 60-kb stretch of DNA on chromosome 11 The ␣ and ␨ chains belong to an “␣like” group and are encoded by genes contained in a 40kb region on chromosome 16 In addition, Figure 21-11 shows that on both chromosome 11 and chromosome 16 are pseudogenes, labeled ␺␣ and ␺␤ These pseudogenes are duplicate copies of the genes that did not acquire new functions but accumulated random mutations that render them nonfunctional What is remarkable is that the order of genes on each chromosome is the same as the temporal order of appearance of the globin chains in the course of development In regard to hemoglobin, the duplicated DNA encodes a new protein that performs a function closely related to the function encoded by the original gene But duplicated DNA can diverge dramatically in function An example of such a divergence is shown in Figure 21-12 Birds and mammals, like other eukaryotic organisms, have a gene encoding lysozyme, a protective enzyme that breaks down the bacterial cell wall This gene has been duplicated in mammals to produce a second sequence that encodes a completely different, nonenzymatic protein, ␣-lactalbumin, a nutritional component of milk Figure 21-12 shows that the duplicated gene has the same intron – exon structure as that of the lysozyme gene, whose array of four exons and three introns itself ␣-like and ␤-like globins that make up human hemoglobin things can happen: (1) the production of the polypeptide may simply increase; (2) the general function of the original sequence is maintained in the new DNA, but there is some differentiation of the sequences by accumulated mutations so that variations on the same protein theme are produced, allowing a somewhat more complex molecular structure; or (3) the new segment may diverge more dramatically and take a whole new function A classic example of the second case is the set of gene duplications and divergences that underlie the production of human hemoglobin Adult hemoglobin is a tetramer consisting of two ␣ polypeptide chains and two ␤ chains, each with its bound heme molecule The gene encoding the ␣ chain is on chromosome 16 and the gene for the ␤ chain is on chromosome 11, but the two chains are about 49 percent identical in their amino acid sequences, an identity that clearly points to the common origin However, in fetuses, until birth, about 80 percent of ␤ chains are substituted by a related ␥ chain These ␤ and ␥ polypeptide chains are 75 percent identical Furthermore, the gene for the ␥ chain is close to the ␤-chain gene on chromosome 11 and has an identical intron – exon structure This developmental change in globin synthesis is part of a larger set of developmental changes that are shown in Figure 21-10 The early embryo begins with ␣, ␥, ⑀, and ␨ chains and, after about 10 weeks, the ⑀ and ␨ chains are replaced by ␣, ␤, and ␥ Near birth, ␤ replaces ␥ and a small amount of yet a sixth globin, ␦, is produced Chromosomal distribution of the genes for the ␣ family of globins on chromosome 16 and the ␤ family of globins on chromosome 11 in humans Gene structure is shown by black bars (exons) and colored bars (introns) ␨ Postnatal age (weeks) Figure 21-10 Developmental changes in the synthesis of the Figure 21-11 ␣ 60 50 40 30 20 Chromosome 16 ζ2 ζ1 ψα1 10 kb α2 α1 3′ 5′ Chromosome 11 ⑀ 5′ Gγ A γ ψβ1 δ β 3′ 44200_21_p679-706 3/12/04 3:58 PM Page 693 693 21.6 Origin of new genes Goat α-lactalbumin gene Exon Intron GTGAGT TAG 76 327 Hen lysozyme gene Exon Intron GTAAGT CAG 82 1270 Exon 159 Exon 162 Intron GTGAG AG 474 Exon Intron GTGAG AG 1810 Exon 76 79 Intron GTGAG CAG 2303 Exon Intron GTGAG CAG 79 Exon 58 69 Figure 21-12 Structural homology of the gene for hen lysozyme and mammalian ␣-lactalbumin Exons and introns are indicated by dark green bars and light green bars, respectively Nucleotide sequences at the beginning and end of each intron are indicated, and the numbers refer to the nucleotide lengths of each segment [After I Kumagai, S Takeda, and K.-I Miura, “Functional Conversion of the Homologous Proteins ␣-Lactalbumin and Lysozyme by Exon Exchange,” Proceedings of the National Academy of Sciences USA 89, 1992, 5887 – 5891.] suggests an earlier multiple duplication event in the origin of lysozyme Imported DNA DNA duplications are not the only source of new DNA that is the basis of new functions; it can also be imported Repeatedly in evolution, extra DNA has been imported into the genome from outside sources by mechanisms other than normal sexual reproduction DNA can be inserted into chromosomes from other chromosomal locations and even from other species In some cases, genes from totally unrelated organisms can become incorporated into cells to become a functional part of the recipient cell’s genome CELLULAR ORGANELLES Eukaryotic cells have obtained some of their organelles in this way Both the chloroplasts of photosynthetic organisms and mitochondria are the descendants of prokaryotes that entered the eukaryotic cells either as infections or by being ingested These prokaryotes became symbionts, transferring much of their genomes to the nuclei of their eukaryotic hosts but retaining genes that are essential to cellular functions Mitochondria have retained about three dozen genes concerned with cellular respiration as well as some tRNA genes, whereas chloroplast genomes have about 130 genes encoding enzymes of the photosynthetic cycle as well as ribosomal proteins and tRNAs Important evidence for the extracellular origin of mitochondria is to be found in their genetic code The “universal” DNA – RNA code of nuclear genes is not, in fact, universal and differs in some respects from that in mitochondria Table 21-4 shows that, for of the 64 RNA triplets, mitochondria differ in their coding from the nuclear genome Moreover, mitochondria in different TABLE 21-4 Comparison of the Universal Nuclear DNA Code with Several Mitochondrial Codes for Five Triplets in Which They Differ Triplet code Nuclear Mitochondrial Mammalia Aves Amphibia Echinoderms Insecta Nematodes Platyhelminth Cnidaria TGA ATA AGA AGG AAA Stop Ile Arg Arg Lys Trp Trp Trp Trp Trp Trp Trp Trp Met Met Met Ile Met Met Met Ile Stop Stop Stop Ser Ser Ser Ser Arg Stop Stop Stop Ser Stop Ser Ser Arg Lys Lys Lys Asn Lys Lys Asn Lys 44200_21_p679-706 3/23/04 10:53 AM Page 694 694 organisms differ from one another for these coding elements, providing evidence that eukaryotic cells must have been invaded by prokaryotes at least five times, each time by a prokaryote with a different coding system For the vertebrates, worms, and insects, the mitochondrial code is more regular than the universal nuclear code In the nuclear genome, for example, isoleucine is the only amino acid redundantly encoded by precisely three triplets: ATT, ATC, and ATA The transition of the third base from A to G yields the fourth member of this codon group, ATG, but it codes for methionine In contrast, in mitochondria, this codon group contains two codons for methionine and two for isoleucine, separated by a transversion HORIZONTAL TRANSFER It is now clear that the nuclear genome is open to the insertion of DNA both from other parts of the same genome and from outside Within a genome, DNA can be transferred through the action of transposable elements (see Chapter 13) The chromosomes of an individual Drosophila, for example, contain a large variety of families of transposable elements with multiple copies of each distributed throughout the genome As much as 25 percent of the DNA of Drosophila may be of transposable origin What role this mobile DNA plays in functional evolution is not clear When transposable elements are introduced into zygotes at mating, such as the P elements of Drosophila (see Chapter 11, page 371), the result is an explosive proliferation of the elements in the recipient genome When a mobile element is inserted into a gene, the effect on the organism is usually drastic and deleterious, but this effect may be an artifact of the methods used to detect the presence of such elements The results of laboratory selection experiments on quantitative characters have shown that transposition can act as an added source of selectable variation There is also the possibility that genes are transferred from the nuclear genome of one species to the nuclear genome of another by retroviruses (see Chapter 13) Retroviruses can be carried between very distantly related species by common disease vectors such as insects or by bacterial infections; so any foreign genetic material carried by a retrovirus could be a powerful source of new functions Relation of genetic to functional change There is no simple relation between the amount of change in a gene’s DNA and the amount of change in the encoded protein’s function At one extreme, almost the entire amino acid sequence of a protein can be replaced while maintaining the original function Eukaryotes, from yeast to humans, produce lysozyme, an enzyme that breaks down bacterial cell walls, as mentioned earlier Virtually every amino acid in this protein has been replaced since yeast and vertebrate lines diverged Chapter 21 • Evolutionary Genetics from an ancient common ancestor; so an alignment of their two protein or DNA sequences would not reveal any similarity The evidence that yeast and human lysozyme genes are descended from an original common ancestral gene comes from comparisons of evolutionarily intermediate forms that show more and more divergence of sequence as species are more divergent The enzyme has maintained its function despite the replacement of the amino acids because just the right amino acids were substituted to maintain the enzyme’s threedimensional structure In contrast, it is possible to change the function of an enzyme by a single amino acid substitution The sheep blow fly, Lucilia cuprina, has developed resistance to organophosphate insecticides used widely to control it R Newcombe, P Campbell, and their colleagues showed that this resistance is the consequence of a single substitution of an aspartic acid for a glycine residue in the active site of an enzyme that is ordinarily a carboxylesterase The mutation causes complete loss of the carboxylesterase activity and its replacement by esterase specificity Threedimensional modeling of the molecule indicates that the substituted protein gains the ability to bind a water molecule close to the site of attachment of the organophosphate, which is then hydrolyzed by the water MESSAGE There is no regular relation between how much DNA change takes place in evolution and how much functional change results When more than one mutation is required for a new function to arise, the order in which these mutations occur in the evolution of the molecule may be critical B Hall has experimentally changed a gene to a new function in E coli by a succession of mutations and selection In addition to the lacZ genes specifying the usual lactose-fermenting activity in E coli, another structural gene locus, ebg, specifies another ␤-galactosidase that does not ferment lactose, although it is induced by lactose The natural function of this second gene is unknown Hall was able to select mutations of this extra gene to enable E coli to live, without any lactose, on a wholly new substrate, galactobionate To so, he first had to mutate the regulatory sequence of ebg so that it became constitutive and no longer required lactose to induce its translation Next, he tried to select mutants that would ferment lactobionate, but he failed First, it was necessary to select a form that would ferment a related substrate, lactulose, and then he could mutagenize the lactulose fermenters and select from among the mutants those able to operate on lactobionate Moreover, only some of the independent mutants from lactose fermentation to lactulose utilization could be further mutated and selected to operate on lactobionate The others were dead ends Thus, the sequence of evolution had to be (1) from an inducible to a constitutive enzyme, 44200_21_p679-706 3/23/04 10:54 AM Page 695 695 21.7 Rate of molecular evolution MESSAGE In the evolution of new functions by mutation and selection, particular pathways through the array of mutations must be followed Other pathways come to dead ends that not allow further evolution 21.7 Rate of molecular evolution There is no simple relation between the number of mutations in DNA or substitutions of amino acids in proteins and the amount of functional change in those proteins Although it is possible that only one or a few mutations can lead to a major change in the function of a protein, the more usual situation is that DNA accumulates substitutions over long periods of evolution without any qualitative change to the functional properties of the encoded proteins Some of the substitutions may, however, have smaller effects, influencing the kinetic properties, timing of production, or quantities of the encoded proteins that, in turn, will affect the fitness of the organism that carries them Mutations of DNA can have three effects on fitness First, they may be deleterious, reducing the probability of survival and reproduction of their carriers All of the laboratory mutants used by the experimental geneticist have some deleterious effect on fitness Second, they may increase fitness by increasing efficiency or by expanding the range of environmental conditions in which the species can make a living or by enabling the organism to track changes in the environment Third, they may have no effect on fitness, leaving the probability of survival and reproduction unchanged; they are the so-called neutral mutations For the purposes of understanding the rate of molecular evolution, however, we need to make a slightly different distinction — that between effectively neutral and effectively selected mutations It is possible to prove that, in a finite population of N individuals, the process of random genetic drift will not be materially altered if the intensity of selection, s, on an allele is of lower order than 1/N Thus the class of evolutionarily neutral mutations includes both those that have absolutely no effect on fitness and those whose effects on fitness are less than the reciprocal of population size, so small as to be effectively neutral On the other hand, if the intensity of selection, s, is of a greater order than 1/N, then the mutation will be effectively selected We would like to know how much of molecular evolution is a consequence of new, favorable adaptive mutations sweeping through a species, the picture presented by a simplistic Darwinian view of evolution, and how much is simply the accumulation of effectively neutral mutations by random fixation Mutations that are effectively deleterious need not be considered, because they will be kept at low frequencies in populations and will not contribute to evolutionary change If a newly arisen mutation is effectively neutral, then, as pointed out in Chapter 19, there is a probability of 1/(2N ) that it will replace the previous allele because of random genetic drift If ␮ is the rate of appearance of new effectively neutral mutations at a locus per gene copy per generation, then the absolute number of new mutational copies that will appear in a population of N diploid individuals is 2N␮ Each one of these new copies has a probability of 1/(2N ) of eventually taking over the population Thus, the absolute rate of replacement of old alleles by new ones at a locus per generation is their rate of appearance multiplied by the probability that any one of them will eventually take over by drift: rate of neutral replacement ϭ 2N␮ ϫ 1/(2N ) ϭ ␮ That is, we expect that in every generation there will be ␮ substitutions of a new allele for an old one at each locus in the population, purely from genetic drift of effectively neutral mutations MESSAGE The rate of replacement in evolution resulting from the random genetic drift of effectively neutral mutations is equal to the mutation rate to such alleles, ␮ The constant rate of neutral substitution predicts that, if the number of nucleotide differences between two species is plotted against the time since their divergence from a common ancestor, the result should be a straight line with slope equal to ␮ That is, evolution should proceed according to a molecular clock that is ticking at the rate ␮ Figure 21-13 shows such a plot for Number of substitutions per nucleotide followed by (2) just the right mutation from lactose to lactulose fermentation, followed by (3) a mutation to ferment lactobionate Synonymous sites 3.0 2.0 1.0 Nonsynonymous sites Divergence time (× 108 ) Figure 21-13 The amount of nucleotide divergence at synonymous and nonsynonymous sites of the ␤-globin gene as a function of time since divergence 44200_21_p679-706 3/12/04 3:58 PM Page 696 696 Chapter 21 • Evolutionary Genetics am m al s 200 180 Bi rd M s/re am p m tile al s R ep s/re pt til es ile /fi s C sh ar p/ la m V pr in ert ey se eb ct s te s/ 220 M Number of amino acid substitutions per 100 residues the ␤-globin gene The results are quite consistent with the claim that nucleotide substitutions have been effectively neutral in the past 500 million years Two sorts of nucleotide substitutions are plotted: synonymous substitutions that are from one alternative codon to another, making no change in the amino acid, and nonsynonymous substitutions that result in an amino acid change Figure 21-13 shows a much lower slope for nonsynonymous substitutions than that for synonymous changes, which means that the mutation rate to selectively neutral nonsynonymous substitutions is much lower than that to synonymous ones This is precisely what we expect Mutations that cause an amino acid substitution should have a deleterious effect above the threshold for neutral evolution more often than synonymous substitutions that not change the protein It is important to note that these observations not show that synonymous substitutions have no selective constraints on them; rather they show that these constraints are, on the average, not as strong as those for mutations that change amino acids Another prediction of neutral evolution is that different proteins will have different clock rates, because the metabolic functions of some proteins will be much more sensitive to changes in their amino acid sequences Proteins in which every amino acid makes a difference will have a smaller effectively neutral mutation rate because a smaller proportion of their mutations will be neutral compared with proteins that are more tolerant of substitution Figure 21-14 shows a comparison of the clocks for fibrinopeptides, hemoglobin, and cytochrome c That fibrinopeptides have a much higher proportion of neutral mutations is reasonable because these 160 140 Fibrinopeptides 120 100 Hemoglobin 80 60 Cytochrome c 40 20 100 200 300 400 500 600 700 800 Separation of ancestors of plants and animals 900 1000 1100 1200 1300 TABLE 21-5 Synonymous and Nonsynonymous Polymorphisms and Species Differences for Alcohol Dehydrogenase in Three Species of Drosophila Nonsynonymous Synonymous Ratio Species differences Polymorphisms 17 0.29 : 0.71 42 0.05 : 0.95 Source: J McDonald and M Kreitman, Nature 351, 1991, 652 – 654 peptides are merely a nonmetabolic safety catch, cut out of fibrinogen to activate the blood-clotting reaction From a priori considerations, why hemoglobins are less sensitive to amino acid changes than is cytochrome c is less obvious MESSAGE The rate of neutral evolution for the amino acid sequence of a protein depends on the sensitivity of a protein’s function to amino acid changes The demonstration of the molecular clock argues that most nucleotide substitutions that have occurred in evolution were neutral, but it does not tell us how much of molecular evolution has been adaptive One way of detecting adaptive evolution of a protein is by comparing the synonymous and nonsynonymous polymorphisms within species with the synonymous and nonsynonymous changes between species Under the operation of neutral evolution by random genetic drift, polymorphism within a species is simply a stage in the eventual fixation of a new allele; so, if all mutations are neutral, the ratio of nonsynonymous to synonymous polymorphisms within a species should be the same as the ratio of nonsynonymous to synonymous substitutions between species On the other hand, if the amino acid changes between species have been driven by a positive adaptive selection, there ought to be an excess of nonsynonymous changes between species Table 21-5 shows an application of this principle by J MacDonald and M Kreitman to the alcohol dehydrogenase gene in three closely related species of Drosophila Clearly, there is an excess of amino acid replacements between species over what is expected from the polymorphisms Figure 21-14 Number of amino acid substitutions in the 21.8 Genetic evidence of common ancestry in evolution evolution of the vertebrates as a function of time since divergence The three proteins — fibrinopeptides, hemoglobin, and cytochrome c — differ in rate because different proportions of their amino acid substitutions are selectively neutral When we think of evolution, we think of change The species living at any particular time are different from their ancestors, having changed in form and function by Millions of years since divergence 44200_21_p679-706 3/12/04 3:58 PM Page 697 697 21.8 Genetic evidence of common ancestry in evolution the mechanisms reviewed up to now in the discussion of the genetics of the evolutionary process But there is a second feature of the diversity of life, one that Darwin took as an important argument for the reality of evolution Not only have present organisms descended from previous, different organisms; but, if we go back in time, organisms that are currently very different are descended from a single ancestral form Indeed, if we go back far enough in time to the origin of life, all the organisms on earth are descended from a single common ancestor Thus we expect to find that apparently different species have underlying similarities, attributes of their common ancestor that have been conserved through evolutionary time despite all the changes that have taken place Before the tools of modern biochemistry and genetics were available, the chief evidence of underlying similarity of apparently different structures in different species was taken from anatomical observations of adult and embryonic forms So, the similar bone structures of the wings of bats and the forelimbs of running mammals make it evident that these structures were derived evolutionarily from a common mammalian ancestor Moreover, the anatomy of the wings of birds points to the common ancestry of mammals and birds (Figure 21-15) It is even argued that the basic segmentation of the bodies of insects and of vertebrates are evolutionary variants on a common ancestral pattern derived from the common ancestor of invertebrates and vertebrates Although this argument may seem to push the claim of evolutionary conservation too far, it turns out, as we have seen in the discussion of the Hox and HOM-C genes in Chapter 18, that genetic analysis of patterns of development provides a powerful demonstration of the common ancestry of animals as different as insects and mammals We saw in Chapter 18 that such disparate organisms as the fly, the mouse, and human beings have similar sequences for the genes controlling the development of body form (The same is true for the worm C elegans.) The simplest explanation is that the Hox and HOM-C genes are the vertebrate and insect descendants of a homeobox gene cluster present in a common ancestor some 600 million years ago The evolutionary conservation of the HOM-C and Hox genes is not a singular occurrence Many examples have been uncovered of strongly conserved genes and even entire pathways that are similar in function For example, the pathways for activating the Drosophila DL and mammalian NF␬B transcription factors are essentially completely conserved from a common ancestral pathway (Figure 21-16) The Drosophila protein at any step in the DL activation pathway is similar in amino acid sequence to its counterpart in the mammalian NF␬B activation pathway (Don’t worry about what the particular proteins do; just appreciate the incredible conservation of cellular and developmental pathways as demonstrated by the simi- Bat Humerus Radius Ulna Carpals Metacarpals Phalanges Human Bird Figure 21-15 The bone structures of a bat wing, a bird wing, and a human arm and hand These bone structures show the underlying anatomical similarity between them and the way in which different bones have become relatively enlarged or diminished to produce these different structures [After W T Keeton and J L Gould, Biological Science W W Norton & Company, 1986.] larity between the corresponding components of the two pathways indicated by similarly shaped objects in the diagrams We indeed know that DL and NF␬B participate in some equivalent developmental decisions.) Indeed, as can be seen from a selection from the known examples, such evolutionary and functional conservation seems to be the norm rather than the exception What has made developmental genetics into an extraordinarily exciting field of biological inquiry is the demonstration, by means of genetic analysis, that basic developmental pathways and their genetic basis have been conserved over hundreds of millions of years of evolution 44200_21_p679-706 3/12/04 3:58 PM Page 698 698 Chapter 21 • Evolutionary Genetics SPZ ligand Drosophila embryo Plasma membrane TOLL receptor Plasma membrane Activated protein kinase PLL CACT DL Phosphorylation IL-1 ligand Mammalian lymphocyte IL-1 receptor Activated PLL-like protein kinase I␬B CACT + I␬B NF␬B Phosphorylation DL Nucleus Import DL + NF␬B Nucleus Import NF␬B Figure 21-16 Two parallel signaling pathways The signaling pathway for activation of the Drosophila DL morphogen parallels a mammalian signaling pathway for activation of NF␬B, the transcription factor that activates the transcription of genes encoding antibody subunits There are structural protein similarities between SPZ and IL-1, TOLL and IL-1R, CACT and I␬B, and DL and NF␬B [After H Lodish, D Baltimore, A Berk, S L Zipursky, P Matsudaira, and J Darnell, Molecular Cell Biology, 3d ed Copyright Scientific American Books, 1995.] MESSAGE Developmental strategies in animals are quite ancient and highly conserved In essence, a mammal, a worm, and a fly are put together with the same basic genetic building blocks and regulatory devices Plus ỗa change, plus c’est le même chose! At a second, deeper level, we can observe the common evolutionary origin of organisms in the structure of their proteins and of their genomes The advantage of direct observation of the protein and DNA sequences is that we not have to depend on observing similarity of function among the proteins or anatomical structures that result from the possession of particular genes We have already seen that replacing a single amino acid can change the function of a protein from an esterase to an acid phosphatase Yet, despite this change in function, we have no difficulty in determining that the two enzymes are produced by reading genes that are virtually identical, one of which was derived by a single mutational step from the other as resistance to insecticides evolved by natural selection Over evolutionary time, genes that have descended from a common ancestor will diverge in DNA sequence and in their physical position in the genome, as a result of mutations and chromosomal rearrangements If enough time elapsed and there were no counteracting force of natural selection, this divergence would finally result in the loss of any observable similarity in genes or proteins between different species, even if they were descended from a common ancestor In fact, even the time since the common ancestor of present-day vertebrates and invertebrates has not erased the similarity of DNA and amino acid sequences between Drosophila and mice Not only are mutation rates not high enough to cause complete loss of similarity even over hundreds of millions of years, but also most new mutations are not preserved, because they cause a deleterious loss or change in function of a protein or in the control of the time and place of protein production Thus the amount of divergence that has been preserved in evolution has been limited 21.9 Comparative genomics and proteomics As we saw in Chapter 12, a major effort of molecular genetics is directed toward determining the complete DNA sequence of a variety of different species At the time at which this paragraph was written, the genomes had been sequenced from more than forty species of bacteria; two species of yeast; the fungus Neurospora crassa; the nematode, Caenorhabditis elegans; two species of Drosophila; two plants, Arabidopsis and rice; the mouse; and humans By the time you read these 44200_21_p679-706 3/12/04 3:58 PM Page 699 699 21.9 Comparative genomics and proteomics lines, many more genomes of many more species will have been sequenced The availability of such data makes it possible to reconstruct the evolution of the genomes of widely diverse species from their common ancestors Moreover, it is now possible to infer the similarities and differences in the proteomes of these species by comparing the gene sequences in various species with gene sequences that code for the amino acid sequences of proteins with known function Comparing the proteomes among distant species With our current state of knowledge, we can suggest functions for about half the proteins in the proteome of each of the eukaryotes whose genomes have been sequenced, by using the similarity of their sequences with proteins of known function Figure 21-17 depicts the distribution of this half of each proteome into general functional categories Strikingly, the group of proteins engaged in defense and immunity have expanded greatly in humans compared with the other species For other functional categories, though there are greater numbers of proteins in the human lineage, there is no case in which the differences between humans and all other eukaryotes are as pronounced As discussed in Chapter 10, gene expression is often controlled through regulation of transcription by proteins called transcription factors Per- haps as a manifestation of the many cell types that differentiate in humans, the size and distribution of the families of specific transcription factors in humans far exceed the numbers for the other sequenced eukaryotes, with the exception of mustard weed (Arabidopsis thaliana, see Figure 21-17) The distribution of proteins described in the preceding paragraph is a description of only half of each proteome What about the other half? It can be broken down into two components One component, comprising about 30 percent of each proteome, consists of proteins that have relatives among the different genomes, but none have had a function ascribed to them The other component, comprising the remaining 20 percent or so of each proteome, consists of proteins that are unrelated by amino acid sequence to any protein known in another branch of the eukaryotic evolutionary tree We can imagine two possible explanations for these novel polypeptides One possibility is that some of these polypeptides first evolved after the sequenced species having a common ancestor diverged from one another Because none of these species are evolutionarily closer than a few hundred million years, it is perhaps not surprising to find this frequency of newly evolved proteins The other possibility is that some of these proteins are very rapidly evolving, and so their ancestry has been essentially erased by the overlay of new mutations that have accumulated It is almost 5000 Yeast Mustard weed Worm Fly Human 4500 4000 Number of proteins 3500 3000 2500 2000 1500 Figure 21-17 The 1000 distribution of eukaryotic proteins according to broad categories of biological function [Reprinted by 500 ab ol is n/ m Tr m an od sc ifi ca rip tio tio n n/ t In ns tra la ce tio llu n C la el r l– s Pr i ce gn ot ll al ei co in n g m fo m ld un in g ic at an io d n de gr ad at io n M ul T tif ns un po ct io rt na C lp yt os ro te ke in le s ta l/s D ef tru en ct se ur al an M d is im ce m lla un ne ity ou s fu nc tio n D N A re pl ic C at io el lu la r M et pr oc es s es permission from Nature 409 (15 February 2001), 902, “Initial Sequencing and Analysis of the Human Genome,” The International Human Genome Sequencing Consortium Copyright 2001 Macmillan Magazines Ltd.] 44200_21_p679-706 3/12/04 3:58 PM Page 700 700 Chapter 21 • Evolutionary Genetics Prokaryotes only

Ngày đăng: 23/01/2020, 06:54

TỪ KHÓA LIÊN QUAN