Part 2 book “Essentials of genretics” has contents: Regulation of gene expression, the genetics of cancer, genomics, bioinformatics, and proteomics, applications and ethics of genetic engineering and biotechnology, developmental genetics, quantitative genetics and multifactorial traits, population and evolutionary genetics,… and other contents.
14 Gene Mutation, DNA Repair, and Transposition CHAPTER CONCEPTS ■■ Mutations comprise any change in the base-pair sequence of DNA ■■ Mutations are a source of genetic variation and provide the raw material for natural selection They are also the source of genetic damage that contributes to cell death, genetic diseases, and cancer ■■ Mutations have a wide range of effects on organisms depending on the type of base-pair alteration, the location of the mutation within the chromosome, and the function of the affected gene product ■■ Mutations can occur spontaneously as a result of natural biological and chemical processes, or they can be induced by external factors, such as chemicals or radiation ■■ Single-gene mutations cause a wide variety of human diseases ■■ Organisms rely on a number of DNA repair mechanisms to detect and correct mutations These mechanisms range from proofreading and correction of replication errors to base excision and homologous recombination repair ■■ Mutations in genes whose products control DNA repair lead to genome hypermutability, human DNA repair diseases, and cancers ■■ Transposable elements may move into and out of chromosomes, causing chromosome breaks and inducing mutations both within coding regions and in gene-regulatory regions Pigment mutations within an ear of corn, caused by transposition of the Ds element T he ability of DNA molecules to store, replicate, transmit, and decode information is the basis of genetic function But equally important are the changes that occur to DNA sequences Without the variation that arises from changes in DNA sequences, there would be no phenotypic variability, no adaptation to environmental changes, and no evolution Gene mutations are the source of new alleles and are the origin of genetic variation within populations On the downside, they are also the source of genetic changes that can lead to cell death, genetic diseases, and cancer Mutations also provide the basis for genetic analysis The phenotypic variations resulting from mutations allow geneticists to identify and study the genes responsible for the modified trait In genetic investigations, mutations act as identifying “markers” for genes so that they can be followed during their transmission from parents to offspring Without phenotypic variability, classical genetic analysis would be impossible For example, if all pea plants displayed a uniform phenotype, Mendel would have had no foundation for his research As discussed earlier in the text (see Chapter 6), we examined mutations in large regions of chromosomes—chromosomal mutations In contrast, the mutations we will now explore are those occurring primarily in the basepair sequence of DNA within individual genes—gene mutations We will also describe how the cell defends itself from such mutations using various mechanisms of DNA repair 273 274 14 GEN E MUTATION, D NA REPAIR , AND TRANS POS ITION 14.1 Gene Mutations Are Classified in Various Ways A mutation can be defined as an alteration in DNA sequence Any base-pair change in any part of a DNA molecule can be considered a mutation A mutation may comprise a single base-pair substitution, a deletion or insertion of one or more base pairs, or a major alteration in the structure of a chromosome Mutations may occur within regions of a gene that code for protein or within noncoding regions of a gene such as introns and regulatory sequences Mutations may or may not bring about a detectable change in phenotype The extent to which a mutation changes the characteristics of an organism depends on which type of cell suffers the mutation and the degree to which the mutation alters the function of a gene product or a gene-regulatory region Mutations can occur in somatic cells or within germ cells Those that occur in germ cells are heritable and are the basis for the transmission of genetic diversity and evolution, as well as genetic diseases Those that occur in somatic cells are not transmitted to the next generation but may lead to altered cellular function or tumors Because of the wide range of types and effects of mutations, geneticists classify mutations according to several different schemes These organizational schemes are not mutually exclusive In this section, we outline some of the ways in which gene mutations are classified a different amino acid in the protein product If this occurs, the mutation is known as a missense mutation A second possible outcome is that the triplet will be changed into a stop codon, resulting in the termination of translation of the protein This is known as a nonsense mutation If the point mutation alters a codon but does not result in a change in the amino acid at that position in the protein (due to degeneracy of the genetic code), it can be considered a silent mutation You will often see two other terms used to describe base substitutions If a pyrimidine replaces a pyrimidine or a purine replaces a purine, a transition has occurred If a purine replaces a pyrimidine, or vice versa, a transversion has occurred Another type of change is the insertion or deletion of one or more nucleotides at any point within the gene As illustrated in Figure 14–1, the loss or addition of a single nucleotide causes all of the subsequent three-letter codons to be changed These are called frameshift mutations because the frame of triplet reading during translation is altered A frameshift mutation will occur when any number of bases are added or deleted, except multiples of three, which would reestablish the initial frame of reading It is possible that one of the many altered triplets will be UAA, UAG, or UGA, the translation termination codons When one of these triplets is encountered during translation, polypeptide synthesis is terminated at that point Obviously, the results of frameshift mutations can be very severe, especially if they occur early in the coding sequence Classification Based on Phenotypic Effects Classification Based on Type of Molecular Change Geneticists often classify gene mutations in terms of the nucleotide changes that constitute the mutation A change of one base pair to another in a DNA molecule is known as a point mutation, or base substitution (Figure 14–1) A change of one nucleotide of a triplet within a protein-coding portion of a gene may result in the creation of a new triplet that codes for Depending on their type and location, mutations can have a wide range of phenotypic effects, from none to severe As discussed earlier in the text (see Chapter 4), a loss-offunction mutation is one that reduces or eliminates the function of the gene product Any type of mutation, from a point mutation to deletion of the entire gene, may lead to a loss of function Mutations that result in complete loss of function are known as null mutations THE CAT SAW THE DOG Change of one letter Gain of one letter Loss of one letter Substitution Deletion Insertion THE BAT SAW THE DOG THE CAT SAW THE HOG THE CAT SAT THE DOG THE ATS AWT HED OG THE CMA TSA WTH EDO G Point mutation Frameshift mutation Loss of C Insertion of M Frameshift mutation F I G U RE – Analogy showing the effects of substitution, deletion, and insertion of one letter in a sentence composed of three-letter words to demonstrate point and frameshift mutations 14.1 Most loss-of-function mutations are recessive A recessive mutation results in a wild-type phenotype when present in a diploid organism and the other allele is wild type In this case, the presence of less than 100 percent of the gene product is sufficient to bring about the wild-type phenotype Some loss-of-function mutations can be dominant A dominant mutation results in a mutant phenotype in a diploid organism, even when the wild-type allele is also present Dominant mutations can have two different types of effects Haploinsufficiency occurs when the single wildtype copy of the gene does not produce enough gene product to bring about a wild-type phenotype In humans, Marfan syndrome is an example of a disorder caused by haploinsufficiency—in this case as a result of a loss-of-function mutation in one copy of the FBN1 gene In contrast, a dominant gain-of-function mutation results in a gene product with enhanced, negative, or new functions This may be due to a change in the amino acid sequence of the protein that confers a new activity, or it may result from a mutation in a regulatory region of the gene, leading to expression of the gene at higher levels or at abnormal times or places A dominant negative mutation may directly interfere with the function of the product of the wild-type allele Often this occurs when the mutant nonfunctional gene product binds to the wildtype gene product, inactivating it The most easily observed mutations are those affecting a morphological trait These mutations are known as visible mutations and are recognized by their ability to alter a normal or wild-type visible phenotype For example, all of Mendel’s pea characteristics and many genetic variations encountered in Drosophila fit this designation, since they cause obvious changes to the morphology of the organism Some mutations give rise to nutritional or biochemical effects In bacteria and fungi, a typical nutritional mutation results in a loss of ability to synthesize an amino acid or vitamin In humans, sickle-cell anemia and hemophilia are examples of diseases resulting from biochemical mutations Although such mutations not always affect morphological characters, they affect the function of proteins that can impinge on the well-being and survival of the affected individual Still another category consists of mutations that affect the behavior patterns of an organism The primary effect of behavioral mutations is often difficult to analyze For example, the mating behavior of a fruit fly may be impaired if it cannot beat its wings However, the defect may be in the flight muscles, the nerves leading to them, or the brain, where the nerve impulses that initiate wing movements originate Another group of mutations—regulatory mutations— affect the regulation of gene expression A mutation in a regulatory gene or a gene control region can disrupt normal regulatory processes and inappropriately activate or inactivate expression of a gene For example, as we will see with G ene M utations A re Cl assi fied in Various Way s 275 the lac operon discussed later in the text (see Chapter 15), a regulatory gene produces a product that controls the transcription of the entire lac operon Mutations within this regulatory gene can lead to the production of a regulatory protein with abnormal effects on the lac operon Our knowledge of genetic regulation has been dependent on the study of such regulatory mutations Regulatory mutations may also occur in regions such as splice junctions, promoters, or other regulatory regions of a gene that affect many aspects of gene regulation including transcription initiation, mRNA splicing, and mRNA stability It is also possible that a mutation may adversely affect a gene product that is essential to the survival of the organism In this case, it is referred to as a lethal mutation Various inherited human biochemical disorders are examples of lethal mutations For example, Tay–Sachs disease and Huntington disease are caused by mutations that result in lethality, but at different points in the life cycle of humans Another interesting class of mutations exerts effects on the organism in ways that depend on the environment in which the organism finds itself Such mutations are called conditional mutations because they are present in the genome of an organism but can be detected only under certain conditions Among the best examples of conditional mutations are temperature-sensitive mutations At a “permissive” temperature, the mutant gene product functions normally, but it loses its function at a different, “restrictive” temperature Therefore, when the organism is shifted from the permissive to the restrictive temperature, the effect of the mutation becomes apparent The temperature-sensitive coat color variations in Siamese cats and Himalayan rabbits, discussed earlier in the text (see Chapter 4), are striking examples of the effects of conditional mutations A neutral mutation is a mutation that can occur either in a protein-coding region or in any part of the genome, and its effect on the genetic fitness of the organism is negligible For example, a neutral mutation within a gene may change a lysine codon (AAA) to an arginine codon (AGA) The two amino acids are chemically similar; therefore, this change may be insignificant to the function of the protein Because eukaryotic genomes consist mainly of noncoding regions, the vast majority of mutations are likely to occur in the large portions of the genome that not contain genes These may be considered neutral mutations, if they not affect gene products or gene expression Classification Based on Location of Mutation Mutations may be classified according to the cell type or chromosomal locations in which they occur Somatic mutations are those occurring in any cell in the body 276 14 GEN E MUTATION, D NA REPAIR , AND TRANS POS ITION except germ cells Autosomal mutations are mutations within genes located on the autosomes, whereas X-linked and Y-linked mutations are those within genes located on the X or Y chromosome, respectively Mutations arising in somatic cells are not transmitted to future generations When a recessive autosomal mutation occurs in a somatic cell of a diploid organism, it is unlikely to result in a detectable phenotype The expression of most such mutations is likely to be masked by expression of the wild-type allele within that cell Somatic mutations will have a greater impact if they are dominant or, in males, if they are X-linked, since such mutations are most likely to be immediately expressed Similarly, the impact of dominant or X-linked somatic mutations will be more noticeable if they occur early in development, when a small number of undifferentiated cells replicate to give rise to several differentiated tissues or organs Dominant mutations that occur in cells of adult tissues are often masked by the activity of thousands upon thousands of nonmutant cells in the same tissue that perform the nonmutant function Mutations in germ cells are of greater significance because they may be transmitted to offspring as gametes They have the potential of being expressed in all cells of an offspring Inherited dominant autosomal mutations will be expressed phenotypically in the first generation X-linked recessive mutations arising in the gametes of a homogametic female may be expressed in hemizygous male offspring This will occur provided that the male offspring receives the affected X chromosome Because of heterozygosity, the occurrence of an autosomal recessive mutation in the gametes of either males or females (even one resulting in a lethal allele) may go unnoticed for many generations, until the resultant allele has become widespread in the population Usually, the new allele will become evident only when a chance mating brings two copies of it together into the homozygous condition Spontaneous and Induced Mutations Mutations can be classified as either spontaneous or induced, although these two categories overlap to some degree Spontaneous mutations are changes in the nucleotide sequence of genes that appear to occur naturally No specific agents are associated with their occurrence, and they are generally assumed to be accidental Many of these mutations arise as a result of normal biological or chemical processes in the organism that alter the structure of nitrogenous bases Often, spontaneous mutations occur during the enzymatic process of DNA replication, as we discuss later in this chapter In contrast to spontaneous mutations, mutations that result from the influence of extraneous factors are considered to be induced mutations Induced mutations may be the result of either natural or artificial agents For example, radiation from cosmic and mineral sources and ultraviolet radiation from the sun are energy sources to which most organisms are exposed and, as such, may be factors that cause induced mutations The earliest demonstration of the artificial induction of mutations occurred in 1927, when Hermann J Muller reported that X rays could cause mutations in Drosophila In 1928, Lewis J Stadler reported that X rays had the same effect on barley In addition to various forms of radiation, numerous natural and synthetic chemical agents are also mutagenic Several generalizations can be made regarding spontaneous mutation rates in organisms The mutation rate is defined as the likelihood that a gene will undergo a mutation in a single generation or in forming a single gamete First, the rate of spontaneous mutation is exceedingly low for all organisms Second, the rate varies between different organisms Third, even within the same species, the spontaneous mutation rate varies from gene to gene Viral and bacterial genes undergo spontaneous mutation at an average of about in 100 million (10-8) replications or cell divisions Maize and Drosophila demonstrate rates several orders of magnitude higher The genes studied in these groups average between in 1,000,000 (10-6) and in 100,000 (10-5) mutations per gamete formed Some mouse genes are another order of magnitude higher in their spontaneous mutation rate, in 100,000 to in 10,000 (10-5 to 10-4) It is not clear why such large variations occur in mutation rates The variation between genes in a given organism may be due to inherent differences in mutability in different regions of the genome Some DNA sequences appear to be highly susceptible to mutation and are known as mutation hot spots The variation between organisms may, in part, reflect the relative efficiencies of their DNA proofreading and repair systems We will discuss these systems later in the chapter ESSEN T IAL PO IN T Mutations can be spontaneous or induced, somatic or germ-line, autosomal or X-linked They can have many different effects on gene function, depending on the type of nucleotide changes that comprise the mutation Phenotypic effects can range from neutral or silent to loss of function or gain of function to lethality 14–1 If one spontaneous mutation occurs within a human egg cell genome, and this mutation changes an A to a T, what is the most likely effect of this mutation on the phenotype of an offspring that develops from this mutated egg? H I NT: This problem asks you to predict the effects of a single base-pair mutation on phenotype The key to its solution involves an understanding of the organization of the human genome as well as the effects of mutations on coding and noncoding regions of genes, and the effects of mutations on development 14.2 S pontaneous M utations A rise f rom R ep lication E rrors and Base M odifications 14.2 Spontaneous Mutations Arise from Replication Errors and Base Modifications In this section, we will outline some of the processes that lead to spontaneous mutations It is useful to keep in mind, however, that many of the DNA changes that occur during spontaneous mutagenesis also occur, at a higher rate, during induced mutagenesis DNA Replication Errors and Slippage As we learned earlier in the text (see Chapter 10), the process of DNA replication is imperfect Occasionally, DNA polymerases insert incorrect nucleotides during replication of a strand of DNA Although DNA polymerases can correct most of these replication errors using their inherent 3′ to 5′ exonuclease proofreading capacity, misincorporated nucleotides may persist after replication If these errors are not detected and corrected by DNA repair mechanisms, they may lead to mutations Replication errors due to mispairing predominantly lead to point mutations The fact that bases can take several forms, known as tautomers, increases the chance of mispairing during DNA replication, as we explain next In addition to mispairing and point mutations, DNA replication can lead to the introduction of small insertions or deletions These mutations can occur when one strand of the DNA template loops out and becomes displaced during replication, or when DNA polymerase slips 277 or stutters during replication If a loop occurs in the template strand during replication, DNA polymerase may miss the looped-out nucleotides, and a small deletion in the new strand will be introduced If DNA polymerase repeatedly introduces nucleotides that are not present in the template strand, an insertion of one or more nucleotides will occur, creating an unpaired loop on the newly synthesized strand Insertions and deletions may lead to frameshift mutations, or amino acid insertions or deletions in the gene product Replication slippage can occur anywhere in the DNA but seems distinctly more common in regions containing tandemly repeated sequences Repeat sequences are hot spots for DNA mutation and in some cases contribute to hereditary diseases, such as fragile-X syndrome and Huntington disease The hypermutability of repeat sequences in noncoding regions of the genome is the basis for current methods of forensic DNA analysis Tautomeric Shifts Purines and pyrimidines can exist in tautomeric forms— that is, in alternate chemical forms that differ by only a single proton shift in the molecule The biologically important tautomers are the keto–enol forms of thymine and guanine and the amino–imino forms of cytosine and adenine These shifts change the bonding structure of the molecule, allowing hydrogen bonding with noncomplementary bases Hence, tautomeric shifts may lead to permanent base-pair changes and mutations Figure 14–2 compares (a) Standard base-pairing arrangements H H CH3 C H O N N C C N C N H H C N C C C N H H C C H H O Adenine (amino) C H H C N N C N H O Thymine (keto) C N C N N C N N O C C N H H Cytosine (amino) Guanine (keto) (b) Anomalous base-pairing arrangements CH3 C H O O N C C C N N H H N C C C O Thymine (enol) H C H H C N N N H N C H H N N C C C N N H Guanine (keto) H H N C C C O Cytosine (imino) C H H C N N Adenine (amino) FIGUR E 14–2 Standard base-pairing relationships (a) compared with examples of the anomalous basepairing that occurs as a result of tautomeric shifts (b) The long triangle indicates the point at which the base bonds to the pentose sugar 278 14 GEN E MUTATION, D NA REPAIR , AND TRANS POS ITION A T T A G C C G A T A T No tautomeric shift G depurination and deamination Depurination is the loss of one of the nitrogenous bases in an intact double-helical DNA molecule Most frequently, the base is a purine—either guanine or adenine These bases may be lost if the glycosidic bond linking the 1′-C of the deoxyribose and the number position of the purine ring is broken, leaving an apurinic site on one strand of the DNA Geneticists estimate that thousands of such spontaneous lesions are formed daily in the DNA of mammalian cells in culture If apurinic sites are not repaired, there will be no base at that position to act as a template during DNA replication As a result, DNA polymerase may introduce a nucleotide at random at that site In deamination, an amino group in cytosine or adenine is converted to a keto group (Figure 14–4) In these cases, cytosine is converted to uracil, and adenine is changed to hypoxanthine The major effect of these changes is an alteration in the base-pairing specificities of these two bases during DNA replication For example, cytosine normally pairs with guanine Following its conversion to uracil, which pairs with adenine, the original G ‚ C pair is converted to an A “ U pair and then, in the next replication, is converted to an A “ T pair When adenine is deaminated, the original A “ T pair is converted to a G ‚ C pair because hypoxanthine pairs naturally with cytosine Deamination may occur spontaneously or as a result of treatment with chemical mutagens such as nitrous acid (HNO2) Tautomeric shift to imino form C C G Semiconservative replication A T T A G C C G Anomalous C=A base pair formed A T C A G C C G A T Tautomer No mutation A C G Tautomeric shift back to C amino form C G Semiconservative replication A T C G G C C G Transition mutation A T T A G C C G Oxidative Damage DNA may also suffer damage from the by-products of normal cellular processes These by-products include H Formation of an A “ T to G ‚ C transition mutation as a result of a tautomeric shift in adenine F I G U RE – N H normal base-pairing relationships with rare unorthodox pairings Anomalous T ‚ G and C “ A pairs, among others, may be formed A mutation occurs during DNA replication when a transiently formed tautomer in the template strand pairs with a noncomplementary base In the next round of replication, the “mismatched” members of the base pair are separated, and each becomes the template for its normal complementary base The end result is a point mutation (Figure 14–3) Some of the most common causes of spontaneous mutations are two forms of DNA base damage: H N C N O C H N H N H Uracil Adenine H H C N H N N C C C N N N C C O C C N C H N C N N O Cytosine H C C C H H C N O N C H H N C C C N N C H Adenine Depurination and Deamination H C C H H C H Hypoxanthine H C C N C N O Cytosine FIGUR E 14–4 Deamination of cytosine and adenine, leading to new base pairing and mutation Cytosine is converted to uracil, which basepairs with adenine Adenine is converted to hypoxanthine, which basepairs with cytosine 14.3 I nduced M utations A rise f rom DNA Damage C aused by Chemical s and R adiation reactive oxygen species (electrophilic oxidants) that are generated during normal aerobic respiration For example, superoxides (O2-), hydroxyl radicals (·OH), and hydrogen peroxide (H2O2) are created during cellular metabolism and are constant threats to the integrity of DNA Such reactive oxidants, also generated by exposure to high-energy radiation, can produce more than 100 different types of chemical modifications in DNA, including modifications to bases, loss of bases, and singlestranded breaks ES S E NT I A L PO I N T Spontaneous mutations occur in many ways, ranging from errors during DNA replication to changes in DNA base pairing caused by tautomeric shifts, depurinations, deaminations, and reactive oxidant damage 14–2 One of the most famous cases of an X-linked recessive mutation in humans is that of hemophilia found in the descendants of Britain’s Queen Victoria The pedigree of the royal family indicates that Victoria was heterozygous for the trait; however, her father was not affected, and there is no evidence that her mother was a carrier What are some possible explanations of how the mutation arose? What types of mutations could lead to the disease? HINT: This problem asks you to determine the sources of new mutations The key to its solution is to consider the ways in which mutations occur, the types of cells in which they can occur, and how they are inherited 14.3 Induced Mutations Arise from DNA Damage Caused by Chemicals and Radiation Induced mutations are those that increase the rate of mutation above the spontaneous background All cells on Earth are exposed to a plethora of agents called mutagens, which have the potential to damage DNA and cause induced mutations Some of these agents, such as some fungal toxins, cosmic rays, and ultraviolet light, are natural components of our environment Others, including some industrial pollutants, medical X rays, and chemicals within tobacco smoke, can be considered as unnatural or humanmade additions to our modern world On the positive side, geneticists harness some mutagens for use in analyzing genes and gene functions The mechanisms by which some 279 of these natural and unnatural agents lead to mutations are outlined in this section Base Analogs One category of mutagenic chemicals is base analogs, compounds that can substitute for purines or pyrimidines during nucleic acid biosynthesis For example, 5-bromouracil (5-BU), a derivative of uracil, behaves as a thymine analog but is halogenated at the number position of the pyrimidine ring If 5-BU is chemically linked to deoxyribose, the nucleoside analog bromodeoxyuridine (BrdU) is formed Figure 14–5 compares the structure of this analog with that of thymine The presence of the bromine atom in place of the methyl group increases the probability that a tautomeric shift will occur If 5-BU is incorporated into DNA in place of thymine and a tautomeric shift to the enol form occurs, 5-BU base-pairs with guanine After one round of replication, an A “ T to G ‚ C transition results Furthermore, the presence of 5-BU within DNA increases the sensitivity of the molecule to ultraviolet (UV) light, which itself is mutagenic There are other base analogs that are mutagenic For example, 2-amino purine (2-AP) can act as an analog of adenine In addition to its base-pairing affinity with thymine, 2-AP can also base-pair with cytosine, leading to possible transitions from A “ T to G ‚ C following replication Alkylating, Intercalating, and Adduct-Forming Agents A number of naturally occurring and human-made chemicals alter the structure of DNA and cause mutations The sulfur-containing mustard gases, discovered during World War I, were some of the first chemical mutagens identified in chemical warfare studies Mustard gases are alkylating agents—that is, they donate an alkyl group, such as CH3 or CH3CH2, to amino or keto groups in nucleotides Ethylmethane sulfonate (EMS), for example, alkylates the keto groups in the number position of guanine and in the number position of thymine As with base analogs, base-pairing affinities are altered, and transition mutations result For example, 6-ethylguanine acts as an analog of adenine and pairs with thymine (Figure 14–6) Intercalating agents are chemicals that have dimensions and shapes that allow them to wedge between the base pairs of DNA When bound between base pairs, intercalating agents cause base pairs to distort and DNA strands to unwind These changes in DNA structure affect many functions including transcription, replication, and repair Deletions and insertions occur during DNA replication and repair, leading to frameshift mutations 14 280 CH3 Br O C H GEN E MUTATION, D NA REPAIR , AND TRANS POS ITION C C6 N C C 3N H H C N H H C C C N N O 5-Bromouracil (keto form) Thymine OH C C N O Br O C O 5-Bromouracil (enol form) high temperatures from amino acids and creatine Many HCAs covalently bind to guanine bases At least 17 different HCAs have been linked to the development of cancers, such as those of the stomach, colon, and breast Ultraviolet Light H All electromagnetic radiation consists of energetic waves that we define by their C different wavelengths (Figure 14–7) C C C C The full range of wavelengths is referred N H C C N H N to as the electromagnetic spectrum, N C C N and the energy of any radiation in the spectrum varies inversely with its O H wavelength Waves in the range of vis5-BU (keto form) Adenine ible light and longer are benign when they interact with most organic moleH Br H O O N cules However, waves of shorter length C C C C C than visible light, being inherently more energetic, have the potential to H C C N N H N disrupt organic molecules As we know, N N C C purines and pyrimidines absorb ultraO H N violet (UV) radiation most intensely at a wavelength of about 260 nm H Although Earth’s ozone layer absorbs 5-BU (enol form) Guanine the most dangerous types of UV radiaF I G U RE – Similarity of the chemical structure of 5-bromouracil (5-BU) and tion, sufficient UV radiation can induce thymine In the common keto form, 5-BU base-pairs normally with adenine, behaving thousands of DNA lesions per hour in as a thymine analog In the rare enol form, it pairs anomalously with guanine any cell exposed to this radiation One major effect of UV radiation on DNA is the creation of pyrimidine dimers— chemical species consisting of two identical pyrimidines— Another group of chemicals that cause mutations are particularly ones consisting of two thymine residues known as adduct-forming agents A DNA adduct is a sub(Figure 14–8) The dimers distort the DNA conformation stance that covalently binds to DNA, altering its conformation and inhibit normal replication As a result, errors can be and interfering with replication and repair Two examples of introduced in the base sequence of DNA during replicaadduct-forming substances are acetaldehyde (a component of tion When UV-induced dimerization is extensive, it is cigarette smoke) and heterocyclic amines (HCAs) HCAs are responsible (at least in part) for the killing effects of UV cancer-causing chemicals that are created during the cooking radiation on cells of meats such as beef, chicken, and fish HCAs are formed at Br O H N H N C2H5 H C N O N C C C N C N H EMS N C CH3 O O N C C N C N C NH2 Guanine H H H N 6-Ethylguanine N C C C H C N O Thymine H FIGUR E 14–6 Conversion of guanine to 6-ethylguanine by the alkylating agent ethylmethane sulfonate (EMS) The 6-ethylguanine base-pairs with thymine 14.4 750 nm 700 nm Radio waves 103 m Visible spectrum (wavelength) 650 nm 600 nm 550 nm 500 nm Microwaves 109 nm (1 m) S ing le-G ene M utations C ause a Wide Range of Human D iseases Infrared 106 nm X rays UV 103 nm 450 nm nm 380 nm 281 FIGUR E 14 –7 The regions of the electromagnetic spectrum and their associated wavelengths Gamma Cosmic rays rays 10-3 nm 10-5 nm Decreasing wavelength Increasing energy Ionizing Radiation As noted above, the energy of radiation varies inversely with wavelength Therefore, X rays, gamma rays, and cosmic rays are more energetic than UV radiation (Figure 14–7) As a result, they penetrate deeply into tissues, causing ionization of the molecules encountered along the way Hence, this type of radiation is called ionizing radiation As ionizing radiation penetrates cells, stable molecules and atoms are transformed into free radicals—chemical species containing one or more unpaired electrons Free radicals can directly or indirectly affect the genetic material, altering purines and pyrimidines in DNA, breaking phosphodiester bonds, disrupting the integrity of chromosomes, and producing a variety of chromosomal aberrations, such as deletions, translocations, and chromosomal fragmentation Research has shown that the relationship between ionizing radiation dose and mutation rate is linear For each doubling of the dose, twice as many mutations are induced ESSEN T IAL PO IN T Mutations can be induced by many types of chemicals and radiation These agents can damage both DNA bases and the sugar-phosphate backbones of DNA molecules 14–3 The cancer drug melphalan is an alkylating agent of the mustard gas family It acts in two ways: by causing alkylation of guanine bases and by cross linking DNA strands together Describe two ways in which melphalan might kill cancer cells What are two ways in which cancer cells could repair the DNA-damaging effects of melphalan? H I NT: This problem asks you to consider the effect of the alkyla- tion of guanine on base pairing during DNA replication The key to its solution is to consider the effects of mutations on cellular processes that allow cells to grow and divide In Section 14.6, you will learn about the ways in which cells repair the types of mutations introduced by alkylating agents UV C C C N N C C C C N C N C C C N C C N N C N C 14.4 Single-Gene Mutations Cause a Wide Range of Human Diseases Dimer formed between adjacent thymidine residues along a DNA strand F I G U RE – Induction of a thymine dimer by UV radiation, leading to distortion of the DNA The covalent crosslinks occur between the atoms of the pyrimidine ring Although most human genetic diseases are polygenic— that is, caused by variations in several genes—even a single base-pair change in one of the approximately 20,000 human genes can lead to a serious inherited disorder These monogenic diseases can be caused by many different types 282 14 TA B L E GEN E MUTATION, D NA REPAIR , AND TRANS POS ITION Examples of Human Disorders Caused by Single-Gene Mutations Type of DNA Mutation Disorder Molecular Change Missense Nonsense Insertion Deletion Achondroplasia Marfan syndrome Familial hypercholesterolemia Cystic fibrosis Trinucleotide repeat expansions Huntington disease Glycine to arginine at position 380 of FGFR2 gene Tyrosine to STOP codon at position 2113 of fibrillin-1 gene Various short insertions throughout the LDLR gene Three-base-pair deletion of phenylalanine codon at position 508 of CFTR gene More than 40 repeats of (CAG) sequence in coding region of Huntingtin gene of single-gene mutations Table 14.1 lists some examples of the types of single-gene mutations that can lead to serious genetic diseases A comprehensive database of human genes, mutations, and disorders is available in the Online Mendelian Inheritance in Man (OMIM) database, which is described in the “Exploring Genomics” feature earlier in the text (see Chapter 3) As of 2015, the OMIM database has catalogued more than 4400 human phenotypes for which the molecular basis is known Geneticists estimate that approximately 30 percent of mutations that cause human diseases are single base-pair changes that create nonsense mutations These mutations not only code for a prematurely terminated protein product, but also trigger rapid decay of the mRNA Many more mutations are missense mutations that alter the amino acid sequence of a protein and frameshift mutations that alter the protein sequence and create internal nonsense codons Other common disease-associated mutations affect the sequences of gene promoters, mRNA splicing signals, and other noncoding sequences that affect transcription, processing, and stability of mRNA or protein One recent study showed that about 15 percent of all point mutations that cause human genetic diseases result in abnormal mRNA splicing Approximately 85 percent of these splicing mutations alter the sequence of 5′ and 3′ splice signals The remainder create new splice sites within the gene Splicing defects often result in degradation of the abnormal mRNA or creation of abnormal protein products Another type of single-gene mutation is caused by expansions of trinucleotide repeat sequences—specific short DNA sequences repeated many times Normal individuals have a low number of repetitions of these sequences; however, individuals with over 20 different human disorders appear to have abnormally large numbers of repeat sequences—in some cases, over 200—within and surrounding specific genes Examples of diseases associated with these trinucleotide repeat expansions are fragile-X syndrome (discussed in Chapter 6), myotonic dystrophy, and Huntington disease (discussed in Chapter 4) When trinucleotide repeats such as (CAG)n occur within a coding region, they can be translated into long tracks of glutamine These glutamine tracks may cause the proteins to aggregate abnormally When the repeats occur outside coding regions, but within the mRNA, it is thought that the mRNAs may act as “toxic” RNAs that bind to important regulatory proteins, sequestering them away from their normal functions in the cell Another possible consequence of long trinucleotide repeats is that the regions of DNA containing the repeats may become abnormally methylated, leading to silencing of gene transcription The mechanisms by which the repeated sequences expand from generation to generation are of great interest It is thought that expansion may result from errors during either DNA replication or DNA damage repair Whatever the cause may be, the presence of these short and unstable repeat sequences seems to be prevalent in humans and in many other organisms 14.5 Organisms Use DNA Repair Systems to Detect and Correct Mutations Living systems have evolved a variety of elaborate repair systems that counteract both spontaneous and induced DNA damage These DNA repair systems are absolutely essential to the maintenance of the genetic integrity of organisms and, as such, to the survival of organisms on Earth The balance between mutation and repair results in the observed mutation rates of individual genes and organisms In addition, DNA repair systems correct the genetic damage that would otherwise result in human genetic diseases and cancer The link between defective DNA repair and cancer susceptibility is described in detail later in the text (see Chapter 16) We now embark on a review of some systems of DNA repair, with the emphasis on the major approaches that organisms use to counteract genetic damage 14.5 O rganisms U se DNA R epair S y stems to D etect and Correct M utations Proofreading and Mismatch Repair Some of the most common types of mutations arise during DNA replication when an incorrect nucleotide is inserted by DNA polymerase The major DNA synthesizing enzyme in bacteria (DNA polymerase III) makes an error approximately once every 100,000 insertions, leading to an error rate of 10-5 Fortunately, DNA polymerase proofreads each step, catching 99 percent of those errors If an incorrect nucleotide is inserted during polymerization, the enzyme can recognize the error and “reverse” its direction It then behaves as a 3′ to 5′ exonuclease, cutting out the incorrect nucleotide and replacing it with the correct one This improves the efficiency of replication 100-fold, creating only mismatch in every 107 insertions, for a final error rate of 10-7 To deal with errors such as base–base mismatches, small insertions, and deletions that remain after proofreading, another mechanism, called mismatch repair, may be activated During mismatch repair, the mismatches are detected, the incorrect nucleotide is removed, and the correct nucleotide is inserted in its place But how does the repair system recognize which nucleotide is correct (on the template strand) and which nucleotide is incorrect (on the newly synthesized strand)? If the mismatch is recognized but no such discrimination occurs, the excision will be random, and the strand bearing the correct base will be clipped out 50 percent of the time Hence, strand discrimination is a critical step The process of strand discrimination has been elucidated in some bacteria, including E coli, and is based on DNA methylation These bacteria contain an enzyme, adenine methylase, which recognizes the DNA sequence 5′¬GATC¬3′ 3′¬CTAG¬5′ as a substrate, adding a methyl group to each of the adenine residues during DNA replication Following replication, the newly synthesized DNA strand remains temporarily unmethylated, as the adenine methylase lags behind the DNA polymerase Prior to methylation, the repair enzyme recognizes the mismatch and binds to the unmethylated (newly synthesized) DNA strand An endonuclease enzyme creates a nick in the backbone of the unmethylated DNA strand, either 5′ or 3′ to the mismatch An exonuclease unwinds and degrades the nicked DNA strand, until the region of the mismatch is reached Finally, DNA polymerase fills in the gap created by the exonuclease, using the correct DNA strand as a template DNA ligase then seals the gap A series of E coli gene products, MutH, MutL, and MutS, as well as exonucleases, DNA polymerase III and ligase, are involved in mismatch repair Mutations in the 283 MutH, MutL, and MutS genes result in bacterial strains deficient in mismatch repair While the preceding mechanism occurs in E coli, similar mechanisms involving homologous proteins exist in yeast and in mammals In humans, mutations in genes that code for DNA mismatch repair proteins (such as the hMSH2 and hMLH1, which are the human equivalents of the MutS and MutL genes of E coli) are associated with the hereditary nonpolyposis colon cancer Mismatch repair defects are commonly found in other cancers, such as leukemias, lymphomas, and tumors of the ovary, prostate, and endometrium Cells from these cancers show genome-wide increases in the rate of spontaneous mutation The link between defective mismatch repair and cancer is supported by experiments with mice Mice that are engineered to have deficiencies in mismatch repair genes accumulate large numbers of mutations and are cancer-prone Postreplication Repair and the SOS Repair System Another DNA repair system, called postreplication repair, responds after damaged DNA has escaped repair and has failed to be completely replicated As illustrated in Figure 14–9, when DNA bearing a lesion of some sort (such as a pyrimidine dimer) is being replicated, DNA polymerase may stall at the lesion and then skip over it, leaving an unreplicated gap on the newly synthesized strand To correct the gap, the RecA protein directs a recombinational exchange with the corresponding region on the undamaged parental strand of the same polarity (the “donor” strand) When the undamaged segment of the donor strand DNA replaces the gapped segment, a gap is created on the donor strand The gap can be filled by repair synthesis as replication proceeds Because a recombinational event is involved in this type of DNA repair, it is considered to be a form of homologous recombination repair Still another repair pathway, the E coli SOS repair system, also responds to damaged DNA, but in a different way In the presence of a large number of unrepaired DNA mismatches and gaps, bacteria can induce the expression of about 20 genes (including lexA, recA, and uvr) whose products allow DNA replication to occur even in the presence of these lesions This type of repair is a last resort to minimize DNA damage, hence its name During SOS repair, DNA synthesis becomes error-prone, inserting random and possibly incorrect nucleotides in places that would normally stall DNA replication As a result, SOS repair itself becomes mutagenic—although it may allow the cell to survive DNA damage that would otherwise kill it 284 14 GEN E MUTATION, D NA REPAIR , AND TRANS POS ITION Postreplication repair T T AA Lesion Complementary region DNA unwound prior to replication Base and Nucleotide Excision Repair T T AA Replication skips over lesion and continues T T Recombined complement AA Undamaged complementary region of parental strand is recombined New gap formed photoreactivation repair is not absolutely essential in E coli; we know this because a mutation creating a null allele in the gene coding for PRE is not lethal Nonetheless, the enzyme is detectable in many organisms, including bacteria, fungi, plants, and some vertebrates—though not in humans Humans and other organisms that lack photoreactivation repair must rely on other repair mechanisms to reverse the effects of UV radiation A number of light-independent DNA repair systems exist in all prokaryotes and eukaryotes The basic mechanisms involved in these types of repair—collectively referred to as excision repair or cut-and-paste mechanisms—consist of the following three steps The distortion or error present on one of the two strands of the DNA helix is recognized and enzymatically clipped out by an endonuclease Excisions in the phosphodiester backbone usually include a number of nucleotides adjacent to the error as well, leaving a gap on one strand of the helix A DNA polymerase fills in the gap by inserting nucleotides complementary to those on the intact strand, which T T Photoreactivation repair AA AA New gap is filled by DNA polymerase and DNA ligase Postreplication repair occurs if DNA replication has skipped over a lesion such as a thymine dimer Through the process of recombination, the correct complementary sequence is recruited from the parental strand and inserted into the gap opposite the lesion The new gap is filled by DNA polymerase and DNA ligase TT 5‘ 3‘ AA F I G U RE – Photoreactivation Repair: Reversal of UV Damage As illustrated in Figure 14–8, UV light is mutagenic as a result of the creation of pyrimidine dimers UV-induced damage to E coli DNA can be partially reversed if, following irradiation, the cells are exposed briefly to light in the blue range of the visible spectrum The process is dependent on the activity of a protein called photoreactivation enzyme (PRE) The enzyme’s mode of action is to cleave the bonds between thymine dimers, thus directly reversing the effect of UV radiation on DNA (Figure 14–10) Although the enzyme will associate with a thymine dimer in the dark, it must absorb a photon of light to cleave the dimer In spite of its ability to reduce the number of UV-induced mutations, DNA is damaged Dimer forms T T A PRE Blue light A T T A A lesion Dimer repaired Normal pairing restored TT AA FIGUR E 14–10 Damaged DNA repaired by photoreactivation repair The bond creating the thymine dimer is cleaved by the photoreactivation enzyme (PRE), which must be activated by blue light in the visible spectrum 14.5 O rganisms U se DNA R epair S y stems to D etect and Correct M utations it uses as a replicative template The enzyme adds these nucleotides to the free 3′-OH end of the clipped DNA In E coli, this step is usually performed by DNA polymerase I DNA ligase seals the final “nick” that remains at the 3′-OH end of the last nucleotide inserted, closing the gap There are two types of excision repair: base excision repair and nucleotide excision repair Base excision repair (BER) corrects DNA that contains a damaged DNA base The first step in the BER pathway in E coli involves the recognition of the altered base by an enzyme called DNA glycosylase There are a number of DNA glycosylases, each of which recognizes a specific base (Figure 14–11) For example, the enzyme uracil DNA glycosylase recognizes the presence of uracil in DNA DNA glycosylases first cut the glycosidic bond between the base and the sugar, creating an apyrimidinic or apurinic site The sugar with the missing base is then recognized by an enzyme called AP endonuclease The AP endonuclease makes a cut in the phosphodiester backbone at the apyrimidinic or apurinic site Endonucleases then remove the deoxyribose sugar, and the gap is filled by DNA polymerase and DNA ligase Although much has been learned about the mechanisms of BER in E coli, BER systems have also been 285 detected in eukaryotes from yeast to humans Experimental evidence shows that both mouse and human cells that are defective in BER activity are hypersensitive to the killing effects of gamma rays and oxidizing agents Nucleotide excision repair (NER) pathways repair “bulky” lesions in DNA that alter or distort the double helix These lesions include the UV-induced pyrimidine dimers and DNA adducts discussed previously The NER pathway (Figure 14–12) was first discovered in E coli by Paul Howard-Flanders and coworkers, who isolated several independent mutants that are sensitive to UV radiation One group of genes was designated uvr (ultraviolet repair) and included the uvrA, uvrB, and uvrC mutations In the NER pathway, the uvr gene products are involved in recognizing and clipping out lesions in the DNA Usually, a specific number of nucleotides is clipped out around both sides of the lesion In E coli, usually a total of 13 nucleotides is removed, including the lesion The repair is then completed by DNA polymerase I and DNA ligase, in a manner similar to that occurring in BER The undamaged strand opposite the lesion is used as a template for the replication, resulting in repair Nucleotide excision repair 5‘ Base excision repair 5‘ 3‘ 3‘ 5‘ 3‘ DNA is damaged ACUAGT Duplex with U–G mismatch AC Uracil DNA glycosylase recognizes and excises incorrect base uvr gene products T GGT C A AC AGT AP endonuclease recognizes lesion and nicks DNA strand Gap is filled T GGT C A DNA polymerase I 5‘ ACCAGT Mismatch repaired 3‘ Nuclease excises lesion AGT DNA polymerase and DNA ligase fill gap 5‘ Lesion T GGT C A U 5‘ 3‘ 3‘ Gap is sealed; normal pairing is restored DNA ligase T GGT C A F I G U RE – 1 Base excision repair (BER) accomplished by uracil DNA glycosylase, AP endonuclease, DNA polymerase, and DNA ligase Uracil is recognized as a noncomplementary base, excised, and replaced with the complementary base (C) FIGUR E 14–12 Nucleotide excision repair (NER) of a UVinduced thymine dimer During repair, 13 nucleotides are excised in prokaryotes, and 28 nucleotides are excised in eukaryotes 286 14 GEN E MUTATION, D NA REPAIR , AND TRANS POS ITION Nucleotide Excision Repair and Xeroderma Pigmentosum in Humans The mechanism of NER in eukaryotes is much more complicated than that in prokaryotes and involves many more proteins, encoded by about 30 genes Much of what is known about the system in humans has come from detailed studies of individuals with xeroderma pigmentosum (XP), a rare recessive genetic disorder that predisposes individuals to severe skin abnormalities, skin cancers, and a wide range of other symptoms including developmental and neurological defects Patients with XP are extremely sensitive to UV radiation in sunlight In addition, they have a 2000-fold higher rate of cancer, particularly skin cancer, than the general population The condition is severe and may be lethal, although early detection and protection from sunlight can arrest it (Figure 14–13) The repair of UV-induced lesions in XP has been investigated in vitro, using human fibroblast cell cultures derived from normal individuals and those with XP (Fibroblasts are undifferentiated connective tissue cells.) The results of these studies suggest that the XP phenotype is caused by defects in NER pathways and by mutations in more than one gene In 1968, James Cleaver showed that cells from XP patients were deficient in DNA synthesis other than that occurring during chromosome replication—a phenomenon known as unscheduled DNA synthesis Unscheduled DNA synthesis is elicited in normal cells by UV radiation Because this type of synthesis is thought to represent the activity of DNA polymerization during NER, the lack of unscheduled DNA synthesis in XP patients suggested that XP may be a deficiency in NER The involvement of multiple genes in NER and XP was further investigated by studies using somatic cell hybridization Fibroblast cells from any two unrelated XP patients, when grown together in tissue culture, can fuse together, forming heterokaryons A heterokaryon is a single cell with two nuclei from different organisms but a common cytoplasm NER in the heterokaryon can be measured by the level of unscheduled DNA synthesis If the mutation in each of the two XP cells occurs in the same gene, the heterokaryon, like the cells that fused to form it, will still be unable to undergo NER This is because there is no normal copy of the relevant gene present in the heterokaryon However, if NER does occur in the heterokaryon, the mutations in the two XP cells must have been present in two different genes Hence, the two mutants are said to demonstrate complementation, a concept also discussed earlier in the text (see Chapter 4) Complementation occurs because the heterokaryon has at least one normal copy of each gene in the fused cell By fusing XP cells from a large number of XP patients, researchers were able to determine how many genes contribute to the XP phenotype Based on these and other studies, XP patients were divided into seven complementation groups, indicating that at least seven different genes are involved in nucleotide excision repair in humans A gene representing each of these complementation groups, XPA to XPG (Xeroderma Pigmentosum gene A to G), has now been identified, and a homologous gene for each has been identified in yeast Approximately 20 percent of XP patients not fall into any of the seven complementation groups Cells from most of these patients have mutations in the gene coding for DNA polymerase h and are defective in repair DNA synthesis As a result of the study of defective genes in XP, a great deal is now known about how NER counteracts DNA damage in normal cells The first step in humans is recognition of the damaged DNA by proteins encoded by the XPC, XPE, and XPA genes These proteins then recruit the remainder of the repair proteins to the site of DNA damage The XPB and XPD genes encode helicases, and the XPF and XPG genes encode nucleases The excision repair complex containing these and other factors is responsible for the excision of an approximately 28-nucleotide-long fragment from the DNA strand that contains the lesion Double-Strand Break Repair in Eukaryotes F I G U RE – Two individuals with xeroderma pigmentosum These XP patients show characteristic XP skin lesions induced by sunlight, as well as mottled redness (erythema) and irregular pigment changes to the skin, in response to cellular injury Thus far, we have discussed repair pathways that deal with damage or errors within one strand of DNA We conclude our discussion of DNA repair by considering what happens when both strands of the DNA helix are cleaved—as a result of exposure to ionizing radiation, for example These 14.5 O rganisms U se DNA R epair S y stems to D etect and Correct M utations 287 Double-stranded break types of damage are extremely dangerous to cells, leading to chromosome rearrangements, cancer, or cell death In this section, we will discuss double3‘ 5‘ strand breaks in eukaryotic cells 5‘ 3‘ Specialized forms of DNA repair, the DNA double-strand break repair (DSB repair) pathBreak detected and 5‘-ends digested ways, are activated and are responsible for reat3‘ 5‘ taching two broken DNA strands Recently, interest in DSB repair has grown because defects in these 5‘ 3‘ pathways are associated with X-ray hypersensitiv3‘-end invades homologous ity and immune deficiency Such defects may also region of sister chromatid underlie familial disposition to breast and ovarian 3‘ 5‘ cancer Several human disease syndromes, such as 3‘ 5‘ 3‘ Fanconi anemia and ataxia telangiectasia, result Sister from defects in DSB repair chromatids One pathway involved in double-strand break 3‘ 5‘ 3‘ repair is homologous recombination repair The 5‘ 3‘ first step in this process involves the activity of an enzyme that recognizes the double-strand break, DNA synthesis across damaged region and then digests back the 5′-ends of the broken DNA 3‘ 5‘ helix, leaving overhanging 3′-ends (Figure 14–14) One overhanging end searches for a region of 3‘ 5‘ sequence complementarity on the sister chromatid and then invades the homologous DNA duplex, 5‘ 3‘ aligning the complementary sequences Once 5‘ 3‘ aligned, DNA synthesis proceeds from the 3′ overHeteroduplex resolved and hanging ends, using the undamaged DNA strands gaps filled by DNA synthesis as templates The interaction of two sister chromaand ligation tids is necessary because, when both strands of one 3‘ 5‘ helix are broken, there is no undamaged parental 5‘ 3‘ DNA strand available to use as a source of the com3‘ 5‘ plementary template DNA sequence during repair After DNA repair synthesis, the resulting heterodu5‘ 3‘ plex molecule is resolved and the two chromatids FIGUR E 14–14 Steps in homologous recombination repair of doubleseparate stranded breaks DSB repair usually occurs during the late S or early G2 phase of the cell cycle, after DNA replicadouble-strand break, the wrong ends could be joined together, tion, a time when sister chromatids are available to be used leading to abnormal chromosome structures, such as those as repair templates Because an undamaged template is used discussed earlier in the text (see Chapter 6) during repair synthesis, homologous recombination repair is an accurate process A second pathway, called nonhomologous end joining, also repairs double-strand breaks However, as the name 14–4 Geneticists often use ethylmethane sulfonate (EMS) implies, the mechanism does not recruit a homologous to induce mutations in Drosophila Why is EMS a mutagen region of DNA during repair This system is activated in G1, of choice for genetic research? What would be the effects prior to DNA replication End joining involves a complex of EMS in a strain of Drosophila lacking functional mismatch of many proteins, and may include the DNA-dependent repair systems? protein kinase and the breast cancer susceptibility gene HINT: This problem asks you to evaluate EMS as a useful product, BRCA1 These and other proteins bind to the free mutagen and to determine its effects in the absence of DNA repair ends of the broken DNA, trim the ends, and ligate them The key to its solution is to consider the chemical effects of EMS back together Because some nucleotide sequences are on DNA Also, consider the types of DNA repair that may operate lost in the process of end joining, it is an error-prone repair on EMS-mutated DNA and the efficiency of these processes system In addition, if more than one chromosome suffers a 288 14 GEN E MUTATION, D NA REPAIR , AND TRANS POS ITION ESS EN T I A L P O IN T Organisms counteract mutations by using a range of DNA repair systems Errors in DNA synthesis can be repaired by proofreading, mismatch repair, and postreplication repair DNA damage can be repaired by photoreactivation repair, SOS repair, base excision repair, nucleotide excision repair, and double-strand break repair 14.6 The Ames Test Is Used to Assess the Mutagenicity of Compounds There is great concern about the possible mutagenic properties of any chemical that enters the human body, whether through the skin, the digestive system, or the respiratory tract Examples of synthetic chemicals that concern us are those found in air and water pollution, food preservatives, artificial sweeteners, herbicides, pesticides, and pharmaceutical products Mutagenicity can be tested in various organisms, including fungi, plants, and cultured mammalian cells; however, one of the most common tests, which we describe here, uses bacteria The Ames test uses a number of different strains of the bacterium Salmonella typhimurium that have been selected for their ability to reveal the presence of specific types of mutations For example, some strains are used to detect base-pair substitutions, and other strains detect various frameshift mutations Each strain contains a mutation in one of the genes of the histidine operon The mutant strains are unable to synthesize histidine (his− strains) and therefore require histidine for growth (Figure 14–15) The assay measures the frequency of reverse mutations that occur within the mutant gene, yielding wild-type bacteria (his+ revertants) These Salmonella strains also have an increased sensitivity to mutagens due to the presence of mutations in genes involved in both DNA damage repair and the synthesis of the lipopolysaccharide barrier that coats bacteria and protects them from external substances Many substances entering the human body are relatively innocuous until activated metabolically, usually in the liver, to more chemically reactive products Thus, the Ames test includes a step in which the test compound is incubated in vitro in the presence of a mammalian liver extract Alternatively, test compounds may be injected into a mouse where they are modified by liver enzymes and then recovered for use in the Ames test In the initial use of Ames testing in the 1970s, a large number of known carcinogens, or cancer-causing agents, were examined, and more than 80 percent of these were shown to be strong mutagens This is not surprising, as the transformation of cells to the malignant state occurs as a result of mutations For example, more than 60 compounds found in cigarette smoke test positive in the Ames his- auxotrophs plus liver enzymes Potential mutagen plus liver enzymes Add mixture to filter paper disk Spread bacteria on agar medium without histidine Place disk on surface of medium Incubate at 37°C Spontaneous his+ his+ revertants revertants (control) induced by mutagen FIGUR E 14–15 The Ames test, which screens compounds for potential mutagenicity test and cause cancer in animal tests Although a positive response in the Ames test does not prove that a compound is carcinogenic, the Ames test is useful as a preliminary screening device The test is used extensively during the development of industrial and pharmaceutical chemical compounds 14.7 Transposable Elements Move within the Genome and May Create Mutations Transposable elements, also known as transposons or “jumping genes,” can move or transpose within and between chromosomes, inserting themselves into various locations within the genome Transposable elements are present in the genomes of all organisms from bacteria to humans Not only are they ubiquitous, but they also comprise large portions of some eukaryotic genomes For example, almost 50 percent of the human genome is derived from transposable elements 14.7 T ransposabl e E l ements M ove wit h in t h e G enome and M ay Create M utations 289 IS Some organisms with unusually large genomes, such as salamanders and barley, contain hundreds of Terminal sequence Internal sequence thousands of copies of various types of transposable A T C CG CGG A T elements Although the function of these elements is 5‘ 3‘ unknown, data from human genome sequencing sug3‘ 5‘ gest that some genes may have evolved from transT A GGC GC C T A posable elements and that the presence of these eleInverted terminal sequence ments may help to modify and reshape the genome Transposable elements are also valuable tools in FIGUR E 14–16 An insertion sequence (IS), shown in purple The genetic research Geneticists harness transposons as terminal sequences are perfect inverted repeats of one another mutagens, as cloning tags, and as vehicles for introducing foreign DNA into model organisms In this section, we discuss transposable elements as segments of DNA that have the same nucleotide sequence naturally occurring mutagens The movement of transas each other but are oriented in the opposite direction posable elements from one place in the genome to another (Figure 14–16) Although Figure 14–16 shows the ITRs to has the capacity to disrupt genes and cause mutations, consist of only a few nucleotides, IS ITRs usually contain as well as to create chromosomal damage such as doubleabout 20 to 40 nucleotide pairs ITRs are essential for transstrand breaks position and act as recognition sites for the binding of the Insertion Sequences and Bacterial Transposons There are two types of transposable elements in bacteria: insertion sequences and bacterial transposons Insertion sequences (IS elements) can move from one location to another and, if they insert into a gene or gene-regulatory region, may cause mutations IS elements were first identified during analyses of mutations in the gal operon of E coli Researchers discovered that certain mutations in this operon were due to the presence of several hundred base pairs of extra DNA inserted into the beginning of the operon Surprisingly, the segment of mutagenic DNA could spontaneously excise from this location, restoring wild-type function to the gal operon Subsequent research revealed that several other DNA elements could behave in a similar fashion, inserting into bacterial chromosomes and affecting gene function IS elements are relatively short, not exceeding 2000 bp (2 kb) The first insertion sequence to be characterized in E coli, IS1, is about 800 bp long Other IS elements such as IS2, 3, 4, and are about 1250 to 1400 bp in length IS elements are present in multiple copies in bacterial genomes For example, the E coli chromosome contains five to eight copies of IS1, five copies each of IS2 and IS3, as well as copies of IS elements on plasmids such as F factors All IS elements contain two features that are essential for their movement First, they contain a gene that encodes an enzyme called transposase This enzyme is responsible for making staggered cuts in chromosomal DNA, into which the IS element can insert Second, the ends of IS elements contain inverted terminal repeats (ITRs) ITRs are short transposase enzyme Bacterial transposons (Tn elements) are larger than IS elements and contain protein-coding genes that are unrelated to their transposition Some Tn elements, such as Tn10, are composed of a drug-resistance gene flanked by two IS elements present in opposite orientations The IS elements encode the transposase enzyme that is necessary for transposition of the Tn element Other types of Tn elements, such as Tn3, have shorter inverted repeat sequences at their ends and encode their transposase enzyme from a transposase gene located in the middle of the Tn element Like IS elements, Tn elements are mobile in both bacterial chromosomes and in plasmids and can cause mutations if they insert into genes or gene-regulatory regions Tn elements are currently of interest because they can introduce multiple drug resistance onto bacterial plasmids These plasmids, called R factors, may contain many Tn elements conferring simultaneous resistance to heavy metals, antibiotics, and other drugs These elements can move from plasmids onto bacterial chromosomes and can spread multiple drug resistance between different strains of bacteria The Ac–Ds System in Maize About 20 years before the discovery of transposons in bacteria, Barbara McClintock discovered mobile genetic elements in corn plants (maize) She did this by analyzing the genetic behavior of two mutations, Dissociation (Ds) and Activator (Ac), expressed in either the endosperm or aleurone layers She then correlated her genetic observations with cytological examinations of the maize 290 14 GEN E MUTATION, D NA REPAIR , AND TRANS POS ITION chromosomes Initially, McClintock determined that Ds was located on chromosome If Ac was also present in the genome, Ds induced breakage at a point on the chromosome adjacent to its own location If chromosome breakage occurred in somatic cells during their development, progeny cells often lost part of the broken chromosome, causing a variety of phenotypic effects The chapter-opening photo illustrates the types of phenotypic effects caused by Ds mutations in kernels of corn Subsequent analysis suggested to McClintock that both Ds and Ac elements sometimes moved to new chromosomal locations While Ds moved only if Ac was also present, Ac was capable of autonomous movement Where Ds came to reside determined its genetic effects—that is, it might cause chromosome breakage, or it might inhibit expression of a certain gene In cells in which Ds caused a gene mutation, Ds might move again, restoring the gene mutation to wild type Figure 14–17 illustrates the types of movements and effects brought about by Ds and Ac elements In McClintock’s original observation, pigment synthesis was restored in cells in which the Ds element jumped out of chromosome McClintock concluded that the Ds and Ac genes were mobile controlling elements We now commonly refer to them as transposable elements, a term coined by another great maize geneticist, Alexander Brink Several Ac and Ds elements have now been analyzed, and the relationship between the two elements has been clarified The first Ds element studied (Ds9) is nearly identical to Ac except for a 194-bp deletion within the transposase gene The deletion of part of the transposase gene in the Ds9 element explains its dependence on the Ac element for transposition Several other Ds elements have also been sequenced, and each contains an even larger deletion within the transposase gene In each case, however, the ITRs are retained Although the significance of Barbara McClintock’s mobile controlling elements was not fully appreciated following her initial observations, molecular analysis has since verified her conclusions She was awarded the Nobel Prize in Physiology or Medicine in 1983 Copia and P Elements in Drosophila There are more than 30 families of transposable elements in Drosophila, each of which is present in 20 to 50 copies in the genome Together, these families constitute about percent of the Drosophila genome and over half of the middle repetitive DNA of this organism One study suggests that 50 percent of all visible mutations in Drosophila are the result of the insertion of transposons into otherwise wild-type genes (a) In absence of Ac, Ds is not transposable Wild-type expression of W occurs Ds W (b) When Ac is present, Ds may be transposed Ac Ds W Ac is present Ds is transposed Ac Ds W Chromosome breaks and fragment is lost W expression ceases, producing mutant effect Ac Ds W (c) Ds can move into and out of another gene Ac Ds W Ds is transposed into W gene W expression is inhibited, producing mutant effect W Ds Ac Ds “jumps” out of W gene Wild-type expression of W is restored Ac Ds W FIGUR E 14–17 Effects of Ac and Ds elements on gene expression (a) If Ds is present in the absence of Ac, there is normal expression of a distantly located hypothetical gene W (b) In the presence of Ac, Ds may transpose to a region adjacent to W Ds can induce chromosome breakage, which may lead to loss of a chromosome fragment bearing the W gene (c) In the presence of Ac, Ds may transpose into the W gene, disrupting W-gene expression If Ds subsequently transposes out of the W gene, W-gene expression may return to normal In 1975, David Hogness and his colleagues David Finnegan, Gerald Rubin, and Michael Young identified a class of DNA elements in Drosophila melanogaster that they designated as copia These elements are transcribed into “copious” amounts of RNA (hence their name) Copia elements are present in 10 to 100 copies in the genomes of Drosophila cells Mapping studies show that they are transposable to different chromosomal locations and are dispersed throughout the genome Each copia element consists of approximately 5000 to 8000 bp of DNA, including a long direct terminal repeat 14.7 T ransposabl e E l ements M ove wit h in t h e G enome and M ay Create M utations copia gene (5000 bp) DTR (267 bp) DTR ITR (17 bp) ITR F I G U RE – Structural organization of a copia transposable element in Drosophila melanogaster, showing the terminal repeats (DTR) sequence of 267 bp at each end Within each DTR is an inverted terminal repeat (ITR) of 17 bp (Figure 14–18) Insertion of copia is dependent on the presence of the ITR sequences and seems to occur preferentially at specific target sites in the genome The copia-like elements demonstrate regulatory effects at the point of their insertion in the chromosome Certain mutations, including those affecting eye color and segment formation, are due to copia insertions within genes For example, the eye-color mutation white-apricot is caused by an allele of the white gene, which contains a copia element within the gene Transposition of the copia element out of the white-apricot allele can restore the allele to wild type Perhaps the most significant Drosophila transposable elements are the P elements These were discovered while studying the phenomenon of hybrid dysgenesis, a condition characterized by sterility, elevated mutation rates, and chromosome rearrangements in the offspring of crosses between certain strains of fruit flies Hybrid dysgenesis is caused by high rates of P element transposition in the germ line, in which transposons insert themselves into or near genes, thereby causing mutations P elements range from 0.5 to 2.9 kb long, with 31-bp ITRs Full-length P elements encode at least two proteins, one of which is the transposase enzyme that is required for transposition, and another is a repressor protein that inhibits transposition The transposase gene is expressed only in the germ line, accounting for the tissue specificity of P element transposition Strains of flies that contain full-length P elements inserted into their genomes are resistant to further transpositions due to the presence of the repressor protein encoded by the P elements Mutations can arise from several kinds of insertional events If a P element inserts into the coding region of a gene, it can terminate transcription of the gene and destroy normal gene expression If it inserts into the promoter region of a gene, it can affect the level of expression of the gene Insertions into introns can affect splicing or cause the premature termination of transcription Geneticists have harnessed P elements as tools for genetic analysis One of the most useful applications of P elements is 291 as vectors to introduce transgenes into Drosophila—a technique known as germ-line transformation P elements are also used to generate mutations and to clone mutant genes In addition, researchers are perfecting methods to target P element insertions to precise single-chromosomal sites, which should increase the precision of germ-line transformation in the analysis of gene activity Transposable Elements in Humans The human genome, like that of other eukaryotes, is riddled with DNA derived from transposons Recent genomic sequencing data reveal that approximately half of the human genome is composed of transposable element DNA As discussed earlier in the text (see Chapter 11), the major families of human transposable elements are the long interspersed elements and short interspersed elements (LINEs and SINEs, respectively) Together, they comprise over 30 percent of the human genome Although most human transposable elements appear to be inactive, the potential mobility and mutagenic effects of these elements have far-reaching implications for human genetics, as can be seen in a recent example of a transposable element “caught in the act.” The case involves a male child with hemophilia One cause of hemophilia is a defect in blood-clotting factor VIII, the product of an X-linked gene Haig Kazazian and his colleagues found LINEs inserted at two points within the gene Researchers were interested in determining if one of the mother’s X chromosomes also contained this specific LINE If so, the unaffected mother would be heterozygous and pass the LINEcontaining chromosome to her son The surprising finding was that the LINE sequence was not present on either of her X chromosomes but was detected on chromosome 22 of both parents This suggests that this mobile element may have transposed from one chromosome to another in the gamete-forming cells of the mother, prior to being transmitted to the son LINE insertions into the human dystrophin gene have resulted in at least two separate cases of Duchenne muscular dystrophy In one case, a LINE inserted into exon 48, and in another case, it inserted into exon 44, both leading to frameshift mutations and premature termination of translation of the dystrophin protein There are also reports that LINEs have inserted into the APC and c-myc genes, leading to mutations that may have contributed to the development of some colon and breast cancers In the latter cases, the transposition had occurred within one or a few somatic cells As of 2012, researchers have determined that at least 25 LINE element insertions have resulted in single-gene human diseases SINE insertions are also responsible for more than 30 cases of human disease In one case, an Alu element 292 14 GEN E MUTATION, D NA REPAIR , AND TRANS POS ITION integrated into the BRCA2 gene, inactivating this tumor suppressor gene and leading to a familial case of breast cancer Other genes that have been mutated by Alu integrations are the factor IX gene (leading to hemophilia B), the ChE gene (leading to acholinesterasemia), and the NF1 gene (leading to neurofibromatosis) Transposons, Mutations, and Evolution Transposons can have a wide range of effects on genes The insertion of a transposon into the coding region of a gene may disrupt the gene’s normal translation reading frame or may induce premature termination of translation of the mRNA transcribed from the gene Many transposable elements contain their own promoters and enhancers, as well as splice sites and polyadenylation signals The presence of these regulatory sequences can have effects on nearby genes The insertion of a transposable element containing polyadenylation or transcription termination signals into a gene’s intron may bring about termination of the gene’s transcription within the element In addition, it can cause aberrant splicing of an RNA transcribed from the gene Insertions of a transposon into a gene’s transcription regulatory region may disrupt the gene’s normal regulation or may cause the gene to be expressed differently as a result of the presence of the transposon’s own promoter or enhancer sequences The presence of two or more identical transposons in a genome creates the potential for recombination between the transposons, leading to duplications, deletions, inversions, or chromosome translocations Any of these rearrangements may bring about phenotypic changes or disease It is thought that about 0.2 percent of detectable human mutations may be due to transposable element insertions Other organisms appear to suffer more damage due to transposition About 10 percent of new mouse mutations and 50 percent of Drosophila mutations are CASE STUDY S caused by insertions of transposable elements in or near genes Because of their ability to alter genes and chromosomes, transposons may contribute to the variability that underlies evolution For example, the Tn elements of bacteria carry antibiotic resistance genes between organisms, conferring a survival advantage to the bacteria under certain conditions Another example of a transposon’s contribution to evolution is provided by Drosophila telomeres LINE-like elements are present at the ends of Drosophila chromosomes, and these elements act as telomeres, maintaining the length of Drosophila chromosomes over successive cell divisions Other examples of evolved transposons are the RAG1 and RAG2 genes in humans These genes encode recombinase enzymes that are essential to the development of the immune system These two genes appear to have evolved from transposons Transposons may also affect the evolution of genomes by altering gene-expression patterns in ways that are subsequently retained by the host For example, the human amylase gene contains an enhancer that causes the gene to be expressed in the parotid gland This enhancer evolved from transposon sequences that were inserted into the generegulatory region early in primate evolution Other examples of gene-expression patterns that were affected by the presence of transposon sequences are T-cell-specific expression of the CD8 gene and placenta-specific expression of the leptin and CYP19 genes ESSEN T IAL PO IN T Transposable elements can move within a genome, creating mutations and altering gene expression Besides creating mutations, transposons may contribute to evolution Geneticists use transposons as research tools to create mutations, clone genes, and introduce genes into model organisms Genetic dwarfism even months pregnant, an expectant mother was undergoing a routine ultrasound While prior tests had been normal, this one showed that the limbs of the fetus were unusually short The doctor suspected that the baby might have a genetic form of dwarfism called achondroplasia He told her that the disorder was due to an autosomal dominant mutation and occurred with a frequency of about in 25,000 births The expectant mother had studied genetics in college and immediately raised several questions How would you answer them? How could her baby have a dominantly inherited disorder if there was no history of this condition on either side of the family? Is the mutation more likely to have come from the mother or the father? If this child has achondroplasia, is there an increased chance that their next child would also have this disorder? Could this disorder have been caused by X rays or ultrasounds she had earlier in pregnancy? PROBLE MS A ND DIS C U S S IO N Q U ES T I O N S 293 INSIGHTS AND SOLUTIONS The base analog 2-amino purine (2-AP) substitutes for adenine during DNA replication, but it may base-pair with cytosine The base analog 5-bromouracil (5-BU) substitutes for thymidine, but it may base-pair with guanine Follow the double-stranded trinucleotide sequence shown here through three rounds of replication, assuming that, in the first round, both analogs are present and become incorporated wherever possible Before the second and third round of replication, any unincorporated base analogs are removed What final sequences occur? Solution: 5-BU substitutes for T A T G C T A A 5BU 2AP T G C G C T 2AP 5BU A Solution: Only four cases represent a new mutation Because each live birth represents two gametes, the sample size is from 80,000 meiotic events The rate is equal to 4>80,000 = 1>20,000 = * 10-5 We have assumed that the mutant gene is fully penetrant and is expressed in each individual bearing it If it is not fully penetrant, our calculation may be an underestimate because one or more mutations may have gone undetected We have also assumed that the screening was 100 percent accurate One or more mutant individuals may have been “missed,” again leading to an underestimate Finally, we assumed that the viability of the mutant and nonmutant individuals is equivalent and that they survive equally in utero Therefore, our assumption is that the number of mutant individuals at birth is equal to the number at conception If this were not true, our calculation would again be an underestimate 2-AP substitutes for A Round I A rare dominant mutation expressed at birth was studied in humans Records showed that six cases were discovered in 40,000 live births Family histories revealed that in two cases, the mutation was already present in one of the parents Calculate the spontaneous mutation rate for this mutation What are some underlying assumptions that may affect our conclusions? Consider the following estimates: a. There are * 109 humans living on this planet b. Each individual has about 20,000 (0.2 * 105) genes c. The average mutation rate at each locus is 10-5 Round II A T G 5BU 2AP C A T G C G C G C G C T A C 2AP 5BU G T A C G C C and G C G C G Round III G C G G C and G C G C 5BU 2AP C G 2AP 5BU Problems and Discussion Questions HOW DO WE KNOW ? In this chapter, we focused on how gene mutations arise and how cells repair DNA damage In particular, we discussed spontaneous and induced mutations, DNA repair methods, and transposable elements Based on your knowledge of these topics, answer several fundamental questions: (a) How we know that mutations occur spontaneously? (b) How we know that certain chemicals and wavelengths of radiation induce mutations in DNA? How many spontaneous mutations are currently present in the human population? Assuming that these mutations are equally distributed among all genes, how many new mutations have arisen in each gene in the human population? Solution: First, since each individual is diploid, there are two copies of each gene per person, each arising from a separate gamete Therefore, the total number of spontaneous mutations is (2 * 0.2 * 105 genes) * (7 * 109 humans) * (10-5 mutations) = (0.4 * 105) * (7 * 109) * (10 - 5) mutations = 2.8 * 109 mutations in the population 2.8 * 109 mutations>0.2 * 105 genes = 14 * 104 mutations per gene in the population Visit for instructor-assigned tutorials and problems (c) How we know that DNA repair mechanisms detect and correct the majority of spontaneous and induced mutations? CONCEPT QUESTION Review the Chapter Concepts list on page 273 These concepts relate to how gene mutations occur, their phenotypic effects, and how mutations can be repaired The first four concepts focus on the effects of gene mutations in diploid organisms Write a short essay describing how these concepts would apply, or not apply, to a haploid organism such as E coli 294 14 GEN E MUTATION, D NA REPAIR , AND TRANS POS ITION Distinguish between spontaneous and induced mutations Give some examples of mutagens that cause induced mutations Why would a mutation in a somatic cell of a multicellular organism escape detection? Why is a random mutation more likely to be deleterious than beneficial? Why are organisms that have a haploid life cycle valuable tools for mutagenesis studies? What is meant by a conditional mutation? Describe a tautomeric shift and how it may lead to a mutation Contrast and compare the mutagenic effects of deaminating agents, alkylating agents, and base analogs 10 Why are frameshift mutations likely to be more detrimental than point mutations, in which a single pyrimidine or purine has been substituted? 11 In which phases of the cell cycle would you expect double-strand break repair and nonhomologous end joining to occur and why? 12 DNA damage brought on by a variety of natural and artificial agents elicits a wide variety of cellular responses In addition to the activation of DNA repair mechanisms, there can be activation of pathways leading to apoptosis (programmed cell death) and cell-cycle arrest Why would apoptosis and cell-cycle arrest often be part of a cellular response to DNA damage? 13 Distinguish between proofreading and mismatch repair 14 How would you expect the misincorporation of bases by a DNA polymerase to change if the relative ratios of the dNTPs were A = T = G but a five-fold excess of C? 15 A chemist has synthesized a novel chemical, which he suspects to be a potential mutagen Name and explain a popular test that can be used to test the mutagenicity of this product in bacteria 16 What genetic defects result in the disorder xeroderma pigmentosum (XP) in humans? How these defects create the phenotypes associated with the disorder? 17 In a bacterial culture in which all cells are unable to synthesize leucine (leu-), a potent mutagen is added, and the cells are allowed to undergo one round of replication At that point, samples are taken, a series of dilutions is made, and the cells are plated on either minimal medium or minimal medium containing leucine The first culture condition (minimal medium) allows the growth of only leu+ cells, while the second culture condition (minimum medium with leucine added) allows the growth of all cells The results of the experiment are as follows: Culture Condition Minimal medium Minimal + leucine Dilution Colonies 10-1 10-7 18 6 What is the rate of mutation at the locus associated with leucine biosynthesis? 18 DNA mismatch repair is a mechanism of DNA repair that has been observed in E coli Give a list of genes in E coli, mutations in which can adversely affect DNA mismatch repair Give a list of equivalent genes in humans 19 A number of different types of mutations in the HBB gene can cause human b-thalassemia, a disease characterized by various levels of anemia Many of these mutations occur within introns or in upstream noncoding sequences Explain why mutations in these regions often lead to severe disease, although they may not directly alter the coding regions of the gene 20 Some mutations that lead to diseases such as Huntington disease are caused by the insertion of trinucleotide repeats Describe how the process of DNA replication could lead to expansions of trinucleotide repeat regions 21 In maize, a Ds or Ac transposon can cause mutations in genes at or near the site of transposon insertion It is possible for these elements to transpose away from their original site, causing a reversion of the mutant phenotype In some cases, however, even more severe phenotypes appear, due to events at or near the mutant allele What might be happening to the transposon or the nearby gene to create more severe mutations? 22 Presented here are hypothetical findings from studies of heterokaryons formed from seven human xeroderma pigmentosum cell strains: XP1 XP2 XP3 XP4 XP5 XP6 XP1 - XP2 - - XP3 - - - XP4 + + + - XP5 + + + + - XP6 + + + + - - XP7 + + + + - - XP7 - Note: “ + ” = complementation; “ - ” = no complementation These data are measurements of the occurrence or nonoccurrence of unscheduled DNA synthesis in the fused heterokaryon None of the strains alone shows any unscheduled DNA synthesis What does unscheduled DNA synthesis represent? Which strains fall into the same complementation groups? How many different groups are revealed based on these data? What can we conclude about the genetic basis of XP from these data? 23 Cystic fibrosis (CF) is a severe autosomal recessive disorder in humans that results from a chloride ion channel defect in epithelial cells More than 500 mutations have been identified in the 24 exons of the responsible gene (CFTR, or cystic fibrosis transmembrane regulator), including dozens of different missense mutations, frameshift mutations, and splice-site defects Although all affected CF individuals demonstrate chronic obstructive lung disease, there is variation in whether or not they exhibit pancreatic enzyme insufficiency (PI) Speculate as to which types of mutations are likely to give rise to less severe symptoms of CF, including only minor PI Some of the 300 sequence alterations that have been detected within the exon regions of the CFTR gene not give rise to cystic fibrosis Taking into account your knowledge of the genetic code, gene expression, protein function, and mutation, describe why this might be so 24 Electrophilic oxidants are known to create the modified base named 7,8-dihydro-8-oxoguanine (oxoG) in DNA Whereas guanine base-pairs with cytosine, oxoG base-pairs with either cytosine or adenine (a) What are the sources of reactive oxidants within cells that cause this type of base alteration? (b) Drawing on your knowledge of nucleotide chemistry, draw the structure of oxoG, and, below it, draw guanine Opposite guanine, draw cytosine, including the hydrogen bonds that allow these two molecules to base-pair Does the structure of oxoG, in contrast to guanine, provide any hint as to why it basepairs with adenine? PROBLE MS AND DIS C U S S IO N Q U ES T I O N S (c) Assume that an unrepaired oxoG lesion is present in the helix of DNA opposite cytosine Predict the type of mutation that will occur following several rounds of replication (d) Which DNA repair mechanisms might work to counteract an oxoG lesion? Which of these is likely to be most effective? 25 Skin cancer carries a lifetime risk nearly equal to that of all other cancers combined Following is a graph (modified from Kraemer, 1997 Proc Natl Acad Sci (USA) 94: 11–14) depicting the age of onset of skin cancers in patients with or without XP, where cumulative percentage of skin cancer is plotted against age The non-XP curve is based on 29,757 cancers surveyed by the National Cancer Institute, and the curve representing those with XP is based on 63 skin cancers from the Xeroderma Pigmentosum Registry (a) Provide an overview of the information contained in the graph (b) Explain why individuals with XP show such an early age of onset Cumulative percentage 100 XP 50 Non-XP 0 20 40 60 Age in years 80 295 26 The initial discovery of IS elements in bacteria revealed the presence of an element upstream (5′) of three genes controlling galactose metabolism All three genes were affected simultaneously, although there was only one IS insertion Offer an explanation as to why this might occur 27 Suppose you are studying a DNA repair system, such as the nucleotide excision repair in vitro By mistake, you add DNA ligase from a tube that has already expired What would be the result? 28 It has been noted that most transposons in humans and other organisms are located in noncoding regions of the genome— regions such as introns, pseudogenes, and stretches of particular types of repetitive DNA There are several ways to interpret this observation Describe two possible interpretations Which interpretation you favor? Why? 29 Two related forms of muscular dystrophy—Duchenne muscular dystrophy (DMD) and Becker muscular dystrophy (BMD)—are both recessive, X-linked, single-gene conditions caused by point mutations, deletions, and insertion in the dystrophin gene Each mutated form of dystrophin is one allele Of the two diseases, DMD is much more severe Given your knowledge of mutations, the genetic code, and translation, propose an explanation for why the two disorders differ greatly in severity 15 Regulation of Gene Expression CHAPTER CONCEPTS ■■ Expression of genetic information is regulated by intricate regulatory mechanisms that control transcription, mRNA stability, translation, and posttranslational modifications ■■ In prokaryotes, genes that encode proteins with related functions tend to be organized in clusters and are often under coordinated control Such clusters, including their associated regulatory sequences, are called operons ■■ Transcription within operons is either inducible or repressible and is often regulated by the metabolic substrate or end product of the pathway ■■ Eukaryotic gene regulation is more complex than prokaryotic gene regulation ■■ The organization of eukaryotic chromatin in the nucleus plays a role in regulating gene expression Chromatin must be remodeled to provide access to regulatory DNA sequences within it ■■ Eukaryotic transcription initiation requires the presence of transcription regulators at enhancer sites and general transcription complexes at promoter sites ■■ Eukaryotic gene expression is also regulated at posttranscriptional steps, including splicing of pre-mRNA, mRNA stability, translation, and posttranslational processing 296 Chromosome territories in an interphase chicken cell nucleus Each chromosome is stained with a different-colored probe I n previous chapters, we described how DNA is organized into genes, how genes store genetic information, and how this information is expressed through the processes of transcription and translation We now consider one of the most fundamental questions in molecular genetics: How is gene expression regulated? It is clear that not all genes are expressed at all times in all situations For example, some proteins in the bacterium E coli are present in as few as to 10 molecules per cell, whereas others, such as ribosomal proteins and the many proteins involved in the glycolytic pathway, are present in as many as 100,000 copies per cell Although many prokaryotic gene products are present continuously at low levels, the level of these products can increase dramatically when required In multicellular eukaryotes, differential gene expression is also essential, not only to allow appropriate and rapid responses to their environments, but also as the basis for embryonic development and adult organ function The activation and repression of gene expression are part of a delicate balancing act for both prokaryotic and eukaryotic organisms Expression of a gene at the wrong time, in the wrong cell type, or in abnormal amounts can lead to a deleterious phenotype, cancer, or cell death—even when the gene itself is normal In this chapter, we will explore the ways in which prokaryotic and eukaryotic organisms regulate gene expression We will describe some of the fundamental components of gene regulation, including the cisacting DNA elements and trans-acting factors that regulate transcription 15.2 L AC TOS E ME TABOLIS M IN E COLI IS RE G U L ATE D BY AN INDUCI BL E S Y S T EM initiation We will then explain how these components interact with each other and with other factors such as activators, repressors, and chromatin proteins We will also consider the roles that posttranscriptional mechanisms play in the regulation of eukaryotic gene expression Please note that some of the topics discussed in this chapter are explored in greater depth later in the text (see Special Topic Chapter 1– Epigenetics and Special Topic Chapter 2—Emerging Roles of RNA.) 297 enzymes In contrast to the inducible system controlling lactose metabolism, the system governing tryptophan expression is said to be repressible Regulation, whether it is inducible or repressible, may be under either negative or positive control Under negative control, gene expression occurs unless it is shut off by some form of a regulator molecule In contrast, under positive control, transcription occurs only if a regulator molecule directly stimulates RNA production In theory, either type of control or a combination of the two can govern inducible or repressible systems 15.1 Prokaryotes Regulate Gene Expression in Response to Both External and Internal Conditions Not only bacteria respond metabolically to changes in their environment, but they also regulate gene expression in order to synthesize products required for a variety of normal cellular activities, including DNA replication, recombination, repair, and cell division In the following sections, we will focus on prokaryotic gene regulation at the level of transcription, which is the predominant level of regulation in prokaryotes Keep in mind, however, that posttranscriptional regulation also occurs in bacteria We will defer discussion of posttranscriptional gene-regulatory mechanisms to subsequent sections dealing with eukaryotic gene expression The idea that microorganisms regulate the synthesis of gene products is not a new one As early as 1900, it was shown that when lactose (a galactose and glucose-containing disaccharide) is present in the growth medium of yeast, the organisms synthesize enzymes required for lactose metabolism When lactose is absent, the enzymes are not manufactured Soon thereafter, investigators were able to generalize that bacteria also adapt to their environment, producing certain enzymes only when specific chemical substrates are present These enzymes are referred to as inducible enzymes, reflecting the role of the substrate, which serves as the inducer in enzyme production In contrast, those enzymes that are produced continuously, regardless of the chemical makeup of the environment, are called constitutive enzymes More recent investigation has revealed a contrasting system whereby the presence of a specific molecule inhibits gene expression This is usually true for molecules that are end products of anabolic biosynthetic pathways For example, the amino acid tryptophan can be synthesized by bacterial cells If a sufficient supply of tryptophan is present in the environment or culture medium, it is energetically inefficient for the organism to synthesize the enzymes necessary for tryptophan production A mechanism has evolved whereby tryptophan plays a role in repressing transcription of genes that encode the appropriate biosynthetic 15.2 Lactose Metabolism in E coli Is Regulated by an Inducible System Beginning in 1946, the studies of Jacques Monod (with later contributions by Joshua Lederberg, Franỗois Jacob, and Andrộ Lwoff ) revealed genetic and biochemical insights into the mechanisms of lactose metabolism in bacteria These studies explained how gene expression is repressed when lactose is absent, but induced when it is available In the presence of lactose, concentrations of the enzymes responsible for lactose metabolism increase rapidly from a few molecules to thousands per cell The enzymes responsible for lactose metabolism are thus inducible, and lactose serves as the inducer In prokaryotes, genes that code for enzymes with related functions (in this case, the genes involved with lactose metabolism) tend to be organized in clusters on the bacterial chromosome In addition, transcription of these genes is often under the coordinated control of a single transcription regulatory region The location of this regulatory region is almost always on the same DNA molecule and upstream of the gene cluster it controls We refer to this type of regulatory region as a cis-acting site Cis-acting regulatory regions bind molecules that control transcription of the gene cluster Such molecules are called trans-acting molecules Actions at the cisacting regulatory site determine whether the genes are transcribed into RNA and thus whether the corresponding enzymes or other protein products are synthesized from the mRNA Binding of a trans-acting molecule at a cisacting site can regulate the gene cluster either negatively (by turning off transcription) or positively (by turning on transcription of genes in the cluster) In this section, we discuss how transcription of such bacterial gene clusters is coordinately regulated ESSEN T IAL PO IN T Research on the lac operon in E coli pioneered our understanding of gene regulation in bacteria 298 15 R egulation o f G ene Ex p ression Regulatory region Repressor gene Structural genes Promoter–Operator b-Galactosidase gene P I O Permease gene Transacetylase gene lacY lacA lacZ lac Operon F I G U RE –1 A simplified overview of the genes and regulatory units involved in the control of lactose etabolism (This region of DNA is not drawn to scale.) A more detailed model is described later m in this chapter Structural Genes As illustrated in Figure 15–1, three genes and an adjacent regulatory region constitute the lactose, or lac, operon Together, the entire gene cluster functions in an integrated fashion to provide a rapid response to the presence or absence of lactose Genes coding for the primary structure of the enzymes are called structural genes There are three structural genes in the lac operon The lacZ gene encodes B-galactosidase, an enzyme whose role is to convert the disaccharide lactose to the monosaccharides glucose and galactose (Figure 15–2) This conversion is essential if lactose is to serve as an energy source in glycolysis The second gene, lacY, encodes the amino acid sequence of permease, an enzyme that facilitates the entry of lactose into the bacterial cell The third gene, lacA, codes for the enzyme transacetylase Although its physiological role is still not completely clear, it may be involved in the removal of toxic by-products of lactose digestion from the cell To study the genes encoding these three enzymes, researchers isolated numerous mutants, each of which eliminated the function of one of the enzymes These mutants were first isolated and studied by Joshua Lederberg Mutant cells that fail to produce active b-galactosidase (lacZ -) or permease (lacY - ) are unable to use lactose as an energy source and are collectively known as lac- mutants Mapping studies by Lederberg established that all three genes are closely linked or contiguous to one another on the bacterial chromosome, in the order Z–Y–A (See Figure 15–1.) All three genes are transcribed as a single unit, resulting in a polycistronic mRNA (Figure 15–3) This results in the coordinated regulation of all three genes, since a single messenger RNA is simultaneously translated into all three gene products Structural genes lacZ CH2OH O OH lacY lacA CH2OH O O OH OH Transcription OH Polycistronic mRNA OH Ribosome OH Moves along mRNA Lactose b-Galactosidase + H2O CH2OH CH2OH O OH Translation O OH + OH OH Proteins OH HO OH Galactose OH Glucose F I G U RE –2 The catabolic conversion of the disaccharide lactose into its monosaccharide units, galactose and glucose b-Galactosidase Permease Transacetylase FIGUR E 15–3 The structural genes of the lac operon are transcribed into a single polycistronic mRNA, which is translated simultaneously by several ribosomes into the three enzymes encoded by the operon 15.2 L AC TOS E CH2OH CH3 O OH ME TABOLIS M IN E COLI IS RE G U L ATE D BY AN INDUCI BL E S Y S T EM S CH CH3 OH OH F I G U RE – The gratuitous inducer isopropylthiogalactoside (IPTG) The Discovery of Regulatory Mutations How does lactose stimulate transcription of the lac operon and induce the synthesis of the related enzymes? A partial answer comes from studies using gratuitous inducers, chemical analogs of lactose such as the sulfur analog isopropylthiogalactoside (IPTG), shown in Figure 15–4 Gratuitous inducers behave like natural inducers, but they not serve as substrates for the enzymes that are subsequently synthesized What, then, is the role of lactose in gene regulation? The answer to this question required the study of a class of mutants called constitutive mutants In cells bearing constitutive mutations, enzymes are produced regardless of the presence or absence of lactose Studies of the constitutive mutation lacI - mapped the mutation to a site on the bacterial chromosome close to, but distinct from, the lacZ, lacY, and lacA genes This mutation defined the lacI gene, which is appropriately called a repressor gene Another set of constitutive mutations that produce identical effects to those of lacI - occurs in a region immediately adjacent to the structural genes This class of mutations, designated lacOC, occurs in the operator region of the operon In both types of constitutive mutants, the enzymes are produced continuously, inducibility is eliminated, and gene regulation is lost 299 Jacob and Monod suggested that the repressor normally binds to the DNA sequence of the operator region When it does so, it inhibits the action of RNA polymerase, effectively repressing the transcription of the structural genes [Figure 15–5(b)] However, when lactose is present, the sugar binds to the repressor molecule and causes an allosteric conformational change This change renders the repressor incapable of interacting with operator DNA [Figure 15–5(c)] In the absence of the repressor–operator interaction, RNA polymerase transcribes the structural genes, and the enzymes necessary for lactose metabolism are produced Because transcription occurs only when the repressor fails to bind to the operator region, regulation is said to be under negative control To summarize, the operon model invokes a series of molecular interactions between proteins, inducers, and DNA to explain the efficient regulation of structural gene expression In the absence of lactose, the enzymes encoded by the genes are not needed, and expression of genes encoding these enzymes is repressed When lactose is present, it indirectly induces the transcription of the structural genes by interacting with the repressor.* If all lactose is metabolized, none is available to bind to the repressor, which is again free to bind to operator DNA and repress transcription Both the I - and O C constitutive mutations interfere with these molecular interactions, allowing continuous transcription of the structural genes In the case of the I mutant, seen in Figure 15–6(a), the repressor protein is altered or absent and cannot bind to the operator region, so the structural genes are always transcribed In the case of the OC mutant [Figure 15–6(b)], the nucleotide sequence of the operator DNA is altered and will not bind with a normal repressor molecule The result is the same: the structural genes are always transcribed Genetic Proof of the Operon Model The Operon Model: Negative Control Around 1960, Jacob and Monod proposed a scheme involving negative control called the operon model, whereby a group of genes is regulated and expressed together as a unit As we saw in Figure 15–1, the lac operon consists of the Z, Y, and A structural genes, as well as the adjacent sequences of DNA referred to as the operator region They argued that the lacI gene regulates the transcription of the structural genes by producing a repressor molecule, and that the repressor is allosteric, meaning that it reversibly interacts with another molecule, causing both a conformational change in the repressor’s three-dimensional shape and a change in its chemical activity Figure 15–5 illustrates the components of the lac operon as well as the action of the lac repressor in the presence and absence of lactose The operon model leads to three major predictions that can be tested to determine its validity The major predictions to be tested are that (1) the I gene produces a diffusible product; (2) the O region is involved in regulation but does not produce a product; and (3) the O region must be adjacent to the structural genes in order to regulate transcription The construction of partially diploid bacteria allows us to assess these assumptions, particularly those that predict trans-acting regulatory molecules For example, as introduced in previously (see Chapter 8), the F plasmid may contain chromosomal genes, in which case it is designated F¿ When an F- cell acquires such a plasmid, it contains its * Technically, allolactose, an isomer of lactose, is the inducer When lactose enters the bacterial cell, some of it is converted to allolactose by the b-galactosidase enzyme 300 15 R egulation o f G ene Ex p ression (a) Components Repressor Promoter gene (I) (P) Operator gene (O) Leader (L) Z Structural genes Y A Operatorbinding site Repressor protein Lactosebinding site RNA polymerase Lactose (b) I + O + Z + Y + A + (wild type) — no lactose present — Repressed I P O L Z Y Repressor binds to operator, blocking transcription A No transcription No enzymes (c) I + O + Z + Y + A + (wild type) — lactose present — Induced I P O L Z No binding occurs; transcription proceeds Y A Transcription Operator-binding region is altered when bound to lactose mRNA Translation Enzymes F I G U R E –5 The components of the wild-type lac operon (a) and the response of the lac operon to the absence (b) and presence (c) of lactose The Leader (L) sequence encodes a short region of mRNA that is 5¿ of the AUG translation start codon and is not translated own chromosome plus one or more additional genes present in the plasmid This creates a host cell, called a merozygote, that is diploid for those genes The use of such a plasmid makes it possible, for example, to introduce an I + gene into a host cell whose genotype is I - or to introduce an O + region into a host cell of genotype OC The Jacob– Monod operon model predicts how regulation should be affected in such cells Adding an I + gene to an I - cell should restore inducibility because the normal wild-type repressor, which is a trans-acting factor, would be produced by the inserted I + gene Adding an O + region to an OC cell should have no effect on constitutive enzyme production, since regulation depends on the presence of an O + region immediately adjacent to the structural genes—that is, O + is a cis-acting regulator The results of these experiments are shown in Table 15.1, where Z represents the structural genes The inserted genes are listed after the designation F¿ In both cases described here, the Jacob–Monod model is upheld (part B of Table 15.1) Part C shows the reverse experiments, where either an I - gene or an OC region is added to cells of normal inducible genotypes As the model predicts, inducibility is maintained in these partial diploids Another prediction of the operon model is that certain mutations in the I gene should have the opposite effect of I - That is, instead of being constitutive because the repressor 15.2 L AC TOS E ME TABOLIS M IN E COLI IS RE G U L ATE D BY AN INDUCI BL E S Y S T EM 301 (a) I - O + Z + Y + A + (mutant repressor gene) — no lactose present — Constitutive I- P O L Z Y A No binding occurs; Transcription transcription proceeds Operator-binding region of repressor is altered Translation mRNA Enzymes (b) I + O c Z + Y + A + (mutant operator gene) — no lactose present — Constitutive P I Oc L Z Nucleotide sequence of operator gene is altered No binding occurs; transcription proceeds mRNA Y A Transcription Translation Enzymes F I G U RE – (a) or the The response of the lac operon in the absence of lactose when a cell bears either the I mutation O C (b) TA B L E A Comparison of Gene Activity (+ or -) in the Presence or Absence of Lactose for Various E coli Genotypes Presence of B-Galactosidase Activity Genotype can’t bind the operator, mutant repressor molecules should be produced that cannot interact with the inducer, lactose As a result, the repressor would always bind to the operator sequence, and the structural genes would be permanently repressed (Figure 15–7) If this were the case, the presence of an additional I + gene would have little or no effect on repression In fact, such a mutation, I S, was discovered wherein the operon is “super-repressed,” as shown in part D of Table 15.1 An additional I + gene does not effectively relieve repression of gene activity These observations are consistent with the idea that the repressor contains separate DNA-binding domains and inducer-binding domains The binding of lactose to the inducer-binding domain causes an allosteric change in the DNA-binding domain Lactose Present Lactose Absent I +O+Z + + - A I +O+Z - - - - + + I O Z + + I +OC Z + + + B I -O+Z + >F′I + + - I +OCZ + >F′O+ + + C I +O+Z + >F′I - + - I +O+Z + >F′OC + - D. I SO+Z + - - ESSEN T IAL POIN T - - Genes involved in the metabolism of lactose are coordinately regulated by a negative control system that responds to the presence or absence of lactose S + + I O Z >F′I + Note: In parts B to D, most genotypes are partially diploid, containing an F factor plus attached genes (F¿) 302 15 R egulation o f G ene Ex p ression IS O + Z + Y + A + (mutant repressor gene) — lactose present — Repressed IS P O L Z Y A Repressor always bound to operator, blocking transcription Lactose-binding region is altered; no binding to lactose F I G U RE –7 The response of the lac operon in the presence of lactose in a cell bearing the I S mutation Isolation of the Repressor Although Jacob and Monod’s operon theory succeeded in explaining many aspects of genetic regulation in prokaryotes, the nature of the repressor molecule was not known when their landmark paper was published in 1961 While they had assumed that the allosteric repressor was a protein, RNA was also a candidate because activity of the molecule required the ability to bind to DNA A single E coli cell contains no more than ten or so molecules of the lac repressor; therefore, direct chemical identification of ten molecules in a population of millions of proteins and RNAs in a single cell presented a tremendous challenge Nevertheless, in 1966, Walter Gilbert and Benno Müller-Hill reported the isolation of the lac repressor Once the repressor was purified, it was shown to have various characteristics of a protein The isolation of the repressor thus confirmed the operon model, which had been put forward strictly on genetic grounds 15–1 The lac Z, Y, and A structural genes are transcribed as a single polycistronic mRNA; however, each structural gene contains its own initiation and termination signals essential for translation Predict what will happen when cells growing in the presence of lactose contain a deletion of one nucleotide (a) early in the Z gene and (b) early in the A gene HINT: This problem requires you to combine your understand- ing of the genetic expression of the lac operon, the genetic code, frameshift mutations, and termination of transcription The key to its solution is to consider the effect of the loss of one nucleotide within a polycistronic mRNA 15.3 The Catabolite-Activating Protein (CAP) Exerts Positive Control over the lac Operon As we discussed previously, the role of b-galactosidase is to cleave lactose into its components, glucose and galactose However, for galactose to be used by the cell, it also must be converted to glucose What if the cell found itself in an environment that contained an ample amount of lactose and glucose? Given that glucose is the preferred carbon source for E coli, it would not be energetically efficient for a cell to induce transcription of the lac operon, make b-galactosidase, and metabolize lactose, since what it really needs— glucose—is already present As we shall see next, a molecule called the catabolite-activating protein (CAP) helps activate expression of the lac operon but is able to inhibit expression when glucose is present This inhibition is called catabolite repression To understand CAP and its role in regulation, let’s backtrack for a moment When the lac repressor is bound to the inducer, RNA polymerase transcribes the lac operon structural genes As stated earlier in the text (see Chapter 12), transcription is initiated as a result of the binding that occurs between RNA polymerase and the nucleotide sequence of the promoter region, found upstream (5¿) from the initial coding sequences Within the lac operon, the promoter is located upstream of the lac operator region (O) (See Figure 15–1.) Careful examination has revealed that RNA polymerase binding is never very efficient unless CAP is also present to facilitate the process The mechanism is summarized in Figure 15–8 In the absence of glucose and under inducible conditions, CAP exerts positive control by binding to the CAP site, facilitating RNA polymerase binding at the lac operon promoter and thus transcription Therefore, for maximal transcription of the structural genes, the repressor must be bound by lactose (so as not to repress lac operon transcription), and CAP must be bound to the CAP-binding site What role does glucose play in inhibiting CAP binding? The answer involves still another molecule, cyclic adenosine monophosphate (cAMP), upon which CAP binding is dependent In order to bind to the lac operon promoter, CAP must be bound to cAMP The level of cAMP is itself dependent on an enzyme, adenyl cyclase, which catalyzes the conversion of ATP to cAMP 15.3 THE C ATABOLITE-AC TIVATING P ROTE IN (C A P ) E X E R T S POS ITIVE CONTROL OVE R THE L AC OP ERO N 303 (a) Glucose absent CAP–cAMP complex binds CAP (Cataboliteactivating protein) + RNA polymerase binds O cAMP As cAMP levels increase, cAMP binds to CAP, causing an allosteric transition CAP-binding site Structural genes Polymerase site Transcription occurs Promoter region Translation occurs (b) Glucose present CAP Glucose RNA polymerase seldom binds CAP cannot bind efficiently O cAMP levels decrease CAP-binding site Structural genes Polymerase site Promoter region Transcription diminished Translation diminished F I G U RE – Catabolite repression (a) In the absence of glucose, cAMP levels increase, resulting in the formation of a cAMP–CAP complex, which binds to the CAP site of the promoter, stimulating transcription (b) In the presence of glucose, cAMP levels decrease, cAMP–CAP complexes are not formed, and transcription is not stimulated The role of glucose in catabolite repression is to inhibit the activity of adenyl cyclase, causing a decline in the level of cAMP in the cell Under this condition, CAP cannot form the cAMP–CAP complex that is essential to the positive control of transcription of the lac operon The structures of CAP and cAMP–CAP have been examined by using X-ray crystallography CAP is a dimer that binds adjacent regions of a specific nucleotide sequence of the DNA making up the lac promoter The cAMP–CAP complex, when bound to DNA, bends the DNA, causing it to assume a new conformation Binding studies in solution further clarify the mechanism of gene activation Alone, neither cAMP–CAP nor RNA polymerase has a strong affinity to bind to lac promoter DNA, nor does either molecule have a strong affinity to bind to the other However, when both are together in the presence of the lac promoter DNA, a tightly bound complex is formed, an example of what is called cooperative binding The control conferred by the cAMP–CAP provides another illustration of how the regulation of one small group of genes can be fine-tuned by several simultaneous influences In contrast to the negative regulation conferred by the lac repressor, the action of cAMP–CAP constitutes positive regulation Thus, a combination of positive and negative regulatory mechanisms determines transcription levels of the lac operon Catabolite repression involving CAP has also been observed for other inducible operons, including those controlling the metabolism of galactose and arabinose 15–2 Predict the level of gene expression of the lac operon, as well as the status of the lac repressor and the CAP protein, when bacterial growth media contain the following sugars: (a) no lactose or glucose, (b) lactose but no glucose, (c) glucose but no lactose, (d) both lactose and glucose HINT: This problem asks you to combine your knowledge of lac operon regulation with your understanding of how catabolite repression affects this regulation The key to its solution is to keep in mind that regulation involving lactose is a negative control system, while regulation involving glucose and catabolite repression is a positive control system 304 15 R egulation o f G ene Ex p ression ES SEN T I A L PO I N T The catabolite-activating protein (CAP) exerts positive control over lac gene expression by interacting with RNA polymerase at the lac promoter and by responding to the levels of cyclic AMP in the bacterial cell 15.4 The Tryptophan (trp) Operon in E coli Is a Repressible Gene System Although inducible gene regulation had been known for some time, it was not until 1953 that Monod and colleagues discovered a repressible system Studies on the biosynthesis of the essential amino acid tryptophan revealed that, if tryptophan is present in sufficient quantity in the growth medium, the enzymes necessary for its synthesis (such as tryptophan synthase) are not produced It is energetically advantageous for bacteria to repress expression of genes involved in tryptophan synthesis when ample tryptophan is present in the growth medium Further investigation showed that enzymes encoded by five contiguous genes on the E coli chromosome are involved in tryptophan synthesis These genes are part of an operon and, in the presence of tryptophan, all are coordinately repressed, and none of the enzymes is produced Because of the great similarity between this repression and the induction of enzymes for lactose metabolism, Jacob and Monod proposed a model of gene regulation resembling that of the lac system (Figure 15–9) The model suggests the presence of a normally inactive repressor that alone cannot interact with the operator region of the operon However, the repressor is an allosteric molecule that can bind to tryptophan When tryptophan is present, the resultant complex of repressor and tryptophan attains a new conformation that binds to the operator, repressing transcription Thus, when tryptophan, the end product of this anabolic pathway, is present, the operon is repressed and enzymes are not made Since the regulatory complex inhibits transcription of the operon, this repressible system is under negative control And as tryptophan participates in repression, it is referred to as a corepressor in this regulatory scheme Evidence for the trp Operon Support for the concept of a repressible operon is based primarily on the isolation of two distinct categories of constitutive mutations The first class, trpR -, maps at a considerable distance from the structural genes This locus represents the gene coding for the repressor Presumably, the mutation either inhibits the interaction of the repressor with tryptophan or inhibits repressor formation entirely Whichever the case, no repression is present in cells with the trpR- mutation As expected, if the trpR + gene encodes a functional repressor molecule, the presence of a copy of this gene will restore repressibility The second constitutive mutant is analogous to the OC mutant of the lactose operon because it maps immediately adjacent to the structural genes Furthermore, the addition of a wild-type operator gene into mutant cells (as an external element) does not restore repression This is predictable if the mutant operator, which must be present in cis, no longer interacts with the repressor–tryptophan complex The entire trp operon has now been well defined, as shown in Figure 15–9 Five contiguous structural genes (trpE, D, C, B, and A) are transcribed as a polycistronic mRNA directing translation of the enzymes that catalyze the biosynthesis of tryptophan As in the lac operon, a promoter region (trpP) represents the binding site for RNA polymerase, and an operator region (trpO) is the binding site for the repressor In the absence of repressor binding, transcription initiates within the overlapping trpP–trpO region and proceeds along a leader sequence 162 nucleotides prior to the first structural gene (trpE) Within that leader sequence, still another regulatory site exists, called an attenuator, which we describe in the next section of this chapter As we shall see, the attenuator is also an integral part of the control mechanism of the operon ESSEN T IAL PO IN T Unlike the inducible lac operon, the trp operon is repressible In the presence of tryptophan, the repressor binds to the regulatory region of the trp operon and represses transcription initiation EVOLVING CONCEPT OF THE GENE The groundbreaking work of Jacob, Monod, and Lwoff in the early 1960s, which established the operon model for the regulation of gene expression in bacteria, expanded the concept of the gene to include noncoding regulatory sequences that are present upstream (5¿) from the coding region In bacterial operons, the transcription of several contiguous structural genes whose products are involved in the same biochemical pathway is regulated by a single set of regulatory sequences 15.5 Alterations to RNA Secondary Structure Also Contribute to Prokaryotic Gene Regulation In the preceding sections of this chapter, we focused on gene regulation brought about by DNA-binding regulatory proteins that interact with promoter and operator 15.5 A lterations to RNA S econdary S tructure Also Contribute to Prokaryotic Gene R egulation (a) Components Promoter Operator trpP trpR 5' trpO P Repressor gene O Attenuator Leader L L A A trpE Regulatory region trpD trpC trpB trpA 3' Structural genes trp Operon Tryptophan binding site Tryptophan (corepressor) Repressor protein (b) Tryptophan absent R P O L A E D C Repressor alone cannot bind to operator B 305 FIGUR E 15–9 (a) The components involved in the regulation of the tryptophan operon (b) Regulatory conditions are depicted that involve either activation or (c) repression of the structural genes In the absence of tryptophan, an inactive repressor is made that cannot bind to the operator (O), thus allowing transcription to proceed In the presence of tryptophan, it binds to the repressor, causing an allosteric transition to occur This complex binds to the operator region, leading to repression of the operon A Transcription proceeds Polycistronic mRNA (c) Tryptophan present R P O L A E D C Repressor-tryptophan complex binds to operator B A Transcription blocked Repressor binds to tryptophan, causing allosteric transition regions of the genes to be regulated These regulatory proteins, such as the lac repressor and the CAP protein, act to decrease or increase transcription initiation from their target promoters by affecting the binding of RNA polymerase to the promoter Gene regulation in prokaryotes can also occur through the interactions of regulatory molecules with specific regions of a nascent mRNA, after transcription has been initiated The binding of these regulatory molecules alters the secondary structure of the mRNA, leading to premature transcription termination or repression of translation We will discuss two types of regulation by RNA secondary structure—attenuation and riboswitches Both types help to fine-tune prokaryotic gene regulation and are used in addition to regulation of transcription initiation Transcription Attenuation Charles Yanofsky, Kevin Bertrand, and their colleagues defined the mechanisms of bacterial attenuation They observed that, when tryptophan is present and the trp operon is repressed, initiation of transcription still occurs at a low level but is subsequently terminated at a point about 140 nucleotides along the transcript They called this process attenuation, as it further diminishes expression of the operon In contrast, when tryptophan is absent or present in very low concentrations, transcription is initiated but is not subsequently terminated, instead continuing beyond the leader sequence into the structural genes The site involved in attenuation is located 115 to 140 nucleotides into the leader sequence and is referred to as the attenuator (See Figure 15–9.) 306 15 R egulation o f G ene Ex p ression Yanofsky and colleagues presented a model to explain how attenuation occurs and is regulated The initial DNA sequence that is transcribed gives rise to an mRNA sequence that has the potential to fold into two mutually exclusive stemloop structures referred to as “hairpins.” In the presence of excess tryptophan, the mRNA hairpin that is formed behaves as a terminator structure, and transcription is almost always terminated prematurely, just beyond the attenuator On the other hand, if tryptophan is scarce, an alternative mRNA hairpin referred to as the antiterminator hairpin is formed Transcription proceeds past the antiterminator hairpin region, and the entire mRNA is subsequently produced A key point in Yanofsky’s model is that the leader transcript must be translated in order for the antiterminator hairpin to form The leader transcript contains two triplets (UGG) that encode tryptophan, and these are present just downstream of the initial AUG sequence that signals the initiation of translation by ribosomes When adequate tryptophan is present, charged tRNATrp is present in the cell As a result, ribosomes translate these UGG triplets, proceed through the attenuator, and allow the terminator hairpin to form The terminator hairpin signals RNA polymerase to prematurely terminate transcription, and the operon is not transcribed If cells are starved of tryptophan, charged tRNATrp is unavailable As a result, ribosomes “stall” during translation of the UGG triplets The presence of ribosomes in this region of the mRNA interferes with the formation of the terminator hairpin, but allows the formation of the antiterminator hairpin within the leader transcript As a result, transcription proceeds, leading to expression of the entire set of structural genes Many other bacterial operons use attenuation to control gene expression These include operons that encode Ligand-binding site enzymes involved in the biosynthesis of amino acids such as threonine, histidine, leucine, and phenylalanine As with the trp operon, attenuation occurs in a leader sequence that contains an attenuator region Riboswitches Since the elucidation of attenuation in the trp operon, numerous cases of gene regulation that also depend on alternative forms of mRNA secondary structure have been documented These involve what are called riboswitches, which are mRNA sequences (or elements) present in the 5¿-untranslated region (5¿-UTR) upstream from the coding sequences These elements are capable of binding with small molecule ligands, such as metabolites, whose synthesis or activity is controlled by the genes encoded by the mRNA Such binding causes a conformational change in one domain of the riboswitch element, which induces another change at a second RNA domain, most often creating a transcription terminator structure This terminator structure interfaces directly with the transcriptional machinery and shuts it down Riboswitches can recognize a broad range of ligands, including amino acids, purines, vitamin cofactors, amino sugars, and metal ions, among others They are widespread in bacteria In Bacillus subtilis, for example, approximately percent of this bacterium’s genes are regulated by riboswitches They are also found in archaea, fungi, and plants, and may be present in animals as well The two important domains within a riboswitch are the ligand-binding site, called the aptamer, and the expression platform, which is capable of forming the terminator structure Figure 15–10 illustrates the principles involved in + Ligand RNA polymerase Antiterminator conformation Ligand binds, inducing conformational changes Terminator conformation F I G U RE –1 Illustration of the mechanism of riboswitch regulation of gene expression, where the default position (left) is in the antiterminator conformation Upon binding by the ligand, the mRNA adopts the terminator conformation (right) Transcription is terminated 15.6 E u karyotic G ene R egulation D i ff ers f rom T hat in Pro k aryotes 307 We will discuss other recent findings involving RNAmediated gene expression later in the text (see Section 15.12 and Special Topic Chapter 2—Emerging Roles of RNA) Chromatin ESSEN T IAL PO IN T Regulation of chromatin remodeling Attenuation and riboswitches regulate gene expression by inducing alterations to mRNA secondary structure, leading to premature termination of transcription DNA Transcription Regulation of transcription Differs from That in Prokaryotes Pre-mRNA (primary transcript) Regulation of splicing and processing Cap Nucleus Nuclear membrane mRNA AAA Regulation of transport Nuclear pore Ribosome Cytoplasm Translation Protein product 15.6 Eukaryotic Gene Regulation Degradation of mRNA Translational regulation Protein modifications F I G U RE – 1 Regulation can occur at any stage in the e xpression of genetic material in eukaryotes All these forms of regulation affect the degree to which a gene is expressed riboswitch control The 5¿-UTR of an mRNA is shown on the left side of the figure in the absence of the ligand (metabolite) RNA polymerase has transcribed the unbound ligand-binding site, and in the default conformation, the expression domain adopts an antiterminator conformation Thus, transcription continues through the expression platform and into the coding region On the right side of the figure, the presence of the ligand on the ligand-binding site induces an alternative conformation in the expression platform, creating the terminator conformation RNA polymerase is effectively blocked and transcription ceases Virtually all cells in a multicellular eukaryotic organism contain a complete genome; however, only a subset of genes is expressed in any particular cell type For example, some white blood cells express genes encoding certain immunoglobulins, allowing these cells to synthesize antibodies that defend the organism from infection and foreign agents However, skin, kidney, and liver cells not express immunoglobulin genes Pancreatic islet cells synthesize and secrete insulin in response to the presence of blood sugars; however, they not manufacture immunoglobulins In addition, they not synthesize insulin when it is not required Eukaryotic cells, as part of multicellular organisms, not grow solely in response to the availability of nutrients Instead, they regulate their growth and division to occur at appropriate places in the body and at appropriate times during development The loss of gene regulation that controls normal cell growth and division may lead to developmental defects or cancer Eukaryotes employ a wide range of mechanisms for altering the expression of genes In contrast to prokaryotic gene regulation, which occurs primarily at the level of transcription initiation, regulation of gene expression in eukaryotes can occur at many different levels These include the initiation of transcription, mRNA modifications and stability, and the synthesis, modification, and stability of the protein product (Figure 15–11) Several features of eukaryotic cells make it possible for them to use more types of gene regulation than are possible in prokaryotic cells: • Eukaryotic cells contain a much greater amount of DNA than prokaryotic cells This DNA is associated with histones and other proteins to form highly compact chromatin structures within an enclosed nucleus Eukaryotic cells modify this structural organization in order to influence gene expression • The mRNAs of most eukaryotic genes must be spliced, capped, and polyadenylated prior to transport from the nucleus Each of these processes can be regulated 308 15 R egulation o f G ene Ex p ression in order to influence the numbers and types of mRNAs available for translation • Genetic information in eukaryotes is carried on many chromosomes (rather than just one), and these chromosomes are enclosed within a double-membranebound nucleus After transcription, transport of RNAs into the cytoplasm can be regulated in order to modulate the availability of mRNAs for translation • Eukaryotic mRNAs can have a wide range of half-lives (t1/2) In contrast, the majority of prokaryotic mRNAs decay very rapidly Rapid turnover of mRNAs allows prokaryotic cells to rapidly respond to environmental changes In eukaryotes, the complement of mRNAs in each cell type can be more subtly manipulated by altering mRNA decay rates over a larger range • In eukaryotes, translation rates can be modulated, as well as the way proteins are processed, modified, and degraded In the following sections, we examine some of the major ways in which eukaryotic gene expression is regulated As most eukaryotic genes are regulated, at least in part, at the transcriptional level, we will emphasize transcriptional control In addition, we will limit our discussion to regulation of genes transcribed by RNA polymerase II As we previously described in Chapter 12, three different RNA polymerases transcribe eukaryotic genes RNA polymerase II transcribes all mRNAs and some small nuclear RNAs, whereas RNA polymerases I and III transcribe ribosomal RNAs, some small nuclear RNAs, and transfer RNAs Transcription by each of these RNA polymerases is regulated differently, with RNA polymerase II having the most diverse and complex mechanisms 15.7 Eukaryotic Gene Expression Is Influenced by Chromatin Modifications Two structural features of eukaryotic genes distinguish them from the genes of prokaryotes First, eukaryotic genes are situated on chromosomes that occupy a distinct location within the cell—the nucleus This sequestering of genetic information in a discrete compartment allows the proteins that directly regulate transcription to be kept apart from those involved with translation and other aspects of cellular metabolism Second, as described earlier in the text (see Chapter 11), eukaryotic DNA is combined with histones and nonhistone proteins to form chromatin Chromatin’s basic structure is characterized by repeating units called nucleosomes that are wound into 30-nm fibers, which in turn form other, even more compact structures The presence of these compact chromatin structures is inhibitory to many processes, including transcription, replication, and DNA repair In this section, we outline some of the ways in which eukaryotic cells modify chromatin in order to regulate gene expression Chromosome Territories and Transcription Factories Recent research has revealed that the interphase nucleus is not a bag of tangled chromosome arms, but has a highly organized structure In the interphase nucleus, each chromosome occupies a discrete domain called a chromosome territory and stays separate from other chromosomes Channels between chromosomes contain little or no DNA and are called interchromosomal domains Transcriptionally active genes appear to be located at the edges of chromosome territories next to interchromosomal domain channels Scientists hypothesize that this organization may bring actively expressed genes into closer association with transcription factors, or with other actively expressed genes, thereby facilitating their coordinated expression Another feature within the nucleus—the transcription factory—may also contribute to regulating gene expression Transcription factories are nuclear sites at which most RNA polymerase II transcription occurs These sites also contain the majority of active RNA polymerase and other transcription factors It is not yet clear whether the formation of transcription factories is a prerequisite or a consequence of transcription initiation; however, by concentrating transcription proteins and actively transcribed genes in specific locations in the nucleus, the cell may enhance the expression of these genes Histone Modifications and Nucleosomal Chromatin Remodeling Chromatin modification is an important step in gene regulation Chromatin modification appears to be a prerequisite for transcription of some eukaryotic genes, although it can occur simultaneously with transcription of other genes Chromatin can be modified in two general ways The first involves changes to nucleosomes, and the second involves modifications to DNA In this subsection, we will discuss changes to the nucleosomal component of chromatin In the next subsection, we present DNA modifications, specifically DNA methylation The tight association of DNA with nucleosomes and other chromatin-binding proteins inhibits access of the DNA to the proteins involved in many functions, including transcription This inhibitory structure is often referred to as “closed” chromatin Before transcription can be initiated 15.7 E u karyotic G ene E xp ression I s I nf luenced by Chromatin M odifications within nucleosomal chromatin, the structure of chromatin must become “open” to transcription regulatory factors and enzymes such as RNA polymerases Nucleosomal chromatin can be modified in three ways The first involves changes in nucleosome composition that can affect gene transcription For example, most nucleosomes contain the normal histone H2A Some gene promoter regions may be flanked by nucleosomes containing variant histones, such as H2A.Z The presence of the H2A.Z variant within these nucleosomes affects nucleosome mobility and positioning on DNA As a result, a gene promoter associated with these variant nucleosomes may be either transcriptionally activated or repressed, depending on the nucleosome position A second mechanism of chromatin alteration involves histone modification Histone modification involves the covalent bonding of functional groups onto the N-terminal tails of histone proteins The most common histone modifications are the addition of acetyl, methyl, or phosphate groups onto the basic amino acids of histone tails Acetylation decreases the positive charge on histones, resulting in a reduced affinity of the histone for DNA In turn, this may assist the formation of open chromatin conformations, which would allow the binding of transcription regulatory proteins to DNA Histone acetylation is catalyzed by histone acetyltransferase enzymes (HATs) In some cases, HATs are recruited to genes by the presence of certain transcription activator proteins that bind to transcription regulatory regions In other cases, transcription activator proteins themselves have HAT activity Of course, what can be opened can also be closed In that case, histone deacetylases (HDACs) remove acetyl groups from histone tails HDACs can be recruited to genes by the presence of certain repressor proteins on regulatory regions The third mechanism of chromatin alteration is chromatin remodeling, which involves the repositioning or removal of nucleosomes on DNA, brought about by chromatin remodeling complexes Chromatin remodeling complexes are large multi-subunit complexes that use the energy of ATP hydrolysis to move and rearrange nucleosomes along the DNA Repositioned nucleosomes make regions of the chromosome accessible to transcription regulatory proteins, such as transcription activators and RNA polymerase II One of the best-studied remodeling complexes is the SWI/SNF complex Remodelers such as SWI/SNF can act in several different ways (Figure 15–12) They may loosen the attachment between histones and DNA, resulting in the nucleosome sliding along the DNA and exposing regulatory regions Alternatively, they may loosen the DNA strand from the nucleosome core, or they may cause reorganization of the internal nucleosome components In all cases, the DNA is left transiently exposed to association with transcription factors and RNA polymerase Like HATs, chromatin remodeling 309 (a) Alteration of DNA–histone contacts ATP ADP Chromatin Remodeler Sliding exposes DNA (b) Alteration of the DNA path ATP ADP DNA pulled off nucleosome Chromatin Remodeler (c) Remodeling of nucleosome core particle ATP ADP Nucleosome dimer forms Chromatin Remodeler F I G U R E – Three ways by which chromatin remodelers, such as the SWI/SNF complex, alter the association of nucleosomes with DNA (a) The DNA–histone contacts may be loosened, allowing the nucleosomes to slide along the DNA, exposing DNA regulatory regions (b) The path of the DNA around a nucleosome core particle may be altered (c) Components of the core nucleosome particle may be rearranged, resulting in a modified nucleosome structure complexes can be recruited to DNA by transcription activator proteins that are bound to specific regions of DNA The interactions of remodeling complexes are also affected by the presence or absence of histone modifications DNA Methylation Another type of change in chromatin that plays a role in gene regulation is the addition or removal of methyl groups to or from bases in DNA DNA methylation most often involves cytosine In the genome of any given eukaryotic species, approximately percent of the cytosine residues are methylated However, the extent of methylation can be tissue specific and can vary from less than percent to more than percent of cytosine residues Evidence of a role for methylation in eukaryotic gene expression is based on a number of observations First, an inverse relationship exists between the degree of methylation and the degree of expression Large transcriptionally inert regions of the genome, such as the inactivated X chromosome in mammalian female cells, are often heavily methylated Second, methylation patterns are tissue specific and, once established, are heritable for all cells of that tissue It appears that proper patterns of DNA methylation are essential for normal mammalian development Undifferentiated 310 15 R egulation o f G ene Ex p ression embryonic cells that are not able to methylate DNA die when they are required to differentiate into specialized cell types Perhaps the most direct evidence for the role of methylation in gene expression comes from studies using base analogs The nucleotide 5-azacytidine can be incorporated into DNA in place of cytidine during DNA replication This analog cannot be methylated, causing the undermethylation of the sites where it is incorporated The incorporation of 5-azacytidine into DNA changes the pattern of gene expression and stimulates expression of alleles on inactivated X chromosomes In addition, the presence of 5-azacytidine in DNA can induce the expression of genes that would normally be silent in certain differentiated cells How might methylation affect gene regulation? Data from in vitro studies suggest that methylation can repress transcription by inhibiting the binding of transcription factors to DNA Methylated DNA may also recruit repressive chromatin remodeling complexes to gene-regulatory regions ES SEN T I A L PO I N T Eukaryotic gene regulation at the level of chromatin may involve gene-specific chromatin remodeling, histone modifications, or DNA modifications These modifications may either allow or inhibit access of promoters and enhancers to transcription factors, resulting in increased or decreased levels of transcription initiation 15–3 Cancer cells often have abnormal patterns of chromatin modifications In some cancers, the DNA repair genes MLH1 and BRCA1 are hypermethylated on their promoter regions Explain how this abnormal methylation pattern could contribute to cancer HINT: This problem involves an understanding of the types of genes that are mutated in cancer cells The key to its solution is to consider how methylation affects gene expression of cancer-related genes Promoters A promoter is a region of DNA that binds one or more proteins that regulate transcription initiation Promoters are located immediately adjacent to the genes they regulate They may be up to several hundred nucleotides in length and specify where transcription begins and the direction of transcription along the DNA Within promoters are a number of promoter elements—short nucleotide sequences that bind specific regulatory factors There are two subcategories within eukaryotic promoters First, the core promoter determines the accurate initiation of transcription by RNA polymerase II Second, proximal promoter elements are those that modulate the efficiency of basal levels of transcription Recent bioinformatic research reveals that there is a great deal of diversity in eukaryotic core promoters in terms of both their structures and functions Core promoters are now thought to be either focused or dispersed Focused promoters specify transcription initiation at a single specific nucleotide (the transcription start site) In contrast, dispersed promoters direct initiation from a number of weak transcription start sites located over a 50- to 100-nucleotide region (Figure 15–13) Focused transcription initiation is the major type of initiation for most genes of lower eukaryotes, but for only about 30 percent of vertebrate genes Focused promoters are usually associated with genes whose transcription levels are highly regulated, whereas dispersed promoters are associated with genes that are transcribed constitutively Little is known about the DNA elements that make up dispersed promoters These promoters are usually found (a) Focused promoter Gene One major transcript 15.8 Eukaryotic Transcription Regulation Requires Specific Cis-Acting Sites As in prokaryotes, eukaryotic transcription regulation is controlled by trans-acting regulatory proteins that bind to specific cis-acting DNA sequences located in and around eukaryotic genes Although these cis-acting sequences not, by themselves, regulate gene transcription, they are essential because they position regulatory proteins in regions where those proteins can act to stimulate or repress transcription of the associated gene In this section, we will discuss some of these cis-acting DNA sequences including promoters, enhancers, and silencers (b) Dispersed promoter Gene Multiple transcripts FIGUR E 15–13 Focused and dispersed promoters Focused promoters (a) specify one specific transcription initiation site Dispersed promoters (b) specify weak transcription initiation at multiple start site positions over an approximately 100-bp region Transcription start sites and the directions of transcription are indicated with arrows 15.8 E U KARYOTIC TRA NS CRI P TION RE G U L ATION RE QU IRE S S P E CI FIC CIS-ACT I N G S I T ES -40 +40 +1 BRE TATA BRE -38 to -32 -30 to -24 311 Inr MTE DPE -2 to +4 +18 to +27 +28 to +33 -23 to -17 F I G U RE – Core-promoter elements found in focused promoters Core-promoter elements are usually located between -40 and +40 nucleotides, relative to the transcription start site, indicated as +1 None of these elements is universal, and a core promoter may contain only one, or several, of these elements BRE is the TFIIB recognition element, TATA is the TATA box, Inr is the initiator element, MTE is the motif ten element, and DPE is the downstream promoter element within CG-rich regions, suggesting that chromatin modifications may influence initiation from these promoters Some data suggest that dispersed promoters contain the same types of elements as focused promoters; however, these elements contain multiple mismatches to the element consensus sequences, perhaps accounting for the low levels of transcription initiation from these types of promoters Much more is known about the structure of focused promoters These promoters are made up of one or more DNA sequence elements, as summarized in Figure 15–14 Each of these elements is found in only some core promoters, with no element being a universal component of all focused promoters The Inr element encompasses the transcription start site, from approximately nucleotides -2 to +4, relative to the start site In humans, the Inr consensus sequence is YYANA/TYY (where Y indicates any pyrimidine nucleotide and N indicates any nucleotide) The transcription start site is the first A residue at +1 The TATA box element is located at approximately -30 relative to the transcription start site and has the consensus sequence TATAA/TAAR (where R indicates any purine nucleotide) The BRE is found in some core promoters at positions either immediately upstream or downstream from the TATA box The MTE and DPE sequence motifs are located downstream of the transcription start site, at approximately +18 to +27 and +28 to +32 respectively In addition to core-promoter elements, many promoters also contain proximal-promoter elements located upstream of the TATA box and BRE Proximal-promoter elements act along with the core-promoter elements to increase the levels of basal transcription For example, the CAAT box is a common proximal-promoter element It has the consensus sequence CAAT or CCAAT and is usually located about 70 to 80 base pairs upstream from the start site Mutational analysis suggests that CAAT boxes (when present) are critical to the promoter’s ability to initiate transcription Mutations on either side of this element have no effect on transcription, whereas mutations within the CAAT sequence dramatically lower the rate of transcription Figure 15–15 summarizes the transcriptional effects of mutations in the CAAT box and other promoter elements The GC box is another element often found in proximal promoter regions and has the consensus sequence GGGCGG It is located, in one or more copies, at about position -110 Enhancers and Silencers Although eukaryotic promoter elements are essential for basal or low levels of transcription initiation, more dramatic changes in transcription initiation require the presence of other sequence elements known as enhancers and silencers Like promoters, enhancers are cis regulators because they function when adjacent to the structural genes they regulate However, unlike promoters, enhancers can be located on either side of a gene, at some distance from the gene, or even within the gene Enhancers are necessary for achieving the maximum level of transcription In addition, enhancers are responsible for time- and tissue-specific gene expression Thus, there is some degree of analogy between enhancers and operator regions in prokaryotes However, enhancers are more complex in both structure and function Several features distinguish promoters from enhancers: The position of an enhancer need not be limited in position; it will function whether it is upstream, downstream, or within the gene it regulates The orientation of an enhancer can be inverted without significant effect on its action If an enhancer is experimentally moved adjacent to a gene elsewhere in the genome, or if an unrelated gene is placed near an enhancer, the transcription of the newly adjacent gene is enhanced Another type of cis-acting transcription regulatory element, the silencer, acts upon eukaryotic genes to repress the level of transcription initiation Silencers, like enhancers, are short DNA sequence elements that affect the rate of transcription initiated from an associated promoter They often act in tissue- or temporal-specific ways to control gene expression 15 312 R egulation o f G ene Ex p ression Relative transcription level 3.5 3.0 1.0 (GC) (CCAAT) ES SEN T I A L PO I N T Eukaryotic transcription regulation requires gene-specific promoter, enhancer, and silencer elements The presence of these cis-acting regulatory sites can affect transcription in tissue- and temporal-specific ways 15.9 Eukaryotic Transcription Initiation is Regulated by Transcription Factors That Bind to Cis-Acting Sites Eukaryotic promoters, enhancers, and silencers influence transcription initiation by acting as binding sites for transcription regulatory proteins These transcription regulatory proteins, known as transcription factors, can have diverse and complicated effects on transcription Some transcription factors increase the levels of transcription initiation and are known as activators, whereas others reduce transcription levels and are known as repressors Some transcription factors are expressed in tissuespecific ways, regulating their target genes for tissue-specific levels of expression In addition, some transcription factors are expressed in cells only at certain times during development or in response to external physiological signals In some cases, a transcription factor that binds to a cis-acting site and regulates a certain gene may be present in a cell and may even bind to its appropriate cis-acting site but will only become active when modified structurally (for example, by phosphorylation or by binding to a coactivator such as a hormone) These modifications to transcription factors can also be regulated in tissue- or temporal-specific ways In addition, different transcription factors may compete for binding to a given DNA sequence or to one of two overlapping sequences In these cases, transcription factor concentrations and the strength with which each factor binds to DNA will dictate which factor binds The same site may also (TATA) (Initiation) FIGUR E 15–15 Summary of the effects on transcription levels of different point mutations in the promoter region of the b-globin gene Each line represents the level of transcription produced in a separate experiment by a singlenucleotide mutation (relative to wild-type) at a particular location Dots represent nucleotides for which no mutation was obtained Note that mutations within specific elements of the promoter have the greatest effects on the level of transcription bind different factors in different tissues Finally, multiple transcription factors that bind to several different enhancers and promoter elements within a gene-regulatory region can interact with each other to fine-tune the levels and timing of transcription initiation The Human Metallothionein IIA Gene: Multiple Cis-Acting Elements and Transcription Factors The human metallothionein IIA gene (hMTIIA) provides an example of how one gene can be transcriptionally regulated through the interplay of multiple promoter and enhancer elements and the transcription factors that bind to them The product of the hMTIIA gene is a protein that binds to heavy metals such as zinc and cadmium, thereby protecting cells from the toxic effects of high levels of these metals The gene is expressed at low levels in all cells but is transcriptionally induced to express at high levels when cells are exposed to heavy metals and steroid hormones such as glucocorticoids The cis-acting regulatory elements controlling transcription of the hMTIIA gene include promoter, enhancer, and silencer elements (Figure 15–16) Each cis-acting element is a short DNA sequence that has specificity for binding to one or more transcription factors The hMTIIA gene contains the promoter elements TATA box and start site, which specify the start of transcription The proximal promoter element, GC, binds the SP1 factor, which is present in most eukaryotic cells and stimulates transcription at low levels in most cells Basal levels of expression are also regulated by the BLE (basal element) and ARE (AP factor response element) regions These ciselements bind the activator proteins 1, 2, and (AP1, AP2, and AP4), which are present in various levels in different cell types and can be activated in response to extracellular growth signals The BLE contains overlapping binding sites for the AP1 and AP4 factors, providing some degree of selectivity in how these factors stimulate transcription of hMTIIA 15.10 Activators and Repressors Interact with General Transcription Factors and Affect Chromatin Structure -250 -200 GRE ARE ARE ARE AP2 AP2 AP2 Glucocorticoid Receptor -150 -100 MRE MRE BLE AP1 MTF1 MTF1 AP4 AP4 Transcription start site -50 MRE GC MRE SP1 MTF1 313 TATA Inr TFIID MTF1 Repressor PZ120 F I G U RE – The human metallothionein IIA gene promoter and enhancer regions, containing multiple cis-acting regulatory sites The transcription factors controlling both basal and induced levels of MTIIA transcription, and their binding sites, are indicated below the gene and are described in the text when bound to the BLE in different cell types High levels of transcription are conferred by the presence of the enhancers MRE (metal response element) and GRE (glucocorticoid response element) The metal-inducible transcription factor (MTF1) binds to the MRE in response to the presence of heavy metals The glucocorticoid receptor protein binds to the GRE, but only when the receptor protein is also bound to the glucocorticoid steroid hormone The glucocorticoid receptor is normally located in the cytoplasm of the cell However, when glucocorticoid hormone enters the cytoplasm, it binds to the receptor and causes a conformational change that allows the receptor to enter the nucleus, bind to the GRE, and stimulate hMTIIA gene transcription In addition to induction, transcription of the hMTIIA gene can be repressed by the actions of the repressor protein PZ120, which binds over the transcription start region The presence of multiple regulatory elements and transcription factors that bind to them allows the hMTIIA gene to be transcriptionally induced or repressed in response to subtle changes in both extracellular and intracellular conditions ES S E NT I A L PO I N T Transcription factors influence transcription rates by binding to cis-acting regulatory sites within or adjacent to a gene promoter 15.10 Activators and Repressors Interact with General Transcription Factors and Affect Chromatin Structure We have now discussed the first steps in eukaryotic transcription regulation: first, chromatin must be remodeled and modified to allow transcription proteins to bind to their specific cis-acting sites; second, transcription factors bind to cis-acting sites and bring about positive and negative effects on the transcription initiation rate—often in response to extracellular signals or in tissue- or time-specific ways The next question is, how these cis-acting regulatory elements and their DNA-binding factors act to influence transcription initiation? To answer this question, we must first discuss how eukaryotic RNA polymerase II and its basal transcription factors assemble at promoters Formation of the Transcription Pre-Initiation Complex A number of proteins called general transcription factors are needed to initiate both basal-level and enhanced levels of transcription These proteins assemble at the promoter in a specific order, forming a transcriptional pre-initiation complex (PIC) that in turn provides a platform for RNA polymerase to recognize and bind to the promoter We will restrict our discussion of PIC formation to focused promoters with TATA boxes—the type of promoter for which the most information is available The general transcription factors and their interactions with the core promoter and RNA polymerase II are outlined in Figure 15–17 The first step in the formation of a PIC is the binding of TFIID to the TATA box of the core promoter TFIID is a multi-subunit complex that contains TBP (TATA Binding Protein) and approximately 13 proteins called TAFs (TBP Associated Factors) As its name implies, TBP binds to the TATA box In addition, a subset of TAFs binds to Inr elements, as well as DPEs and MTEs TFIIA interacts with TFIID and assists the binding of TFIID to the core promoter Once TFIID has made contact with the core promoter, TFIIB binds to BREs on one or both sides of the TATA box Once TFIID and TFIIB have bound the core promoter, the other general transcription factors interact with RNA polymerase II and help recruit it to the promoter The fully formed PIC mediates the unwinding of promoter DNA at the start site and the transition of RNA polymerase II from transcription initiation to elongation On many promoters of higher eukaryotes, RNA polymerase II remains paused at about 50 bp downstream of the transcription start site, awaiting 15 314 R egulation o f G ene Ex p ression -50 -40 -30 -20 -10 +10 Transcription start site TATA box TAFs IID IIA TBP TAFs TBP IIB IIB TAFs TBP IIA Med RNAP II Med IIF IIB TAFs TBP RNAP II IIA IIF IIE IIH Med IIB TAFs TBP RNAP II IIA IIF IIH IIE F I G U RE –1 The assembly of general transcription factors required for the initiation of transcription by RNA polymerase II (RNAP II) signals that release it into transcription elongation In other gene promoters of higher eukaryotes and in all promoters in yeast, RNA polymerase II immediately leaves the promoter region and proceeds down the DNA template in an elongation complex Several of the general transcription factors, specifically TFIID, TFIIE, TFIIH, and Mediator, remain on the core promoter to help set up the next PIC distant enhancer or silencer elements into close physical contact with the promoter regions of genes that they regulate In one model of transcription activation and repression, DNA looping may deliver activators, repressors, and general transcription factors to the vicinity of promoters that must be activated or repressed In this recruitment model, enhancer and silencer elements act as donors that increase the concentrations of important regulatory proteins at gene promoters By enhancing the rate of PIC assembly or stability, or by accelerating the release of RNA polymerase II from a promoter, transcription activators bound at enhancers may stimulate the rate of transcription initiation In order to make contact with promoter-bound factors, activators are thought to interact with other proteins called coactivators that form a complex known as an enhanceosome (Figure 15–18) Enhanceosomes may directly contact the PIC through subunits of the mediator and TFIID In a similar way, repressors bound at silencer elements may decrease the rate of PIC assembly and the release of RNA polymerase II In a second model, DNA looping may result in chromatin alterations that either stimulate or repress transcription of target genes Chromatin remodeling complexes or chromatin modifiers, once delivered to the vicinity of a promoter, may open or close the promoter to interactions with general transcription factors and RNA polymerase II, or may inhibit the release of paused RNA polymerase II from pre-initiation complexes A third model of transcription activation and repression states that enhancer or repressor looping may relocate a target gene to a nuclear region that is favorable or inhibitory to transcription This nuclear relocation model would be consistent with the presence of transcription factories– regions of the nucleus that contain concentrations of RNA polymerase II and transcription regulatory factors Enhancer Activator and Coactivator Proteins Enhanceosome Med IIB TAFs TBP Mechanisms of Transcription Activation and Repression Researchers have proposed several models to explain how transcription activators and repressors bring about changes to RNA polymerase II transcription In most cases, these models involve the formation of DNA loops that bring RNAP II IIA IIF IIH IIE TATA box FIGUR E 15–18 Formation of DNA loops allows factors that bind to an enhancer (or silencer) at a distance from a promoter to interact with general transcription factors and RNA polymerase II (RNAP II) in the pre-initiation complex and to regulate the level of transcription 15.11 315 Posttranscriptional Gene Regulation Occurs at Many Steps from RNA Processing to Protein Modification ES S E NT I A L PO I N T Transcription factors act by enhancing or repressing the association of general transcription factors at the promoter They may also assist in chromatin remodeling and the relocation of a target gene to specific nuclear sites 15–4 The hormone estrogen converts the estrogen receptor (ER) protein from an inactive molecule to an active transcription factor The ER binds to cis-acting sites that act as enhancers, located near the promoters of a number of genes In some tissues, the presence of estrogen appears to activate transcription of ER-target genes, whereas in other tissues, it appears to repress transcription of those same genes Offer an explanation as to how this may occur HIN T: This problem involves an understanding of how transcrip- tion enhancers and repressors work The key to its solution is to consider the many ways that trans-acting factors can interact at enhancers to bring about changes in transcription initiation example, they can alter the protein’s enzymatic activity, receptor-binding capacity, or protein localization in the cell Figure 15–19 presents an example of alternative splicing of the pre-mRNA transcribed from the calcitonin/calcitonin gene-related peptide gene (CT/CGRP gene) In thyroid cells, the CT/CGRP primary transcript is spliced in such a way that the mature mRNA contains the first four exons only In these cells, the exon polyadenylation signal is used to process the mRNA and add the poly-A tail This mRNA is translated into the calcitonin peptide, a 32-amino acid peptide hormone that is involved in regulating calcium In the brain and peripheral nervous system, the CT/CGRP primary transcript is spliced to include exons and 6, but not exon In these cells, the exon polyadenylation site is recognized The CGRP mRNA encodes a 37-amino acid peptide with hormonal activities in a wide range of tissues Through alternative splicing, two peptide hormones with different structures, locations, and functions are synthesized from the same gene Even more complex alternative splicing patterns occur in some genes, such as the example in Figure 15–20 Alternative splicing increases the number of proteins that can be made from each gene As a result, the number of 15.11 Posttranscriptional Gene Regulation Occurs at Many Steps from RNA Processing to Protein Modification Calcitonin peptide translation and processing 5' AAAAn Although transcriptional control is a major type of gene regulation in eukaryotes, posttranscriptional regulation plays an equal, and in some cases more significant, role Modification of eukaryotic nuclear RNA transcripts prior to translation includes the removal of noncoding introns, the precise splicing together of the remaining exons, and the addition of a cap at the mRNA’s 5¿ end and a poly-A tail at its 3¿-end The messenger RNA is then exported to the cytoplasm, where it is translated and degraded Each of the mRNA processing steps can be regulated to control the quantity of functional mRNA available for synthesis of a protein product In addition, the rate of translation, as well as the stability and activity of protein products, can be regulated We will examine several mechanisms of posttranscriptional gene regulation that are especially important in eukaryotes—alternative splicing, mRNA stability, translation, and protein stability Alternative Splicing of mRNA Alternative splicing can generate different forms of mRNA from identical pre-mRNA molecules, so that expression of one gene can give rise to a number of proteins with similar or different functions Changes in splicing patterns can have many different effects on the translated protein For Primary Transcript CT mRNA (thyroid cells) Poly A 5' Poly A AAAAn 6 CGRP mRNA (neuronal cells) translation and processing CGRP peptide FIGUR E 15–19 Alternative splicing of the CT/CGRP gene transcript The primary transcript, which is shown in the middle of the diagram, contains six exons The primary transcript can be spliced into two different mRNAs, both containing the first three exons but differing in their final exons The CT mRNA contains exon 4, with polyadenylation occurring at the end of the fourth exon The CGRP mRNA contains exons and 6, and polyadenylation occurs at the end of exon The CT mRNA is produced in thyroid cells After translation, the resulting protein is processed into the calcitonin peptide In contrast, the CGRP mRNA is produced in neuronal cells, and after translation, its protein product is processed into the CGRP peptide 316 15 R egulation o f G ene Ex p ression Exon Exon Exon Exon 17 12 alternatives 48 alternatives 33 alternatives alternatives Genomic DNA and Pre-mRNA mRNA Protein © 2000 Elsevier F I G U RE –2 Alternative splicing of the Dscam gene mRNA The Dscam gene encodes a protein that guides axon growth during development Each mRNA will contain one of the 12 possible exons for exon (red), one of the 48 possible exons for exon (blue), one of the 33 possible exons for exon (green), and one of the possible exons for exon 17 (yellow) Counting all possible combinations of these exons, the Dscam gene could encode 38,016 different versions of the DSCAM protein proteins that an organism can make—its proteome—is not the same as the number of genes in the genome, and protein diversity can exceed gene number by an order of magnitude Alternative splicing is found in all metazoans but is especially common in vertebrates, including humans It has been estimated that at least two-thirds of protein-coding genes in the human genome can undergo alternative splicing Thus, humans can produce several hundred thousand different proteins (or perhaps more) from the approximately 20,000 genes in the haploid genome Mutations that affect regulation of splicing contribute to several genetic disorders One of these disorders, myotonic dystrophy (DM), provides an example of how defects in alternative RNA splicing can lead to a wide range of symptoms Myotonic dystrophy is the most common form of adult muscular dystrophy, affecting in 8000 individuals It is an autosomal dominant disorder that occurs in two forms—DM1 and DM2 Both of these diseases show a wide range of symptoms, including muscle wasting, myotonia (difficulty relaxing muscles), insulin resistance, behavior and cognitive defects, and cardiac muscle problems DM1 is caused by the expansion of the trinucleotide repeat CTG in the 3¿-untranslated region of the DMPK gene In unaffected individuals, the DMPK gene contains between and 35 copies of the CTG repeat sequence, whereas in DM1 patients, the gene contains between 150 and 2000 copies The severity of the symptoms is directly related to the number of copies of the repeat sequence DM2 is caused by an expansion of the repeat sequence CCTG within the first intron of the ZNF9 gene Affected individuals may have up to 11,000 copies of the repeat sequence in the ZNF9 intron In DM2, the severity of symptoms is not related to the number of repeats Recently, scientists have discovered that DM1 and DM2 are caused not by changes in the protein products of the DMPK or ZNF9 genes, but by the toxic effects of their repeat-containing RNAs These RNAs accumulate and form inclusions within the nucleus In the case of ZNF9, only the CCUG sequence repeat itself accumulates in the nucleus, as the remainder of the intron is degraded after splicing of the mRNA It appears that the accumulated RNA fragments bind to, and sequester, proteins that would normally be involved in regulating the alternative splicing patterns of a large number of other RNAs These RNAs include those whose products are required for the proper functioning of muscle and neural tissue So far, scientists have discovered over 20 genes that are inappropriately spliced in the muscle, heart, and brain of DM1 patients Often, the fetal splicing patterns occur in DM1 and DM2 patients, and the normal transitions to adult splicing patterns are lacking Such defects in the regulation of RNA splicing are known as spliceopathies Control of mRNA Stability The steady-state level of an mRNA is its amount in the cell as determined by a combination of the rate at which the gene is transcribed and the rate at which the mRNA is degraded The steady-state level determines the amount of mRNA that is available for translation All mRNA molecules are degraded at some point after their synthesis, but the lifetime of an mRNA, defined in terms of its half-life, or t1/2, can vary widely between different mRNAs and can be regulated in response to the needs of the cell Some mRNAs are degraded within minutes after their synthesis, whereas others can remain stable for hours, months, or even years (in the case of mRNAs stored in oocytes) Regulation of mRNA stability is often linked with the process of translation Several observations demonstrate this link between translation and mRNA stability First, most mRNA molecules become stable in cells that are treated with translation inhibitors Second, the presence of premature stop codons in the body of an mRNA, as well as premature translation termination, causes rapid degradation of mRNAs Third, many of the ribonucleases and mRNA-binding proteins that affect mRNA stability associate with ribosomes 15.12 RNA -INDU CE D G E NE S ILE NCING CONTROL S G E NE E XPRE S S ION IN S EVER AL WAY S Another way that an mRNA’s half-life can be altered is through specific RNA sequence elements that recruit degrading or stabilizing complexes One well-studied mRNA stability element is the adenosine-uracil rich element (ARE)—a stretch of ribonucleotides that consist of A and U ribonucleotides These AU-rich elements are usually located in the 3¿-untranslated regions of mRNAs that have short, regulated half-lives These ARE-containing mRNAs encode proteins that are involved in cell growth or transcription control and need to be rapidly modulated in abundance In cells that are not growing or require low levels of gene expression, specific complexes bind to the ARE elements of these mRNA molecules, bringing about shortening of the poly-A tail and rapid mRNA degradation It is estimated that approximately 10 percent of mammalian mRNAs contain these instability elements Translational and Posttranslational Controls In some cases, the translation of an mRNA can be regulated by the extent of the cell’s requirement for the gene product In other cases, the stability of a protein can be modulated or the protein can be modified after translation to change its structure and affect its activity An example of regulated stability and modification is that of the p53 protein The p53 protein is essential to protect normal cells from the effects of DNA damage and other stresses It is a transcription factor that increases the transcription of a number of genes whose products are involved in cell-cycle arrest, DNA repair, and programmed cell death Under normal conditions, the levels of p53 protein are extremely low in cells, and the p53 that is present is inactive When cells suffer DNA damage or metabolic stress, the amount of p53 protein increases dramatically, and p53 becomes an active transcription factor The changes in the abundance and activity of p53 are due to a combination of increased protein stability and modifications to the protein In unstressed cells, p53 is bound by another protein called Mdm2 The Mdm2 protein binds to the p53 protein, blocking its ability to induce transcription In addition, Mdm2 adds ubiquitin residues onto the p53 protein Ubiquitin is a small protein that tags other proteins for degradation by proteolytic enzymes The presence of ubiquitin on p53 results in p53 degradation When cells are stressed, Mdm2 and p53 become modified by phosphorylation and acetylation, resulting in the release of Mdm2 from p53 As a consequence, p53 proteins become stable, the levels of p53 increase, and the protein is able to act as a transcription factor An added level of control is that p53 is a transcription factor that induces the transcription of the Mdm2 gene Hence, the presence of active p53 triggers a negative feedback loop that creates more Mdm2 protein, which rapidly returns p53 to its rare and inactive state 317 ESSEN T IAL PO IN T Posttranscriptional gene regulation can involve alternative splicing of nascent RNA, changes in mRNA stability, translational control, and posttranslational modifications These mechanisms may alter the type, quantity, or activity of a gene’s protein product 15.12 RNA-Induced Gene Silencing Controls Gene Expression in Several Ways In the last decade, the discovery that small noncoding RNA (sncRNA) molecules control gene expression has given rise to a new field of research First discovered in plants, sncRNAs are now known to regulate gene expression in the cytoplasm of plants, animals, and fungi by repressing translation and triggering the degradation of mRNAs This form of sequencespecific posttranscriptional regulation is known as RNA interference (RNAi) More recently, sncRNAs have been shown to act in the nucleus to alter chromatin structure and bring about repression of transcription Together, these phenomena are known as RNA-induced gene silencing Later in the text (see Special Topic Chapter 2—Emerging Roles of RNA), we present a comprehensive description of gene regulation by various types of RNA molecules, including sncRNAs Recent studies are demonstrating that RNA-induced gene-silencing mechanisms operate during normal development and control the expression of batteries of genes involved in tissue-specific cellular differentiation In addition, scientists have discovered that abnormal activities of sncRNAs contribute to the occurrence of cancers, diabetic complications, and heart disease Geneticists are applying RNAi as a powerful research tool RNAi technology allows investigators to create specific single-gene defects without having to induce inherited gene mutations RNAi-mediated gene silencing is relatively specific and inexpensive, and it allows scientists to rapidly analyze gene function In addition to its use in laboratory research, RNAi is being developed as a potential pharmaceutical agent In theory, any disease caused by overexpression of a specific gene, or even normal expression of an abnormal gene product, could be attacked by therapeutic RNAi Viral infections are obvious targets, and scientists have had promising results using RNAi in tissue cultures to reduce the severity of infection by several types of viruses such as HIV, influenza, and polio In animal models, siRNA molecules have successfully treated virus infections, eye diseases, cancers, and inflammatory bowel disease New as it is, the science of RNAi holds powerful promise for molecular medicine The uses of RNAi in therapeutics are also discussed in the Genetics, Technology, and Society feature in Chapter 12 on p 250 318 15 R egulation o f G ene Ex p ression G E N E T I C S , T E C H N O L O G Y, A N D S O C I E T Y F Quorum Sensing: Social Networking in the Bacterial World or decades, scientists regarded bacteria as independent single-celled organisms, incapable of cell-to-cell communication However, recent research is revealing that many bacteria can regulate gene expression and coordinate group behavior through a form of communication termed quorum sensing Through this process, bacteria send and receive chemical signals called autoinducers that relay information about population size When the population size reaches a “quorum,” defined in the business world as the minimum number of members of an organization that must be present to conduct business, the autoinducers regulate gene expression in a way that benefits the group as a whole Quorum sensing has been described in more than 70 species of bacteria, and its uses range from controlling bioluminescence in marine bacteria to regulating the expression of virulence factors in pathogenic bacteria Our understanding of quorum sensing has altered our perceptions of prokaryotic gene regulation and is leading to the development of practical applications, including new antibiotic drugs Quorum sensing was discovered in the 1960s, during research on the bioluminescent bacterium Vibrio fischeri, which lives in a symbiotic relationship with the Hawaiian bobtail squid, Euprymna scolopes While hunting for food at night, the squid uses light emitted by the V fischeri present in its light organ to illuminate the ocean floor and to counter the shadows created by moonlight that normally act as a beacon for the squid’s predators In return, the bacteria gain a protected, nutrient-rich environment in the squid’s light organ During the day, the bacteria not glow as a result of the squid’s ability to reduce the concentration of bacteria in its light organ, which in turn prevents expression of the bacterial luciferase (lux) operon What turns the bacteria’s lux genes on in response to high cell density and off in response to low cell density? In V fischeri, the responsible autoinducer is a secreted homoserine lactone molecule At a critical population size, these molecules accumulate, are taken up by bacteria within the population, and regulate the lux operon by binding directly to transcription factors that stimulate lux gene expression Since the discovery of quorum sensing in Vibrio fischeri, scientists have identified similar microbial communication systems in other bacteria, including significant human pathogens such as pseudomonal, staphylococcal, and streptococcal species The expression of as many as 15 percent of bacterial genes may be regulated by quorum sensing Quorum sensing molecules may also mediate communication among members of different species In 1994, Bonnie Bassler and her colleagues at Princeton University discovered an autoinducer molecule in the marine bacterium Vibrio harveyi that was also present in many diverse types of bacteria This molecule, autoinducer-2 (AI-2), has the potential to mediate “quorum-sensing cross talk” between species and thus serve as a universal language for bacterial communication Because the accumulation of AI-2 is proportional to cell number, and because the structure of AI-2 may vary slightly between different species, the current hypothesis is that AI-2 can transmit information about both the cell density and species composition of a bacterial community Pathogenic bacteria use quorum sensing to regulate the expression of genes whose products help these bacteria invade a host and avoid immune system detection For example, Vibrio cholerae, the causative agent of cholera, uses AI-2 and an additional species-specific autoinducer to activate the genes controlling the production of cholera toxin Pseudomonas aeruginosa, the Gram-negative bacterium that often affects cystic fibrosis patients, uses quorum sensing to regulate the production of elastase, a protease that disrupts the respiratory epithelium and interferes with ciliary function P aeruginosa also uses autoinducers to control the production of biofilms, tough protective shells that resist host defenses and make treatment with antibiotics nearly impossible Other bacteria determine cell density through quorum sensing to delay the production of toxic substances until the colony is large enough to overpower the host’s immune system and establish an infection Because many bacteria rely on quorum sensing to regulate diseasecausing genes, therapeutics that block quorum sensing may help combat infections Research into these potential therapies is now in progress, and several are now approaching the clinical trial phase Thus, what began as a fascinating observation in the glowing squid has launched an exciting era of research in bacterial genetics that may one day prove of great clinical significance Your Turn T ake time, individually or in groups, to answer the following questions Investigate the references and links to help you understand the mechanisms and potential uses of quorum sensing in bacteria Inhibitors of quorum sensing molecules have potential as antibacterial agents What are some ways in which quorum sensing inhibitors could work to combat bacterial infections? Have any of these therapeutics reached clinical trials? A recent review of quorum sensing therapeutics can be found in Njoroge, J and Sperandio, V 2009 Jamming bacterial communication: New approaches for the treatment of infectious diseases EMBO Mol Med 1(4): 201–210 Regulation of bacterial gene expression by autoinducer molecules involves several different mechanisms Describe these mechanisms and how each could be used as a target for the control of bacterial infections The mechanisms by which autoinducers regulate gene expression are summarized in Asad, S and Opal, S.M 2008 Bench-tobedside review: Quorum sensing and the role of cell-to-cell communication during invasive bacterial infection Critical Care 12: 236–247 Quorum sensing systems are also capable of detecting and responding INS IG HT S A N D S O LU T I O N S to chemical signals given off by host cells Explain how this works and how this might benefit pathogenic bacteria CASE STUDY A A review article dealing with interkingdom communication and quorum sensing can be found at Wagner, V.E et al 2006 Quo- 319 rum sensing: dynamic response of Pseudomonas aeruginosa to external signals Trends Microbiol 14(2): 55–58 A mysterious muscular dystrophy man in his early 30s suddenly developed weakness in his hands and neck, followed a few weeks later by burning muscle pain—all symptoms of late-onset muscular dystrophy His internist ordered genetic tests to determine whether he had one of the inherited muscular dystrophies, focusing on Becker muscular dystrophy, myotonic dystrophy Type I, and myotonic dystrophy Type II These tests were designed to detect mutations in the related dystrophin, DMPK, and ZNF9 genes The testing ruled out Becker muscular dystrophy While awaiting the results of the DMPK and ZNF9 gene tests, the internist explained that the possible mutations were due to expanded tri- and tetranucleotide repeats, but not in the protein-coding portion of the genes She went on to say that the resulting disorders were due not to changes in the encoded proteins, which appear to be normal, but instead to altered RNA splicing patterns, whereby the RNA splicing remnants containing the nucleotide repeats disrupt normal splicing of the transcripts of other genes This discussion raises several interesting questions about the diagnosis and genetic basis of the disorders What is alternative splicing, where does it occur, and how could disrupting it affect the expression of the affected gene(s)? What role might the expanded tri- and tetranucleotide repeats play in the altered splicing? How does this contrast with other types of muscular dystrophy, such as Becker muscular dystrophy and Duchenne muscular dystrophy? INSIGHTS AND SOLUTIONS A theoretical operon (theo) in E coli contains several structural genes encoding enzymes that are involved sequentially in the biosynthesis of an amino acid Unlike the lac operon, in which the repressor gene is separate from the operon, the gene encoding the regulator molecule is contained within the theo operon When the end product (the amino acid) is present, it combines with the regulator molecule, and this complex binds to the operator, repressing the operon In the absence of the amino acid, the regulatory molecule fails to bind to the operator, and transcription proceeds Characterize this operon, then consider the following mutations, as well as the situation in which the wild-type gene is present along with the mutant gene in partially diploid cells (F¿): (a) Mutation in the operator region (b) Mutation in the promoter region (c) Mutation in the regulator gene In each case, will the operon be active or inactive in transcription, assuming that the mutation affects the regulation of the theo operon? Compare each response with the equivalent situation of the lac operon Solution: The theo operon is repressible and under negative control When there is no amino acid present in the medium (or the environment), the product of the regulatory gene cannot bind to the operator region, and transcription proceeds under the direction of RNA polymerase The enzymes necessary for synthesis of the amino acid are produced, as is the regulator molecule If the amino acid is present, or is present after sufficient synthesis occurs, the amino acid binds to the regulator, forming a complex that interacts with the operator region, causing repression of transcription of the genes within the operon The theo operon is similar to the tryptophan system, except that the regulator gene is within the operon rather than separate from it Therefore, in the theo operon, the regulator gene is itself regulated by the presence or absence of the amino acid (a) As in the lac operon, a mutation in the theo operator region inhibits binding with the repressor complex, and transcription occurs constitutively The presence of an F¿ plasmid bearing the wild-type allele would have no effect, since it is not adjacent to the structural genes (b) A mutation in the theo promoter region would no doubt inhibit binding to RNA polymerase and therefore inhibit transcription This would also happen in the lac operon A wild-type allele present in an F¿ plasmid would have no effect (c) A mutation in the theo regulator gene, as in the lac system, may inhibit either its binding to the repressor or its binding to the operator gene In both cases, transcription will be constitutive because the theo system is repressible Both cases result in the failure of the regulator to bind to the operator, allowing transcription to proceed In the lac system, failure to bind the corepressor lactose would permanently repress the system The addition of a wild-type allele would restore repressibility, provided that this gene was transcribed constitutively Regulatory sites for eukaryotic genes are usually located within a few hundred nucleotides of the transcription start site, but they can be located up to several kilobases away DNA sequence-specific binding assays have been used to detect and isolate protein factors present at low concentrations in nuclear extracts In these experiments, short DNA molecules containing (continued) 320 15 R egulation o f G ene Ex p ression Insights and Solutions—continued McKnight’s laboratory at the Fred Hutchinson Cancer Center The cDNA isolated from cells expressing the binding factor is cloned into the lambda vector, gt11 Plaques of this library, containing proteins derived from expression of cDNA inserts, are adsorbed onto nitrocellulose filters and probed with double-stranded radioactive DNA corresponding to the binding site If a fusion protein corresponding to the binding factor is present, it will bind to the DNA probe After the unbound probe is washed off, the filter is subjected to autoradiography and the plaques corresponding to the DNA-binding proteins can be identified An added advantage of this strategy is filter recycling by washing the bound DNA from the filters prior to their reuse Such an ingenious procedure is similar to the colony-hybridization and plaque-hybridization procedures described for library screening in Chapter 17, and it provides a general method for isolating genes encoding DNA-binding factors DNA-binding sequences are attached to material that is packed into a glass column, and nuclear extracts are passed over the column The idea is that if proteins that specifically bind to the DNA sequence are present in the nuclear extract, they will bind to the DNA, and they can be recovered from the column after all other nonbinding material has been washed away Once a DNAbinding protein has been isolated and identified, the problem is to devise a general method for screening cloned libraries for the genes encoding the DNA-binding factors Determining the amino acid sequence of the protein and constructing synthetic oligonucleotide probes are time consuming and useful for only one factor at a time Knowing the strong affinity for binding between the protein and its DNA-recognition sequence, how would you screen for genes encoding binding factors? Solution: Several general strategies have been developed, and one of the most promising was devised by Steve Visit for instructor-assigned tutorials and problems Problems and Discussion Questions HOW DO WE KNOW ? In this chapter, we have focused on how prokaryotic and eukaryotic organisms regulate the expression of genetic information In particular, we discussed both transcriptional and posttranscriptional gene regulation Based on your knowledge of these topics, answer several fundamental questions: (a) How we know that bacteria regulate the expression of certain genes in response to the environment? (b) How we know that bacterial gene clusters are often coordinately regulated by a regulatory region that must be located adjacent to the cluster? (c) What led researchers to conclude that a trans-acting repressor molecule regulates the lac operon? (d) How we know that promoters and enhancers regulate transcription of eukaryotic genes? (e) How we know that DNA methylation plays a role in the regulation of eukaryotic gene expression? CONCEPT QUESTION Review the Chapter Concepts list on p 296 These all relate to the regulation of gene expression in prokaryotes and eukaryotes Write a brief essay that discusses why you think gene-regulatory systems evolved in bacteria, and why genes related to common functions are found together in operons Describe which enzymes are required for lactose and tryptophan metabolism in bacteria when lactose and tryptophan, respectively, are (a) present and (b) absent Contrast positive versus negative regulation of gene expression Describe the role of the repressor in an inducible system and in a repressible system Both attenuation and riboswitches rely on changes in the secondary structure of the leader regions of mRNA to regulate gene expression Compare and contrast the specific mechanisms in these two types of regulation For the lac genotypes shown in the accompanying table, predict whether the structural gene (Z) is constitutive, permanently repressed, or inducible in the presence of lactose Genotype + Constitutive + + Repressed Inducible X I O Z I -O+Z + I +OCZ + I -O+Z + >F′I + I +OCZ + >F′OC I SO + Z + I SO+Z + >F′I + For the genotypes and conditions (lactose present or absent) shown in the accompanying table, predict whether functional enzymes, nonfunctional enzymes, or no enzymes are made Genotype Condition + No lactose Lactose No lactose Lactose No lactose Lactose Lactose No lactose No lactose Lactose + + I O Z I +OCZ + I -O+Z I -O+Z I -O+Z + >F′I + I +OCZ + >F′O+ I +O+Z - >F′I +O+Z + I -O+Z - >F′I +O+Z + I SO+Z + >F′O+ I +OCZ + >F′O+Z + Functional Nonfunctional No Enzyme Enzyme Enzyme Made Made Made X The locations of numerous lacI - and lacI S mutations have been determined within the DNA sequence of the lacI gene Among these, lacI - mutations were found to occur in the 5¿-upstream region of the gene, while lacI S mutations were found to occur farther downstream in the gene Are the locations of the two types of mutations within the gene consistent with what is known about the function of the repressor that is the product of the lacI gene? PROBLE MS AND DIS C U S S IO N Q U ES T I O N S Explain why catabolite repression is used in regulating the lac operon and describe how it fine-tunes b-galactosidase synthesis 10 Describe experiments that would confirm whether or not two transcription regulatory molecules act through the mechanism of cooperative binding 11 Predict the level of genetic activity of the lac operon as well as the status of the lac repressor and the CAP protein under the cellular conditions listed in the accompanying table (a) (b) (c) (d) Lactose Glucose + + + + 12 Predict the effect on the inducibility of the lac operon of a mutation that disrupts the function of (a) the crp gene, which encodes the CAP protein, and (b) the CAP-binding site within the promoter 13 Describe the role of attenuation in the regulation of tryptophan biosynthesis 14 Imagine that a new operon is discovered in a certain microorganism The promoter sequence, the operator sequence, and the structural gene are represented by P, O, and S, respectively Products of two other genes, X and Y, help in the regulation of this operon It is induced by a molecule A Its function is very similar to that of the lac operon, that is, the S gene product (enzyme) helps in metabolizing a molecule X From the data provided in the accompanying table, explain how the X and Y gene products help in the regulation of this operon Genotype In the presence of A In the absence of A P +O +S +X +Y + enzyme is produced no enzyme is produced P +O +S +X -Y + enzyme is produced enzyme is produced + + + + - P O S X Y no enzyme is produced no enzyme is produced P +O +S +X -Y - enzyme is produced enzyme is produced enzyme is produced enzyme is produced + - + + + + + + - - P O S X Y + P O S X Y /F ¿X no enzyme is produced no enzyme is produced P +O +S +X -Y -/F ¿Y + enzyme is produced + + + - - P O S X Y /F ¿X +Y + enzyme is produced enzyme is produced no enzyme is produced 15 A bacterial operon is responsible for production of the biosynthetic enzymes needed to make the theoretical amino acid tisophane (tis) The operon is regulated by a separate gene, R, deletion of which causes the loss of enzyme synthesis In the wild-type condition, when tis is present, no enzymes are made; in the absence of tis, the enzymes are made Mutations in the operator gene (O -) result in repression regardless of the presence of tis Is the operon under positive or negative control? Propose a model for (a) repression of the genes in the presence of tis in wild-type cells and (b) the mutations 16 A marine bacterium is isolated and is shown to contain an inducible operon whose genetic products metabolize oil when it is encountered in the environment Investigation demonstrates that the operon is under positive control and that there is a reg gene whose product interacts with an operator region (o) to regulate the structural genes designated sg 321 In an attempt to understand how the operon functions, a constitutive mutant strain and several partial diploid strains were isolated and tested with the results shown here: Host Chromosome F9 Factor Phenotype wild type wild type wild type mutant strain none reg gene from mutant strain operon from mutant strain reg gene from wild type inducible inducible constitutive constitutive Draw all possible conclusions about the mutation as well as the nature of regulation of the operon Is the constitutive mutation in the trans-acting reg element or in the cis-acting o operator element? 17 What is the mechanism by which the chemical 5-azacytidine enhances gene expression? 18 List and define the levels of eukaryotic gene regulation discussed in this chapter 19 What are the subcategories within eukaryotic promoters? How enhancers and silencers differ from promoters? 20 What are transcription factors? What cis-acting elements they bind? 21 Compare the control of gene regulation in eukaryotes and prokaryotes at the level of initiation of transcription How the regulatory mechanisms work? What are the similarities and differences in these two types of organisms in terms of the specific components of the regulatory mechanisms? Address how the differences or similarities relate to the biological context of the control of gene expression 22 Many eukaryotic promoter regions contain CAAT boxes with consensus sequences CAAT or CCAAT approximately 70 to 80 bases upstream from the transcription start site How might one determine the influence of CAAT boxes on the transcription rate of a given gene? 23 What is RNA-induced gene silencing in eukaryotes? How sncRNAs affect gene regulation and how are they currently used in research and medicine? 24 Although it is customary to consider transcriptional regulation in eukaryotes as resulting from the positive or negative influence of different factors binding to DNA, a more complex picture is emerging For instance, researchers have described the action of a transcriptional repressor (Net) that is regulated by nuclear export (Ducret et al., 1999 Mol and Cell Biol 19: 7076–7087) Under neutral conditions, Net inhibits transcription of target genes; however, when phosphorylated, Net stimulates transcription of target genes When stress conditions exist in a cell (for example, from ultraviolet light or heat shock), Net is excluded from the nucleus, and target genes are transcribed Devise a model (using diagrams) that provides a consistent explanation of these three conditions 25 DNA methylation is commonly associated with a reduction of transcription The following data come from a study of the impact of the location and extent of DNA methylation on gene activity in human cells A bacterial gene, luciferase, was cloned next to eukaryotic promoter fragments that were methylated to various degrees, in vitro The chimeric plasmids were then introduced into tissue culture cells, and the luciferase activity was assayed These data compare the degree of expression of luciferase with differences in the location of DNA methylation 322 15 R egulation o f G ene Ex p ression (Irvine et al., 2002 Mol and Cell Biol 22: 6689–6696) What general conclusions can be drawn from these data? DNA Segment Outside transcription unit (0–7.6 kb away) Inside transcription unit Patch Size of Methylation (kb) Number of Methylated CpGs Relative Luciferase Expression 0.0 0 490X 2.0 3.1 12.1 0.0 1.9 2.4 12.1 100 102 593 0 108 134 593 290X 250X 2X 490X 80X 5X 2X 26 The interphase nucleus appears to be a highly structured organelle with chromosome territories, interchromosomal compartments, and transcription factories In cultured human cells, researchers have identified approximately 8000 transcription factories per cell, each containing an average of eight tightly associated RNA polymerase II molecules actively transcribing RNA If each RNA polymerase II molecule is transcribing a different gene, how might such a transcription factory appear? Provide a simple diagram that shows eight different genes being transcribed in a transcription factory and include the promoters, structural genes, and nascent transcripts in your presentation 27 It has been estimated that at least two-thirds of human genes produce alternatively spliced mRNA isoforms In some cases, incorrectly spliced RNAs lead to human pathologies Scientists have examined human cancer cells for splice-specific changes and found that many of the changes disrupt tumor-suppressor gene function (Xu and Lee, 2003 Nucl Acids Res 31: 5635– 5643) In general, what would be the effects of splicing changes on these RNAs and the function of tumor-suppressor gene function? How might loss of splicing specificity be associated with cancer? 16 The Genetics of Cancer CHAPTER CONCEPTS ■■ Cancer is characterized by genetic defects in fundamental aspects of cellular function, including DNA repair, chromatin modification, cellcycle regulation, apoptosis, and signal transduction ■■ Most cancer-causing mutations occur in somatic cells; only about percent of cancers have a hereditary component ■■ Mutations in cancer-related genes lead to abnormal proliferation and loss of control over how cells spread and invade surrounding tissues ■■ The development of cancer is a multistep process requiring mutations in genes controlling many aspects of cell proliferation and metastasis ■■ Cancer cells show high levels of genomic instability, leading to the accumulation of multiple mutations, some in cancerrelated genes ■■ DNA methylation and histone modifications play significant roles in the development of cancers ■■ Mutations in proto-oncogenes and tumor-suppressor genes contribute to the development of cancers ■■ Cancer-causing viruses and environmental agents contribute to the development of human cancers Colored scanning electron micrograph of two prostate cancer cells in the final stages of cell division (cytokinesis) The cells are still joined by strands of cytoplasm C ancer is the leading cause of death in Western countries It strikes people of all ages, and one out of three people will experience a cancer diagnosis sometime in his or her lifetime Each year, more than million cases of cancer are diagnosed in the United States, and more than 500,000 people die from the disease Over the last 30 years, scientists have discovered that cancer is a genetic disease at the somatic cell level, characterized by the presence of gene products derived from mutated or abnormally expressed genes The combined effects of numerous abnormal gene products lead to the uncontrolled growth and spread of cancer cells Although some mutated cancer genes may be inherited, most are created within somatic cells that then divide and form tumors Completion of the Human Genome Project and numerous large-scale rapid DNA sequencing studies have opened the door to a wealth of new information about the mutations that trigger a cell to become cancerous This new understanding of cancer genetics is also leading to new gene-specific treatments, some of which are now entering clinical trials Some scientists predict that gene-targeted therapies will replace chemotherapies within the next 25 years The goal of this chapter is to highlight our current understanding of the nature and causes of cancer As we will see, cancer is a genetic disease that arises from the accumulation of mutations in genes controlling many basic aspects of cellular function Please note that some of the topics discussed 323 16 324 The Ge ne tic s of C ancer in this chapter are explored in greater depth later in the text (see Special Topic Chapter 1—Epigenetics and Special Topic Chapter 4—Genomics and Personalized Medicine) What Is Cancer? 16.1 Cancer Is a Genetic Disease at the Level of Somatic Cells Perhaps the most significant development in understanding the causes of cancer is the realization that cancer is a genetic disease Genomic alterations that are associated with cancer range from single-nucleotide substitutions to large-scale chromosome rearrangements, amplifications, and deletions (Figure 16–1) However, unlike other genetic diseases, cancer is caused by mutations that arise predominantly in somatic cells Only about percent of cancers are associated with germ-line mutations that increase a person’s susceptibility to certain types of cancer Another important difference between cancers and other genetic diseases is that cancers rarely arise from a single mutation in a single gene, but from the accumulation of many mutations The mutations that lead to cancer affect multiple (a) 13 14 15 cellular functions, including repair of DNA damage, cell division, apoptosis, cellular differentiation, migratory behavior, and cell–cell contact 10 11 12 16 17 18 Clinically, cancer is defined as a large number of complex diseases, up to a hundred, that behave differently depending on the cell types from which they originate and the types of genetic alterations that occur within each cancer type Cancers vary in their ages of onset, growth rates, invasiveness, prognoses, and responsiveness to treatments However, at the molecular level, all cancers exhibit common characteristics that unite them as a family All cancer cells share two fundamental properties: (1) abnormal cell growth and division (proliferation), and (2) defects in the normal restraints that keep cells from spreading and colonizing other parts of the body (metastasis) In normal cells, these functions are tightly controlled by genes that are expressed appropriately in time and place In cancer cells, these genes are either mutated or are expressed inappropriately It is this combination of uncontrolled cell proliferation and metastatic spread that makes cancer cells dangerous When a cell simply loses genetic control over cell growth, it may grow into a multicellular mass, a benign tumor Such a tumor can often be removed by surgery and may cause no serious harm However, if cells in the tumor also have the ability to break loose, enter the bloodstream, invade other tissues, and form secondary tumors (metastases), they become malignant Malignant tumors are often difficult to treat and may become life threatening As we will see later in the chapter, there are multiple steps and genetic mutations that convert a benign tumor into a dangerous malignant tumor ESSEN T IAL POIN T 19 20 21 x 22 y (b) 13 19 14 20 15 Cancer cells show two fundamental properties: abnormal cell proliferation and a propensity to spread and invade other parts of the body (metastasis) The Clonal Origin of Cancer Cells 10 11 12 16 17 18 21 22 x F I G U RE – (a) Spectral karyotype of a normal cell (b) Karyotype of a cancer cell showing translocations, deletions, and aneuploidy—characteristic features of cancer cells Although malignant tumors may contain billions of cells, and may invade and grow in numerous parts of the body, all cancer cells in the primary and secondary tumors are clonal, meaning that they originated from a common ancestral cell that accumulated specific cancer-causing mutations This is an important concept in understanding the molecular causes of cancer and has implications for its diagnosis Numerous data support the concept of cancer clonality For example, reciprocal chromosomal translocations are characteristic of many cancers, including leukemias 16.1 C ance r Is a G en e tic Dis e as e at the Le vel of So mat i c Ce ll s and lymphomas (two cancers involving white blood cells) Cancer cells from patients with Burkitt lymphoma show reciprocal translocations between chromosome (with translocation breakpoints at or near the c-myc gene) and chromosomes 2, 14, or 22 (with translocation breakpoints at or near one of the immunoglobulin genes) Each Burkitt lymphoma patient exhibits unique breakpoints in his or her c-myc and immunoglobulin gene DNA sequences; however, all lymphoma cells within that patient contain identical translocation breakpoints This demonstrates that all cancer cells in each case of Burkitt lymphoma arise from a single cell, and this cell passes on its genetic aberrations to its progeny Another demonstration that cancer cells are clonal is their pattern of X-chromosome inactivation As explained earlier in the text (see Chapter 5), female humans are mosaic, with some cells containing an inactivated paternal X chromosome and other cells containing an inactivated maternal X chromosome X-chromosome inactivation occurs early in development and takes place at random All cancer cells within a tumor, both primary and metastatic, within one female individual, contain the same inactivated X chromosome This supports the concept that all the cancer cells in that patient arose from a common ancestral cell ES S E NT I A L PO I N T Cancers are clonal, meaning that all cells within a tumor originate from a single cell that contained a number of mutations The Cancer Stem Cell Hypothesis A concept that is related to the clonal origin of cancer cells is that of the cancer stem cell Many scientists now believe that most of the cells within tumors not proliferate Those that proliferate and give rise to all the cells within the tumor are known as cancer stem cells Stem cells are undifferentiated cells that have the capacity for self-renewal—a process in which the stem cell divides unevenly, creating one daughter cell that goes on to differentiate into a mature cell type and one that remains a stem cell The cancer stem cell hypothesis contrasts the random or stochastic model This model predicts that every cell within a tumor has the potential to form a new tumor Although scientists still actively debate the existence of cancer stem cells, evidence is accumulating that cancer stem cells exist, at least in some tumors Cancer stem cells have been identified in leukemias as well as in solid tumors of the brain, breast, colon, ovary, pancreas, and prostate It is still not clear what fraction of any tumor is composed of cancer stem cells For example, human acute myeloid leukemias contain less than cancer stem cell in 10,000 In contrast, some solid tumors may contain as many as 40 percent cancer stem cells 325 Scientists are also not sure about the origins of cancer stem cells It is possible that they may arise from normal adult stem cells within a tissue, or they may be created from more differentiated somatic cells that acquire properties similar to stem cells after accumulating numerous mutations and changes to chromatin structure Cancer as a Multistep Process, Requiring Multiple Mutations Although we know that cancer is a genetic disease initiated by mutations that lead to uncontrolled cell proliferation and metastasis, a single mutation is not sufficient to transform a normal cell into a tumor-forming, malignant cell If it were sufficient, then cancer would be far more prevalent than it is In humans, mutations occur spontaneously at a rate of about 10-6 mutations per gene, per cell division, mainly due to the intrinsic error rates of DNA replication Because there are approximately 1016 cell divisions in a human body during a lifetime, a person might suffer up to 1010 mutations per gene somewhere in the body, during his or her lifetime However, only about one person in three will suffer from cancer The phenomenon of age-related cancer is another indication that cancer develops from the accumulation of several mutagenic events in a single cell The incidence of most cancers rises exponentially with age If a single mutation were sufficient to convert a normal cell to a malignant one, then cancer incidence would appear to be independent of age The age-related incidence of cancer suggests that many independent mutations, occurring randomly and with a low probability, are necessary before a cell is transformed into a malignant cancer cell Another indication that cancer is a multistep process is the delay that occurs between exposure to carcinogens (cancer-causing agents) and the appearance of the cancer For example, an incubation period of five to eight years separated exposure of people to the radiation of the atomic explosions at Hiroshima and Nagasaki and the onset of leukemias The multistep nature of cancer development is supported by the observation that cancers often develop in progressive steps, beginning with mildly aberrant cells and progressing to cells that are increasingly tumorigenic and malignant Each step in tumorigenesis (the development of a malignant tumor) appears to be the result of two or more genetic alterations that release the cells progressively from the controls that normally operate on proliferation and malignancy This observation suggests that the progressive genetic alterations that create a cancer cell confer selective advantages to the cell and are propagated through cell divisions during the creation of tumors The progressive nature of cancer is illustrated by the development of colorectal cancer Colorectal cancers are 326 16 The Ge ne tic s of C ancer Pathways APC Normal colonic epithelium Patient age (years) PI3K Cell Cycle/Apoptosis Genes TGF-b Kras Small adenoma 30–50 Large adenoma 40–60 Carcinoma 50–70 F I G U RE – Steps in the development of colorectal cancers Some of the genes that acquire driver mutations and cause the progressive development of colorectal cancer are shown at the top of the figure These driver mutations accumulate over time and can take 40 years or more to result in the formation of a malignant tumor known to proceed through several clinical stages that are characterized by the stepwise accumulation of genetic defects in several genes (Figure 16–2) The first step is the conversion of a normal epithelial cell into a small cluster of cells known as an adenoma or polyp This step requires inactivating mutations in the adenomatous polyposis coli (APC) gene, a gene that encodes a protein involved in the normal differentiation of intestinal cells The APC gene is a tumor-suppressor gene, which will be discussed later in the chapter The resulting adenoma grows slowly and is considered benign The second step in the development of colorectal cancer is the acquisition of a second genetic alteration in one of the cells within the small adenoma This is usually a mutation in the Kras gene, a gene whose product is normally involved with regulating cell growth The mutations in Kras that contribute to colorectal cancer cause the Kras protein to become constitutively active, resulting in unregulated cell division The cell containing the APC and Kras mutations grows and expands to form a larger intermediate adenoma of approximately cm in diameter—in a process known as clonal expansion The cells of the original small adenoma (containing the APC mutation) are now vastly outnumbered by cells containing the two mutations The third step, which transforms a large adenoma into a malignant tumor (carcinoma), requires several more waves of clonal expansions triggered by the acquisition of defects in several genes, including p53, PI3K, and TGF-b The products of these genes control several important aspects of normal cell growth and division, such as apoptosis, growth signaling, and cell-cycle regulation—all of which we will discuss in more detail later in the chapter The resulting carcinoma is able to further grow and invade the underlying tissues of the colon A few cells within the carcinoma acquire one or more new mutations that allow them to break free of the tumor, migrate to other parts of the body, and form metastases Driver Mutations and Passenger Mutations Scientists are now applying some of the recent advances in DNA sequencing in order to identify all of the somatic mutations that occur during the development of a cancer cell These studies compare the DNA sequences of genomes from cancer cells and normal cells derived from the same patient Data from these studies are revealing that tens of thousands of somatic mutations can be present in cancer cells Solid tumors such as those of the colon or breast may contain as many as 70 mutated genes Some other cancers, such as lung cancer and melanomas, may contain several hundred mutations Researchers believe that only a handful of mutations in each tumor—called driver mutations—give a growth advantage to a tumor cell The remainder of the mutations may be acquired over time, perhaps as a result of the increased levels of DNA damage that accumulate in cancer cells, but these mutations have no direct contribution to the cancer phenotype These are known as passenger mutations The total number of driver mutations that occur in any particular cancer is small—between and As we will discover in subsequent sections of this chapter, the genes that acquire driver mutations that lead to cancer (called oncogenes and tumor-suppressor genes) are those that control a large number of essential cellular functions We will now investigate these fundamental processes, the genes that control them, and how mutations in these genes may lead to cancer ESSEN T IAL PO IN T The development of cancer is a multistep process, requiring mutations in several cancer-related genes 16.2 327 Cancer Cells Contain Genetic Defects Affecting Genomic Stabilit y, DNA Repair, and Chromatin Modifications 16.2 Cancer Cells Contain Genetic Defects Affecting Genomic Stability, DNA Repair, and Chromatin Modifications Many researchers believe that the fundamental defect in cancer cells is a derangement of the cells’ normal ability to repair DNA damage This loss of genomic integrity leads to a general increase in the mutation rate for every gene in the genome, including those whose products control aspects of cell proliferation, programmed cell death, and metastasis The high level of genomic instability seen in cancer cells is known as the mutator phenotype In addition, recent research has revealed that cancer cells contain aberrations in the types and locations of chromatin modifications, particularly DNA and histone methylation patterns Genomic Instability and Defective DNA Repair Genomic instability in cancer cells is characterized by the presence of gross defects such as translocations, aneuploidy, chromosome loss, DNA amplification, and chromosome deletions Often cancer cells show specific chromosomal defects that are used to diagnose the type and stage of the cancer For example, leukemic white blood cells from patients with chronic myelogenous leukemia (CML) bear a specific translocation, in which the C-ABL gene on chromosome is translocated into the BCR gene on chromosome 22 This translocation creates a structure known as the Philadelphia chromosome (Figure 16–3) The BCRABL fusion gene codes for a chimeric BCR-ABL protein The normal ABL protein is a protein kinase that acts within signal transduction pathways, transferring growth factor signals from the external environment to the nucleus The BCR-ABL protein is an abnormal signal transduction molecule in CML cells, which stimulates these cells to proliferate even in the absence of external growth signals In keeping with the concept of the cancer mutator phenotype, a number of inherited cancers are caused by defects in genes that control DNA repair For example, xeroderma pigmentosum (XP) is a rare hereditary disorder that is characterized by extreme sensitivity to ultraviolet light and other carcinogens Patients with XP often develop skin cancer Cells from patients with XP are defective in nucleotide excision repair, with mutations appearing in any one of seven genes whose products are necessary to carry out DNA repair XP cells are impaired in their ability to repair DNA lesions such as thymine dimers induced by UV light The relationship between XP and genes controlling nucleotide excision repair is also described earlier in the text (see Chapter 14) Normal chromosome Translocation t(9;22) Normal chromosome 22 + q11.2 (BCR) q34.1 (C-ABL) + (BCR) (ABL) Philadelphia chromosome FIGUR E 16–3 A reciprocal translocation involving the long arms of chromosomes and 22 results in the formation of a characteristic chromosome, the Philadelphia chromosome, which is associated with chronic myelogenous leukemia (CML) The t(9;22) translocation results in the fusion of the C-ABL proto-oncogene on chromosome with the BCR gene on chromosome 22 The fusion protein is a powerful hybrid molecule that allows cells to escape control of the cell cycle, contributing to the development of CML Another example is hereditary nonpolyposis colorectal cancer (HNPCC), which is caused by mutations in genes controlling DNA repair HNPCC is an autosomal dominant syndrome, affecting about one in every 200 to 1000 people Patients affected by HNPCC have an increased risk of developing colon, ovary, uterine, and kidney cancers Cells from patients with HNPCC show higher than normal mutation rates and genomic instability At least eight genes are associated with HNPCC, and four of these genes control aspects of DNA mismatch repair Inactivation of any of these four genes—MSH2, MSH6, MLH1, and MLH3—causes a rapid accumulation of genome-wide mutations and the subsequent development of cancers The observation that hereditary defects in genes controlling nucleotide excision repair and DNA mismatch repair lead to high rates of cancer lends support to the idea that the mutator phenotype is a significant contributor to the development of cancer Chromatin Modifications and Cancer Epigenetics The field of cancer epigenetics is providing new perspectives on the genetics of cancer Epigenetics is the study of factors that affect gene expression but that not alter the nucleotide sequence of DNA DNA methylation and histone modifications such as acetylation and phosphorylation are examples of epigenetic modifications The effects of chromatin modifications and epigenetic 328 16 The Ge ne tic s of C ancer factors on gene expression and hereditary disease are discussed in more detail later in the text (see Special Topic Chapter 1—Epigenetics) Cancer cells contain altered DNA methylation patterns Overall, there is much less DNA methylation in cancer cells than in normal cells At the same time, the promoters of some genes are hypermethylated in cancer cells These changes are thought to result in the release of transcription repression over the bulk of genes that would be silent in normal cells—including cancer-causing genes—while at the same time repressing transcription of genes that would regulate normal cellular functions such as DNA repair and cell-cycle control Histone modifications are also disrupted in cancer cells Genes that encode histone acetylases, deacetylases, methyltransferases, and demethylases are often mutated or aberrantly expressed in cancer cells The large numbers of epigenetic abnormalities in tumors have prompted some scientists to speculate that there may be more epigenetic defects in cancer cells than there are gene mutations In addition, because epigenetic modifications are reversible, it may be possible to treat cancers using epigenetic-based therapies ES SEN T I A L PO I N T Cancer cells show high rates of mutation, chromosomal abnormalities, genomic instability, and abnormal patterns of chromatin modifications 16–1 In chronic myelogenous leukemia (CML), leukemic blood cells can be distinguished from other cells of the body by the presence of a functional BCR-ABL hybrid protein Explain how this characteristic provides an opportunity to develop a therapeutic approach to a treatment for CML HINT: This problem asks you to imagine a therapy that is based on the unique genetic characteristics of CML leukemic cells The key to its solution is to remember that the BCR-ABL fusion protein is found only in CML white blood cells and that this unusual protein has a specific function thought to directly contribute to the development of CML To help you answer this problem, you may wish to learn more about the cancer drug Gleevec (see http://www.cancer.gov/cancertopics/druginfo/ imatinibmesylate) 16.3 Cancer Cells Contain Genetic Defects Affecting Cell-Cycle Regulation One of the fundamental aberrations in all cancer cells is a loss of control over cell proliferation Although some cells, such as epidermal cells of the skin or blood-forming cells in the bone marrow, continue to grow and divide throughout an organism’s lifetime, most cells in adult multicellular organisms remain in a nondividing, quiescent, and differentiated state Differentiated cells are those that are specialized for specific functions, such as photoreceptor cells of the retina or muscle cells of the heart Normal regulation over cell proliferation involves a large number of gene products that control steps in the cell cycle, programmed cell death, and the response of cells to external growth signals In cancer cells, many of the genes that control these functions are mutated or aberrantly expressed, leading to uncontrolled cell proliferation In this section, we will review steps in the cell cycle, some of the genes that control the cell cycle, and how these genes, when mutated, lead to cancer The Cell Cycle and Signal Transduction As we learned earlier in the text (see Chapter 2), the cellular events that occur in sequence from one cell division to the next comprise the cell cycle Cells in the G0 phase of the cell cycle can often be stimulated to reenter the cell cycle by external growth signals These signals are delivered to the cell by molecules such as growth factors and hormones that bind to cell-surface receptors, which then relay the signal from the plasma membrane to the cytoplasm The process of transmitting growth signals from the external environment to the cell nucleus is known as signal transduction Ultimately, signal transduction initiates a program of gene expression that propels the cell out of G0 back into the cell cycle Cancer cells often have defects in signal transduction pathways Sometimes, abnormal signal transduction molecules send continuous growth signals to the nucleus even in the absence of external growth signals An example of abnormal signal transduction due to mutations in the ras gene is described in Section 16.4 In addition, malignant cells may not respond to external signals from surrounding cells—signals that would normally inhibit cell proliferation within a mature tissue Cell-Cycle Control and Checkpoints In normal cells, progress through the cell cycle is tightly regulated, and each step must be completed before the next step can begin There are at least three distinct points in the cell cycle at which the cell monitors external signals and internal equilibrium before proceeding to the next stage These are the G1/S, the G2/M, and M checkpoints At the G1/S checkpoint, the cell monitors its size and determines whether its DNA has been damaged If the cell has not achieved an adequate size, or if the DNA has been damaged, further progress through the cell cycle is halted until these conditions are corrected If cell size and DNA integrity are normal, the G1/S checkpoint is traversed, and the cell 16.3 Relative amounts of cyclins C ance r Cell s Conta in G ene tic De fe ct s A ffect ing Cell -C ycle R eg ulat i on D2 A B D1 E G1 S G2 M Phases of the cell cycle F I G U RE – Relative expression times and amounts of cyclins during the 329 of cancer For example, if genes that control the G1/S or G2/M checkpoints are mutated, the cell may continue to grow and divide without repairing DNA damage As these cells continue to divide, they accumulate mutations in genes whose products control cell proliferation or metastasis Similarly, if genes that control progress through the cell cycle, such as those that encode the cyclins, are expressed at the wrong time or at incorrect levels, the cell may grow and divide continuously and may be unable to exit the cell cycle into G0 The result in both cases is that the cell loses control over proliferation and is on its way to becoming cancerous cell cycle Control of Apoptosis proceeds to S phase The second important checkpoint is the G2/M checkpoint, where physiological conditions in the cell are monitored prior to mitosis If DNA replication or repair of any DNA damage has not been completed, the cell cycle arrests until these processes are complete The third major checkpoint occurs during mitosis and is called the M checkpoint At this checkpoint, both the successful formation of the spindle-fiber system and the attachment of spindle fibers to the kinetochores associated with the centromeres are monitored If spindle fibers are not properly formed or attachment is inadequate, mitosis is arrested In addition to regulating the cell cycle at checkpoints, the cell controls progress through the cell cycle by means of two classes of proteins: cyclins and cyclin-dependent kinases (CDKs) The cell accumulates and destroys cyclin proteins in a precise pattern during the cell cycle (Figure 16–4) When a cyclin is present, it binds to a specific CDK, triggering activity of the CDK/cyclin complex The CDK/cyclin complex then selectively phosphorylates and activates other proteins that in turn bring about the changes necessary to advance the cell through the cell cycle For example, in G1 phase, CDK4/cyclin D complexes activate proteins that stimulate transcription of genes whose products (such as DNA polymerase d and DNA ligase) are required for DNA replication during S phase Another CDK/cyclin complex, CDK1/cyclin B, phosphorylates a number of proteins that bring about the events of early mitosis, such as nuclear membrane breakdown, chromosome condensation, and cytoskeletal reorganization Mitosis can only be completed, however, when cyclin B is degraded and the protein phosphorylations characteristic of M phase are reversed Although a large number of different protein kinases exist in cells, only a few are involved in cell-cycle regulation Mutation or misexpression of any of the genes controlling the cell cycle can contribute to the development As already described, if DNA replication, repair, or chromosome assembly is defective, normal cells halt their progress through the cell cycle until the condition is corrected This reduces the number of mutations and chromosomal abnormalities that accumulate in normal proliferating cells However, if DNA or chromosomal damage is so severe that repair is impossible, the cell may initiate a second line of defense—a process called apoptosis, or programmed cell death Apoptosis is a genetically controlled process whereby the cell commits suicide Besides its role in preventing cancer, apoptosis is also initiated during normal multicellular development in order to eliminate certain cells that not contribute to the final adult organism The steps in apoptosis are the same for damaged cells and for cells being eliminated during development: nuclear DNA becomes fragmented, internal cellular structures are disrupted, and the cell dissolves into small spherical structures known as apoptotic bodies In the final step, the apoptotic bodies are engulfed by the immune system’s phagocytic cells A series of proteases called caspases are responsible for initiating apoptosis and for digesting intracellular components By removing damaged cells, programmed cell death reduces the number of mutations that are passed to the next generation, including those in cancer-causing genes Some of the same genes that control cell-cycle checkpoints can trigger apoptosis These genes are mutated in many cancers As a result of the mutation or inactivation of these checkpoint genes, the cell is unable to repair its DNA or undergo apoptosis This inability leads to the accumulation of even more mutations in genes that control growth, division, and metastasis ESSEN T IAL PO IN T Cancer cells have defects in cell-cycle progression, checkpoint controls, and programmed cell death 330 16 The Ge ne tic s of C ancer 16.4 Proto-oncogenes and Tumor- Suppressor Genes Are Altered in Cancer Cells Two general categories of genes are mutated or misexpressed in cancer cells—the proto-oncogenes and the tumor-suppressor genes (Table 16.1) Proto-oncogenes encode transcription factors that stimulate expression of other genes, signal transduction molecules that stimulate cell division, and cell-cycle regulators that move the cell through the cell cycle Their products are important for normal cell functions, especially cell growth and division When normal cells become quiescent and cease division, they repress the expression of most proto-oncogenes or modify the activities of their products In cancer cells, one or more proto-oncogenes are altered in such a way that the activities of their products cannot be regulated in a normal fashion This is sometimes due to mutations that result in an abnormal protein product In other cases, protooncogenes may be overexpressed or expressed at an incorrect time due to mutations within gene-regulatory regions such as enhancer elements or due to alterations in chromatin structure that affect gene expression If a proto-oncogene is continually in an “on” state, its product may constantly stimulate the cell to divide When a proto-oncogene is mutated or abnormally expressed and contributes to the development of cancer, it is known as an oncogene—a cancer-causing gene Oncogenes are proto-oncogenes that have experienced a gain-of-function alteration As a result, only one allele of a proto-oncogene needs to be mutated or misexpressed in order to contribute to cancer Hence, oncogenes confer a dominant cancer phenotype Tumor-suppressor genes are genes whose products normally regulate cell-cycle checkpoints or initiate the TA B L E process of apoptosis In normal cells, proteins encoded by tumor-suppressor genes halt progress through the cell cycle in response to DNA damage or growth-suppression signals from the extracellular environment When tumorsuppressor genes are mutated or inactivated, cells are unable to respond normally to cell-cycle checkpoints, or are unable to undergo programmed cell death if DNA damage is extensive This leads to the accumulation of more mutations and the development of cancer When both alleles of a tumor-suppressor gene are inactivated through mutation or epigenetic modifications, and other changes in the cell keep it growing and dividing, cells may become tumorigenic The following are examples of proto-oncogenes and tumor-suppressor genes that contribute to cancer when mutated or abnormally expressed Approximately 400 oncogenes and tumor-suppressor genes are now known, and more will likely be discovered as cancer research continues The ras Proto-oncogenes Some of the most frequently mutated genes in human tumors are those in the ras gene family These genes are mutated in more than 30 percent of human tumors The ras gene family encodes signal transduction molecules that are associated with the cell membrane and regulate cell growth and division Ras proteins normally transmit signals from the cell membrane to the nucleus, stimulating the cell to divide in response to external growth factors Ras proteins alternate between an inactive (switched off) and an active (switched on) state by binding either guanosine diphosphate (GDP) or guanosine triphosphate (GTP) Mutations that convert the ras proto-oncogene to an oncogene prevent the Ras protein from hydrolyzing GTP to GDP and hence freeze the Ras protein into its “on” conformation, constantly stimulating the cell to divide Some Proto-oncogenes and Tumor-Suppressor Genes Proto-oncogene Normal Function Alteration in Cancer Associated Cancers c-myc Transcription factor, regulates cell cycle, differentiation, apoptosis Translocation, amplification, point mutations Lymphomas, leukemias, lung cancer, many types c-kit Tyrosine kinase, signal transduction Mutation Sarcomas RARa Hormone-dependent transcription factor, differentiation Chromosomal translocations with PML gene, fusion product Acute promyelocytic leukemia Cyclins Bind to CDKs, regulate cell cycle Gene amplification, overexpression Lung, esophagus, many types Tumor-Suppressor Normal Function Alteration in Cancer Associated Cancers RB1 Cell-cycle checkpoints, binds E2F Mutation, deletion, inactivation by viral oncogene products Retinoblastoma, osteosarcoma, many types APC Cell–cell interaction Mutation Colorectal cancers, brain, thyroid p53 Transcription regulation Mutation, deletion, viruses Many types BRCA1, BRCA2 DNA repair Point mutations Breast, ovarian, prostate cancers 16.4 331 Proto-oncoge n e s and Tu mor-Su ppre s s or G ene s A re Alte red in C ancer Ce ll s The p53 Tumor-Suppressor Gene The most frequently mutated gene in human cancers— mutated in more than 50 percent of all cancers —is the p53 gene This gene encodes a transcription factor that represses or stimulates transcription of more than 50 different genes Normally, the p53 protein is continuously synthesized but is rapidly degraded and therefore is present in cells at low levels Several types of cellular stress events bring about rapid increases in the nuclear levels of activated p53 protein These include chemical damage to DNA, doublestranded breaks in DNA induced by ionizing radiation, and the presence of DNA-repair intermediates generated by exposure of cells to ultraviolet light The p53 protein initiates several different responses to DNA damage including cell-cycle arrest followed by DNA repair and apoptosis if DNA cannot be repaired These responses are accomplished by p53 acting as a transcription factor that stimulates or represses the expression of genes involved in each response In normal cells, p53 can arrest the cell cycle at the G1/S and G2/M checkpoints, as well as retard the progression of the cell through S phase It accomplishes this by inhibiting cyclin/CDK complexes and regulating the transcription of other genes involved in these phases of the cell cycle The p53 protein can also instruct a damaged cell to commit suicide by apoptosis It does so by activating the transcription of genes whose products control this process In cancer cells that lack sufficient p53, these gene products are not synthesized and apoptosis may not occur Cells lacking functional p53 are unable to arrest at cell-cycle checkpoints or to enter apoptosis in response to DNA damage As a result, they move unchecked through the cell cycle, regardless of the condition of the cell’s DNA Cells lacking p53 have high mutation rates and accumulate the types of mutations that lead to cancer Because of the importance of the p53 gene to genomic integrity, it is often referred to as the “guardian of the genome.” The RB1 Tumor-Suppressor Gene The loss or mutation of the RB1 (retinoblastoma 1) tumor-suppressor gene contributes to the development of many cancers, including those of the breast, bone, lung, and bladder The RB1 gene was originally identified as a result of studies on retinoblastoma, an inherited disorder in which tumors develop in the eyes of young children Retinoblastoma occurs with a frequency of about in 15,000 individuals In the familial form of the disease, individuals inherit one mutated allele of the RB1 gene and have an 85 percent chance of developing retinoblastomas as well as an increased chance of developing other cancers All somatic cells of patients with hereditary retinoblastoma contain one mutated allele of the RB1 gene However, it is only when the second normal allele of the RB1 gene is lost or mutated in certain retinal cells that retinoblastoma develops In individuals who not have this hereditary condition, retinoblastoma is extremely rare, as it requires at least two separate somatic mutations in a retinal cell in order to inactivate both copies of the RB1 gene The retinoblastoma protein (pRB) is a tumorsuppressor protein that controls the G1/S cell-cycle checkpoint The pRB protein is found in the nuclei of all cell types at all stages of the cell cycle However, its activity varies throughout the cell cycle, depending on its phosphorylation state When cells are in the G0 phase of the cell cycle, the pRB protein is nonphosphorylated and binds to transcription factors such as E2F, inactivating them (Figure 16–5) When the cell is stimulated by growth factors, it enters G1 and approaches S phase Throughout the G1 phase, the pRB protein becomes phosphorylated by the CDK4/cyclin D1 complex Phosphorylated pRB releases its bound regulatory proteins When E2F and other regulators are released by pRB, they are free to induce the expression of over 30 genes whose products are required for the transition from G1 into S phase After cells traverse S, G2, and M phases, pRB reverts to a nonphosphorylated state, binds to regulatory proteins such as E2F, and keeps them sequestered until required for the next cell cycle In normal E2F Inactive transcription factor pRB G1 PO4 CDK4/cyclin D1 complex phosphorylates pRB S pRB E2F PO4 Active transcription factor Gene expression: cell progresses through cell cycle Target gene FIGUR E 16–5 During G0 and early G1, pRB interacts with and inactivates transcription factor E2F As the cell moves from G1 to S phase, a CDK4/cyclin D1 complex forms and adds phosphate groups to pRB As pRB becomes phosphorylated, E2F is released and becomes transcriptionally active, allowing the cell to pass through S phase Phosphorylation of pRB is transitory; as CDK/cyclin complexes are degraded and the cell moves through the cell cycle to early G1, pRB phosphorylation declines, allowing pRB to reassociate with E2F 332 16 The Ge ne tic s of C ancer quiescent cells, the presence of the pRB protein prevents passage into S phase In many cancer cells, including retinoblastoma cells, both copies of the RB1 gene are defective, inactive, or absent, and progression through the cell cycle is not regulated 16–2 People with a genetic condition known as Li–Fraumeni syndrome inherit one mutant copy of the p53 gene These people have a high risk of developing a number of different cancers, such as breast cancer, leukemia, bone cancer, adrenocortical tumors, and brain tumors Explain how mutations in one cancer-related gene can give rise to such a diverse range of tumors HINT: This problem involves an understanding of how tumor- suppressor genes regulate cell growth and behavior The key to its solution is to consider which cellular functions are regulated by the p53 protein and how the absence of p53 could affect each of these functions Also, read about loss of heterozygosity in Section 16.6 16.5 Cancer Cells Metastasize and Invade Other Tissues As discussed at the beginning of this chapter, uncontrolled growth alone is insufficient to create a malignant and lifethreatening cancer Cancer cells must also become malignant, acquiring the ability to disengage from the original tumor site, to enter the blood or lymphatic system, to invade surrounding tissues, and to develop into secondary tumors In order to leave the site of the primary tumor and invade other tissues, tumor cells must dissociate from the primary tumor and secrete proteases that digest components of the extracellular matrix and basal lamina, which normally surround and separate the body’s tissues The extracellular matrix and basal lamina are composed of proteins and carbohydrates They surround and separate body tissues, form the scaffold for tissue growth, and inhibit the migration of cells The ability to invade the extracellular matrix is also a property of some normal cell types For example, implantation of the embryo in the uterine wall during pregnancy requires cell migration across the extracellular matrix In addition, white blood cells reach sites of infection by penetrating capillary walls The mechanisms of invasion are probably similar in these normal cells and in cancer cells The difference is that, in normal cells, the invasive ability is tightly regulated, whereas in tumor cells, this regulation has been lost Once cancer cells have disengaged from the primary tumor and traversed tissue barriers, they enter the blood or lymphatic system and may become lodged in microvessels of other tissues At this point the cells may undergo a second round of invasion to enter the new tissue and grow into new (metastatic) tumors Only a small percentage of circulating cancer cells—about 0.01 percent—survive to establish metastatic tumors Other important features of metastatic cells are increased cell motility, the capacity to stimulate new blood vessel formation, and the ability to escape detection by the host’s immune system Metastasis is controlled by a large number of gene products, including cell-adhesion molecules, cytoskeleton regulators, and proteolytic enzymes For example, epithelial tumors have a lower than normal level of the E-cadherin glycoprotein, which is responsible for cell– cell adhesion in normal tissues Also, proteolytic enzymes such as metalloproteinases are present at higher than normal levels in many highly malignant tumors For example, breast cancer cells that metastasize to bone abnormally express the metalloproteinase gene MMP1 Those that spread to the lungs overexpress the MMP1 and MMP2 genes It has been shown that the level of aggressiveness of a tumor correlates positively with the levels of proteolytic enzymes expressed by the tumor In addition, malignant cells are not susceptible to the normal controls conferred by regulatory molecules such as tissue inhibitors of metalloproteinases (TIMPs) ESSEN T IAL PO IN T The ability of cancer cells to metastasize requires defects in gene products that control a number of functions such as cell adhesion, proteolysis, and tissue invasion 16.6 Predisposition to Some Cancers Can Be Inherited Although the vast majority of human cancers are sporadic, a small fraction (approximately percent) have a hereditary or familial component Some of these inherited forms of cancer are listed in Table 16.2 Most inherited cancer-susceptibility alleles occur in tumor-suppressor genes, and though transmitted in a Mendelian dominant fashion, are not sufficient in themselves to trigger development of a cancer At least one other somatic mutation in the other copy of the gene must occur in order to drive a cell toward tumorigenesis In addition, mutations in still other genes are usually necessary to fully express the cancer phenotype As mentioned earlier, inherited mutations in the RB1 gene predispose individuals to developing various cancers Although the normal somatic cells of these patients are heterozygous for the RB1 mutation, cells within their tumors contain mutations in both copies of the gene The phenomenon whereby the second, wild-type, 16.7 TA B L E Viru s e s and En viro nm ental Age nt s Cont ribu te to Hu man C ancer s 333 Some Inherited Predispositions to Cancer Tumor Predisposition Syndrome Gene Affected Early-onset familial breast cancer Familial adenomatous polyposis Familial melanoma Gorlin syndrome Hereditary nonpolyposis colon cancer Li-Fraumeni syndrome Multiple endocrine neoplasia, type Multiple endocrine neoplasia, type Neurofibromatosis, type Neurofibromatosis, type Retinoblastoma Von Hippel–Lindau syndrome Wilms tumor BRCA1 APC CDKN2 PTCH1 MSH2, p53 MEN1 RET NF1 NF2 RB1 VHL WT1 allele is mutated in a tumor is known as loss of heterozygosity Although loss of heterozygosity is an essential first step in expression of these inherited cancers, further mutations in other proto-oncogenes, tumor-suppressor genes, or chromatin-modifying genes are necessary for the tumor cells to become fully malignant The development of hereditary colon cancer illustrates how inherited mutations in one allele of a gene contribute only one step in the multistep pathway leading to malignancy In Section 16.1, we described how colorectal cancers develop through the accumulation of mutations in several genes, leading to a stepwise clonal expansion of cells and the development of carcinomas Although the vast majority of colorectal cancers are sporadic, about percent of cases result from a genetic predisposition to cancer known as familial adenomatous polyposis (FAP) In FAP, individuals inherit one mutant copy of the APC (adenomatous polyposis) gene located on the long arm of chromosome Mutations include deletions, frameshift, and point mutations The normal function of the APC gene product is to act as a tumor suppressor controlling growth and differentiation The presence of a heterozygous APC mutation causes the epithelial cells of the colon to partially escape cell-cycle control, and the cells divide to form small clusters of cells called polyps or adenomas People who are heterozygous for this condition develop hundreds to thousands of colon and rectal polyps early in life Although it is not necessary for the second allele of the APC gene to be mutated in polyps at this stage, in the majority of cases, the second APC allele becomes mutant in a later stage of cancer development The remaining steps in development of colorectal carcinoma follow the same order as that shown in Figure 16–2 ES S E NT I A L PO I N T Inherited mutations in cancer-susceptibility genes are not sufficient to trigger cancer Other somatic mutations in proto-oncogenes or tumor-suppressor genes are necessary for the development of hereditary cancers 16–3 Although tobacco smoking is responsible for a large number of human cancers, not all smokers develop cancer Similarly, some people who inherit mutations in the tumor-suppressor genes p53 or RB1 never develop cancer Explain these observations H I NT: This problem asks you to consider the reasons why only some people develop cancer as a result of environmental factors or mutations in tumor-suppressor genes The key to its solution is to consider the steps involved in the development of cancer and the number of abnormal functions in cancer cells Also, consider how genetics may affect DNA repair functions 16.7 Viruses and Environmental Agents Contribute to Human Cancers It is thought that, worldwide, about 15 percent of human cancers are associated with viruses, making virus infection the second largest risk factor for cancer, next to tobacco smoking The most significant contributors to virus-induced human cancers are listed in Table 16.3 Like other risk factors for cancer, including hereditary predisposition to certain cancers, virus infection alone is not sufficient to trigger human cancers Other factors, including DNA damage or the accumulation of mutations in one or more of a cell’s oncogenes and tumor-suppressor genes, are required to move a cell down the multistep pathway to cancer In addition to viruses, environmental agents also contribute to the development of cancer Any substance or event that damages DNA has the potential to be carcinogenic TA BLE 16.3 Human Viruses Associated with Cancers Virus Associated Cancers DNA Viruses Epstein-Barr virus EBV Burkitt lymphoma, nasopharyngeal carcinoma, Hodgkin lymphoma Hepatitis B virus HBV Hepatocellular carcinoma Hepatitis C virus HCV Hepatocellular carcinoma, non-Hodgkin lymphoma Human papilloma viruses 16, 18 HPV16, 18 Cervical cancer, anogenital cancers, oral cancers Human T-cell lymphotropic virus type HTLV-1 Adult T-cell leukemia and lymphoma Human immunodeficiency virus type-1 HIV-1 Immune suppression, leading to cancers Retroviruses 334 16 The Ge ne tic s of C ancer Our environment, both natural and human-made, contains abundant carcinogens These include chemicals, radiation, and chronic infections Perhaps the most significant carcinogen in our environment is tobacco smoke, which contains at least 60 chemicals that interact with DNA and cause mutations Epidemiologists estimate that about 30 percent of human cancer deaths are associated with cigarette smoking Smokers have a 20-fold increased risk of developing lung cancer, which kills more than one million people, worldwide, each year Diet is often implicated in the development of cancer Consumption of red meat and animal fat is associated with some cancers, such as colon, prostate, and breast cancer The mechanisms by which these substances may contribute to carcinogenesis may involve stimulation of cell division through hormones or creation of carcinogenic chemicals during cooking Alcohol may cause inflammation of the liver and contribute to liver cancer Although most people perceive the human-made, industrial environment to be a highly significant contributor to cancer, it may account for only a small percentage of total cancers, and only in special situations Some of the most mutagenic agents, and hence potentially the most carcinogenic, are natural substances and natural processes For example, aflatoxin, a component of a mold that grows on peanuts and corn, is one of the most carcinogenic chemicals known Most chemical carcinogens, such as nitrosamines, are components of synthetic substances and are found in some preserved meats; however, many are naturally occurring For example, natural pesticides and antibiotics found in plants may be carcinogenic, and the human body itself creates alkylating agents in the acidic environment of the gut Nevertheless, these observations not diminish the serious cancer risks to specific populations who are exposed to human-made carcinogens such as synthetic pesticides or asbestos DNA lesions brought about by natural radiation (X rays, ultraviolet light), dietary substances, and substances in the external environment contribute the majority of environmentally caused mutations that lead to cancer In addition, normal metabolism creates oxidative end products that can damage DNA, proteins, and lipids It is estimated that the human body suffers about 10,000 damaging DNA lesions per day due to the actions of oxygen free radicals DNA repair enzymes deal successfully with most of this damage; however, some damage may lead to permanent mutations The process of DNA replication itself is mutagenic Hence, substances such as growth factors or hormones that stimulate cell division are ultimately mutagenic and perhaps carcinogenic Chronic inflammation due to infection also stimulates tissue repair and cell division, resulting in DNA lesions accumulating during replication These mutations may persist, particularly if cell-cycle checkpoints are compromised due to mutations or inactivation of tumor-suppressor genes such as p53 or RB1 G E N E T I C S , T E C H N O L O G Y, A N D S O C I E T Y T Breast Cancer: The Double-Edged Sword of Genetic Testing he prospect of using genetics to prevent and cure a wide range of diseases is exciting However, in our enthusiasm, we often forget that these new technologies still have significant limitations and profound ethical complexities The story of genetic testing for breast cancer illustrates how we must temper our high expectations with respect for uncertainty Breast cancer is the most common cancer among women A woman’s lifetime risk of developing breast cancer is about 12 percent Each year, more than 200,000 new cases are diagnosed in the United States Breast cancer is not limited to women; about 1400 men are also diagnosed with the disease each year Approximately to 10 percent of breast cancers are familial, a category defined by the early onset of the disease and the appearance of several cases of breast or ovarian cancer among near blood relatives In 1994, two genes were identified that show linkage to familial breast cancers: BRCA1 and BRCA2 These two genes encode tumorsuppressor proteins that are involved in repairing damaged DNA Women who inherit germ-line mutations in BRCA1 have an approximately 60 percent chance of developing breast cancer and a 39 percent chance of developing ovarian cancer Those who inherit mutations in BRCA2 have an approximately 45 percent chance of developing breast cancer and a 15 percent chance of developing ovarian cancer In men, mutations in these two genes lead to increased risks of both breast and prostate cancers BRCA1 and BRCA2 genetic tests are available, and these detect over 2000 different mutations that are known to occur within the coding regions of these genes Many patients at risk for familial breast cancer opt to undergo genetic testing These patients feel that test results could motivate them to take steps to prevent breast or ovarian cancers, guide them in childbearing decisions, and provide information concerning the risk to close relatives But all these potential benefits are fraught with uncertainties IN SIGH T S AN D SOLUT ION S A woman whose BRCA test results are negative may feel relieved and assume that she is not subject to familial breast cancer However, her risk of developing breast cancer is still 12 percent (the population risk), and she should continue to monitor herself for the disease Also, a negative BRCA genetic test does not eliminate the possibility that she carries an inherited mutation in another gene that increases breast cancer risk or that her BRCA1 or BRCA2 gene mutations exist in regions of the genes that are inaccessible to current genetic tests A woman whose test results are positive faces difficult choices Her treatment options consist of close monitoring, prophylactic mastectomy or oophorectomy (removal of breasts and ovaries, respectively), or taking prophylactic drugs such as tamoxifen Prophylactic surgery reduces her risk but does not eliminate it, as cancers can still occur in tissues that remain after surgery Drugs such as tamoxifen are helpful but have serious side effects Genetic tests also affect the patient’s entire family People often experience fear, anxiety, and guilt on learning that they are carriers of a genetic disease Confidentiality is also a major concern Patients fear that their genetic test results may be leaked to insurance companies or employers, jeopardizing CASE STUDY their prospects for jobs or affordable health and life insurance The unanswered scientific and ethical questions about BRCA1 and BRCA2 genetic testing are many and important As we develop genetic tests for more and more diseases over the next few decades, our struggle with these kinds of questions will continue Your Turn T ake time, individually or in groups, to answer the following questions Investigate the references and links, to help you understand some of the issues that surround genetic testing for breast cancer New genomics research is rapidly identifying genes linked to human diseases, including breast cancer How many genes are now thought to be involved in familial breast cancer? Are genetic tests available to detect mutations in these genes? Search for recent scientific data on breast cancer susceptibility genes by using the PubMed Web site (http://www.ncbi.nlm.nih.gov/ pubmed), as described in the Exploring Genomics feature in Chapter 335 Certain ethnic groups have a higher than average prevalence of mutations in BRCA1 and BRCA2 Recently, Israel proposed a national screening program to detect BRCA1 and BRCA2 mutations among all women in the Ashkenazi Jewish population What would be the scientific and ethical pros and cons of conducting such a wide population screen, rather than restricting genetic testing to people from high-risk families? Read about this topic in the New York Times article at www.nytimes.com/2013/11/27/ health/in-israel-a-push-to-screen-forcancer-gene-leaves-many-conflicted.html If your family was at risk for familial breast cancer, would you opt to take the BRCA1 and BRCA2 genetic tests? What actions would you take if you received a positive test result? How would you feel about such a result? Two helpful sources are: (a) Surbone, A 2001 Ethical implications of genetic testing for breast cancer susceptibility Crit Rev in Onc./Hem 40: 149–157 (b) BRCA1 and BRCA2: Cancer Risk and Genetic Testing National Cancer Institute Fact Sheet http://www.cancer.gov/cancertopics/ factsheet/Risk/BRCA Screening for cancer can save lives A woman who was a heavy smoker visited her doctor for a cervical smear test to screen for cancer When the cytology results came, a large number of abnormal precancerous cells were found on her cervix These were successfully removed under a local anesthetic, using a technique called the large loop excision of the transformation zone (LLETZ), which cut away the precancerous area A follow-up cervical smear test six months later showed only normal cells What might have happened if the woman had not bothered to go for the screening? What can the woman to reduce the risk of cervical and other cancers in the future? What other risk factors are closely linked to cervical cancer? INSIGHTS AND SOLUTIONS In disorders such as retinoblastoma, a mutation in one allele of the RB1 gene can be inherited from the germ line, causing an autosomal dominant predisposition to the development of eye tumors To develop tumors, a somatic mutation in the second copy of the RB1 gene is necessary, indicating that the mutation itself acts as a recessive trait Given that the first mutation can be inherited, in what ways can a second mutational event occur? Solution: In considering how this second mutation arises, we must look at several types of mutational events, including changes in nucleotide sequence and events that involve whole chromosomes or chromosome parts Retinoblastoma results when both copies of the RB1 locus are lost or inactivated With this in mind, you must first list the phenomena that can result in a mutational loss or the inactivation of a gene (continued) 336 16 The Ge ne tic s of C ancer Insights and Solutions—continued One way the second RB1 mutation can occur is by a nucleotide alteration that converts the remaining normal RB1 allele to a mutant form This alteration can occur through a nucleotide substitution or through a frameshift mutation caused by the insertion or deletion of nucleotides during replication A second mechanism involves the loss of the chromosome carrying the normal allele This event would take place during mitosis, resulting in chromosome 13 monosomy and leaving the mutant copy of the gene as the only RB1 allele This mechanism does not necessarily involve loss of the entire chromosome; deletion of the long arm (RB1 is on 13q) or an interstitial deletion involving the RB1 locus and some surrounding material would have the same result Alternatively, a chromosome aberration involving loss of the normal copy of the RB1 gene might be followed by duplication of the chromosome carrying the mutant allele Two copies of chromosome 13 would be restored to the cell, but the normal RB1 allele would not be present Finally, a recombination event followed by chromosome segregation could produce a homozygous combination of mutant RB1 alleles Proto-oncogenes can be converted to oncogenes in a number of different ways In some cases, the proto-oncogene itself becomes amplified up to hundreds of times in a cancer cell An example is the cyclin D1 gene, which is amplified in some cancers In other cases, the proto-oncogene may be mutated in a limited number of specific ways, leading to alterations in the gene product’s structure The ras gene is an example of a proto-oncogene that becomes oncogenic after suffering point mutations in specific regions of the gene Explain why these two proto-oncogenes (cyclin D1 and ras) undergo such different alterations in order to convert them into oncogenes Problems and Discussion Questions HOW DO WE KNOW ? In this chapter, we focused on cancer as a genetic disease In particular, we discussed the relationship between cancer, the cell cycle, and mutations in proto-oncogenes and tumor-suppressor genes Based on your knowledge of these topics, answer several fundamental questions: (a) How we know that malignant tumors arise from a single cell that contains mutations? (b) How we know that cancer development requires more than one mutation? (c) How we know that cancer cells contain defects in DNA repair? CONCEPT QUESTION Review the Chapter Concepts list on page 321 These concepts relate to the multiple ways in which genetic alterations lead to the development of cancers The sixth concept states that DNA methylation and histone modifications contribute to the genetic alterations leading to cancer Write a short essay describing how these changes in cancer cells contribute to the development of cancers What is the relationship between signal transduction and cellular proliferation? Solution: The first step in solving this question is to understand the normal functions of these proto-oncogenes and to think about how either amplification or mutation would affect each of these functions The cyclin D1 protein regulates progression of the cell cycle from G1 into S phase, by binding to CDK4 and activating this kinase The cyclin D1/CDK4 complex phosphorylates a number of proteins including pRB, which in turn activate other proteins in a cascade that results in transcription of genes whose products are necessary for DNA replication in S phase The simplest way to increase the activity of cyclin D1 would be to increase the number of cyclin D1 molecules available for binding to the cell’s endogenous CDK4 molecules This can be accomplished by several mechanisms, including amplification of the cyclin D1 gene In contrast, a point mutation in the cyclin D1 gene would most likely interfere with the ability of the cyclin D1 protein to bind to CDK4; hence, mutations within the gene would probably repress cell-cycle progression rather than stimulate it The ras gene product is a signal transduction protein that operates as an on/off switch in response to external stimulation by growth factors It does so by binding either GTP (the “on” state) or GDP (the “off” state) Oncogenic mutations in the ras gene occur in specific regions that alter the ability of the Ras protein to exchange GDP for GTP Oncogenic Ras proteins are locked in the “on” conformation, bound to GTP In this way, they constantly stimulate the cell to divide An amplification of the ras gene would simply provide more molecules of normal Ras protein, which would still be capable of on/off regulation Hence, simple amplification of ras would less likely be oncogenic Visit for instructor-assigned tutorials and problems How normal cells and cancer cells differ in terms of cell-cycle regulation? Describe kinases and cyclins How they interact to cause cells to move through the cell cycle? What is the role of the retinoblastoma protein in cell-cycle regulation? Is the retinoblastoma gene a tumor-suppressor gene or an oncogene? Can cancer be inherited or infectious? What is apoptosis, and under what circumstances cells undergo this process? Define tumor-suppressor genes Why is a mutation in a single copy of a tumor-suppressor gene expected to behave as a recessive gene? 10 A genetic variant of the retinoblastoma protein, called PSM-RB (phosphorylation site mutated RB), is not able to be phosphorylated by the action of CDK4/cyclin D1 complex Explain why PSM-RB is said to have a constitutive growth-suppressing action on the cell cycle 11 Part of the Ras protein is associated with the plasma membrane, and part extends into the cytoplasm How does the Ras protein transmit a signal from outside the cell into the cytoplasm? What happens in cases where the ras gene is mutated? 12 If a cell suffers damage to its DNA while in S phase, how can this damage be repaired before the cell enters mitosis? 13 Distinguish between oncogenes and proto-oncogenes In what ways can proto-oncogenes be converted to oncogenes? 14 Of the two classes of genes associated with cancer, tumorsuppressor genes and oncogenes, mutations in which group can be considered gain-of-function mutations? In which group are the loss-of-function mutations? Explain 15 How translocations such as the Philadelphia chromosome contribute to cancer? 16 Given that cancers can be environmentally induced and that some environmental factors are the result of lifestyle choices such as smoking, sun exposure, and diet, what percentage of the money spent on cancer research you think should be devoted to research and education on preventing cancer rather than on finding cancer cures? 17 What are the most significant environmental agents that contribute to human cancers? 18 Explain the role of p53 protein in protecting normal cells against cancer With respect to this protein and its function, explain how a normal cell turns cancerous 19 What is loss of heterozygosity and how does this process contribute to the development of cancers? 20 Mention the causative agents of DNA lesions in the human body that can lead to cancer 21 Radiotherapy (treatment with ionizing radiation) is one of the most effective current cancer treatments It works by damaging DNA and other cellular components In which ways could radiotherapy control or cure cancer, and why does radiotherapy often have significant side effects? 22 Genetic tests that detect mutations in the BRCA1 and BRCA2 oncogenes are widely available These tests reveal a number of mutations in these genes—mutations that have been linked to familial breast cancer Assume that a young woman in a suspected breast cancer family takes the BRCA1 and BRCA2 genetic tests and receives negative results That is, she does not test positive for the mutant alleles of BRCA1 or BRCA2 Can she consider herself free of risk for breast cancer? 23 Explain the connection between DNA methylation and cancer 24 While all cancer cells are proliferative, only some become malignant Explain this statement 25 As part of a cancer research project, you have discovered a gene that is mutated in many metastatic tumors After determining the DNA sequence of this gene, you compare the sequence with those of other genes in the human genome sequence database Your gene appears to code for an amino acid sequence that resembles sequences found in some serine proteases Conjecture how your new gene might contribute to the development of highly invasive cancers 337 PROBLEMS AND DISC USSION Q UES T ION S 26 A study by Bose and colleagues (1998 Blood 92: 3362–3367) and a previous study by Biernaux and others (1996 Bone Marrow Transplant 17: (Suppl 3) S45–S47) showed that BCR-ABL fusion gene transcripts can be detected in 25 to 30 percent of healthy adults who not develop chronic myelogenous leukemia (CML) Explain how these individuals can carry a fusion gene that is transcriptionally active and yet not develop CML 27 Those who inherit a mutant allele of the RB1 gene are at risk for developing a bone cancer called osteosarcoma You suspect that in these cases, osteosarcoma requires a mutation in the second RB1 allele, and you have cultured some osteosarcoma cells and obtained a cDNA clone of a normal human RB1 gene A colleague sends you a research paper revealing that a strain of cancer-prone mice develops malignant tumors when injected with osteosarcoma cells, and you obtain these mice Using these three resources, what experiments would you perform to determine (a) whether osteosarcoma cells carry two RB1 mutations, (b) whether osteosarcoma cells produce any pRB protein, and (c) if the addition of a normal RB1 gene will change the cancercausing potential of osteosarcoma cells? 28 The following table shows neutral polymorphisms found in control families (those with no increased frequency of breast and ovarian cancer) Examine the data in the table and answer the following questions: (a) What is meant by a neutral polymorphism? (b) What is the significance of this table in the context of examining a family or population for BRCA1 mutations that predispose an individual to cancer? (c) Is the PM2 polymorphism likely to result in a neutral missense mutation or a silent mutation? (d) Answer part (c) for the PM3 polymorphism Neutral Polymorphisms in BRCA1 Frequency in Control Chromosomes* Name Codon Location Base in Codon† A C G T PM1 317 152 0 10 0 PM6 878 0 55 0 100 PM7 1190 109 0 53 0 PM2 1443 0 115 0 58 PM3 1619 116 0 52 0 * The number of chromosomes with a particular base at the indicated polymorphic site (A, C, G, or T) is shown † Position 1, 2, or of the codon 17 Recombinant DNA Technology CHAPTER CONCEPTS ■■ Recombinant DNA technology creates combinations of DNA sequences from different sources ■■ A common application of recombinant DNA technology is to clone a DNA segment of interest ■■ Specific DNA segments are inserted into vectors to create recombinant DNA molecules that are transferred into eukaryotic or prokaryotic host cells, where the recombinant DNA replicates as the host cells divide ■■ DNA libraries are collections of cloned DNA and were historically used to isolate specific genes ■■ DNA segments can be quickly amplified millions of times using the polymerase chain reaction (PCR) ■■ DNA, RNA, and proteins can be analyzed using a range of molecular techniques ■■ Sequencing reveals the nucleotide composition of DNA, and major improvements in sequencing technologies have rapidly advanced many areas of modern genetics research, particularly genomics ■■ Gene knockout methods and transgenic animals have become invaluable for studying gene function in vivo ■■ Recombinant DNA technology has revolutionized our ability to investigate the genomes of diverse species and has led to the modern revolution in genomics 338 A researcher examines an agarose gel containing separated DNA fragments stained with the DNA-binding dye ethidium bromide and visualized under ultraviolet light R esearchers of the mid- to late 1970s developed various techniques to create, replicate, and analyze recombinant DNA molecules—DNA created by joining together pieces of DNA from different sources The methods used to copy or clone DNA, called recombinant DNA technology and often known as “gene splicing” in the early days, marked a major advance in research in molecular biology and genetics, allowing scientists to isolate and study specific DNA sequences For their contributions to the development of this technology, Daniel Nathans, Hamilton Smith, and Werner Arber were awarded the 1978 Nobel Prize in Physiology or Medicine The power of recombinant DNA technology is astonishing, enabling geneticists to identify and isolate a single gene or DNA segment of interest from a genome Through cloning, large quantities of identical copies of this specific DNA molecule can be produced These identical copies, or clones, can then be manipulated for numerous purposes, including conducting research on the structure and organization of the DNA, studying gene expression, studying protein products to understand their structure and function, and producing important commercial products from the protein encoded by a gene The fundamental techniques involved in recombinant DNA technology subsequently led to the field of genomics, enabling scientists to sequence and analyze entire genomes Note that some of the topics discussed in this chapter are explored in greater depth later in the text (see Special Topic Chapters 3—DNA Forensics, 5—Genetically Modified Foods, and 6—Gene Therapy) In this chapter, we review basic methods of recombinant DNA technology used to isolate, replicate, and analyze DNA 17.1 Re combi nant DNA Te chn olog y Be gan with T wo Key To o l s 17.1 Recombinant DNA Technology Began with Two Key Tools: Restriction Enzymes and DNA Cloning Vectors Although natural genetic processes such as crossing over produce recombined DNA molecules, the term recombinant DNA is generally reserved for molecules produced by artificially joining DNA obtained from different sources We begin our discussion of recombinant DNA technology by considering two important tools used to construct and amplify recombinant DNA molecules: DNA-cutting enzymes called restriction enzymes and DNA cloning vectors The use of restriction enzymes and cloning vectors was largely responsible for advancing the field of molecular biology because a wide range of laboratory techniques are based on recombinant DNA technology Restriction Enzymes Cut DNA at Specific Recognition Sequences Restriction enzymes are produced by bacteria as a defense mechanism against infection by bacteriophage They restrict or prevent viral infection by degrading the DNA of invading viruses More than 3500 restriction enzymes have been identified, and over 250 are commercially produced and available for use by researchers A restriction enzyme recognizes and binds to DNA at a specific nucleotide sequence called a recognition sequence or restriction site (Figure 17–1) The enzyme then cuts both strands of the DNA within that sequence by cleaving the phosphodiester backbone of DNA Scientists commonly refer to this as Enzyme Recognition Sequence “digestion” of DNA The usefulness of restriction enzymes in cloning derives from their ability to accurately and reproducibly cut genomic DNA into fragments Restriction enzymes represent sophisticated molecular scissors for cutting DNA into fragments of desired sizes Restriction sites are present randomly in the genome The actual fragment sizes produced by digestion with a given restriction enzyme vary based on the number and location of sites throughout a DNA sample Recognition sequences exhibit a form of symmetry described as a palindrome: the nucleotide sequence reads the same on both strands of the DNA when read in the 5¿ to 3¿ direction Each restriction enzyme recognizes its particular recognition sequence and cuts the DNA in a characteristic cleavage pattern The most common recognition sequences are four or six nucleotides long, but some contain eight or more nucleotides Enzymes such as EcoRI and HindIII make offset cuts in the DNA strands, thus producing fragments with single-stranded overhanging ends called cohesive ends (or “sticky” ends), while others such as AluI and BalI cut both strands at the same nucleotide pair, producing DNA fragments with double-stranded ends called blunt-end fragments Four common restriction enzymes, their restriction sites, and source microbes are shown in Figure 17–1 One of the first restriction enzymes to be identified was isolated from Escherichia coli strain R and was designated EcoRI DNA fragments produced by EcoRI digestion (Figure 17–2) have cohesive ends because they can basepair with complementary single-stranded ends on other DNA fragments cut using EcoRI When mixed together, single-stranded ends of DNA fragments from different sources DNA Fragments Produced A-A-G-C-T-T A T-T-C-G-A-A T-T-C-G-A A Cohesive Ends G-G-A-T-C-C G C-C-T-A-G-G C-C-T-A-G G Cohesive Ends Source Microbe A-G-C-T-T Haemophilus influenzae Rd HindIII G-A-T-C-C Bacillus amyloliquefaciens H BamHI A-G-C-T A-G C-T T-C-G-A T-C G-A Arthrobacter luteus AluI G-A-T-C Blunt Ends G-A-T-C Staphylococcus aureus 3A Sau3AI C-T-A-G 339 C-T-A-G Cohesive Ends F I G U RE – Common restriction enzymes, with their recognition sequence, DNA cutting patterns, and sources Arrows indicate the location in the DNA cut by each enzyme 17 340 5‘ 3‘ R E COMBI NANT D NA TE CHN OLOG Y G-A-A-T-T-C G-A-A-T-T-C C-T-T-A-A-G C-T-T-A-A-G Cleavage with EcoRI Cleavage with EcoRI G A-A-T-T-C 3‘ G 5‘ Fragments with complementary sticky ends C-T-T-A-A FIGUR E 17–2 DNA from different sources is cleaved with EcoRI and mixed to allow annealing The enzyme DNA ligase forms phosphodiester bonds between these fragments to create an intact recombinant DNA molecule Nick Annealing allows recombinant DNA molecules to form by complementary base pairing The two strands are not covalently bonded as indicated by the nicks in the DNA backbone G A-A-T-T-C C-T-T-A-A G Nick DNA ligase 5‘ G-A-A-T-T-C 3‘ C-T-T-A-A-G 3‘ DNA ligase seals the nicks in the DNA backbone, 5‘ covalently bonding the two strands cut with the same restriction enzyme can anneal, or stick together, by hydrogen bonding of complementary base pairs in single-stranded ends Addition of the enzyme DNA ligase—recall the role of DNA ligase in DNA replication as discussed earlier in the text (see Chapter 10)—to DNA fragments will seal the phosphodiester backbone of DNA to covalently join the fragments together to form recombinant DNA molecules (Figure 17–2) Scientists often use restriction enzymes that create cohesive ends since the overhanging ends make cloning less technically challenging Blunt-end ligation is more technically challenging because it is not facilitated by hydrogen bonding, but a scientist can ligate fragments digested by different blunt-end generating enzymes fragments and replicate these fragments when vectors are introduced into host cells Many different vectors are available for cloning Vectors differ in terms of the host cells they can enter and replicate in and in the size of DNA fragment inserts they can carry, but most DNA vectors have several key properties • A vector contains several restriction sites that allow insertion of the DNA fragments to be cloned • Vectors must be capable of replicating in host cells to allow for independent replication of the vector DNA and any DNA fragment it carries • To distinguish host cells that have taken up vectors from host cells that have not, the vector contains a selectable marker gene (usually an antibiotic resistance gene or a gene that encodes a protein which produces a visible product, such as color or fluorescent light) • Most vectors incorporate specific sequences that allow for sequencing inserted DNA ES SEN T I A L PO I N T Recombinant DNA technology was made possible by the discovery of proteins called restriction enzymes, which cut DNA at specific sequences, producing fragments that can be joined with other DNA fragments to form recombinant DNA molecules DNA Vectors Accept and Replicate DNA Molecules to Be Cloned Bacterial Plasmid Vectors Scientists recognized that DNA fragments produced by restriction-enzyme digestion could be copied or cloned if they had a technique for replicating the fragments The second key tool that allowed DNA cloning was the development of cloning vectors, DNA molecules that accept DNA Genetically modified bacterial plasmids were the first vectors developed, and they are still widely used for cloning Plasmid cloning vectors were derived from naturally occurring plasmids Recall from Chapter that plasmids are extrachromosomal, double-stranded DNA molecules that 17.1 Re combi nant DNA Te chn olog y Be gan with T wo Key To o l s replicate independently from the chromosomes within bacterial cells [Figure 17–3(a)] Plasmids have been extensively modified by genetic engineering to serve as cloning vectors Many commercially prepared plasmids are readily available with a range of useful features [Figure 17–3(b)] Plasmids are introduced into bacteria by the process of transformation (see Chapter 8) Two main techniques are widely used for bacterial transformation One approach involves treating cells with calcium ions and using a brief heat shock to pulse DNA into cells The other technique, called electroporation, uses a brief, but high-intensity, pulse of electricity to move DNA into bacterial cells Only one or a few plasmids generally enter a bacterial host cell by transformation Because plasmids have an origin of replication (ori) that allows for plasmid replication, many plasmids can increase their copy number to produce several hundred copies in a single host cell These plasmids greatly enhance the number of DNA clones that can be produced Plasmid vectors (a) (b) HindIII SphI PstI SalI DNA sequencing AccI HincII primer site XbaI BamHI Multiple SmaI XmaI cloning site KpnI BanII Ampicillin lacZ SstI resistance gene EcoRI gene (ampR) DNA sequencing primer site Origin of replication (ori) F I G U RE – (a) A color-enhanced electron micrograph of plasmids isolated from E coli (b) A diagram of a typical DNA cloning plasmid 341 have also been genetically engineered to contain a number of restriction sites for commonly used restriction enzymes in a region called the multiple cloning site Multiple cloning sites allow scientists to clone a range of different fragments generated by many commonly used restriction enzymes Cloning DNA with a plasmid generally begins by cutting both the plasmid DNA and the DNA to be cloned with the same restriction enzyme (Figure 17–4) Typically, the plasmid is cut once within the multiple cloning site to produce a linear vector DNA restriction fragments from the DNA to be cloned are added to the linearized vector in the presence of DNA ligase Sticky ends of DNA fragments anneal, joining the DNA to be cloned and the plasmid DNA ligase is then used to create phosphodiester bonds to seal nicks in the DNA backbone, thus producing recombinant DNA, which is then introduced into bacterial host cells by transformation Once inside the cell, plasmids replicate quickly to produce multiple copies However, when cloning DNA using plasmids, not all plasmids will incorporate DNA to be cloned For example, a plasmid cut with a particular restriction enzyme can close back on itself (self-ligation) if cut ends of the plasmid rejoin Obviously then, such nonrecombinant plasmids are not desired Also, during transformation, not all host cells will take up plasmids Therefore it is important that bacterial cells containing recombinant DNA can be readily identified in a cloning experiment One way this is accomplished is through the use of selectable marker genes described earlier Genes that provide resistance to antibiotics such as ampicillin and genes such as the lacZ gene are very effective selectable marker genes Figure 17–5 provides an example of how these genes can be used to select for and identify bacteria containing recombinant plasmids This process is referred to as “blue-white” screening for a reason that will soon become obvious In blue-white screening a plasmid is used that contains the lacZ gene incorporated into the multiple cloning site The lacZ gene encodes the enzyme b-galactosidase, which, as you learned earlier in the text (see Chapter 15), is used to cleave the disaccharide lactose into its component monosaccharides glucose and galactose Blue-white screening takes advantage of the enzymatic activity of b-galactosidase Using this approach, one can easily identify transformed bacterial cells containing recombinant or nonrecombinant plasmids If a DNA fragment is inserted anywhere in the multiple cloning site, the lacZ gene is disrupted and will not produce functional copies of b-galactosidase Transformed bacteria in this experiment are plated on agar plates that contain an antibiotic—ampicillin in this case Nontransformed bacteria cannot grow well on these plates because they not have the amp R gene and so the ampicillin kills these cells These agar plates also contain a substance called X-gal (technically 5-bromo-4-chloro-3-indolylb-D-galactopyranoside) X-gal is similar to lactose in structure It is a substrate for b-galactosidase, and when it is 342 17 R E COMBI NANT D NA TE CHN OLOG Y Host-cell chromosome DNA to be cloned is cut with the same restriction enzyme Plasmid vector is removed from bacterial cell and cut with a restriction enzyme The two DNAs are ligated to form a recombinant molecule Introduction into bacterial host cells by transformation Cells carrying recombinant plasmids can be selected by plating on medium containing antibiotics and color indicators such as X-gal (refer to Figure 17-5) F I G U RE – Cloning with a plasmid vector involves cutting both plasmid and the DNA to be cloned with the same restriction enzyme The DNA to be cloned is ligated into the vector and transferred to a bacterial host for replication Bacterial cells carrying plasmids with DNA inserts can be identified by selection and then isolated The cloned DNA is then recovered from the bacterial host for further analysis cleaved by b-galactosidase it turns blue As a result, bacterial cells carrying nonrecombinant plasmids (those that have self-ligated and thus not contain inserted DNA) have a functional lacZ gene and produce b-galactosidase, which cleaves X-gal in the medium, and these cells turn blue However, recombinant bacteria with plasmids containing an inserted DNA fragment will form white colonies when they grow on X-gal medium because the plasmids in these cells are not producing functional b-galactosidase (Figure 17–5) Bacteria in these white colonies are clones of each other—genetically identical cells with copies of recombinant plasmids White colonies can be transferred to flasks of bacterial culture broth and grown in large quantities, after which it is relatively easy to isolate and purify recombinant plasmids from these cells Plasmids are still the workhorses for many applications of recombinant DNA technology, but they have a major limitation: because they are small, they can only accept inserted pieces of DNA up to about 25 kilobases (kb) in size, and most plasmids can often only accept substantially smaller pieces Therefore, as recombinant DNA technology has developed and it has become desirable to clone large pieces of DNA, other vectors have been developed primarily for their ability to accept larger pieces of DNA and because they can be used with other types of host cells beside bacteria Other Types of Cloning Vectors Phage vector systems were among the earliest vectors used in addition to plasmids These included genetically modified strains of bacteriophage l Phage vectors were popular for quite some time because they can carry inserts up to 45 kb, more than twice as long as DNA inserts in most plasmid vectors DNA fragments are ligated into the phage vector to produce recombinant l vectors that are subsequently packaged into phage protein heads in vitro and introduced into bacterial host cells growing on petri plates Inside the bacteria, the vectors replicate and form many copies of infective phage, each of which carries a DNA insert As they reproduce, they lyse their bacterial host cells, forming the clear spots known as plaques (described in Chapter 8), 17.1 lacZ gene Multiple cloning site Re combi nant DNA Te chn olog y Be gan with T wo Key To o l s Ampicillin resistance (ampR) Multiple cloning site of plasmid is cut with restriction enzyme 3‘ lacZ 5‘ gene 5‘ 3‘ lacZ gene DNA to be cloned cut with same restriction enzyme 3‘ 5‘ DNA ligase joins together plasmid DNA and DNA to be cloned to create recombinant plasmid lacZ gene DNA insertion disrupts lacZ gene lacZ gene Transform bacteria with plasmids Grow cells on media with ampicillin and X-gal Bacteria with nonrecombinant plasmids are blue Bacteria with recombinant plasmids are white from which phage can be isolated and the cloned DNA can be recovered Bacterial artificial chromosomes (BACs) and yeast artificial chromosomes (YACs) are two other examples 343 FIGUR E 17–5 In blue-white screening, DNA inserted into multiple cloning site of a plasmid disrupts the lacZ gene so that bacteria containing recombinant DNA are unable to metabolize X-gal, resulting in white colonies that allow direct identification of bacterial colonies carrying cloned DNA inserts Photo of a petri dish showing the growth of bacterial cells after uptake of recombinant plasmids Cells in blue colonies contain vectors without cloned DNA inserts, whereas cells in white colonies contain vectors carrying DNA inserts of vectors that can be used to clone large fragments of DNA For example, the mapping and analysis of large eukaryotic genomes such as the human genome required cloning vectors that could carry very large DNA fragments such as segments of an entire chromosome BACs are essentially very large but low copy number (typically one or two copies/bacterial cell) plasmids that can accept DNA inserts in the 100- to 300-kb range Like natural chromosomes, a YAC has telomeres at each end, origins of replication, and a centromere Yeast chromosomes range in size from 230 kb to over 1900 kb, making it possible to clone DNA inserts from 100 to 1000 kb in YACs Unlike the vectors described so far, expression vectors are designed to ensure mRNA expression of a cloned gene with the purpose of producing many copies of the gene’s encoded protein in a host cell Expression vectors are available for prokaryotic and eukaryotic host cells and contain the appropriate sequences to initiate both transcription and translation of the cloned gene For many research applications that involve studies of protein structure and function, producing a recombinant protein in bacteria (or other host cells) and purifying the protein is a routine approach, although it is not always easy to properly express a protein that maintains its biological function The biotechnology industry also relies heavily on expression vectors to produce commercially valuable protein products from cloned genes, a topic we will discuss later in the text (see Chapter 19) Introducing genes into plants is a common application that can be done in many ways, and we will discuss aspects of genetic engineering of food plants later in the text (see Special Topic Chapter 5—Genetically Modified Foods) One widely used approach to insert genes into plant cells involves the soil bacterium Rhizobium radiobacter, which infects plant cells and produces tumors (called crown galls) in many species of plants Formerly Agrobacterium tumefaciens, this bacterium was renamed based on genomic analysis Rhizobium contains a plasmid called the Ti plasmid (tumor-inducing) Restriction sites in Ti plasmids can be used to insert foreign DNA, and recombinant vectors are introduced into Rhizobium by transformation Tumor-inducing genes from Ti plasmids are removed from the vector so that the recombinant vector does not result in tumor production Rhizobium containing recombinant DNA is mixed with plant cells (not all types of plant cells can be infected by Rhizobium) 17 344 R E COMBI NANT D NA TE CHN OLOG Y Once inside the cell, the plasmid is integrated into a chromosome of the host cell Plant cells carrying a recombinant Ti plasmid can be grown in tissue culture The presence of certain compounds in the culture medium in which plant cells are grown stimulates the formation of roots and shoots, and eventually a mature plant carrying a foreign gene ES SEN T I A L PO I N T Vectors replicate autonomously in host cells and facilitate the cloning and manipulation of newly created recombinant DNA molecules 17–1 An ampicillin-resistant, tetracycline-resistant plasmid, pBR322, is cleaved with PstI, which cleaves within the ampicillin resistance gene The cut plasmid is ligated with PstI-digested Drosophila DNA to prepare a genomic library, and the mixture is used to transform E coli K12 (a) Which antibiotic should be added to the medium to select cells that have incorporated a plasmid? (b) If recombinant cells were plated on medium containing ampicillin or tetracycline and medium with both antibiotics, on which plates would you expect to see growth of bacteria containing plasmids with Drosophila DNA inserts? (c) How can you explain the presence of colonies that are resistant to both antibiotics? BamHI PvuII PstI Ampicillin resistance gene Tetracycline resistance gene SalI pBR322 HINT: This problem involves an understanding of antibiotic selectable marker genes in plasmids and antibiotic DNA selection for identifying bacteria transformed with recombinant plasmid DNA The key to its solution is to recognize that inserting foreign DNA into the plasmid vector disrupts one of the antibiotic resistance genes in the plasmid 17.2 DNA Libraries Are Collections of Cloned Sequences Only relatively small DNA segments—representing just a single gene or even a portion of a gene—are produced by cloning DNA into vectors, particularly plasmids In the cloning discussions we have had so far, we have described how DNA can be inserted into vectors and cloned—a relatively straightforward process—but we have not discussed how one knows what particular DNA sequence they have cloned Simply cutting DNA and inserting into vectors does not tell you what gene or sequences have been cloned During the first several decades of DNA cloning, scientists created DNA libraries, which represent a collection of cloned DNA Depending on how a library is constructed, it may contain genes and noncoding regions of DNA Generally, there are two main types of libraries, genomic DNA libraries and complementary DNA (cDNA) libraries Genomic Libraries Ideally, a genomic library consists of many overlapping fragments of the genome, with at least one copy of every DNA sequence in an organism’s genome, which in summary span the entire genome In making a genomic library, DNA is extracted from cells or tissues and cut randomly with restriction enzymes, and the resulting fragments are inserted into vectors using techniques that we discussed in the previous section Since some vectors (such as plasmids) can carry only a few thousand base pairs of inserted DNA, selecting the vector so that the library contains the whole genome in the smallest number of clones is an important consideration Because genomic DNA is the foreign DNA introduced into vectors, genomic libraries contain coding and noncoding segments of DNA such as introns, and vectors in the library may contain more than one gene or only a portion of a gene As you will learn later in the text (see Chapter 18), wholegenome shotgun cloning approaches (see Figure 18–1) and new sequencing methodologies are readily replacing traditional genomic DNA libraries because they effectively allow one to sequence an entire genomic DNA sample without the need for inserting DNA fragments into vectors and cloning them in host cells Later in the text (see Chapter 18), we will also consider how DNA sequence analysis using bioinformatics allows one to identify protein-coding and noncoding sequences in cloned DNA Complementary DNA (cDNA) Libraries Complementary DNA (cDNA) libraries offer certain advantages over genomic libraries and continue to be a useful methodology for gene cloning This is primarily because a cDNA library contains DNA copies which are made from mRNA molecules isolated from cultured cells or a tissue sample cDNA is complementary to the nucleotide sequence of the mRNA, and so unlike a genomic library, which contains all of the DNA in a genome—gene coding and noncoding sequences—a cDNA library contains only expressed genes As a result, cDNA libraries 17.2 DNA L ib rar ie s A re Colle c tio ns of Clo ne d S eq u e nces Poly-A tail mRNA 5‘ Add oligo(dT) primer AAAAA 3‘ TTTTT AAAAA 3‘ 5‘ TTTTT 5‘ Add reverse transcriptase to synthesize a strand of cDNA AAAAA 3‘ 5‘ 3‘ TTTTT 5‘ AAAAA 3‘ 5‘ 3‘ Partially digest RNA with RNase H Double-stranded DNA/RNA duplex TTTTT 5‘ 5‘ 3‘ 3‘ TTTTT 5‘ Add DNA polymerase I to synthesize second strand of DNA 5‘ AAAAA 3‘ 3‘ TTTTT 5‘ Add DNA ligase to seal gaps AAAAA 3‘ 5‘ 3‘ Double-stranded cDNA TTTTT 5‘ F I G U RE – Producing cDNA from mRNA Because most eukaryotic mRNAs have a poly-A tail at the 3¿ end, a short oligo(dT) molecule annealed to this tail serves as a primer for the enzyme reverse transcriptase The enzyme reverse transcriptase uses the mRNA as a template to synthesize a complementary DNA strand (cDNA) and forms an mRNA/cDNA double-stranded duplex The mRNA is digested with the enzyme RNAse H, producing gaps in the RNA strand The 3¿ ends of the remaining RNA serve as primers for DNA polymerase I, which synthesizes a second DNA strand The result is a double-stranded cDNA molecule that can be cloned into a suitable vector have been particularly useful for identifying and studying genes expressed in certain cells or tissues under certain conditions: for example, during development, cell death, cancer, and other biological processes One can also use these libraries to compare expressed genes from normal tissues and diseased tissues For instance, this approach has been used to identify genes involved in cancer formation, such as those genes that contribute to progression from a normal cell to a cancer cell and genes involved in cancer cell metastasis (spreading) 345 Preparation of a cDNA library is shown in Figure 17–6 These libraries provide a snapshot of the genes that were transcriptionally active in a tissue at a particular time because the relative amount of cDNA in a particular library is equivalent to the amount of starting mRNA isolated from the tissue and used to make the library Because cDNA libraries provide a catalog of all the genes active in a cell at a specific time, they have been very valuable tools for scientists isolating and studying genes in particular tissues Specific Genes Can Be Recovered from a Library by Screening Genomic and cDNA libraries often consist of several hundred thousand different DNA clones, much like a large book library may have many books but only a few of interest to your studies in genetics So how can libraries be used to locate a specific gene of interest in a library? To find a specific gene, we need to identify and isolate only the clone or clones containing that gene We must also determine whether a given clone contains all or only part of the gene we are studying Several methods allow us to sort through a library and isolate specific genes of interest, and this approach is called library screening The choice of method often depends on available information about the gene being sought The process for screening a library with a probe is shown in Figure 17–7 Often, probes are used to screen a library to recover clones of a specific gene A probe is any DNA or RNA sequence that is complementary to some part of a cloned sequence present in the library—the target gene or sequence to be identified A probe must be labeled or tagged in different ways so that it can be identified When a probe is used in a hybridization reaction, the probe binds to any complementary DNA sequences present in one or more clones Probes can be labeled with radioactive isotopes, or increasingly these days probes are labeled with nonradioactive compounds that undergo chemical or color reactions to indicate the location of a specific clone in a library Probes are derived from a variety of sources—often related genes isolated from another species can be used if enough of the DNA sequence is conserved For example, genes from rats, mice, or even Drosophila that have conserved sequence similarity to human genes can be used as probes to identify human genes during library screening ESSEN T IAL POIN T DNA libraries are collections of cloned DNA that can be screened to identify and isolate specific sequences of interest As we have discussed here, libraries enable scientists to clone DNA and then identify individual genes in the library 346 17 R E COMBI NANT D NA TE CHN OLOG Y Colonies of the library are overlaid with a DNA-binding membrane such as nylon Colonies transferred to membrane Colonies are transferred to membrane, then lysed, and DNA is denatured Membrane is placed in a heat-sealed bag with a solution containing the labeled probe; the probe hybridizes with denatured DNA from colonies Membrane is rinsed to remove excess probe, then dried; X-ray film is placed over the filter for autoradiography or chemiluminescence signal from the probe recorded with a digital camera Film Using the original plate, cells are picked from the colony that hybridized to the probe Hybridization of the probe to one colony from the original plate is indicated by a spot on the X-ray film or fluorescent spot on a digital image Cells are transferred to a medium for growth and further analysis F I G U RE – Screening a library to recover a specific gene The library, present in bacteria on petri plates, is overlaid with a DNAbinding membrane, and colonies are transferred to the membrane Colonies on the membrane are lysed, and the DNA is denatured to single strands The membrane is placed in a hybridization bag along with buffer and a labeled single-stranded DNA probe During incubation, the probe forms a double-stranded hybrid with any complementary sequences on the membrane The membrane is removed from the bag and washed to remove excess probe Hybrids are detected by placing a piece of X-ray film over the membrane and exposing it for a short time or by chemiluminescence detection and image capture using a digital camera Colonies containing the cloned DNA that hybridized to the probe are identified from the orientation of the spots Cells are picked from this colony for growth and further analysis 17.3 The Polyme ras e Chain Reac tion Is a Powerfu l Techn iqu e f or Co p y ing D NA Cloning DNA from libraries is still a technique with valuable applications However, as you will learn later in the text (see Chapter 18), the basic methods of recombinant DNA technology were the foundation for the development of powerful techniques for whole-genome cloning and sequencing, which led to the genomics era of modern genetics and molecular biology Genomic techniques, in which entire genomes are being sequenced without creating libraries, are replacing many traditional recombinant DNA approaches that cloned or identified individual or a few genes at a time 17.3 The Polymerase Chain Reaction Is a Powerful Technique for Copying DNA Cloning DNA using vectors and host cells is labor intensive and time consuming In 1986, another technique, called the polymerase chain reaction (PCR), became available to the scientific community This advance revolutionized recombinant DNA methodology and further accelerated the pace of biological research The significance of this method was underscored by the awarding of the 1993 Nobel Prize in Chemistry to Kary Mullis, who developed the technique PCR is a rapid method of DNA cloning that extends the power of recombinant DNA research and in many cases eliminates the need to use host cells for cloning PCR is also a method of choice for many applications, whether in molecular biology, human genetics, evolution, development, conservation, or forensics By copying a specific DNA sequence through a series of in vitro reactions, PCR can amplify target DNA sequences that are initially present in very small quantities in a population of other DNA molecules When using PCR to clone DNA, double-stranded target DNA to be amplified is placed in a tube with DNA polymerase, Mg2+ (as an important cofactor for DNA polymerase), and the four deoxyribonucleoside triphosphates In addition, some information about the nucleotide sequence of the target DNA is required This sequence information is used to synthesize two oligonucleotide primers: short (typically about 20 nt long) single-stranded DNA sequences, one complementary to the 5¿ end of one strand of target DNA to be amplified and another primer complementary to the opposing strand of target DNA at its 3¿ end When added to a sample of double-stranded DNA that has been denatured into single strands, the primers bind to complementary nucleotides flanking the sequence to be cloned DNA polymerase can then extend the 3¿ end of each primer to synthesize second strands of the target DNA Therefore, one complete reaction process, called a cycle, doubles the number of DNA molecules in the reaction (Figure 17–8) Repetition of the process produces large numbers of copied DNA very quickly If desired, the PCR products can be cloned into plasmid vectors for further use 347 Most routine PCR applications involve a series of three reaction steps in a cycle These three steps are as follows: Denaturation: The double-stranded DNA to be amplified is denatured into single strands by heating to 92−95°C for about minute The DNA can come from many sources, including genomic DNA, mummified remains, fossils, or forensic samples such as blood, semen, or hair Hybridization/Annealing: The temperature of the reaction is lowered to a temperature between 45°C and 65°C, which causes primer binding, also called hybridization or annealing, to the denatured, single-stranded DNA The primers serve as starting points for DNA polymerase to synthesize new DNA strands complementary to the target DNA Factors such as primer length, base composition of primers (GC-rich primers are more thermally stable than AT-rich primers), and whether or not all bases in a primer are complementary to bases in the target sequence are among primary considerations when selecting a hybridization temperature for an experiment Extension: The reaction temperature is adjusted to between 65°C and 75°C, and DNA polymerase uses the primers as a starting point to synthesize new DNA strands by adding nucleotides to the ends of the primers in a 5¿ to 3¿ direction PCR is a chain reaction because the number of new DNA strands is doubled in each cycle, and the new strands, along with the old strands, serve as templates in the next cycle Each cycle takes to minutes and can be repeated immediately, so that in less than hours, 25 to 30 cycles result in over a million-fold increase in the amount of DNA (Figure 17–8) This process is automated by instruments called thermocyclers, or simply PCR machines, that can be programmed to carry out a predetermined number of cycles A key requirement for PCR is the type of DNA polymerase used in PCR reactions Multiple PCR cycles involve repetitive heating and cooling of samples, which eventually lead to heat denaturation and loss of activity of most proteins PCR reactions rely on thermostable forms of DNA polymerase capable of withstanding multiple heating and cooling cycles without significant loss of activity PCR became a major tool when DNA polymerase was isolated from Thermus aquaticus, a bacterium living in the hot springs of Yellowstone National Park Called Taq Polymerase, this enzyme is capable of tolerating extreme temperature changes and was the first thermostable polymerase used for PCR PCR-based DNA cloning has several advantages over library cloning approaches PCR is rapid and can be carried out in a few hours, rather than the days required for making and screening DNA libraries PCR is also very sensitive and amplifies specific DNA sequences from vanishingly small DNA samples, including the DNA in a single cell This 17 348 R E COMBI NANT D NA TE CHN OLOG Y Cycle 5‘ 3‘ 3‘ DNA to be 5‘ amplified 5‘ 3‘ Denature DNA (92–95°C) 3‘ 5‘ 5‘ 3‘ Anneal primers 3‘ 5‘ Primers 3‘ (45–65°C) 5‘ 3‘ 5‘ 5‘ 3‘ 3‘ 3‘ 5‘ 3‘ 5‘ 5‘ Extend primers (65–75°C; product of the first cycle doubles the number of DNA molecules) Cycle 3‘ 5‘ 5‘ 3‘ 3‘ 3‘ 5‘ 3‘ 5‘ 5‘ 3‘ 3‘ 5‘ 3‘ 5‘ 5‘ Repeat Steps and Denature strands and anneal primers 5‘ 3‘ 5‘ 3‘ 3‘ 5‘ 5‘ 3‘ 5‘ 3‘ 5‘ 3‘ 3‘ 5‘ 3‘ 5‘ 20 cycles increases DNA copies by ~106 Repeat Step Extend primers (product of the second cycle is four new DNA molecules) F I G U RE – In the polymerase chain reaction (PCR), the target DNA is denatured into single strands; each strand is then annealed to short, complementary primers DNA polymerase extends the primers in the 5¿ to 3¿ direction, using the single-stranded DNA as a template The result after one round of replication is a doubling of DNA molecules to create two newly synthesized double-stranded DNA molecules Repeated cycles of PCR can quickly amplify the original DNA sequence more than a millionfold Note: shown here is a relatively short sequence of DNA being amplified Typically, much longer segments of DNA are used for PCR, and the primers bind somewhere within the DNA molecule and not so close to the end of the actual molecule feature of PCR is invaluable in several kinds of applications, including genetic testing, forensics, and molecular paleontology With carefully designed primers, DNA samples that have been partially degraded, contaminated with other materials, or embedded in a matrix (such as amber) can be recovered and amplified, when conventional cloning would be difficult or impossible A wide variety of PCR-based techniques involve different variations of the basic technique described here of the target DNA must be known in order to synthesize primers In addition, even minor contamination of the sample with DNA from other sources can cause problems For example, cells shed from a researcher’s skin can contaminate samples gathered from a crime scene, making it difficult to obtain accurate results PCR reactions must always be performed with carefully designed and appropriate controls Also, PCR typically cannot amplify particularly long segments of DNA DNA polymerase in a PCR reaction only extends primers for relatively short distances and does not continue processively until it reaches the other end of long template strands of DNA Because of this characteristic, With deletions to text noted above, top Limitations of PCR Although PCR is a valuable technique, it does have limitations: some information about the nucleotide sequence of the page should begin with Applications of PCR scientists often use PCR to amplify pieces of DNA that are several hundred to several thousand nucleotides in length, which is fine for most routine applications Applications of PCR The PCR has been one of the most widely used techniques in genetics and molecular biology for over 20 years PCR and its variations have many other applications as well In short, PCR is one of the most versatile techniques in modern genetics As you will learn later in the text (see Chapter 19), genespecific primers provide a way of using PCR for screening mutations involved in genetic disorders, allowing the location and nature of a mutation to be determined quickly Primers can be designed to distinguish between target sequences that differ by only a single nucleotide This makes it possible to synthesize allele-specific probes for genetic testing; thus PCR is important for diagnosing genetic disorders PCR is also a key diagnostic methodology for detecting bacteria and viruses (such as hepatitis or HIV) in humans, and pathogenic bacteria such as E coli and Staphylococcus aureus in contaminated food PCR techniques are particularly advantageous when studying samples from single cells, fossils, or a crime scene, where a single hair or even a saliva-moistened postage stamp is the source of the DNA Later in the text (see Special Topic Chapter 3—DNA Forensics), we will discuss how PCR is used in human identification, including remains identification, and in forensic applications Using PCR, researchers can also explore uncharacterized DNA regions adjacent to known regions and even sequence DNA Reverse transcription PCR (RT-PCR) is a powerful methodology for studying gene expression, that is, mRNA production by cells or tissues In RT-PCR, RNA is isolated from cells or tissues to be studied, and reverse transcriptase is used to generate double-stranded cDNA molecules, as described earlier when we discussed preparation of cDNA libraries This reaction is followed by PCR to amplify cDNA with a set of primers specific for the gene of interest Amplified cDNA fragments are then separated and visualized on an agarose gel Because the amount of amplified cDNA in RT-PCR is based on the relative number of mRNA molecules in the starting reaction, RT-PCR can be used to evaluate relative levels of gene expression in different samples The amplified cDNA can be inserted into plasmid vectors, which are replicated to produce a cDNA library RT-PCR is more sensitive than conventional cDNA preparation and is a powerful tool for identifying mRNAs that may be present in only one or two copies per cell Finally, in discussing PCR approaches, one of the most valuable modern PCR techniques involves a method called 17.4 Mole c u l ar Te chn iqu es f or Ana ly zing D NA 349 quantitative real-time PCR (qPCR) or simply real-time PCR This approach makes it possible to determine the amount of PCR product made during an experiment, which enables researchers to quantify amplification reactions as they occur in “real time” without having to run a gel ESSEN T IAL PO IN T PCR allows DNA to be amplified, or copied, without cloning and is a rapid and sensitive method with wide-ranging applications 17–2 You have just created the world’s first genomic library from the African okapi, a relative of the giraffe No genes from this genome have been previously isolated or described You wish to isolate the gene encoding the oxygen-transporting protein b-globin from the okapi library This gene has been isolated from humans, and its nucleotide sequence and amino acid sequence are available in databases Using the information available about the human b-globin gene, what two strategies can you use to isolate this gene from the okapi library? H I NT: This problem asks you to design PCR primers to amplify the b-globin gene from a species whose genome you just sequenced The key to its solution is to remember that you have at your disposal sequence data for the human b-globin gene and consider that PCR experiments require the use of primers that bind to complementary bases in the DNA to be amplified For more practice, see Problems 15 and 16 17.4 Molecular Techniques for Analyzing DNA In addition to cloning and PCR methods, a wide range of molecular techniques are available to geneticists, molecular biologists, and almost anyone who does research involving DNA and RNA, particularly those who study the structure, expression, and regulation of genes In the following sections, we consider some of the most commonly used molecular methods that provide information about the organization and function of cloned sequences Throughout later sections of the text you will see these and other techniques discussed in the context of certain applications in modern genetics Restriction Mapping Historically, one of the first steps in characterizing a DNA clone was the construction of a restriction map 350 17 R E COMBI NANT D NA TE CHN OLOG Y A restriction map establishes the number of, order of, and distances between restriction-enzyme cleavage sites along a cloned segment of DNA, thus providing information about the length of the cloned insert and the location of restriction-enzyme cleavage sites within the clone The data the maps provide can be used to reclone fragments of a gene or compare its internal organization with that of other cloned sequences Before DNA sequencing and bioinformatics became popular, restriction maps were created experimentally by cutting DNA with different restriction enzymes and separating DNA fragments by gel electrophoresis (refer to Figure 9–20 and see chapter opening photo) The digestion pattern of fragments generated can then be interpreted to determine the location of restriction sites for different enzymes Because of advances in DNA sequencing and the use of bioinformatics, restriction maps are now created by simply using software to identify restriction-enzyme cutting sites in sequenced DNA The Exploring Genomics exercise in this chapter involves a Web site, Webcutter, which is commonly used for generating restriction maps Restriction maps were an important way of characterizing cloned DNA and could be constructed in the absence of any other information about the DNA, including whether or not it encodes a gene or has other functions In the Human Genome Project, restriction maps of the human genome were important for digesting the genome into pieces that could be sequenced Nucleic Acid Blotting Several of the techniques described in this chapter rely on hybridization between complementary nucleic acid (DNA or RNA) molecules One of the most widely used methods for detecting such hybrids is called Southern blotting (after Edwin Southern, who devised it) The Southern blot method can be used to identify which clones in a library contain a given DNA sequence and to characterize the size of the fragments Southern blots can also be used to identify fragments carrying specific genes in genomic DNA digested with a restriction enzyme Southern blotting has two components: separation of DNA fragments by gel electrophoresis and hybridization of the fragments using labeled probes (Figure 17–9) Gel electrophoresis can be used to characterize the number of fragments produced by restriction digestion of relatively small pieces of DNA and to estimate their molecular weights However, restriction-enzyme digestion of large genomes—such as the human genome, with more than billion nucleotides—will produce so many different fragments that they will run together on a gel to produce a continuous smear The identification of specific fragments in these cases is accomplished in the next step: hybridization characterizes the DNA sequences present in the fragments To produce Figure 17–10, researchers cut samples of genomic DNA with several restriction enzymes The agarose gel electrophoresis pattern of fragments obtained for each restriction enzyme is shown in Figure 17–10(a) A Southern blot of this gel is illustrated in Figure 17–10(b) The probe hybridized to complementary sequences, identifying fragments of interest Southern blotting led to the development of other blotting approaches RNA blotting was subsequently called Northern blot analysis or simply Northern blotting, and following a naming scheme that correlates with the directionality of a compass, a related blotting technique involving proteins is known as Western blotting Western blotting is a widely used technique for analyzing proteins Thus part of the historical significance of Southern blotting is that it led to the development of other blotting methods that are key tools for studying nucleic acids and proteins Prior to the development of RT-PCR and real-time PCR, Northern blotting was a common approach used to study gene expression To determine whether a gene is actively being expressed in a given cell or tissue type, Northern blotting probes for the presence of mRNA complementary to a cloned gene To this, mRNA is extracted from a specific cell or tissue type and separated by gel electrophoresis The RNA is then transferred to a membrane, as in Southern blotting, and the membrane is exposed to a labeled single-stranded DNA or RNA probe derived from a cloned copy of the gene If mRNA complementary to the DNA probe is present, the complementary sequences will hybridize and be detected as a band on the film Northern blots provide information about the expression of specific genes and are used to study patterns of gene expression in embryonic tissues, cancer, and genetic disorders Northern blots also detect alternatively spliced mRNAs (multiple types of transcripts derived from a single gene) and can be used to derive other information about transcribed mRNAs such as the size of a gene’s mRNA transcripts, measuring band density and the amount of mRNA expressed by a gene Northern blots are occasionally still used to study RNA expression, but because PCRbased techniques are faster and more sensitive than blotting methods, techniques such as RT-PCR are often the preferred approach, particularly for measuring changes in gene expression Finally, as noted earlier in the text (see Chapter 9), fluorescence in situ hybridization, or FISH, is a powerful tool that involves hybridizing a probe directly to a chromosome or RNA without blotting (see Figure 9–18 and Figures 20–8 and 20–9) FISH can be carried out with isolated chromosomes on a slide or directly 17.4 M ole cu l ar Te ch niqu esf or Ana ly zing D NA 351 DNA samples cut with restriction enzymes are loaded on agarose gel for electrophoresis Lane 1: DNA size markers Lane 2: DNA cut with restriction enzyme A Lane 3: DNA cut with restriction enzyme B DNA is separated by electrophoresis Weight Paper towels DNA-binding membrane Gel Wick (sponge) Buffer DNA is denatured Gel is placed on sponge wick DNA-binding membrane, paper towels, and weight are placed on gel; buffer passes upward through sponge by capillary action, transferring DNA fragments to membrane Radioactive or nonradioactively labeled probe The membrane is placed in heat-sealed bag or a tube with solution containing labeled probe; probe hybridizes with complementary sequences Bound probe detected by film or probe signal captured with a digital camera F I G U RE – In the Southern blotting technique, samples of the DNA to be probed are cut with restriction enzymes and the fragments are separated by gel electrophoresis The pattern of fragments is visualized and photographed under ultraviolet illumination The gel is placed in an alkaline solution to denature DNA into single-strands then it is placed on a sponge wick that is in contact with a buffer solution and covered with a DNA-binding membrane Layers of paper towels or blotting paper are placed in situ in tissue sections or entire organisms, particularly when embryos are used for various studies in developmental genetics (Figure 17–11) For example, in developmental studies one can identify which cell types in an All size markers appear because they are labeled; in lanes and 3, only those bands that hybridize with probe are visible on top of the membrane and held in place with a weight Capillary action draws the buffer through the gel, transferring the DNA fragments from the gel to the membrane Single-stranded DNA fragments on the membrane is hybridized to a labeled DNA probe The membrane is washed to remove excess probe and overlaid with a piece of X-ray film for autoradiography or chemiluminescence from the probe detected with a digital camera The hybridized fragments show up as bands on the X-ray film embryo express different genes during specific stages of development Variations of the FISH technique are also used to produce spectral karyotypes in which individual chromosomes can 352 17 R E COMBI NANT D NA TE CHN OLOG Y (a) (b) F I G U R E –1 (a) Agarose gel stained with ethidium bromide to show DNA fragments (b) Exposed X-ray film of a Southern blot prepared from the gel in part (a) Only those bands containing DNA sequences complementary to the probe show hybridization be detected using probes labeled with dyes that will fluoresce at different wavelengths (see Chapter opening photograph and Figure 16–1) ES SEN T I A L PO I N T DNA and RNA can be analyzed through a variety of methods that involve hybridization techniques 17.5 DNA Sequencing Is the Ultimate Way to Characterize DNA at the Molecular Level In a sense, cloned DNA, from a single gene to an entire genome, is completely characterized at the molecular level only when its nucleotide sequence is known The ability to sequence DNA has greatly enhanced our understanding of genome organization and increased our knowledge of gene structure, function, and mechanisms of regulation Historically, the most commonly used method of DNA sequencing was developed by Fred Sanger and his colleagues and is known as dideoxynucleotide chain-termination sequencing or simply Sanger sequencing In this technique, a double-stranded DNA molecule whose sequence is to be determined is converted to single strands that are used as a template for synthesizing a series of complementary strands The DNA to be sequenced is mixed with a primer that is complementary to the target DNA or vector, along with DNA polymerase, and the four deoxyribonucleotide triphosphates (dATP, dCTP, dGTP, and dTTP) are added to each tube The key to the Sanger technique is the addition of a small amount of one modified deoxyribonucleotide (Figure 17–12), called a dideoxynucleotide (abbreviated ddNTP) Notice that dideoxynucleotides have a 3¿ hydrogen instead of a 3¿ hydroxyl group Dideoxynucleotides are called chain-termination nucleotides because they lack the 3¿ oxygen FIGUR E 17–11 In situ hybridization of a zebrafish embryo 48 hours after fertilization showing expression of atp2a1 mRNA, which encodes a muscle-specific calcium pump The probe revealing atp2a1 expression produces dark blue staining Notice that this staining is restricted to muscle cells surrounding the developing spinal cord of the embryo required to form a phosphodiester bond with another nucleotide Thus when ddNTPs are included in a reaction as DNA synthesis takes place, the polymerase occasionally inserts a dideoxynucleotide instead of a deoxyribonucleotide into a growing DNA strand Since the dideoxynucleotide has no 3¿-OH group, it cannot form a 3¿ bond with another nucleotide, and DNA synthesis terminates because DNA polymerase cannot add new nucleotides to a ddNTP The Sanger reaction takes advantage of this key modification For example, in Figure 17–12, notice that the shortest fragment generated is a sequence that has added ddCTP to the 3¿ end of the primer and the chain has terminated Over time as the reaction proceeds, eventually a ddNTP will be inserted at every location in the newly synthesized DNA so that each strand synthesized differs in length by one nucleotide and is terminated by a ddNTP This allows for separation of these DNA fragments by gel electrophoresis, which can then be used to determine the sequence When the Sanger technique was first developed, four separate reaction tubes, each with a different single ddNTP (e.g., ddATP, ddCTP, ddGTP, and ddTTP), were used These reactions typically used either a radioactively labeled primer or a radioactively labeled ddNTP for analysis of the sequence following polyacrylamide gel electrophoresis and autoradiography Historically, this approach involved large polyacrylamide gels in which each reaction was loaded on a separate lane of the gel and ladder-like banding patterns revealed by autoradiography were read to determine the sequence This original approach could typically read several hundred bases per reaction Read length—that is, the amount of sequence that can be generated in a single individual reaction and the total amount of DNA sequence generated in a sequence run, which is effectively read length times the number of reactions an instrument can run during a given period of time—has become a hot area for innovation in sequencing technology In the past 20 years, modifications of the Sanger technique led to technologies that allowed sequencing reactions to occur in a single tube As shown in Figure 17–12, each of the four ddNTPs is labeled with a different-colored fluorescent 17.5 D NA Seque ncing Is the Ultimate Way to Charac te riz e DNA at the M ol ec u l ar Level Reaction components • dNTPs (dATP, dCTP, dGTP, dTTP) • DNA template • small amount of ddNTPs with • Primer fluorochromes • DNA polymerase 5‘ 3‘ Primer P P 4‘ 1‘ 3‘ 3‘ A G T C C T AGC T C AGGA T CG G A T C TG T AC 2‘ OH 5‘ Template strand ddNTPs ddATP ddCTP ddGTP ddTTP P P 5‘ 5‘ 5‘ 5‘ 5‘ 5‘ 5‘ A G T C C T AGCC Base O 5‘ P 4‘ 1‘ 3‘ 5‘ H Deoxynucleotide (dNTP) Primer extension and chain termination 5‘ Base O 5‘ P 353 2‘ H H Dideoxynucleotide (ddNTP) Separate DNA fragments by capillary gel electrophoresis 3‘ A G T C C T AGCC T 3‘ A G T C C T AGCC T A Direction of movement of strands 3‘ A G T C C T AGCC T AG Laser 3‘ A G T C C T AGCC T AG A Capillary gel Detector Laser and detector detect fluorescence of each ddNTP and provide input to a computer for sequence analysis 3‘ A G T C C T AGCC T AG A C Chromatograph 3‘ A G T C C T AGCC T AG A C A 3‘ A G T C C T AGCC T AG A C A T 3‘ A G T C C T AGCC T AG A C A T G 3‘ F I G U R E – Computer-automated DNA sequencing using the chain-termination (Sanger) method (1) A primer is annealed to a sequence adjacent to the DNA being sequenced (usually near the multiple cloning site of a cloning vector) (2) A reaction mixture is added to the primer–template combination This includes DNA polymerase, the four dNTPs, and small molar amounts of dideoxynucleotides (ddNTPs) labeled with fluorescent dyes All four ddNTPs are added to the same tube, and during primer extension, all possible lengths of chains are produced During primer extension, the polymerase occasionally (randomly) inserts a ddNTP instead of a dNTP, terminating the synthesis of the dye These reactions were carried out in PCR-like fashion using cycling reactions that permit greater read and run capabilities The reaction products were separated through a single, ultrathin-diameter polyacrylamide tube gel called a capillary gel (capillary gel electrophoresis) As DNA fragments move through the gel, they are scanned with a laser The laser stimulates fluorescent dyes on each DNA fragment, which then emit different wavelengths of light for each ddNTP Emitted light is captured by a detector that amplifies and feeds this information into a computer to convert the light patterns into a DNA sequence that is technically called an electropherogram or 5‘ C T A G A C A T G 3‘ chain because the ddNTP does not have the OH group needed to attach the next nucleotide Over the course of the reaction, all possible termination sites will have a ddNTP inserted The products of the reaction are added to a single lane on a capillary gel, and the bands are read by a detector and imaging system This process is computer automated, and robotic machines sequence several hundred thousand nucleotides in a 24-hour period and then store and analyze the data automatically The sequence is obtained by extension of the primer and is read from the newly synthesized strand Thus in this case, the sequence obtained begins with 5¿-CTAGACATG-3¿ chromatograph The data are represented as a series of colored peaks, each corresponding to one nucleotide in the sequence Since the early 1990s, DNA sequencing has largely been performed through computer-automated Sanger-reactionbased technology Such systems generate relatively large amounts of sequence DNA Computer-automated sequences can achieve read lengths of approximately 1000 bp with about 99.999 percent accuracy for about $0.50 per kb Automated DNA sequencers often contain multiple capillary gels (as many as 96) that are several feet long and can process several thousand bases of sequences so that many of these 354 17 R E COMBI NANT D NA TE CHN OLOG Y instruments make it possible to generate over million bp of sequences in a day! These systems became essential for the rapidly accelerating progress of the Human Genome Project Sequencing Technologies Have Progressed Rapidly Since Fred Sanger was awarded part of the 1980 Nobel Prize in Chemistry (which he shared with Walter Gilbert and Paul Berg) for sequencing technology, DNA sequencing technologies have undergone an incredible evolution to dramatically improve sequencing capabilities New innovations in sequencing technology are developing quickly Sanger sequencing approaches (particularly those involving computer-automated instruments such as capillary electrophoresis) still have their place in everyday routine applications that require sequencing, such as sequencing a relatively short piece of DNA amplified by PCR When it comes to sequencing entire genomes, however, Sanger sequencing technologies are outdated The costs of Sanger sequencing are relatively high compared to newer technologies, and Sanger sequencing output, even with computer-automated DNA sequencing, is simply not high enough to support the growing demand for genomic data This demand is being driven in large part by personalized genomics (see Chapter 19) and the desire to reveal the genetic basis of human diseases, which will require tens of thousands of individual genome sequences As we will discuss later in the text (see Chapter 19), a race was on to develop sequencing technologies for the complete and accurate sequencing of an individual human genome for $1000 Several sequencing companies claim that they have technologies that can sequence entire genomes for $1000 but scientists not agree on what level of accuracy and other factors should be expected at this price In mid 2015, current information from the National Human Genome Research Institute estimated a full cost of ∼$4200 to sequence a human genome in less than one day Nonetheless, there is every indication that technology will allow for rapid, accurate, and routine sequencing of genomes for $1000 or less relatively soon thanks to the development of nextgeneration sequencing (NGS) technologies NGS technologies dispense with the Sanger technique and capillary electrophoresis methods ( first-generation sequencing) in favor of sophisticated, parallel formats (simultaneous reaction formats) that synthesize DNA from tens of thousands of identical strands simultaneously and then use state-of-the art fluorescence imaging techniques to detect new synthesized strands and average sequence data across many molecules being sequenced NGS technologies are providing an unprecedented capacity for generating massive amounts of DNA sequence data rapidly and at dramatically reduced costs per base The desire for NGS within the research community and challenges such as the $1000 genome have led to an intense race among many companies eager to produce NGS methods As a result, there are at least five NGS technologies Next-generation sequencing started around 2005 Some of the first instruments were capable of producing as much data as 50 capillary electrophoresis systems and are up to 200 times faster and cheaper than conventional Sanger approaches NGS instruments generally produce short read lengths of ∼200–400 bp, and then these snippets of sequence are stitched together using software to produce a coherent, complete genome Shortly after NGS methods were commercialized, companies were announcing progress on third-generation sequencing (TGS) TGS methods are based on strategies that sequence a single molecule of single-stranded DNA, and at least four different approaches are being explored Recently, TGS was used to sequence the genomes of five strains of Vibrio cholerae involved in a cholera outbreak in Haiti in less than an hour This genetic determination of the Vibrio strains resulted in rapid action to successfully treat the outbreak with antibiotics to which these bacteria were not resistant Most TGS technologies still have somewhat high error rates for sequencing accuracy—about 15 percent errors in sequence generated At around $750,000, these technologies also remain very expensive The genomics research community has embraced NGS and third-generation sequencing technologies Which approaches will eventually emerge as the sequencing methods of choice for the long-term is unclear, but what is clear is that the landscape of sequencing capabilities has dramatically changed for the better and never before have scientists had the ability to generate so much sequence data so quickly Through 2006, new sequencing technologies were cutting sequencing costs in half about every two years And keep an eye on the $1000 genome: there is every reason to believe sequencing technology will get us there soon Our overview of DNA sequencing in this chapter is a great introduction to a detailed discussion of genomics and many related topics found later in the text (see Chapter 18) ESSEN T IAL PO IN T DNA sequencing technologies are changing rapidly Next-generation and third-generation methods produce large amounts of fairly accurate sequence data in a short time 17.6 Creating Knockout and Transgenic Organisms for Studying Gene Function Recombinant DNA technology has also made it possible to directly manipulate genes in organisms in ways that allow scientists to learn about gene function in vivo These approaches enable scientists to create genetically engineered plants and animals for research and for commercial applications We conclude this chapter by briefly 17.6 355 Cr eating Knockou t and Transgeni c Organ isms f or S tudyi ng G e ne F u nct i on discussing gene knockout technology and creating transgenic animals as examples of gene-targeting methods that have revolutionized research in genetics In selected chapters of the book, the Modern Approaches to Understanding Gene Function boxes have highlighted examples of specific research projects involving gene-targeting approaches Gene Targeting and Knockout Animal Models The concept behind gene targeting is to manipulate a specific allele, locus, or base sequence to learn about the functions of a gene of interest In the 1980s, scientists devised a gene-targeting technique for creating gene knockout (often abbreviated as KO) organisms, specifically mice The pioneers of knockout technology, Dr Mario Capecchi of the University of Utah and colleagues Oliver Smithies of the University of North Carolina, Chapel Hill, and Sir Martin Evans of Cardiff University, UK, received the 2007 Nobel Prize in Physiology or Medicine for developing this technique The fundamental purpose of creating a knockout is to disrupt or eliminate a specific gene or genes of interest and then ask, “what happens?” If physical, behavioral, and biochemical changes or other metabolic phenotypes or functions are observed in the KO animal, then one can begin to see that the gene of interest has some functional role or roles in the observed phenotypes Thus one of the most valuable reasons for creating a KO is to learn about gene function The KO techniques developed in mice led to similar technologies for making KOs in zebrafish, rats, pigs, fruit flies, and many other organisms including plants Knockout mice have revolutionized research in genetics, molecular biology, and biomedical research in many ways Scientists have used knockout methods to create thousands of knockout organisms that have advanced our understanding of gene function, created animal models for many human diseases, and enabled scientists to make transgenic animals (which we will discuss below) Applications of KO technology have also provided the foundation for genetargeting approaches in gene therapy that we discuss later in the text (see Special Topic Chapter 6—Gene Therapy) Generally, generating a KO mouse or a transgenic mouse is a very labor-intensive project that can take several years of experiments and crosses and a significant budget to complete However, once a KO mouse is made, assuming it is fertile, a colony of mice can be maintained; often KO mice are shared around the world so that other researchers can work with them Many companies will produce KO mice for researchers It is also possible to make double-knockout animals (DKOs) and even triple-knockout animals (TKOs) This approach is typically used when scientists want to study the functional effects of disrupting two or three genes thought to be involved in a related mechanism or pathway A KO animal can be made in several ways (Figure 17–13) Here we outline a strategy for making KO mice, but the same Designing the targeting vector Vector neoR Target gene Genome Transform ES cells with targeting vector and select cells for recombination ES cells from agouti mouse Microinject ES cells into blastocyst from black-colored mouse Inner cell mass Transfer into pseudo-pregnant foster mother, birth of chimeras Chimeras Chimeric mouse bred to black mouse to create mice heterozygous (+/-) for gene knockout * * (+/-) (+/+) (+/-) (+/+) Breed heterozygous mice to produce mice homozygous (-/-) for gene knockout FIGUR E 17–13 A basic strategy for producing a knockout mouse 356 17 R E COMBI NANT D NA TE CHN OLOG Y basic methods apply when making most KO animals The DNA sequence for the gene of interest to be targeted for KO must be known Scientists also need to know some sequence information about noncoding sequences that flank the gene at its location in the genome A targeting vector is then constructed The purpose of the targeting vector is to create a segment of DNA that can be introduced into cells It then undergoes homologous recombination with the gene of interest (the target gene) to disrupt or replace the gene of interest, thereby rendering it nonfunctional The targeting vector contains a copy of the gene of interest that has been mutated by inserting a large segment of foreign DNA, essentially a large insertion mutation This foreign DNA will disrupt the reading frame of the target gene so that if the gene is transcribed into mRNA and translated into protein, it will produce a nonfunctional protein To help scientists determine whether the targeting vector has been properly introduced into the genome, the insertion sequence typically contains a marker gene The example shown in Figure 17–13 uses a marker sequence for neomycin resistance (neoR ) Neomycin is an antibiotic that blocks protein synthesis in both prokaryotic and eukaryotic cells, and its role will become apparent momentarily The gene for green fluorescent protein (GFP), the lacZ gene that we discussed earlier in this chapter, and other marker genes are also sometimes used as marker genes These markers allow for very visual examples of KO or transgenic organisms, and you will see an example of a GFP transgenic animal later in this section There are several ways to introduce the targeting vector into cells One popular approach involves using (a) electroporation to deliver the vector into embryonic stem (ES) cells grown in culture The ES cells are harvested from the inner cell mass of a mouse embryo at the blastocyst stage Alternatively, the targeting vector is directly injected into the blastocyst with the hopes that it will enter ES cells in the inner cell mass Sometimes it is possible to make KOs by isolating newly fertilized eggs from a female mouse (or female of the desired animal species) and microinjecting the targeting vector DNA directly into the nucleus of the egg or into one of the pronuclei of a fertilized egg [Figure 17–14(a)] Randomly, in a very small percentage of ES cells that have taken in the targeting vector, the actions of the endogenous enzyme recombinase will catalyze homologous recombination between the targeting vector and the sequence for the gene of interest In the few recombinant ES cells that will be created, the targeting vector will usually replace the original gene on only one of two chromosomes Recombinant ES cells can be selected for by treating cultured ES cells with a reagent that will kill cells that lack the targeting vector For the example shown in Figure 17–13, neomycin is added to cultured ES cells Cells containing the targeting vector are resistant to neomycin, but ES cells that are not nonrecombinant die Recombinant ES cells are then injected into a mouse embryo at the blastocyst stage where they will be incorporated into the inner cell mass of the blastocyst The blastocyst is then placed into the uterus of a surrogate mother mouse, sometimes called a pseudopregnant mouse—a female mouse bred by a sterile male mouse The pseudopregnant mouse offers a uterus that is (b) F I G U RE – (a) Microinjecting DNA into a fertilized egg to create a knockout or a transgenic mouse A fertilized egg is held by a suction or holding pipette (seen below the egg), and a microinjection needle delivers cloned DNA into the nucleus of the egg (b) On the left is a null mouse (-/-) for both copies of the obese (ob) gene, which produces a peptide hormone called leptin The mouse on the right is wild type (+/+) for the ob gene The ob knockout mouse weighs almost five times as much as its wild-type sibling Naturally occurring mutations in the human Ob gene create weight disorders for affected individuals 17.6 Cr eating Knockou t and Transgeni c Organ isms f or S tudyi ng G e ne F u nct i on receptive to implantation of the blastocyst containing the targeting vector From the implanted embryos the surrogate will give birth to mice that are chimeras: some cells in their body arise from KO stem cells, and others arise from stem cells of the donor blastocyst Researchers screen for mice that contain the targeting vector by obtaining DNA from a sample of tail tissue, purifying the DNA, and performing PCR to verify that the targeting vector sequence is present in the animal’s genome As long as the targeting vector DNA is present in germ cells, the sequence will be inherited in all of the offspring generated by these mice, but typically most F1 generation KO mice produced this way are heterozygous (+/-) for the gene of interest and the targeting vector and not homozygous for the KO Sibling matings of F1 animals can then be used to generate homozygous KO animals, referred to as null mice and given a -/- designation because they lack wild-type copies of the targeted gene of interest As mentioned at the beginning of this section, KO animal models serve invaluable roles for learning about gene function, and they continue to be essential for biomedical research on disease genes [see Figure 17–14(b)] Despite all of the work that goes into trying to produce a KO organism, sometimes viable offspring are never born The KO results in embryonic lethality Knocking out a gene that is important during embryonic development may kill the embryo before it has a chance to fully develop Typically, researchers will examine embryos from the pseudopregnant mouse to see if they can determine at what stage of embryonic development the embryo is dying This examination often reveals specific organ defects that can also be informative about the function of the KO gene If null mice for a particular gene of interest cannot be derived by traditional KO approaches, a conditional knockout can often provide a way to study such a gene Conditional knockouts allow one to control when a target gene is disrupted This study can be done at a particular time in the animal’s development For example, if a target gene displays embryonic lethality, one can use a conditional KO to allow an animal to progress through development and be born before disrupting Another advantage of conditional KOs is that target genes can also be turned off in a particular tissue or organ instead of the entire animal 357 (TALENs) and zinc finger nucleases (ZFNs) can be used for gene therapy These approaches have also been used for the in vivo genetic engineering of mice, rats, and other species for targeted modifications of specific genes We also mention CRISPRs (Clustered Regularly Interspaced Palindromic Repeats) as a powerful new tool for gene-targeting experiments Making a Transgenic Animal: The Basics Transgenic animals (Figure 17–15), also sometimes called knock-in animals, express or often overexpress a particular gene of interest (the transgene) The method of creating transgenic animals is conceptually simple, and many of the steps are similar to the steps involved in making a KO Moreover, as is true when making KOs, there are speciesspecific challenges associated with creating transgenics Many of the prevailing techniques used to make transgenics were also developed in mice Conceptually, introducing a transgene into an organism involves steps similar to making a KO But instead of trying to disrupt a target gene, a vector with the transgene is created to undergo homologous recombination into the host-cell genome In some applications, tissue-specific promoter sequences can be used so that the transgene is expressed only in specific tissues For example, tissuespecific promoters are used in the biotechnology industry to express specific recombinant products in milk for subsequent purification It is often easier to make a transgenic than a KO because the vector just needs to be incorporated into the host genome somewhere (hopefully in a noncoding region) but often not at a particular locus as is necessary when making a KO Gene-Editing Methods Gene-editing methods allow researchers to create changes in a specific sequence to remove, correct or replace a defective gene or parts of a gene Gene editing is based on using different nucleases to create breaks in the genome in a sequence specific manner Later in the text (see Special Topic Chapter 6—Gene Therapy), we discuss how gene-editing methods with transcription activator-like effector nucleases FIGUR E 17–15 Transgenic mice incorporating the GFP gene from jellyfish enable scientists to tag particular genes with GFP to track gene activity, including activity in subsequent generations of mice generated from these transgenics This procedure can be very valuable for examining the transgenerational effects on gene expression, including epigenetic changes 358 17 R E COMBI NANT D NA TE CHN OLOG Y The vector with the transgene can be put into ES cells or injected directly into embryos Then, in a relatively small percentage of embryos or eggs, the transgenic DNA becomes inserted into the egg cell genome by recombination due to the action of naturally occurring DNA recombinases After this stage, the rest of the process is similar to making a KO: embryos are placed into pseudopregnant females, and resulting crosses are used to derive mice that are homozygous for the transgene In a transgenic, the transgene is often overexpressed in order to study its effects on the appearance and functions of mice There are many variations and purposes for making transgenics Transgenic animals overexpressing certain genes, expressing human genes or genes from a different species, and expressing mutant genes are among examples of transgenics that are valuable models for basic and applied research to understand gene function As we will consider later in the text (see Chapter 19), transgenic animals and plants are also created in order to produce commercially valuable biotechnology products Later in the text (see Special Topic Chapter 5—Genetically Modified Foods), you will learn about examples of transgenic food crops ESSEN T IAL PO IN T Gene-targeting methods to create knockout animals and transgenic animals are widely used, valuable approaches for studying gene function in vivo E X P LO R I N G G E N OMI C S A Visit the Manipulating Recombinant DNA: Study Area: Exploring Genomics Restriction Mapping and Designing PCR Primers s you learned in this chapter, restriction enzymes are sophisticated “scissors” that molecular biologists use to cut DNA, and they are routinely used in genetics and molecular biology laboratories for recombinant DNA experiments A wide variety of online tools assist scientists working with restriction enzymes and manipulating recombinant DNA for different applications Here we explore Webcutter, a site that makes recombinant DNA experiments much easier Exercise I – Creating a Restriction Map in Webcutter Suppose you had cloned and sequenced a gene and you wanted to design a probe approximately 600 bp long that could be used to analyze expression of this gene in different human tissues by Northern blot analysis Not too long ago, you had primarily two ways to approach this task You could digest the cloned DNA with whatever restriction enzymes were in your freezer, then run agarose gels and develop restriction maps in the hope of identifying cutting sites that would give you the size fragment you wanted Or you could scan the sequence with your eyes, looking for restriction sites of interest—a very time-consuming and eye-straining effort! Internet sites such as Webcutter take the guesswork out of developing restriction maps and make it relatively easy to design experiments for manipulating recombinant DNA In this exercise, you will use Webcutter to create a restriction map of human DNA with the enzymes EcoRI, BamHI, and PstI Access Webcutter at http://rna lundberg.gu.se/cutter2/ Go to the Companion Web site for Concepts of Genetics and open the Exploring Genomics exercise for this chapter Copy the sequence of cloned human DNA found there and paste it into the text box in Webcutter Scroll down to “Please indicate which enzymes to include in the analysis.” Click the button indicating “Use only the following enzymes.” Select the restriction enzymes EcoRI, BamHI, and PstI from the list provided, then click “Analyze sequence.” (Note: Use the command, control, or shift key to select multiple restriction enzymes.) After examining the results provided by Webcutter, create a table showing the number of cutting sites for each enzyme and the fragment sizes that would be generated by digesting with each enzyme Draw a restriction map indicating cutting sites for each enzyme with distances between each site and the total size of this piece of human DNA Exercise II – Designing a Recombinant DNA Experiment Now that you have created a restriction map of your piece of human DNA, you need to ligate the DNA into a plasmid DNA vector that you can use to make your probe To this, you will need to determine which restriction enzymes would best be suited for cutting both the plasmid and the human DNA Referring back to the Companion Web site and the Exploring Genomics exercise for this chapter, copy the plasmid DNA sequence from Exercise II into the text box in Webcutter and identify cutting sites for the same INSI GHT S A N D SOLUTION S enzymes you used in Exercise I Then answer the following questions: a. What is the total size of the plasmid DNA analyzed in Webcutter? b. Which enzyme(s) could be used in a recombinant DNA experiment to ligate the plasmid to the largest DNA fragment from the human gene? Briefly explain your answer c. What size recombinant DNA molecule will be created by ligating these fragments? CASE STUDY d. Draw a simple diagram showing the cloned DNA inserted into the plasmid and indicate the restriction-enzyme cutting site(s) used to create this recombinant plasmid As you prepare to carry out this subcloning experiment, you find that the expiration dates on most of your restriction enzymes have long since passed Rather than run an experiment with old enzymes, you decide to purchase new enzymes Fortunately, a site called REBASE®: 359 The Restriction Enzyme Database can help you Over 300 restriction enzymes are commercially available rather inexpensively, but scientists are always looking for ways to stretch their research budgets as far as possible REBASE is excellent for locating enzyme suppliers and enzyme specifics, particularly if you need to work with an enzyme that you are unfamiliar with Visit REBASE® at http://rebase.neb.com/rebase/rebase html to identify companies that sell the restriction enzyme(s) you need for this experiment Should we worry about recombinant DNA technology? E arly in the 1970s, when recombinant DNA research was first developed, scientists realized that there may be unforeseen dangers, and after a self-imposed moratorium on all such research, they developed and implemented a detailed set of safety protocols for the construction, storage, and use of genetically modified organisms These guidelines then formed the basis of regulations adopted by the federal government Over time, safer methods were developed, and these stringent guidelines were gradually relaxed or, in many cases, eliminated altogether Now, however, the specter of bioterrorism has re-focused attention on the potential misuses of recombinant DNA technology For example, individuals or small groups might use the information in genome databases coupled with recombinant DNA technology to construct or reconstruct agents of disease, such as the smallpox virus or the deadly influenza virus 1 Do you think that the question of recombinant DNA research regulation by university and corporations should be revisited to monitor possible bioterrorist activity? Should freely available access to genetic databases, including genomes, and gene or protein sequences be continued, or should it be restricted to individuals who have been screened and approved for such access? Forty years after its development, the use of recombinant DNA technology is widespread and is found even in many middle school and high school biology courses Are there some aspects of gene splicing that might be dangerous in the hands of an amateur? INSIGHTS AND SOLUTIONS The recognition sequence for the restriction enzyme Sau3AI is GATC (see Figure 17–1); in the recognition sequence for the enzyme BamHI—GGATCC—the four internal bases are identical to the Sau3AI sequence The single-stranded ends produced by the two enzymes are identical Suppose you have a cloning vector that contains a BamHI recognition sequence and you also have foreign DNA that was cut with Sau3AI (a) Can this DNA be ligated into the BamHI site of the vector, and if so, why? (b) Can the DNA segment cloned into this sequence be cut from the vector with Sau3AI? With BamHI? What potential problems you see with the use of BamHI? Solution: (a) DNA cut with Sau3AI can be ligated into the vector’s BamHI cutting site because the single-stranded ends generated by the two enzymes are identical (b) The DNA can be cut from the vector with Sau3AI because the recognition sequence for this enzyme (GATC) is maintained on each side of the insert Recovering the cloned insert with BamHI is more problematic In the ligated vector, the conserved sequences are GGATC (left) and GATCC (right) The correct base for recognition by BamHI will follow the conserved sequence (to produce GGATCC on the left) only about 25 percent of the time, and the correct base will precede the conserved sequence (and produce GGATCC on the right) about 25 percent of the time as well Thus, BamHI will be able to cut the insert from the vector (0.25 * 0.25 = 0.0625), or only about percent, of the time GGATC C C T AG Insert Vector G A T CC C TAGG 360 17 R E COMBI NANT D NA TE CHN OLOG Y Problems and Discussion Questions HOW DO WE KNOW ? In this chapter we focused on how specific DNA sequences can be copied, identified, characterized, and sequenced At the same time, we found many opportunities to consider the methods and reasoning underlying these techniques From the explanations given in the chapter, what answers would you propose to the following fundamental questions? (a) In a recombinant DNA cloning experiment, how can we determine whether DNA fragments of interest have been incorporated into plasmids and, once host cells are transformed, which cells contain recombinant DNA? (b) When using DNA libraries to clone genes, what combination of techniques are used to identify a particular gene of interest? (c) What steps make PCR a chain reaction that can produce millions of copies of a specific DNA molecule in a matter of hours without using host cells? (d) How has DNA sequencing technology evolved in response to the emerging needs of genome scientists? CONCEPT QUESTION Review the Chapter Concepts list on page 338 All of these refer to recombinant DNA methods and applications Write a short essay or sketch a diagram that provides an overview of how recombinant DNA techniques help geneticists study genes What roles restriction enzymes, vectors, and host cells play in recombinant DNA studies? What role does DNA ligase perform in a DNA cloning experiment? How does the action of DNA ligase differ from the function of restriction enzymes? The human insulin gene contains a number of sequences that are removed in the processing of the mRNA transcript In spite of the fact that bacterial cells cannot excise these sequences from mRNA transcripts, explain how a gene like this can be cloned into a bacterial cell and produce insulin Although many cloning applications involve introducing recombinant DNA into bacterial host cells, many other cell types are also used as hosts for recombinant DNA Why? You want to perform restriction digestion on a specific segment of DNA containing the nucleotide sequence shown below Which restriction enzyme will you use for this experiment? Give the complementary sequence of the identified restriction site (Consult Figure 17–1 for a list of restriction sites.) TCTGTGAGAATTCCTAGGTA Restriction sites are palindromic; that is, they read the same in the 5¿ to 3¿ direction on each strand of DNA What is the advantage of having restriction sites organized this way? List the advantages and disadvantages of using plasmids as cloning vectors What advantages BACs and YACs provide over plasmids as cloning vectors? What are the advantages of using a restriction enzyme whose recognition site is relatively rare? When would you use such enzymes? 10 The introduction of genes into plants is a common practice that has generated not only a host of genetically modified foodstuffs, but also significant worldwide controversy Interestingly, a Visit for instructor-assigned tutorials and problems tumor-inducing plasmid is often used to produce genetically modified plants Is the use of a tumor-inducing plasmid the source of such controversy? 11 What is a cDNA library, and for what purpose can it be used? 12 If you performed a PCR experiment starting with only one copy of double-stranded DNA, approximately how many DNA molecules would be present in the reaction tube after 15 cycles of amplification? 13 In a control experiment, a plasmid containing a HindIII recognition sequence within a kanamycin resistance gene is cut with HindIII, re-ligated, and used to transform E coli K12 cells Kanamycin-resistant colonies are selected, and plasmid DNA from these colonies is subjected to electrophoresis Most of the colonies contain plasmids that produce single bands that migrate at the same rate as the original intact plasmid A few colonies, however, produce two bands, one of original size and one that migrates much higher in the gel Diagram the origin of this slow band as a product of ligation 14 What advantages cDNA libraries provide over genomic DNA libraries? Describe cloning applications where the use of a genomic library is necessary to provide information that a cDNA library cannot 15 To create a cDNA library, cDNA can be inserted into vectors and cloned In the analysis of cDNA clones, it is often difficult to find clones that are full length—that is, many clones are shorter than the mature mRNA molecules from which they are derived Why is this so? 16 List the steps involved in screening a genomic library What must be known before starting such a procedure? What are the potential problems with such a procedure, and how can they be overcome or minimized? 17 What is quantitative real-time PCR (qPCR)? Describe what happens during a qPCR reaction and how it is quantified 18 We usually think of enzymes as being most active at around 37°C, yet in PCR the DNA polymerase is subjected to multiple exposures of relatively high temperatures and seems to function appropriately at 70–75°C What is special about the DNA polymerizing enzymes typically used in PCR? 19 How next-generation sequencing (NGS) and third-generation sequencing (TGS) differ from Sanger sequencing? 20 What is the difference between a knockout animal and a transgenic animal? 21 One complication of making a transgenic animal is that the transgene may integrate at random into the coding region, or the regulatory region, of an endogenous gene What might be the consequences of such random integrations? How might this complicate genetic analysis of the transgene? 22 When disrupting a mouse gene by knockout, why is it desirable to breed mice until offspring homozygous (-/-) for the knockout target gene are obtained? 23 What techniques can scientists use to determine if a particular transgene has been integrated into the genome of an organism? 18 Genomics, Bioinformatics, and Proteomics AG GCCCAACAAG CACA GCC GGGGAAGG AAA A T GCG T TG TGG ACCTCT G T GCC GA T T CCTG AGG CCCAAGA AGC – C A T CC TGGGA AG GAAA A TGC A T T G GGG AAC CCTG TG CGGA T TCT T G TG G CT T TG GCCC T ATC TG TCCTG TG T TG AAG CT G TG CCA A TCCG AAA A G TCCAG G ATG AC TG G CT T TGGCCC T A TC T T T TC T A TG T CCAAGCTG TG CCC A T CCAA A A A G TCCAAG A TG AC CHAPTER CONCEPTS ACC A AA ACCCTC A TCA AGACG AT T G TCG CCAGG AT C A A T G AC A T T T CACACACG CAG TCT ACCA AA ACCCT C A TCA AGA C A AT T G TC ACCAGGA T C A A TG AC A T T T CACACACG CAG TCA ■■ ■■ ■■ Genomics applies recombinant DNA, DNA sequencing methods, and bioinformatics to sequence, assemble, and analyze genomes Disciplines in genomics encompass several areas of study, including structural and functional genomics, comparative genomics, and metagenomics, and have led to an “omics” revolution in modern biology Bioinformatics merges information technology with biology and mathematics to store, share, compare, and analyze nucleic acid and protein sequence data ■■ The Human Genome Project has greatly advanced our understanding of the organization, size, and function of the human genome ■■ Since completion of the Human Genome Project, a new era of genomics studies is providing deeper insights into the human genome ■■ Comparative genomics analysis has revealed similarities and differences in genome size and organization ■■ Metagenomics is the study of genomes from environmental samples and is valuable for identifying microbial genomes ■■ Transcriptome analysis provides insight into patterns of gene expression and gene-regulatory activity of a genome ■■ Proteomics focuses on the protein content of cells and on the structures, functions, and interactions of proteins ■■ Systems biology approaches attempt to uncover complex interactions among genes, proteins, and other cellular components GT C TCC TC C A A ACAGAGGGT CGCTGG TCT GG AC T T C A T T C C TG GG CTCCA ACCAG TCCTG GT C TCC TC C A A ACAGA A AG T CACCGGT T TGGAC T T C A T T C C TG GG CTCC A CCCC A T CCTG AG T T TG TCCAG GA TGG ACCA GACGT TG GCCA T C T ACC A A C AGA TCCT C AAC AG TC TG CAT ACC T T A TCCA AGA TGG ACCAGAC A CT GGCAG T C T ACCA A C A GA TCC T C AC CAG T A TGCCT TC C AGAAAT G TG G TCCAA A T A TC T A A T G ACCTGG AG AACCTCCGGGACCT T CTCCACCTG TCC AGAAAC G TGATCCA A A T A TC CA ACGACCTG GAG AACCTCCGGG A T C T T C T T CACG TG CTG G CCTCCTCC AAG AG CTGCCCC T T GCCCCGGGCC AGGGGCCTGGAG ACCT T T G AG AGC CTG G CCT TC T C T AA GA GC TGCC AC T T GCCC TGGGCC AG T GGCC TGGAG AC CT T G GACA GC CTG G GCGGCG TCCTG GAAG CCTCACTC T AC TCCA CAG AG G TG GTGG CTCTG AACA G Alignment comparing DNA sequence for the leptin gene from dogs (top) and humans (bottom) Vertical and boxes identical CTG G GGGGTGTCCTGGAAG CTT C lines AG GC T ACshaded TCCA CAG AGindicate G T GGTGG CCCTG AGC A GG bases LEP encodes a hormone that functions to suppress appetite This type of analysis is a common application of bioinformatics and a good demonstration of comparative genomics T he term genome, meaning the complete set of DNA in a single cell of an organism, was coined at a time when geneticists began to turn from the study of individual genes to a focus on the larger picture In 1977, as recombinant DNA-based techniques were developed, Fred Sanger and colleagues began the field of genomics, the study of genomes, by using a newly developed method of DNA sequencing to sequence the 5400-nucleotide genome of the virus fX174 Other viral genomes were sequenced in short order, but even this technology was slow and labor-intensive, limiting its use to small genomes During the next three decades, the development of computerautomated DNA sequencing methods made it possible to consider sequencing the larger and more complex genomes of eukaryotes, including the human genome The development of recombinant DNA technologies coupled with the advent of new, powerful DNA sequencing methods and bioinformatics is responsible for rapidly accelerating the field of genomics Genomic technologies have developed so quickly that modern biological research is currently experiencing a genomics revolution In this chapter, we will examine basic technologies used in genomics and then discuss examples of genome data and different disciplines of genomics We will also discuss transcriptome analysis, the study of genes expressed in a cell or tissue (the “transcriptome”), and proteomics, the study of proteins present in a cell or tissue 361 362 18 Ge no mi cs , B ioi n f or mat i cs, and Prote omics The chapter concludes with a brief look at systems biology, a new area of contemporary biology that incorporates and integrates genomics, transcriptome analysis, and proteomics data Later in the text (see Chapter 19) we will continue our discussion of genomics by describing many modern applications of recombinant DNA and genomic technologies Please note that some of the topics discussed in this chapter are explored in greater depth in later chapters (see Special Topic Chapter 1—Epigenetics, Special Topic Chapter 2—Emerging Roles of RNA, and Special Topic Chapter 4—Genomics and Personalized Medicine) 18.1 Whole-Genome Sequencing Is a Widely Used Method for Sequencing and Assembling Entire Genomes and all of the letters in the entire book are the “genome.” Then you and your friend would go through the painstaking task of comparing the pieces of paper to find places that match, overlapping sentences—areas where there are similar sentences on different pieces of paper Eventually, in theory, many of the strips containing matching sentences would overlap in ways that you could use to reconstruct the pages and assemble the order of the entire text Figure 18–1 shows a basic overview of WGS First, an entire chromosome is cut into short, overlapping fragments, either by mechanically shearing the DNA in various ways (such as excessive heat treatment or sonication in which sonic energy is used to break DNA) or by using restriction enzymes to cleave the DNA at different locations For simplicity, here we present a basic example of DNA digestion using restriction enzymes Increasingly, nonenzymatic approaches for shearing DNA are being used Different restriction enzymes can be used so that chromosomes are cut at different sites; or sometimes, partial digests of DNA using the same restriction enzyme are As discussed earlier in the text (see Chapter 17), recombinant DNA technology made it possible to generate DNA libraries that could be used to identify, clone, and sequence specific genes of interest But a primary limitation of library screening and even of most polymerase chain reaction (PCR) approaches is that they typically can identify only relatively small numbers EcoRl BamHl BamHl of genes at a time Genomics allows the sequencing of entire genomes Structural genomics focuses on sequencing genomes Genomic DNA cut into and analyzing nucleomultiple overlapping fragments by digestion tide sequences to identify with different restriction genes and other important enzymes to create a series of overlapping fragments sequences such as gene-regu4 latory regions The most widely used strategy for sequencing Overlapping sequenced fragments aligned using computer programs to and assembling an entire assemble an entire chromosome genome involves variations of a method called whole3 Fragments aligned based on identical genome sequencing (WGS), DNA sequences also known as shotgun cloncreates an assembly of contiguous ing or shotgun sequencing In fragments or “contigs” simple terms, this technique Contigs is analogous to you and a friend taking your respective FI G URE 8–1 An overview of whole-genome sequencing (WGS) and assembly This approach copies of this genetics textshows one strategy that involves using restriction enzymes to digest genomic DNA into overlapping book and randomly ripping fragments, which are then sequenced and aligned using bioinformatics to identify overlapping fragthe pages into strips about ments based on sequence identity Notice that EcoRI digestion of the portion of DNA depicted here to inches long Each chapproduces two fragments (contigs 1, 2–4), whereas digestion with BamHI produces three fragments ter represents a chromosome, (contigs 1–2, 3, 4) 18.1 Whole-Genome Sequencing Is a Widely Used Method for Sequencing and Assembling Entire Genomes 363 Sequence alignment between contigs and Contig 5‘– ATTT T T TT TGTATTTTTAATAGA GACGAGGTGTC AC C ATGT TGGACAGGC TGGTCTCGAACT CCTGACCT CAGGTGATCT GCCC –3‘ Contig 5‘ – GGTC TCGA ACT CCT GAC CTCAGGTGA TCT GCCCAC C TC AGCCT CCCA A A G T G CT G G A Sequence alignment between contigs and TTACAAGCATGAGCCACCACTCCCAGGC –3‘ Contig 5‘ – GAGCCACCACTCCCAGGC TT TATTTTC TATTTTTTAATTAC AGC C ATC C TAGTGAATGTGAAGTAGTATC TC AC TGAGGT T T T G AT T T –3‘ Assembled sequence of a partial segment of chromosome based on alignment of three contigs 5‘– ATTT T T TT TGTATTTTTAATAGA GACGAGGTGTC AC C ATGT TGGACAGGC TGGTCTCGAACT CCTGACCT CAGGTGATCT GCCC AC C TC AGC CT CCCA A A G T G CT G G A T TACAAGCATGAGCCACCACTCCCAGGCTT TATTTTC TATTTTTTAATTAC AGC C ATC C TAGTGAATGTGAAGTAGTATC TC AC TGAGGT T T T G AT T T –3‘ F I G U RE – DNA-sequence alignment of contigs on human chromosome Single-stranded DNA for three different contigs from human chromosome is shown in blue, red, or green In reality, contig alignment involves fragments that are several used With partial digests, DNA is incubated with restriction enzymes for only a short period of time, so that not every site in a particular sequence is cut to completion by an individual enzyme Restriction digests of whole chromosomes generate thousands to millions of overlapping DNA fragments For example, a 6-bp cutter such as EcoRI creates about 700,000 fragments when used to digest the human genome! In the next section, we will discuss the importance of bioinformatics to genomics One of the earliest bioinformatics applications to be developed for genomic purposes was the use of algorithm-based software programs for creating a DNA-sequence alignment, in which similar sequences of bases are lined up for comparison Alignment identifies overlapping sequences, allowing scientists to reconstruct their order in a chromosome Because these overlapping fragments are adjoining segments that collectively form one continuous DNA molecule within a chromosome, they are called contiguous fragments, or “contigs.” Figure 18–2 shows an example of contig alignment and assembly for a portion of human chromosome For simplicity, this figure shows relatively short sequences for each contig, which in actuality would be much longer The figure is also simplified in that, in actual alignments, assembled sequences not always overlap only at their ends The whole-genome shotgun sequencing method was developed by J Craig Venter and colleagues at The Institute for Genome Research (TIGR) In 1995, TIGR scientists thousand bases in length Alignment of the three contigs allows a portion of chromosome to be assembled Alignment of all contigs for a particular chromosome would result in assembly of a completely sequenced chromosome used this approach to sequence the 1.83-million-bp genome of the bacterium Haemophilus influenzae This was the first completed genome sequence from a free-living organism, and it demonstrated “proof of concept” that shotgun sequencing could be used to sequence an entire genome Even after the genome for H influenzae was sequenced, many scientists were skeptical that a shotgun approach would work on the larger genomes of eukaryotes Now shotgun approaches are the predominant method for sequencing genomes Cutting a genome into contigs is not particularly difficult; however, a primary hurdle that had to be overcome to advance whole-genome sequencing was the question of how to sequence millions or billions of base pairs in a timely and cost-effective way This was a major challenge for scientists working on the Human Genome Project (Section 18.4) The major technological breakthrough that made genomics possible was the development of computerautomated sequencers As we discussed in Chapter 17, next- and thirdgeneration sequencers now enable genome scientists to produce sequence nearly 50,000 times faster than sequencers in 2000 with greater output, improved accuracy, and reduced cost ESSEN T IAL PO IN T Whole-genome shotgun sequencing enables scientists to assemble sequence maps of entire genomes 364 18 Ge no mi cs , B ioi n f or mat i cs, and Prote omics 18.2 DNA Sequence Analysis Relies on Bioinformatics Applications and Genome Databases Genomics necessitated the rapid development of bioinformatics, the use of computer hardware and software and mathematics applications to organize, share, and analyze data related to gene structure, gene sequence and expression, and protein structure and function However, even before whole-genome sequencing projects had been initiated, a large amount of sequence information from a range of different organisms was accumulating as a result of gene cloning by recombinant DNA techniques Scientists around the world needed databases that could be used to store, share, and obtain the maximum amount of information from protein and DNA sequences Thus, bioinformatics software was already being used to compare and analyze DNA sequences and to create private and public databases Once genomics emerged as a new approach for analyzing DNA, however, bioinformatics became even more important than before Today, it is a dynamic area of biological research, providing new career opportunities for anyone interested in merging an understanding of biological data with information technology, mathematics, and statistical analysis Among the most common applications of bioinformatics are to compare DNA sequences, as in contig alignment; to identify genes in a genomic DNA sequence; to find generegulatory regions, such as promoters and enhancers; to identify structural sequences, such as telomeric sequences, in chromosomes; to predict the amino acid sequence of a putative polypeptide encoded by a cloned gene sequence; to analyze protein structure and predict protein functions on the basis of identified domains and motifs; and to deduce evolutionary relationships between genes and organisms on the basis of sequence information High-throughput DNA sequencing techniques were developed nearly simultaneously with the expansion of the Internet As genome data accumulated, many DNA-sequence databases became freely available online Databases are essential for archiving and sharing data with other researchers and with the public One of the largest genomic databases, called GenBank, is maintained by the National Center for Biotechnology Information (NCBI) in Washington, D.C., and is the largest publicly available database of DNA sequences GenBank shares and acquires data from databases in Japan and Europe; it contains more than 1.5 trillion bases of sequence data from over 100,000 species; and it doubles in size roughly every 14–18 months! The Human Genome Nomenclature Committee, supported by the NIH, establishes rules for assigning names and symbols to newly cloned human genes As sequences are identified and genes are named, each sequence deposited into GenBank is provided with an accession number that scientists can use to access and retrieve that sequence for analysis The NCBI is an invaluable source of public access databases and bioinformatics tools for analyzing genome data You have already been introduced to NCBI and GenBank through several Exploring Genomics exercises In Exploring Genomics for this chapter, you will use NCBI and GenBank to compare and align contigs in order to assemble a chromosome segment Annotation to Identify Gene Sequences One of the fundamental challenges of genomics is that, although genome projects generate tremendous amounts of DNA sequence information, these data are of little use until they have been analyzed and interpreted Genome projects accumulate nucleotide sequences, and then scientists have to make sense of those sequences Thus, after a genome has been sequenced and compiled, scientists are faced with the task of identifying gene-regulatory sequences and other sequences of interest in the genome so that gene maps can be developed This process, called annotation, relies heavily on bioinformatics, and a wealth of different software tools are available to carry it out One initial approach to annotating a sequence is to compare the newly sequenced genomic DNA to the known sequences already stored in various databases The NCBI provides access to BLAST (Basic Local Alignment Search Tool), a very popular software application for searching through banks of DNA and protein sequence data Using BLAST, we can compare a segment of genomic DNA to sequences throughout major databases such as GenBank to identify portions that align with or are the same as existing sequences Figure 18–3 shows a representative example of a sequence alignment based on a BLAST search Here a 280-bp chromosome 12 contig from the rat was used to search a mouse database to determine whether a sequence in the rat contig matched a known gene in mice Notice that the rat contig (the query sequence in the BLAST search) aligned with base pairs 174,612 to 174,891 of mouse chromosome The accession number for the mouse chromosome sequence, NT_039455.6, is indicated at the top of the figure BLAST searches calculate a similarity score—also called the identity value—determined by the sum of identical matches between aligned sequences divided by the total number of bases aligned Gaps, indicating missing bases in the two sequences, are usually ignored in calculating similarity scores The aligned rat and mouse sequences were 93 percent similar and showed no gaps in the alignment Because this mouse sequence on chromosome is known to contain an insulin receptor gene (encoding a protein that 18.2 D NA Seque nce Analy sis R e lies on Bioi nf ormatics A pp lications and G enom e Databas es 365 ref | NT_039455.6 | Mm8_39495_36 Mus musculus chromosome genomic contig, strain C57BL/6J Features in this part of subject sequence: insulin receptor Score = 418 bits (226), Expect = 2e-114 Identities = 262/280 (93%), Gaps = 0/280 (0%) Query CAGGCCATCCCGAAAGCGAAGA TCCCT TGA AGAGGTGGGCAA TGTGACAGCCACTACACC 60 Sbjct 174891 CAGGCCATCCCGAAAGCGAAGATCCCT TGAAGAGGTGGGGAATG TGACAGCCACCACACT 174832 Query 61 CACAC T TCCAGAT T T TCCCAACA TC TCCTCCACCA TCGCGCCCACAAGCCACGAAGAGCA 120 Sbjct 174831 CACACT TCCAGAT T TCCCCAACGTCTCC TC TACCA T TG TGCCCACA AGTCAGGAGGAGCA 174772 Query 121 CAGACCA T T TGAGAAAGTAGTA A ACA AGGAGTCAC T TGTCA TC TCTGGCCTGAGACAC T T 180 Sbjct 174771 CAGGCCATT TGAGAAAGTGGTGAACAAGGAGTCACT TGTCATC TC TGGCCTGAGACACT T 174712 Query 181 CAC TGGGTACCGCAT TGAGCTGCAGGCATGCA ATC AGGACTCCCCAGAAGAGAGGTGCAG 240 Sbjct 174711 CAC TGGGTACCGCAT TGAGCTGCAGGCATGCA A TCA AGAT TCCCCAGATGAGAGGTGCAG 174652 Query 241 CGTGGCTGCCTACGTCAGTGCCCGGACCATGCCTGA AGGT 280 Sbjct 174651 TGTGGCTGCCT ACGTC AGTGCCCGGACCATGCCTGA AGGT 174612 F I G U RE – BLAST results showing a 280-base sequence of a chromosome 12 contig from rats (Rattus norvegicus, the “query”) aligned with a portion of chromosome from mice (Mus musculus, the “subject”) that contains a partial sequence for the insulin receptor gene Vertical lines indicate binds the hormone insulin), it is highly likely that the rat contig sequence also contains an insulin receptor gene We will return to the topic of similarity in Sections 18.3 and 18.6, where we consider how similarity between gene sequences can be used to infer function and to identify evolutionarily related genes through comparative genomics Hallmark Characteristics of a Gene Sequence Can Be Recognized during Annotation A major limitation of this approach to annotation is that it only works if similar gene sequences are already in a 5’ UTR (exon) Exon Codons Transcription Promoter (e.g., Translation initiation site regulatory TATA box, element (e.g., CAAT box, enhancers, GC box) silencers) Intron exact matches The rat contig sequence was used as a query sequence to search a mouse database in GenBank Notice that the two sequences show 93 percent identity, strong evidence that this rat contig sequence contains a gene for the insulin receptor database Fortunately, it is not the only way to identify genes Whether the genome under study is from a eukaryote or a prokaryote, several hallmark characteristics of protein-coding genes can be searched for using bioinformatics software (Figure 18–4) We discussed many of these characteristics of a “typical” gene earlier in the text (see Chapters 12 and 15) For instance, gene-regulatory sequences found upstream of genes are marked by identifiable sequences such as promoters, enhancers, and silencers Recall from earlier in the text (see Chapter 15) that TATA box, GC box, and CAAT box sequences are often present in the promoter region of eukaryotic genes Recall also that Exon Codons 5’ splice site (GT) Intron Exon 3’ UTR (exon) Codons 3’ splice site Translation Polyadenylation (AG) termination site site F I G U RE – Characteristics of a protein-coding gene that can be used during annotation to identify a gene in an unknown sequence of genomic DNA Most eukaryotic genes are organized into coding segments (exons) and noncoding segments (introns) When annotating a genome sequence to determine whether it contains a gene, it is necessary to distinguish between introns and exons, gene-regulatory sequences, such as promoters and enhancers, untranslated regions (UTRs), and gene termination sequences 366 18 Ge no mi cs , B ioi n f or mat i cs, and Prote omics splice sites between exons and introns contain a predictable sequence (most introns begin with CT and end with AG) and such splice-site sequences are important for determining intron and exon boundaries Interestingly, current estimates indicate that only percent of human genes are transcribed from a single, linear stretch of DNA that does not contain any introns Annotation is intended to reveal identifiable features that provide clues to the presence of a protein coding gene For example, protein-coding genes contain one or more open reading frames (ORFs), sequences of triplet nucleotides that, after transcription and mRNA splicing, are translated into the amino acid sequence of a protein ORFs typically begin with an initiation sequence, usually ATG,which transcribes into the AUG start codon of an mRNA molecule, and end with a termination sequence, TAA, TAG, or TGA, which corresponds to the stop codons of UAA, UAG, and UGA in mRNA Downstream elements, such as termination sequences and well-defined sequences at the end of a gene, where a polyadenylation sequence signals the addition of a poly-A tail to the 39 end of a mRNA transcript, are also important for annotation (Figure 18–4) Annotation can sometimes be a little bit easier for prokaryotic genes than for eukaryotic genes because there are no introns in prokaryotic genes Gene-prediction programs are used to annotate sequences These programs incorporate search elements for many of the criteria mentioned above and have become invaluable applications of bioinformatics 18–1 In a sequence encompassing 99.4 percent of the euchromatic regions of human chromosome 1, Gregory et al (Gregory, S.G et al., Nature, 441:315–321, 2006) identified 3141 genes (a) How does one identify a gene within a raw sequence of bases in DNA? (b) What features of a genome are used to verify likely gene assignments? (c) Given that chromosome contains approximately percent of the human genome, and assuming that there are approximately 20,000 genes, would you consider chromosome to be “gene rich”? HINT: This problem involves a basic understanding of bioinfor- matics and gene annotation approaches to determine how potential gene sequences can be identified in a stretch of sequenced DNA For more practice, see Problem 11 ES SEN T I A L PO I N T Bioinformatics can be used for sequence annotation to identify protein-coding DNA sequencing and noncoding sequences such as regulatory elements 18.3 Genomics Attempts to Identify Potential Functions of Genes and Other Elements in a Genome As the term suggests, functional genomics is the study of gene functions, based on the resulting RNAs or possible proteins they encode, and the functions of other components of the genome, such as gene-regulatory elements Functional genomics can involve experimental approaches to confirm or refute computational predictions about genome functions (such as the number of protein-coding genes), and it also considers how genes are expressed and the regulation of gene expression Predicting Gene and Protein Functions by Sequence Analysis One approach to assigning functions to genes is to use sequence similarity searches, as described in the previous section Programs such as BLAST are used to search through databases to find alignments between the newly sequenced genome and genes that have already been identified, either in the same or in different species You were introduced to this approach for predicting gene function in Figure 18–3, when we demonstrated how sequence similarity to the mouse gene was used to identify a gene in a rat contig as the insulin receptor gene Inferring gene function from similarity searches is based on a relatively simple idea If a genome sequence shows statistically significant similarity to the sequence of a gene whose function is known, then it is likely that the genome sequence encodes a protein with a similar or related function Another major benefit of similarity searches is that they are often able to identify homologous genes, genes that are evolutionarily related After the human genome was sequenced, many ORFs in it were identified as proteincoding genes based on their alignment with related genes of known function in other species As an example, Figure 18–5 compares portions of the human leptin gene (LEP) with its homolog in mice (ob/Lep) These two genes are over 85 percent identical in sequence The leptin gene was first discovered in mice (recall our discussion about leptin knockout mice in Chapter 17) The match between the LEP-containing DNA sequence in humans and the mouse homolog sequence confirms the identity and leptincoding function of this gene in human genomic DNA If homologous genes in different species are thought to have descended from a gene in a common ancestor, the genes are known as orthologs In Section 18.6 we will consider the globin gene family Mouse and human a-globin genes are orthologs evolved from a common ancestor 18.4 T h e Hu man Ge n om e Proje ct R eve ale d M an y Imp or tant Aspe cts of G e n ome Organ izat ion in Humans Human LEP gene GTCACCAGGATCAATGACAT T TCACACACG - - - TCAGTCTCCTCCAAACAGAAAGTCACC GTCACCAGGATCAATGACAT T TCACACACGCAGTCGGTATCCGCCAAGCAGAGGGTCACT Mouse ob/Lep gene GGT T TGGACT TCA TTCCTGGGCTCCACCCCATCC TGACCT TATCCA AGATGGACCAGACA GGCT TGGACT TCA TTCCTGGGCT TCACCCCAT TC TGAGT T TGTCCA AGATGGACCAGACT CTGGCAGTC T ACCAACAGATCCTCACCAGTA TGCCTTCCAGAAACGTGATCCA A ATA TCC CTGGCAGTC T A TCAACAGGTCCTCACCAGCCTGCCTTCCCA AAATGTGCTGCAGATA GCC F I G U RE – Comparison of the human LEP and mouse ob/Lep genes Partial sequences for these homologs are shown with the human LEP gene on top and the mouse ob/Lep gene sequence below it Notice from the number of identical nucleotides, indicated by vertical lines, that the nucleotide sequence for these two genes is very similar Gaps are indicated by horizontal dashes Homologous genes in the same species are called paralogs The a- and b-globin subunits in humans are paralogs resulting from a gene-duplication event Paralogs often have similar or identical functions Predicting Function from Structural Analysis of Protein Domains and Motifs When a gene sequence is used to predict a polypeptide sequence, the polypeptide can be analyzed for specific structural domains and motifs Identification of protein domains, such as ion channels, membrane-spanning regions, DNAbinding regions, secretion and export signals, and other structural aspects of a polypeptide that are encoded by a DNA sequence, can in turn be used to predict protein function Recall from earlier in the text (see Chapter 15), for example, that the structures of many DNA-binding proteins have characteristic patterns, or motifs, such as the helix-turn-helix, leucine zipper, or zinc-finger motifs These motifs can often easily be searched for using bioinformatics software, and their identification in a sequence is a common strategy for inferring the possible functions of a protein determine the sequence of the human genome and to identify all the genes it contains It has produced a plethora of information, much of which is still being analyzed and interpreted What is clear from all the different kinds of genomes sequenced is that humans and all other species share a common set of genes essential for cellular function and reproduction, confirming that all living organisms arose from a common ancestor Origins of the Project The publicly funded Human Genome Project began in 1990 under the direction of James Watson, the co-discoverer of the double-helix structure of DNA Eventually the public project was led by Dr Francis Collins, who had previously led a research team involved in identifying the CFTR gene as the cause of cystic fibrosis In the United States, the Collins-led HGP was coordinated by the Department of Energy and the National Center of Human Genome Research, a division of the National Institutes of Health It established a 15-year plan with a proposed budget of $3 billion to identify all human genes, originally thought to number between 80,000 and 100,000, to sequence and map them all, and to sequence the approximately billion base pairs thought to comprise the 24 chromosomes (22 autosomes, plus X and Y) in humans Other primary goals of the HGP included the following: • • 18.4 The Human Genome Project Revealed Many Important Aspects of Genome Organization in Humans Now that you have a general idea of the basic strategies used for analyzing a genome, let’s look at the largest genomics project completed to date The Human Genome Project (HGP) was a coordinated international effort to To establish functional categories for all human genes To analyze genetic variations between humans, including the identification of single-nucleotide polymorphisms (SNPs) • To map and sequence the genomes of several model organisms used in experimental genetics, including E coli, S cerevisiae, C elegans, D melanogaster, and M musculus (mouse) • To develop new sequencing technologies, such as highthroughput computer-automated sequencers, in order to facilitate genome analysis • To disseminate genome information among both scientists and the general public ES S E NT I A L PO I N T Functional genomics predicts gene function based on sequence analysis 367 Lastly, to deal with the impact that genetic information would have on society, the HGP set up the ELSI program (standing for Ethical, Legal, and Social Implications) to consider ethical, legal, and social issues arising from the HGP and to ensure that personal genetic information would be safeguarded and not used in discriminatory ways As the HGP grew into an international effort, scientists in 18 countries were involved in the project Much of the work was carried out by the International Human Genome Sequence Consortium, involving nearly 3000 scientists 368 18 Ge no mi cs , B ioi n f or mat i cs, and Prote omics working at 20 centers in six countries (China, France, Germany, Great Britain, Japan, and the United States) In 1999, a privately funded human genome project led by J Craig Venter at Celera Genomics (aptly named from a word meaning “swiftness”) was announced Celera’s goal was to use whole-genome shotgun sequencing and computer-automated high-throughput DNA sequencers to sequence the human genome more rapidly than HGP The public project had proposed using a clone-by-clone approach to sequence the genome Recall that Venter and colleagues had proven the potential of shotgun sequencing in 1995 when they completed the genome for H influenzae Celera’s announcement set off an intense competition between the two teams, which both aspired to be first with the human genome sequence This contest eventually led to the HGP finishing ahead of schedule and under budget after scientists from the public project began to use highthroughput sequencers and whole-genome sequencing strategies as well Major Features of the Human Genome In June 2000, the leaders of the public and private genome projects met at the White House with President Clinton and jointly announced the completion of a draft sequence of the human genome In February 2001, they each published an analysis covering about 96 percent of the euchromatic region of the genome The public project sequenced euchromatic portions of the genome 12 times and set a quality control standard of a 0.01 percent error rate for their sequence Although this error rate may seem very low, it still allows about 600,000 errors in the human genome sequence Celera sequenced certain areas of the genome more than 35 times when compiling the genome The remaining work to complete the genome consisted of filling in gaps clustered around centromeres, telomeres, and repetitive sequences (regions rich in GC base pairs can be particularly tough to sequence and interpret), correcting misaligned segments, and re-sequencing portions of the genome to ensure accuracy In 2003 genome sequencing and error fixing were deemed sufficient to pass the international project’s definition of completion—that it contained fewer than error per 10,000 nucleotides and that it covered 95 percent of the gene-containing portions of the genome Yet even at the time of “completion” there were still some 350 gaps in the sequence that continued to be worked on And of course the HGP did not sequence the genome of every person on Earth The assembled genomes largely consist of haploid genomes pooled from different individuals so that they provide a reference genome representative of major, common elements of a human genome widely shared among populations of humans Examples of major features of the human genome are summarized in Table 18.1 As you can see in this table, many unexpected observations have provided us with major new insights The genome is not static! Genome variations, including the abundance of repetitive sequences scattered throughout the genome, verify that the genome is indeed dynamic, revealing many evolutionary examples of sequences TA BLE 18.1 Major Features of the Human Genome • The human genome contains 3.1 billion nucleotides, but protein-coding sequences make up only about percent of the genome • The genome sequence is ∼99.9 percent similar in individuals of all nationalities SNPs and copy number variations (CNVs) account for genome diversity from person to person • The genome is dynamic At least 50 percent of the genome is derived from transposable elements, such as SINEs, LINEs, and Alu sequences, retrotransposons, and other repetitive DNA sequences • The human genome contains approximately 20,000 proteincoding genes, far fewer than the originally predicted number of 80,000–100,000 genes • The average size of a human gene is ∼25 kb, including generegulatory regions, introns, and exons On average, mRNAs produced by human genes are ∼3000 nt long • Many human genes produce more than one protein through alternative splicing, thus enabling human cells to produce a much larger number of proteins (perhaps as many as 200,000) from only ∼20,000 genes • More than 50 percent of human genes show a high degree of sequence similarity to genes in other organisms; however, more than 40 percent of the genes identified have no known molecular function • Human genes created by duplication events are evident in gene families • Genes are not uniformly distributed on the 24 human chromosomes Gene-rich clusters are separated by gene-poor “deserts” that account for 20 percent of the genome These deserts correlate with G bands seen in stained chromosomes Chromosome 19 has the highest gene density, and chromosome 13 and the Y chromosome have the lowest gene densities • Chromosome contains the largest number of genes, and the Y chromosome contains the smallest number • Human genes are larger and contain more and larger introns than genes in the genomes of invertebrates, such as Drosophila The largest known human gene encodes dystrophin, a muscle protein This gene, associated in mutant form with muscular dystrophy, is 2.5 Mb in length (Chapter 14), larger than many bacterial chromosomes Most of this gene is composed of introns • The number of introns in human genes ranges from (in histone genes) to 234 (in the gene for titin, which encodes a muscle protein) 18.4 The Human Gen ome Proje ct Re vea le d M an y Impor tant Aspe cts of G e nome Organ izat ion in Humans that have changed in structure and location In many ways, the HGP has revealed just how little we know about our genome Two of the biggest surprises discovered by the HGP were that less than percent of the genome codes for proteins and that there are only around 20,000 protein-coding genes Recall that the number of genes had originally been estimated to be about 100,000, based in part on a prediction that human cells produce about 100,000 proteins At least half of the genes show sequence similarity to genes shared by many other organisms, and as you will learn in Section 18.7, a majority of human genes are similar in sequence to genes from closely related species such as chimpanzees There is still no consensus among scientists worldwide about the exact number of human genes One reason is that it is unclear whether or not many of the presumed genes produce functional proteins Genome scientists continue to annotate the genome, and as mentioned earlier, functional genomics studies have important roles in determining whether or not computational predictions about the number of protein-coding and non–protein-coding genes are accurate The number of genes is much lower than the number of predicted proteins in part because many genes code for multiple proteins through alternative splicing Recall from earlier in the text (see Chapter 12), that alternative 369 splicing patterns can generate multiple mRNA molecules, and thus multiple proteins, from a single gene, through different combinations of intron–exon splicing arrangements Initial estimates suggested that over 50 percent of human genes undergo alternative splicing to produce multiple transcripts and multiple proteins Recent studies suggest that ∼94–95 percent of human pre-mRNAs contain multiple exons that are processed to produce multiple transcripts and potentially multiple different protein products Clearly, alternative splicing produces an incredible diversity of proteins beyond simple predictions based on the number of genes in the human genome Functional categories have been assigned for human genes, primarily on the basis of (1) functions determined previously (for example, from recombinant DNA cloning of human genes and known mutations involved in human diseases), (2) comparison to known genes and predicted protein sequences from other species, and (3) predictions based on annotation and analysis of protein functional domains and motifs (Figure 18–6) Although functional categories and assignments continue to be revised, the functions of over 40 percent of human genes remain unknown Determining human gene functions, deciphering complexities of geneexpression regulation and gene interaction, and uncovering the relationships between human genes and phenotypes are among the many challenges for genome scientists Receptor (1543, 5.0%) Kinase (868, 2.8%) Select regulatory molecule (988, 3.2%) signal transaction Cell adhesion (577, 1.9%) Chaperone (159, 0.5%) Miscellaneous (1318, 4.3%) Cytoskeletal structural protein (876, 2.8%) Extracellular matrix (437, 1.4%) Viral protein (100, 0.3%) Immunoglobulin (264, 0.9%) Transfer/carrier protein (203, 0.7%) Ion channel (406, 1.3%) Motor (376, 1.2%) Transcription factor (1850, 6.0%) none Structural protein of muscle (296, 1.0%) d i Nucleic acid enzyme (2308, 7.5%) Proto-oncogene (902, 2.9%) ac c g lei din Select calcium-binding protein (34, 0.1%) c nu bin Intracellular transporter (350, 1.1%) Signaling molecule (376, 1.2%) Transporter (533, 1.7%) en m zy e Transferase (610, 2.0%) Synthase and synthetase (313, 1.0%) Oxidoreductase (656, 2.1%) Lyase (117, 0.4%) Ligase (56, 0.2%) Isomerase (163, 0.5%) Hydrolase (1227, 4.0%) F I G U RE – A representation of the functional categories to which genes in the human genome have been assigned on the basis of similarity to proteins of known function Among the most common genes are those involved in nucleic acid metabolism (7.5 percent of all genes identified), transcription factors Molecular function unknown (12,809, 41.7%) (6.0 percent), receptors (5 percent), hydrolases (4 percent), protein kinases (2.8 percent), and cytoskeletal structural proteins (2.8 percent) A total of 12,809 predicted proteins (41 percent) have unknown functions, indicative of the work that is still needed to fully decipher our genome 370 18 Ge no mi cs , B ioi n f or mat i cs, and Prote omics Individual Variations in the Human Genome The HGP has also shown us that in all humans, regardless of racial and ethnic origins, the genomic sequence is approximately 99.9 percent the same As we discuss in other chapters, most genetic differences between humans result from single-nucleotide polymorphisms (SNPs) and copy number variations (CNVs) Recall that SNPs are single-base changes in the genome and variations of many SNPs are associated with disease conditions For example, SNPs cause sickle-cell anemia and cystic fibrosis Later in the text (see Chapter 19), we will examine how SNPs can be detected and used for diagnosis and treatment of disease After the draft sequence of the human genome was completed, it initially appeared that most genetic variations between individuals (the 0.1 percent differences) were due to SNPs While SNPs are important contributing factors to genome variation, structural differences that we discussed earlier in the text (see Chapter 11) such as deletions, duplications, inversions, and CNVs, which can span millions of bp of DNA, play much more important roles in genome variation than previously thought As we discussed earlier in the text (see Chapters and 11), recall that CNVs are duplications or deletions of relatively large sections of DNA on the order of several hundred or several thousand base pairs Many of the CNVs that vary the most among genomes appear to be at least kilobase Although most human DNA is present in two copies per cell, one from each parent, CNVs are segments of DNA that are duplicated or deleted, resulting in variations in the number of copies of a DNA segment inherited by individuals In some cases CNVs are major deletions removing entire genes; other deletions affect gene function by frameshifts in the reading code CNV sequences that are duplicated can result in overexpression of a particular gene, yet many deleted and duplicated CNVs not present clearly identifiable phenotypes Current estimates of the number of CNVs in an individual genome range from about 12 CNVs to perhaps 4–5 dozen per person Some studies estimate that there may be as many as 1500 CNVs greater than kb among the human genome Other studies claim there are more than 1.5 million deletions of less than 100 bp that contribute to genome variation between individuals Accessing the Human Genome Project on the Internet It is now possible to access databases and other sites on the Internet that display maps for all human chromosomes You will visit a number of these databases in Exploring Genomics exercises Figure 18–7(a) displays a partial gene map for chromosome 12 that was taken from an NCBI database called Map Viewer You may already have used Map Viewer for the Exploring Genomics exercises ealier in the text (see Chapter 7) This image shows an ideogram, or cytogenetic map, of chromosome 12 To the right of the ideogram is a column showing the contigs (arranged lying vertically) that were aligned to sequence this chromosome The Hs UniG column displays a histogram representation of gene density on chromosome 12 Notice that relatively few genes are located near the centromere Gene symbols, loci, and gene names (by description) are provided for selected genes; in this figure only 20 genes are shown When accessing these maps on the Internet, one can magnify, or zoom in on, each region of the chromosome, revealing all genes mapped to a particular area You can see that most of the genes listed here have been assigned descriptions based on the functions of their products, some of which are transmembrane proteins, some enzymes such as kinases, some receptors, including several involved in olfaction, and so on Other genes are described in terms of hypothetical products; they are presumed to be genes based on the presence of ORFs, but their function remains unknown [Figure 21–1(a)] The HGP’s most valuable contribution will perhaps be the identification of disease genes and the development of new treatment strategies as a result Thus, extensive maps have been developed for genes implicated in human disease conditions The disease gene map of chromosome 21 shown in Figure 18–7(b) indicates genes involved in amyotrophic lateral sclerosis (ALS), Alzheimer disease, cataracts, deafness, and several different cancers Later in the text (see Chapter 19) we discuss implications of the HGP for the identification of genes involved in human genetic diseases, and for disease diagnosis, detection, and gene therapy applications ESSEN T IAL PO IN T The Human Genome Project revealed many surprises about human genetics, including gene number, the high degree of DNA sequence similarity between individuals and between humans and other species, and showed that many genes encode multiple proteins 18.5 After the Human Genome Project: What Is Next? The Human Genome Project and the development of genomics techniques have been largely responsible for launching a new era of biological research—the era of “omics.” It seems that every year, more areas of biological 18.5 Af te r th e Hu man G e no m e Proje ct: What Is N ext ? (a) Ideogram 12p13.33 12p13.32 12p13.31 12p13.2 12p13.1 12p12.3 12p12.2 12p12.1 12p11.23 12p11.22 12p11.21 12p11.1 12q11 12q12 12q13.11 12q13.12 12q13.13 12q13.2 12q13.3 12q14.1 12q14.2 12q14.3 12q15 12q21.1 12q21.2 12q21.31 12q21.32 12q21.33 12q22 12q23.1 12q23.2 12q23.3 12q24.11 12q24.12 12q24.13 12q24.21 12q24.22 12q24.23 12q24.31 12q24.32 12q24.33 Contig NT_009759 10M 20M NT_009714 HS UniG Gene Symbol Hs.279594 Hs.544577 Hs.479728 Hs.524219 Hs.458355 Hs.567497 Hs.419240 Hs.212838 Hs.446149 30M Locus FBXL14 12p13.33 NTF3 12p13 CDKN1B 12p13.1-p12 FLJ22028 12p12.1 LOC260338 12p11.1 40M 50M NT_029419 60M Hs.524390 Hs.642755 Hs.35052 Hs.369761 Hs.433845 Hs.533782 Hs.406013 Hs.292063 Hs.546261 Hs.632717 Hs.406510 Hs.505735 Hs.75069 Hs.527861 LOC728166 12q13.11 70M 90M NT_019546 Hs.290404 100M Hs.192374 110M Hs.528668 NT_009775 120M NT_009755 130M (b) Hs.642609 NT_024477 Hs.433863 Hs.448226 Hs.546285 Hs.442798 Hs.520348 Hs.10842 F-box and leucine-rich repeat protein 14 Neurotrophin Cyclin-dependent kinase inhibitor 1B (p27, Kip 1) Hypothetical protein FLJ22028 HSA12cenq11 beta-tubulin 4Q (TUBB4Q) pseudogene Hypothetical protein LOC728166 C12orf41 12q13.11 PRKAG1 12q12-q14 FAIM2 12q13 Chromosome 12 open reading frame 41 Protein kinase, AMP-activated, gamma noncatalytic subunit Fas apoptotic inhibitory molecule ACVRL1 12q11-q14 Activin A receptor type II-like OR6C65 12q13.2 OR6C2 12q13.2 MIP 12q13 Olfactory receptor, family 6, subfamily C, member 65 Olfactory receptor, family 6, subfamily C, member Major intrinsic protein of lens fiber LOC338805 12p14.1 TMEM5 12q14.2 TRHDE 12q15-q21 CRADD 12q21.33-q23.1 TMCC3 12q22 C12orF52 12q24.13 Hs.524599 80M Description TBX3 12q24.1 Similar to heat shock 70kD protein binding protein Transmembrane protein Thyrotropin-releasing hormone degrading enzyme CASP2 and RIPK1 domain containing adaptor with death domain Transmembrane and coiled-coil domain family Chromosome 12 open reading frame 52 T-box (ulnar mammary syndrome) Chromosome 21 50 million bases Coxsackie and adenovirus receptor Amyloidosis cerebroarterial, Dutch type Alzheimer disease, APP-related Schizophrenia, chronic Usher syndrome, autosomal recessive Amyotrophic lateral sclerosis Oligomycin sensitivity Jervell and Lange-Nielsen syndrome Long QT syndrome Down syndrome cell-adhesion molecule Homocystinuria Cataract, congenital, autosomal dominant Deafness, autosomal recessive Myxovirus (influenza) resistance Leukemia, acute myeloid Myeloproliferative syndrome, transient Leukemia transient of Down syndrome Enterokinase deficiency Multiple carboxylase deficiency T-cell lymphoma invasion and metastasis Mycobacterial infection, atypical Down syndrome (critical region) Autoimmune polyglandular disease, type Bethlem myopathy Epilepsy, progressive myoclonic Holoprosencephaly, alobar Knobloch syndrome Hemolytic anemia Breast cancer Platelet disorder, with myeloid malignancy F I G U RE – (a) A gene map for chromosome 12 from the NCBI database Map Viewer (b) Partial map of disease genes on human chromosome 21 Maps such as this depict genes thought to be involved in human genetic disease conditions 371 18 372 Ge no mi cs , B ioi n f or mat i cs, and Prote omics research are being described as having an omics connection Some examples of “omics” are • proteomics—the analysis of all the proteins in a cell or tissue • metabolomics—the analysis of proteins and enzymatic pathways involved in cell metabolism • glycomics—the analysis of the carbohydrates of a cell or tissue • toxicogenomics—the analysis of the effects of toxic chemicals on genes, including mutations created by toxins and changes in gene expression caused by toxins • metagenomics—the analysis of genomes of organisms collected from the environment • nutrigenomics—understanding the relationships between genes and diet • pharmacogenomics—the development of customized medicine based on a person’s genetic profile for a particular condition • areas for human genome research have emerged, including cancer genome projects, analysis of the epigenome (including a Human Epigenome Project that is creating hundreds of maps of epigenetic changes in different cell and tissue types and evaluating potential roles of epigenetics in complex diseases), characterization of SNPs (the International HapMap Project) and CNVs for their role in genome variation, disease, and pharmacogenomics applications We have discussed aspects of a cancer genome project (Cancer Genome Atlas Project) earlier in the text (see Chapter 16) The epigenome is covered in depth later in the text (see Special Topic Chapter 1—Epigenetics) SNPs and pharmacogenomics are discussed later as well (see Special Topic Chapter 4—Genomics and Personalized Medicine) Here we consider several examples of genome research that are extensions of the HGP ESSEN T IAL POIN T Genomics has led to a number of other related “omics” disciplines that are rapidly changing how modern biologists study DNA, RNA, and proteins and many aspects of cell function Personal Genome Projects and Personal Genomics transcriptomics—the analysis of all expressed genes in a cell or tissue As we discussed earlier in this chapter and earlier in the text (see Chapter 17), new sequencing technologies, capable of generating longer sequence reads at higher speeds with greater accuracy, have greatly reduced the cost of DNA sequencing, and expectations for continued cost reductions along with continued technological advances are high (see Figure 18–8) These expectations led several companies to We will consider several of these genomics disciplines in other parts of this chapter Since completion of a reference sequence of the human genome, studies have continued at a very rapid pace For example, as a result of the HGP, many other major theme 300 $10,000 Billions of base pairs 250 Cost p er milli 200 150 on base pairs o f seque Automated Sanger Sequencing: At the peak of this technique, a single machine could produce hundreds of thousands of base pairs in a single run 100 nce (lo g scale ) Sequencing by Synthesis: Other companies such as Solexa (now Illumina) modified the next-generation, sequencing-by-synthesis techniques and can produce billions of base pairs in a single run 50 $100 nce $10 in ence stored Gene sequ tabases l public da na io at rn inte First drafts of two composite haploid human genomes 2003 2004 2005 Sequencing by Ligation: This technique employed in SOLiD instruments uses a different chemistry from previous technologies and samples every base twice, reducing the error rate que n Se gu hot eS nom -Ge 2002 J Craig Venter diploid genome $1000 ole Wh 2001 Two Korean males including Seong-Jin Kim, Stephen Quake, another cancer genome, George Church, a Yoruban female, another male, and four others James Watson, a woman with acute myeloid leukemia, a Yoruba male from Nigeria and the first Asian genome Human Genome Project completed 2000 A glioma cell line, Inuk, Gubi and Archbishop Desmond Tutu, James Lupski, and family of four 454 Pyrosequencing: 454 sequencing is considered the first “next-generation” technique A machine could sequence hundreds of millions of base pairs in a single run 2006 Year 2007 2008 Third-Generation Sequencing: Companies such as Helicos BioSciences already read sequence from short, single DNA molecules Others, such as Pacific Biosciences, Oxford Nanopore, and Ion Torrent, can read from longer molecules as they pass through a pore $1 2009 F I G U RE – Human genome sequence explosion Sequencing costs have steadily declined since 2000 due to innovations in sequencing technology As a result, notice that the amount of whole-genome shotgun sequencing data stored in public databases—which include data on several individual genomes—has dramatically increased 2010 2011 2012 18.5 propose WGS for individual people—a personal genomics approach competition Two programs funded by the National Institutes of Health challenged scientists to develop sequencing technologies to complete a human genome for $1000 by 2014 (see the “Genetics, Technology, and Society” essay in Chapter 19 Since 2005, the cost of DNA sequencing has dropped from about $1000 per megabase to ten cents per megabase!) In 2012, Life Technologies announced that their Ion Proton technology was used to sequence a genome for $1000 Whether the $1000 mark represents the costs of reagents to sequence a genome or actual costs when sequence preparation, labor, and analysis of the genome are taken into account can be debated Whether the accuracy and completeness of the sequence coverage reported by Life Technologies is sufficient to definitively state that the $1000 genome threshold has been achieved has been challenged by other scientists As you will learn later in the text (see Chapter 19), having somebody such as a geneticist analyze genome data and consider how genome variations may affect a person’s health takes a lot of time and money So even if the cost of sequencing a person’s genome is less than $1000, interpreting genome data to make sense for medical treatment may cost hundreds of thousands of dollars Pursuit of the $1000 genome was an indicator that DNA sequencing may eventually be affordable enough for individuals to consider acquiring a readout of their own genetic blueprint The genome of James D Watson, who together with Francis Crick discovered the structure of DNA, was the focus of “Project Jim” by the Connecticut company 454 Life Sciences, which wanted to sequence the genome of a highprofile person and decided that the co-discoverer of DNA structure and the first director of the U.S Human Genome Project should be that person Human genome pioneer J Craig Venter had his genome completed by the J Craig Venter Institute and deposited into GenBank in May 2007 George Church of Harvard and his colleagues started a Personal Genome Project (PGP) and recruited volunteers to provide DNA for individual genome sequencing on the understanding that the genome data will be made publicly available Church’s genome has been completed and is available online The concept of a personal genome project raises the obvious question: would you have your genome sequenced for $10,000, $1000, or even for free? Since the Watson and Venter genomes were completed, in 2008 the first complete genome sequence was provided for an individual “ancient” human, a Palaeo-Eskimo, obtained from ∼4000-year-old permafrost-preserved hair This work recovered about 78 percent of the diploid genome and revealed many interesting SNPs (of which about percent have not been previously reported) Af te r th e Hu man G e no m e Proje ct: What Is N ext ? 373 As of 2014, more than 30,000 individual human genomes have been sequenced Exome Sequencing The focus of personal genome projects has shifted toward exome sequencing, sequencing the 180,000 exons in a person’s genome This can be done at a cost of less than $1000 with 6100* coverage Exome sequencing reveals mutations involved in disease by focusing only on exons as protein-coding segments of the genome Of course, a limitation of this approach is its failure to identify mutations in gene-regulatory regions that influence gene expression As an example, a group of scientists called the 1000 Genomes Project Consortium reported on the genomes of 1092 individuals from 14 populations representing Europe, East Asia, sub-Saharan Africa, and the Americas Wholegenome and exome-sequencing data revealed more than 38 million SNPs and many other structural variations (CNVs) One interpretation of this work is that it reveals clear variations in individuals and associates particular diseases with geographic or ancestral background Thus sequencing genomes of individuals from diverse populations can help us better understand the spectrum of human genetic variation and to learn the causes of genetic diseases across diverse groups We will come back to the topic of exome sequencing later in the text (see Chapter 19), when we discuss genetic testing Another particularly beneficial aspect of personal genome projects is the insight they are providing into genome variation The HGP combined samples from different individuals to create a reference genome for a haploid genome Personal genome projects sequence a diploid genome; consequently, such projects indicate that haploid genome comparisons may underestimate the extent of genome variation between individuals by five-fold or more For example, when Venter’s genome was analyzed, over million variations were found between his maternal and paternal chromosomes alone From what we are learning about personal genomes, genome variation between individuals may be closer to 0.5 percent than 0.1 percent, and in a 3-billion-bp genome this is a significant difference in sequence variation Integrating genome data from several complete individual genomes of individuals from different ethnic groups will also be of great value in evolutionary genetics to address fundamental questions about human diversity, ancestry, and migration patterns In a related matter, PGPs are revealing that there can be significant mosaicism in human somatic cells Thus, cells in an individual person not all contain identical genomes Because of the sophistication of WGS methods, mosaicisms for SNPs and CNVs have been found in skin, brain, blood, and stem cells from the same individual We 374 18 Ge no mi cs , B ioi n f or mat i cs, and Prote omics are only beginning to understand the frequency and effects of genetic mosaicism on health and disease Later in the text (see Chapter 19), we will consider how various approaches to personal genomics can be used for genetic testing • There are 20,687 protein-coding genes in the human genome • A total of 11,224 sequences are characterized as pseudogenes, previously thought to be inactive in all individuals Some of these are inactive in most individuals but occasionally active in certain cell types of some individuals, which may eventually warrant their reclassification as active, transcribed genes and not pseudogenes • SNPs associated with disease are enriched within noncoding functional elements of the genome, often residing near protein-coding genes ES SEN T I A L PO I N T Personal genome sequencing and exome sequencing will provide insight into individual variations in genomes and has tremendous potential for diagnosis and treatment of genetic diseases Encyclopedia of DNA Elements (ENCODE) Project In 2003, a few months after the announcement that the human genome had been sequenced, a group of about three dozen research teams around the world began the Encyclopedia of DNA Elements (ENCODE) Project A main goal of ENCODE was to use both experimental approaches and bioinformatics to identify and analyze functional elements of the genome, such as transcriptional start sites, promoters, and enhancers, which regulate the expression of human genes Recall from our previous discussions that only a relatively small percentage (less than percent) of the human genome codes for proteins ENCODE focused not on genes but on all of the sequences, commonly referred to as “junk” DNA So what are all of the other bases in the genome doing? We know that such sequences are important for chromosome structure, the regulation of gene expression, and other roles Just because these sequences themselves not code for protein does not mean that they are all unimportant Non–protein-coding sequences are discussed in greater detail later in the text (see Special Topic Chapter 2—Emerging Roles of RNA) ENCODE studied gene expression in 147 different cell types because genome activity differs from cell to cell After about a decade of research and a cost of $288 million, in 2012 a group of 30 research papers were published revealing the major findings of the ENCODE project Highlights of what ENCODE revealed include the following • • The majority, ∼80 percent, of the human genome is considered functional This is partly because large segments of the genome are transcribed into RNA Most of these RNAs not encode proteins These various RNAs include tRNA, rRNAs, and miRNAs For example, at least 13,000 sequences specify long noncoding RNAs (lncRNAs) Other reports suggest there may be over 17,000 lncRNAs It may turn out that the number of noncoding RNA sequences will outnumber protein-coding genes The functional sequences also include gene-regulatory regions: ∼70,000 promoter regions and nearly 400,000 enhancer regions The ENCODE findings have broadly defined the functional roles of the genome to include encoding proteins or noncoding RNAs and displaying biochemical properties such as binding regulatory proteins that influence transcription or chromatin structure A relatively large body of geneticists and other scientists not agree with ENCODE’s definition of functional sequences One reason cited is that ENCODE did not adequately address many of the repetitive sequences in the genome such as transposons, LINEs, SINEs, and other sequences such as telomeres and centromeres There has also been significant debate about the value of ENCODE, given the cost of the project But research teams are already using information from ENCODE to identify risk factors for certain diseases, with the hopes of developing appropriate cures and treatments EVOLVING CONCEPT OF A GENE Based on the work of the ENCODE project, we now know that DNA sequences that have previously been thought of as “junk DNA,” which not encode proteins, are nonetheless often transcribed into what we call noncoding RNA (ncRNA) Since the function of some these RNAs is now being determined, we must consider whether the concept of the gene should be expanded to include DNA sequences that encode ncRNAs At this writing, there is no consensus, but it is important for you to be aware of these current findings as you develop your final interpretation of a gene The Human Microbiome Project In 2007 the National Institutes of Health announced plans for the Human Microbiome Project (HMP), a $170 million project to complete the genomes of an estimated 600–1000 microorganisms, bacteria, viruses, and yeast that live on and inside humans Microorganisms outnumber human cells by about 10 to Many microbes, such as E coli in the digestive tract, have important roles in human health, and 18.5 of course other microbes make us ill The HMP has several major goals, including: • Determining if individuals share a core human microbiome • Understanding whether changes in the microbiome can be correlated with changes in human health • Developing new methods, including bioinformatics tools, to support analysis of the microbiome • Addressing ethical, legal, and social implications raised by human microbiome research Does this sound familiar? Recall that addressing ethical, legal, and social issues was a goal of the HGP The HMP has involved about 200 scientists at 80 institutions In 2012 a series of papers were published summarizing recent findings from the HMP The HMP analyzed 15 body sites from males and 18 sites from females from 242 healthy individuals in the United States and applied WGS of genomes for the microbes and viruses present at these sites Each person was sampled up to three times over nearly two years Researchers used bioinformatics to compare microbial and viral genome sequences obtained to sequences in publicly available databases In addition to WGS analysis, sequences for 16S rRNA gene sequences in particular were used to compare bacterial samples More than 2000 microbial sequences isolated from the human body have been sequenced to date The HMP has amassed more than 1000 times the sequencing data generated by the Human Genome Project What concepts have we formulated about the human microbiome so far? • Sequence data from the HMP have identified an estimated 81 to 99 percent of the microbes and viruses distributed among body areas in human males and females • As many as 1000 bacterial strains may be present in each person • An estimated 10,000 bacterial species may be part of the human microbiome • The microbiome starts at birth Babies pick up bacteria from their mothers’ microbiome • A surprise to HMP scientists is that the microbiome can be substantially different from person to person Also, sequences for disease-causing bacteria are present in everyone’s microbiome • In the human gut, for example, although the microbiome differs from person to person, it remains relatively stable over time in individuals Af te r th e Hu man G e no m e Proje ct: What Is N ext ? 375 There is no single “reference” human microbiome to which people can be compared Microbial diversity varies greatly from individual to individual, and a personalization of the microbiome occurs in individuals For instance, comparing sequences of the microbiomes from two healthy people of equivalent age reveals microbiomes that can be quite different There are, however, similarities in certain parts of the body, with signature bacteria and characteristic genes associated with a particular location in the body Knowledge about the personalized nature of the microbiome will be valuable for improving human health and medicine, which in the future may include microbiomespecific therapeutic drugs Scientists are trying to establish criteria for a healthy microbiome, which is expected to help determine, for example, how bacteria help maintain normal health, how antibiotics can disturb a person’s microbiome, and why certain individuals are susceptible to certain diseases, especially chronic conditions such as psoriasis, irritable bowel syndrome, and potentially even obesity Related to this project, a team of researchers at the University of California, Los Angeles, analyzed DNA sequences from 101 college students, 49 of whom had acne and 52 of whom did not Over 1000 strains of Propionibacterium acnes (P acnes) were isolated Using WGS and bioinformatics, researchers clustered these strains into ten strain types (related strains) Six of these types were more common among acne-prone students, and one type appeared repeatedly in skin samples from students without acne Sequence analysis of types associated with acne indicated gene clusters that may contribute to the skin disease Further analysis of these strain types may help dermatologists develop new drugs targeted at killing acne-causing strains of P acnes No Genome Left Behind and the Genome 10K Plan Without question, new sequencing technologies that have been developed as a result of the HGP are an important part of the transformational effect the HGP has had on modern biology In the late 1990s, a room full of sequencers and several million dollars were required to sequence the 97-Mb genome of C elegans As a sign of modern times in the world of genomics, recently two sequencers and $500,000 produced a reasonably complete draft of the 750-Mb cod genome—in a month! Recent headline-grabbing genomes that have been completed include: • the tomato, which has 31,760 genes, more genes than humans! • the potato, a vegetable that shares 92 percent of its DNA with tomatoes, a fruit 376 18 Ge no mi cs , B ioi n f or mat i cs, and Prote omics • chickpea, the second most widely grown legume after the soybean • the red-spotted newt, which has a genome of almost 10 billion base pairs! Modern sequencing technologies are asking some to consider the question, “What would you if you could sequence everything?” Partners around the world, including genome scientists and museum curators, have proposed sequencing 10,000 vertebrate genomes, the Genome 10K plan Shortly after the HGP finished, the National Human Genome Research Institute (NHGRI) assembled a list of mammals and other vertebrates as priorities for genome sequencing in part because of their potential benefit for learning about the human genome through comparative genomics Genome 10K will also provide insight into genome evolution and speciation Stone-Age Genomics In yet another example of how genomics has taken over areas of DNA analysis, a number of labs around the world are involved in analyzing “ancient” DNA These so-called stone-age genomics studies are generating fascinating data from miniscule amounts of ancient DNA obtained from bone and other tissues such as hair that are tens of thousands to about 700,000 years old, and often involve samples from extinct species Analysis of DNA from a 2400-year-old Egyptian mummy, bison, mosses, platypus, mammoths, Pleistocene-age cave bears and polar bears, coelacanths and Neanderthals are some of the most prominent examples of stone-age genomics In 2013, scientists reported the oldest complete genome sequence generated to date It came from a 700,000-year-old bone fragment from an ancient horse uncovered from the frozen ground in the Yukon Territory of Canada This result is interesting in part because evolutionary biologists have used genomic data to estimate that ancient ancestors of modern horses branched off from other animal lineages around million years ago— about twice as long ago as prior estimates In 2005, researchers from McMaster University in Canada and Pennsylvania State University published about 13 million bp from a 27,000-year-old woolly mammoth This study revealed a ∼98.5 percent sequence identity between mammoths and African elephants Subsequent studies by other scientists have used whole-genome shotgun sequencing of mitochondrial and nuclear DNA from Siberian mammoths to provide data on the mammoth genome These studies suggest that the mammoth genome differs from the African elephant by as little as 0.6 percent These studies are also great demonstrations of how stable DNA can be under the right conditions, particularly when frozen In the future, it may be possible to produce complete genome sequences from samples that are several million years old Perhaps even more intriguing are similarities that have been revealed between the mammoth and human genomes For example, 18–8, when the gene sequences from human chromosomes were aligned with sequences from the mammoth genome, approximately 50 percent of mammoth genes showed sequence alignment with human genes on autosomes In Section 18.6 we will discuss recent work on the Neanderthal genome Obtaining the genome of a human ancestor this old was previously unimaginable This work is providing new insights into our understanding of human evolution ESSEN T IAL POIN T Since completion of the Human Genome Project, human genome research has focused on individual human genomes (personalized genomics) and other efforts such as the Human Microbiome Project 18.6 Comparative Genomics Analyzes and Compares Genomes from Different Organisms As of 2014, over 4400 whole genomes have been sequenced— including many model organisms and a number of viruses About 200 of the completed genomes are from eukaryotes This is quite extraordinary progress in a relatively short time span Among these organisms are yeast (Saccharomyces cerevisiae)—the first eukaryotic genome to be sequenced to bacteria such as E coli, the nematode roundworm (Caenorhabditis elegans), the thale cress plant (Arabidopsis thaliana), mice (Mus musculus), zebrafish (Danio rerio), and of course Drosophila In the past few years, genomes for chimpanzees, dogs, chickens, gorillas, sea urchins, honey bees, pigs, pufferfish, rice, and wheat have all been sequenced These studies have demonstrated not only significant differences in genome organization between prokaryotes and eukaryotes but also many similarities between genomes of nearly all species Similar gene sets are used by organisms for basic cellular functions, such as DNA replication, transcription, and translation These genetic relationships are the rationale for using model organisms to study inherited human disorders, the effects of the environment on genes, and interactions of genes in complex diseases, such as cardiovascular disease, diabetes, neurodegenerative conditions, and behavioral disorders 18.6 Compa r at i ve G e no mi cs A na lyz e s and Co mpa re s G e n o me s fro m Diffe re nt O rgan i sms In this section we discuss interesting aspects of genomes in selected organisms Comparative genomics compares the genomes of different organisms to answer questions about genetics and other aspects of biology It is a field with many research and practical applications, including gene discovery and the development of model organisms to study human diseases It also incorporates the study of gene and genome evolution and the relationship between organisms and their environment Comparative genomics can reveal genetic differences and similarities between organisms to provide insight into how those differences contribute to differences in phenotype, life cycle, or other attributes, and to ascertain the evolutionary history of those genetic differences Prokaryotic and Eukaryotic Genomes Display Common Structural and Functional Features and Important Differences Since most prokaryotes have small genomes amenable to shotgun cloning and sequencing, many early genome projects have focused on prokaryotes, and more than 900 additional projects to sequence prokaryotic genomes are now under way Many of the prokaryotic genomes already sequenced are from organisms that cause human diseases, such as cholera, tuberculosis, and leprosy Traditionally, the bacterial genome has been thought of as relatively small (less than Mb) and contained within a single circular DNA molecule E coli, used as the prototypical bacterial model organism in genetics, has a genome with these characteristics However, the flood of genomic information now available has challenged the validity of this viewpoint for bacteria in general Although most prokaryotic genomes are small, their sizes vary across a surprisingly wide range In fact, there is some overlap in size between larger bacterial genomes (30 Mb in Bacillus megaterium) and smaller eukaryotic genomes (12.1 Mb in yeast) Gene number in bacterial genomes also demonstrates a wide range, from less than 500 to more than 5000 genes, a ten-fold difference In addition, although many bacteria have a single, circular chromosome, there is substantial variation in chromosome organization and number among bacterial species An increasing number of genomes composed of linear DNA molecules are being identified, including the genome of Borrelia burgdorferi, the organism that causes Lyme disease Sequencing of the Vibrio cholerae genome (the organism responsible for cholera) revealed the presence of two circular chromosomes Other bacteria that have genomes with two or more chromosomes include Rhizobium radiobacter (formerly Agrobacterium tumefaciens), Deinococcus radiodurans, and 377 Rhodobacter sphaeroides The finding that some bacterial species have multiple chromosomes raises questions both about how replication and segregation of their chromosomes are coordinated during cell division and about what undiscovered mechanisms of gene regulation may exist in bacteria The answers may provide clues about the evolution of multichromosome eukaryotic genomes We can make two generalizations about the organization of protein-coding genes in bacteria First, gene density is very high, averaging about one gene per kilobase of DNA For example, the genome of E coli strain K12, which was sequenced in 1997 as the second prokaryotic genome to be sequenced, is 4.6 Mb in size, and it contains 4289 proteincoding genes in its single, circular chromosome This close packing of genes in prokaryotic genomes means that a very high proportion of the DNA (approximately 85 to 90 percent) serves as coding DNA Typically, only a small amount of a bacterial genome is noncoding DNA, often in the form of regulatory sequences or of transposable elements that can move from one place to another in the genome The second generalization we can make is that bacterial genomes contain operons Recall from an earlier chapter (see Chapter 15) that operons contain multiple genes functioning as a transcriptional unit whose protein products are part of a common biochemical pathway) In E coli, 27 percent of all genes are contained in operons (almost 600 operons) The basic features of eukaryotic genomes are similar in different species, although genome size in eukaryotes is highly variable (Table 18.2) Genome sizes range from about 10 Mb in fungi to over 100,000 Mb in some flowering plants (a ten thousand-fold range); the number of chromosomes per genome ranges from two to the hundreds (about a hundred-fold range), but the number of genes varies much less dramatically than either genome size or chromosome number Eukaryotic genomes have several features not found in prokaryotes: • Gene density In prokaryotes, gene density is close to gene per kilobase In eukaryotic genomes, there is a wide range of gene density In yeast, there is about gene/ kb, in Drosophila, about gene/13 kb, and in humans, gene density varies greatly from chromosome to chromosome Human chromosome 22 has about gene/ 64 kb, while chromosome 13 has gene/155 kb of DNA • Introns Most eukaryotic genes contain introns There is wide variation among genomes in the number of introns they contain and also wide variation from gene to gene The entire yeast genome has only 239 introns, whereas just a single gene in the human genome can contain more than 100 introns Regarding intron 18 378 TA B L E Ge no mi cs , B ioi n f or mat i cs, and Prote omics Comparison of Selected Genomes Organism (Scientific Name) Approximate Size of Genome (in million [megabase, Mb] or billion [gigabase, Gb] bases) (Date Completed) Bacterium (Escherichia coli) Chicken (Gallus gallus) Dog (Canis familiaris) Chimpanzee (Pan troglodytes) ∼3 Gb (2005) Fruit fly (Drosophila melanogaster) 165 Mb (2000) Human (Homo sapiens) 3.1 Gb (2004) ∼2.5 Gb (2002) ∼3 Gb (2012) Mouse (Mus musculus) Pig (Sus scrofa) 4.6 Mb (1997) Gb (2004) 2.5 Gb (2003) ∼2.75 Gb (2004) Rat (Rattus norvegicus) Rhesus macaque (Macaca mulatta) 2.87 Gb (2007) Rice (Oryza sativa) 389 Mb (2005) Roundworm (Caenorhabditis elegans) 97 Mb (1998) 814 Mb (2006) Sea urchin (Strongylocentrotus purpuratus) Thale cress (plant) (Arabidopsis thaliana) 140 Mb (2000) Yeast (Saccharomyces cerevisiae) 12 Mb (1996) Number of Genes Approximate Percentage of Genes Shared with Humans 4403 ∼20,000–23,000 60% not determined ∼18,400 ∼20,000–24,000 75% ∼13,600 ∼20,000 50% ∼30,000 80% 21,640 ∼22,000 84% 98% 100% 80% ∼20,000 ∼41,000 93% 19,099 ∼23,500 40% ∼27,500 ∼5700 not determined 60% not determined 30% Adapted from Palladino, M A Understanding the Human Genome Project, 2nd ed Benjamin Cummings, 2006 Note: Billion bp (gigabase, Gb) size, generally the size in eukaryotes is correlated with genome size Smaller genomes have smaller average introns, and larger genomes have larger average intron sizes But there are exceptions For example, the genome of the pufferfish (Fugu rubripes) has relatively few introns • Repetitive sequences The presence of introns and the existence of repetitive sequences are two major reasons for the wide range of genome sizes in eukaryotes In some plants, such as maize, repetitive sequences are the dominant feature of the genome The maize genome has about 2500 Mb of DNA, and more than two-thirds of that genome is composed of repetitive DNA In the human, as discussed previously, about half of the genome is repetitive DNA ES SEN T I A L PO I N T Genomic analysis of prokaryotes and eukaryotes has revealed similarities and important fundamental differences in genome size, gene number, and genome organization Comparative Genomics Provides Novel Information about the Genomes of Model Organisms and the Human Genome As mentioned earlier, the Human Genome Project sequenced genomes from a number of model nonhuman organisms too, including E coli, Arabidopsis thaliana, Saccharomyces cerevisiae, Drosophila melanogaster, the nematode roundworm Caenorhabditis elegans, and the mouse Mus musculus Complete genome sequences of such organisms have been invaluable for comparative genomics studies of gene function in these organisms and in humans As shown in Table 18.2, the number of genes humans share with other species is very high, ranging from about 30 percent of the genes in yeast to ∼80 percent in mice and ∼98 percent in chimpanzees The human genome even contains around 100 genes that are also present in many bacteria Comparative genomics has shown us that many mutated genes involved in human disease are also present in model organisms For instance, approximately 60 percent of genes mutated in nearly 300 human diseases are also found in Drosophila These include genes involved in prostate, colon, and pancreatic cancers; cardiovascular disease; cystic fibrosis; and several other conditions Here we consider how comparative genomics studies of several model organisms (sea urchins, dogs, chimpanzees, and Rhesus monkeys) and the Neanderthal genome have revealed interesting elements of the human genome The Sea Urchin Genome In 2006, researchers from the Sea Urchin Genome Sequencing Consortium completed the 814 million bp genome of the sea urchin Strongylocentrotus purpuratus Sea urchins are shallow-water marine invertebrates that have served 18.6 Compa r at i ve G e no mi cs A na lyz e s and Co mpa re s G e n o me s fro m Diffe re nt O rgan i sms as important model organisms, particularly for developmental biologists One reason is that the sea urchin is a nonchordate deuterostome, and humans, with their spinal cord, are chordate deuterostomes Fossil records indicate that sea urchins appeared during the Early Cambrian period, around 520 mya A combination of whole-genome shotgun sequencing and map-based cloning in BACs was used to complete the genome Sea urchins have an estimated 23,500 genes, including representative genes for just about all major vertebrate gene families Sequence alignment and homology searches demonstrate that the sea urchin contains many genes with important functions in humans, yet interestingly, important genes in flies and worms, such as certain cytochrome P-450 genes that play a role in the breakdown of toxic compounds, are missing from sea urchins The sea urchin genome also has an abundance (∼25 to 30 percent) of pseudogenes—nonfunctional relatives of proteincoding genes (we meet pseudogenes again in the next subsection) Sea urchins have a smaller average intron size than humans, supporting the general trend revealed by comparative genomics that intron size is correlated with overall genome size Urchins have nearly 1000 genes for sensing light and odor, indicative of great sensory abilities In this respect, their genome is more typical of vertebrates than invertebrates A number of orthologs of human genes involved in hearing and balance are present in the sea urchin, as are many humandisease-associated orthologs, including protein kinases, GTPases, transcription factors, innate immunity, transporters, and low-density lipoprotein receptors Sea urchins and humans share approximately 7000 orthologs The Dog Genome In 2005 the genome for “man’s best friend” was completed, and it revealed that we share about 75 percent of our genes with dogs (Canis familiaris), providing a useful model with which to study our own genome Dogs have a genome that is similar in size to the human genome: about 2.5 billion base pairs with an estimated 18,400 genes The dog offers several advantages for studying heritable human diseases Dogs share many genetic disorders with humans, including over 400 single-gene disorders, sex-chromosome aneuploidies, multifactorial diseases (such as epilepsy), behavioral conditions (such as obsessive-compulsive disorder), and genetic predispositions to cancer, blindness, heart disease, and deafness The molecular causes of at least 60 percent of inherited diseases in dogs, such as point mutations and deletions, are similar or identical to those found in humans In addition, at least 50 percent of the genetic diseases in dogs are breedspecific, so that the mutant allele segregates in relatively 379 homogeneous genetic backgrounds Dog breeds resemble isolated human populations in having a small number of founders and a long period of relative genetic isolation These properties make individual dog breeds useful as models of human genetic disorders Dog breeders are now using genetic tests to screen dogs for inherited disease conditions, for coat color in Labrador retrievers and poodles, and for fur length in Mastiffs Undoubtedly, we can expect many more genetic tests for dogs in the near future, including DNA analysis for size, type of tail, speed, sense of smell, and other traits deemed important by breeders and owners The Chimpanzee Genome Although the chimpanzee (Pan troglodytes) genome was not part of the HGP, its nucleotide sequence was completed in 2004 Overall, the chimp and human genome sequences differ by less than percent, and 98 percent of the genes are the same Comparisons between these genomes offer some interesting insights into what makes some primates humans and others chimpanzees The speciation events that separated humans and chimpanzees occurred less than 6.3 million years ago (mya) Genomic analysis indicates that these species initially diverged but then exchanged genes again before separating completely Their separate evolution after this point is exhibited in such differences as that seen between the sequence of chimpanzee chromosome 22 and its human ortholog, chromosome 21 (chimps have 48 chromosomes and humans have 46, so the numbering is different) These chromosomes have accumulated nucleotide substitutions that total 1.44 percent of the sequence The most surprising difference is the discovery of 68,000 nucleotide insertions or deletions, collectively called indels, in the chimp and human chromosomes, a frequency of indel every 470 bases Many of these are Alu insertions in human chromosome 21 Although the overall difference in the nucleotide sequence is small, there are significant differences in the encoded genes Only 17 percent of the genes analyzed encode identical proteins in both chromosomes; the other 83 percent encode genes with one or more amino acid differences Differences in the time and place of gene expression also play a major role in differentiating the two primates Using DNA microarrays (discussed in Section 18.9), researchers compared expression patterns of 202 genes in human and chimp cells from brain and liver They found more species-specific differences in expression of brain genes than liver genes To further examine these differences, Svante Pääbo and colleagues compared expression of 10,000 genes in human and chimpanzee brains and found that 10 percent of genes examined differ in expression in 380 18 Ge no mi cs , B ioi n f or mat i cs, and Prote omics one or more regions of the brain More importantly, these differences are associated with genes in regions of the human genome that have been duplicated subsequent to the divergence of chimps and humans This finding indicates that genome evolution, speciation, and gene expression are interconnected Further work on these segmental duplications and the genes they contain may identify genes that help make us human The Rhesus Monkey Genome The Rhesus macaque monkey (Macaca mulatta), another primate, has served as one of the most important model organisms in biomedical research Macaques have played central roles in our understanding of cardiovascular disease, aging, diabetes, cancer, depression, osteoporosis, and many other aspects of human health They have been essential for research on AIDS vaccines and for the development of polio vaccines The macaque’s genome is the first monkey genome to have been sequenced A main reason geneticists are so excited about the completion of this sequencing project is that macaques provide a more distant evolutionary window that is ideally suited for comparing and analyzing human and chimpanzee genomes As we discussed in the preceding section, humans and chimpanzees shared a common ancestor approximately mya But macaques split from the ape lineage that led to chimpanzees and humans about 25 mya The macaque and human genome have thus diverged farther from one another, as evidenced by the ∼93 percent sequence identity between humans and macaques compared to the ∼98 percent sequence identity shared by humans and chimpanzees The macaque genome was published in 2007, and it was no surprise to learn that it consists of 2.87 billion bp (similar to the size of the human genome) contained in 22 chromosomes (20 autosomes, an X, and a Y) with ∼20,000 protein-coding genes Although comparative analyses of this genome are ongoing, a number of interesting features have been revealed so far As in humans, about 50 percent of the genome consists of repeat elements (transposons, LINEs, SINEs) Gene duplications and gene families are abundant, including cancer gene families found in humans A number of interesting surprises have also been observed For instance, recall from earlier in the text (see Chapter 4) and elsewhere our discussion about the genetic disorder phenylketonuria (PKU), an autosomal recessive inherited condition in which individuals cannot metabolize the amino acid phenylalanine due to mutation of the phenylalanine hydroxylase (PAH) gene The histidine substitution encoded by a mutation in the PAH gene of humans with PKU appears as the wild-type amino acid in the protein from healthy macaques Further analysis of the macaque genome and comparison to the human and chimpanzee genome will be invaluable for geneticists studying genetic variations that played a role in primate evolution The Neanderthal Genome and Modern Humans In early 2009, a team of scientists led by Svante Pääbo at the Max Planck Institute for Evolutionary Anthropology in Germany and 454 Life Sciences reported completion of a rough draft of the Neanderthal (Homo neanderthalensis) genome encompassing more than billion bp of Neanderthal DNA and about two-thirds of the genome Previously, in 1997, Pääbo’s lab sequenced portions of Neanderthal mitochondrial DNA from a fossil In late 2006, Pääbo’s group along with a number of scientists in the United States reported the first sequence of ∼65,000 bp of nuclear DNA isolated from Neanderthal bone samples Bones from three females who lived in Vindija Cave in Croatia about 38,000 to 44,000 years ago were used to produce the draft sequence of the Neanderthal nuclear genome Because Neanderthals are members of the human family, and closer relatives to humans than chimpanzees, the Neanderthal genome is expected to provide an unprecedented opportunity to use comparative genomics to advance our understanding of evolutionary relationships between modern humans and our predecessors In particular, scientists are interested in identifying areas in the genome where humans have undergone rapid evolution since splitting (diverging) from Neanderthals Much of this analysis involves a comparative genomics approach to compare the Neanderthal genome to the human and chimpanzee genomes The human and Neanderthal genomes are 99 percent identical Comparative genomics has identified 78 protein-coding sequences in humans that seem to have arisen since the divergence from Neanderthals and that may have helped modern humans adapt Some of these sequences are involved in cognitive development and sperm motility Of the many genes shared by these species, FOXP2 is a gene that has been linked to speech and language ability There are many genes that influence speech, so this finding does not mean that Neanderthals spoke as we But because Neanderthals had the same modern human FOXP2 gene scientists have speculated that Neanderthals possessed linguistic abilities The realization that modern humans and Neanderthals lived in overlapping ranges as recently as 30,000 years ago has led to speculation about the interactions between modern humans and Neanderthals Genome studies suggest that interbreeding took place between Neanderthals and modern humans an estimated 37,000 to 80,000 years ago in the eastern Mediterranean In fact, the genome 18.8 Me tag en om i cs A pp lie s G e no m i cs Te chn iqu e s to Env ironme ntal Sampl e s of non-African H sapiens contains approximately 1–4 percent of sequence inherited from Neanderthals Recent work by Pääbo’s lab on a 45,000-year-old leg bone from Siberia has produced the oldest genome sequence for Homo sapiens to date About percent of this genome was derived from Neanderthals These exciting studies, previously thought to be impossible, are having ramifications in many areas of human evolution, and it will be interesting indeed to follow the progress of this work ES S E NT I A L PO I N T Studies in comparative genomics are revealing fascinating similarities and variations in genomes from different organisms 18.7 Comparative Genomics Is Useful for Studying the Evolution and Function of Multigene Families 381 superfamily that arose by duplication and dispersal to occupy different chromosomal sites In this family, an ancestral gene encoding an oxygen transport protein was duplicated about 800 mya, producing two sister genes, one of which evolved into the modern-day myoglobin gene Myoglobin is an oxygencarrying protein found in muscle The other gene underwent further duplication and divergence about 500 mya and formed prototypes of the a-globin and b-globin genes These genes encode proteins found in hemoglobin, the oxygen-carrying molecule in red blood cells Additional duplications within these genes occurred within the last 200 million years Events subsequent to each duplication dispersed these gene subfamilies to different chromosomes, and in the human genome, each now resides on a separate chromosome 18.8 Metagenomics Applies Genomics Comparative genomics has also proven to be valuable for Techniques to Environmental Samples identifying members of multigene families, groups of genes that share similar but not identical DNA sequences Metagenomics, also called environmental genomics, is through duplication and descent from a single ancestral the use of whole-genome shotgun approaches to sequence gene Their gene products frequently have similar funcgenomes from entire communities of microbes in environtions, and the genes are often, but not always, found at a mental samples of water, air, and soil Oceans, glaciers, single chromosomal locus A group of related multigene deserts, and virtually every other environment on Earth families is called a superfamily Sequence data from are being sampled for metagenomics projects Human genome projects are providing evidence that multigene famiChromosome 16 Chromosome 11 Chromosome 22 lies are present in many, if not b-Globins a-Globins all, genomes One of the bestG A e d b a2 Myoglobin g g a1 z studied examples of gene family evolution is the globin gene 40 mya superfamily, whose members 100 mya encode very similar but not identical polypeptide chains 200 mya with closely related functions (Figure 18–9) Other wellcharacterized gene superfamilies include the histone, tubulin, actin, and immu500 mya noglobulin (antibody) gene superfamilies Recall that paralogs, which we defined in Section 18.3, are homologous genes present 700–800 mya in the same single organism, FI GUR E 18–9 The evolutionary history of the globin gene superfamily A duplication event in an believed to have evolved by ancestral gene gave rise to two lineages about 700 to 800 million years ago (mya) One line led to gene duplication The globin the myoglobin gene, which is located on chromosome 22 in humans; the other underwent a second genes that encode the polypepduplication event about 500 mya, giving rise to the ancestors of the a-globin and b-globin gene tides in hemoglobin molecules subfamilies Duplications beginning about 200 mya formed the b-globin gene subfamilies In humans, are a paralogous multigene the a-globin genes are located on chromosome 16, and the b-globin genes are on chromosome 11 382 18 Ge no mi cs , B ioi n f or mat i cs, and Prote omics to replicate the complex array of growth conditions the microbes need to survive in culture For the Sorcerer II GOS projBacteria ect, samples of water from differ28% ent layers in the water column were passed through high-density filters of various sizes to capture the microbes DNA was then isolated from the microbes and Eukaryota Bacteria subjected to shotgun sequenc63% 90.8% ing and genome assembly Highthroughput sequencers on board the yacht operated nearly around the clock F I G U RE – (a) Kingdom identifications for predicted proteins in NCBInr, NCBI ProkaryBy early 2007, the GOS dataotic Genomes, the Institute for Genomics Research Gene Indices, and Ensembl databases base contained approximately Notice that the publicly available databases of sequenced genomes and the predicted proteins billion bp of DNA from more they encode are dominated by eukaryotic sequences (b) Kingdom identifications for novel than 400 uncharacterized micropredicted proteins in the Global Ocean Sampling (GOS) database Bacterial sequences domibial species! These sequences nate this database, demonstrating the value of metagenomics for revealing new information about microbial genomes and microbial communities included 7.7 million previously uncharacterized sequences, encoding more than million genome pioneer J Craig Venter left Celera to form the J different potential proteins This is almost twice the total Craig Venter Institute, and his group has played a central number of previously characterized proteins in all other role in developing metagenomics as an emerging area of known databases worldwide Figure 18–10(a) shows the genomics research kingdom assignments for predicted protein sequences in One of the institute’s major initiatives has been a global publicly available databases worldwide, such as the NCBIexpedition to sample marine and terrestrial microorgannonredundant protein database (NCBInr), which accesses isms from around the world and to sequence their genomes GenBank, Ensembl, and other well-known databases Through this project, called the Sorcerer II Global Ocean Eukaryotic sequences comprise the majority (63 percent) Sampling (GOS) Expedition, Venter and his researchers of predicted proteins in these databases Reviewing the traveled the globe by yacht, in a sailing voyage described as kingdom assignments of approximately million predicted a modern-day version of Charles Darwin’s famous voyage proteins in the Global Ocean Sampling (GOS) dataset on the H.M.S Beagle shows that, in contrast, the largest majority (90.8 percent) A key benefit of metagenomics is its potential for of sequences in this database are from the bacterial kingteaching us more about millions of species of bacteria, dom [Figure 18–10(b)] of which only a few thousand have been well characterThe GOS Expedition also examined protein families ized Many new viruses, particularly bacteriophages, are corresponding to the predicted proteins encoded by the also identified through metagenomics studies of water genome sequences in the GOS database: 17,067 families samples Metagenomics is providing important new inforwere medium (between 20 and 200 proteins) and largemation about genetic diversity in microbes that is key to sized (7200 proteins) clusters These data demonstrate the understanding complex interactions between microbial value of the GOS Expedition and of metagenomics for idencommunities and their environment, as well as allowing tifying novel microbial genes and potential proteins phylogenetic classification of newly identified microbes In Section 18.5 you learned about the Human MicroMetagenomics also has great potential for identifying genes biome Project This project represents an example of a with novel functions, some of which may have valuable metagenomics project in that it is intended to sequence the applications in medicine and biotechnology genomes of microbes and viruses present in and on humans The general method used in metagenomics to sequence as the “environment” being sampled Many other highgenomes for all microbes in a given environment involves profile metagenomics applications have emerged recently, isolating DNA directly from an environmental sample including the use of metagenomics to identify viruses and without requiring cultures of the microbes or viruses fungi thought to be involved in colony collapse disorder Such an approach is necessary because often it is difficult One such malady has resulted in the loss of 50–90 percent (a) Viruses 7% Archaea 2% (b) Eukaryota 2.8% Viruses 3.7% Archaea 2.7% 18.9 Tr anscri ptome Analy sis Re veal s Profiles of Exp resse d G ene s in Ce ll s and T i ss u es of the honey bee population in beekeeping operations throughout the United States ES S E NT I A L PO I N T Metagenomics, or environmental genomics, sequences genomes of organisms in environmental samples, often identifying new sequences that encode proteins with novel functions 18.9 Transcriptome Analysis Reveals Profiles of Expressed Genes in Cells and Tissues Once any genome has been sequenced and annotated, a formidable challenge still remains: that of understanding genome function by analyzing the genes it contains and the ways the genes expressed by the genome are regulated Transcriptome analysis, also called transcriptomics or global analysis of gene expression, studies the expression of genes by a genome both qualitatively—by identifying which genes are expressed and which genes are not expressed—and quantitatively—by measuring varying levels of expression for different genes Even though in theory all cells of an organism possess the same gene in any cell or tissue type, certain genes will be highly expressed, others expressed at low levels, and some not expressed at all Transcriptome analysis reveals gene-expression profiles that for the same genome may vary from cell to cell or from tissue type to tissue type Identifying genes expressed by a genome is essential for understanding how the genome functions Transcriptome analysis provides insights into (1) normal patterns of gene expression that are important for understanding how a cell or tissue type differentiates during development, (2) how gene expression dictates and controls the physiology of differentiated cells, and (3) mechanisms of disease development that result from or cause gene-expression changes in cells Later in the text (see Chapter 19), we will consider why gene-expression analysis is gradually becoming an important diagnostic tool in certain areas of medicine For example, examining gene-expression profiles in a cancerous tumor can help diagnose tumor type, determine the likelihood of tumor metastasis (spreading), and develop the most effective treatment strategy Microarray Analysis A number of different techniques can be used for transcriptome analysis PCR-based methods are useful because of their ability to detect genes that are expressed at low levels For many years DNA microarray analysis 383 was widely used because it enables researchers to analyze all of a sample’s expressed genes simultaneously Although as DNA sequencing technologies have developed and, most recently, techniques for RNA sequencing (RNA-seq) have developed including in situ RNA sequencing, it is expected that microarrays will become antiquated relatively soon Most microarrays, also known as gene chips, consist of a glass microscope slide onto which single-stranded DNA molecules are attached, or “spotted,” using a computer-controlled high-speed robotic arm called an arrayer Arrayers are fitted with a number of tiny pins Each pin is immersed in a small amount of solution containing millions of copies of a different single-stranded DNA molecule For example, many microarrays are made with single-stranded sequences of complementary DNA (cDNA) or expressed sequenced tags (ESTs)—short fragments of cloned DNA from expressed genes The arrayer fixes the DNA onto the slide at specific locations (points, or spots) that are recorded by a computer A single microarray can have over 20,000 different spots of DNA (and over million for exon-specific microarrays), each containing a unique sequence that serves as a probe for a different gene Probes for entire genomes are available on microarrays, including the human genome As you will learn later in the text (see Chapter 19), researchers are also using microarrays to compare patterns of gene expression in tissues in response to different conditions, to compare geneexpression patterns in normal and diseased tissues, and to identify pathogens One approach to using a microarray for transcriptome analysis is shown in Figure 18–11 Microarrays have dramatically changed the way geneexpression patterns are analyzed As discussed earlier in the text (see Chapter 17), Northern blot analysis was one of the earliest methods used for analyzing gene expression Then PCR techniques proved to be rapid and more sensitive approaches The biggest advantage of microarrays is that they enable thousands of genes to be studied simultaneously As a result, however, they can generate an overwhelming amount of gene-expression data Over million geneexpression datasets are now available in publicly accessible databases Most of these datasets have been generated in the past decade largely through microarray analysis In addition, even when properly controlled, microarrays often yield variable results For example, one experiment under certain conditions may not always yield similar patterns of gene expression as another identical experiment Some of these differences can be due to real differences in gene expression, but others can be the result of variability in chip preparation, cDNA synthesis, probe hybridization, or washing conditions, all of which must be carefully controlled to limit such variability Commercially available microarrays can reduce 384 18 Ge no mi cs , B ioi n f or mat i cs, and Prote omics Tissue sample Isolate mRNA mRNA molecules Make cDNA by reverse transcription, using fluorescently labeled nucleotides Labeled cDNA molecules (single strands) Hybridization: Apply the cDNA mixture to a DNA microarray Segment of a microarray Microarray (chip) Fixed to each spot on a microarray are millions of copies of short single-stranded DNA molecules, a different gene to each spot DNA strand on microarray Rinse off excess cDNA, put the microarray in a scanner to measure fluorescence of each spot Fluorescence intensity indicates the amount of mRNA expressed in the tissue sample Scanner A G G A C G T T C C T G C A cDNA cDNA hybridized to DNA on microarray Readout No fluorescence: gene not expressed in tissue sample Moderate fluorescence: low gene expression Bright fluorescence: highly expressed gene in tissue sample F I G U RE – 1 Microarray analysis for analyzing gene-expression patterns in a tissue the variability that can result when individual researchers make their own arrays As mentioned previously, you should also be aware that new methods for directly sequencing RNA (RNA-Seq, also called whole-transcriptome shotgun sequencing) will soon render microarrays obsolete Now that we have considered genomes and transcriptomes, we turn our attention to the ultimate end products of most genes, the proteins encoded by a genome ES SEN T I A L PO I N T DNA microarrays or gene chips have been valuable for transcriptome analysis by studying expression patterns for thousands of genes simultaneously 18.10 Proteomics Identifies and Analyzes the Protein Composition of Cells As more genomes have been sequenced and studied, biologists have focused increasingly on understanding the complex structures and functions of the proteins the genomes encode This interest is not surprising given that in most of the genomes sequenced to date, many newly discovered genes and their putative proteins have no known function Keep in mind, in the ensuing discussion, that although every 18.10 Prote o mics Id ent ifie s and Ana lyz e s th e Prote in Compo sit i on o f Cel l s cell in the body contains an equivalent set of genes, not all cells express the same genes and proteins Proteome is a term that represents the complete set of proteins encoded by a genome, but it is also often used to mean the entire complement of proteins in a cell This definition would then include proteins that a cell acquired from another cell type Proteomics—the complete identification, characterization, and quantitative analysis of the proteome of a cell, tissue, or organism—can be used to reconcile differences between the number of genes in a genome and the number of different proteins produced But equally important, proteomics also provides information about a protein’s structure and function; posttranslational modifications; protein– protein, protein–nucleic acid, and protein–metabolite interactions; cellular localization of proteins; protein stability and aspects of translational and posttranslational levels of geneexpression regulation; and relationships (shared domains, evolutionary history) to other proteins Proteomics is also of clinical interest because it allows comparison of proteins in normal and diseased tissues, which can lead to the identification of proteins as biomarkers for disease conditions Proteomic analysis of mitochondrial proteins during aging, proteomic maps of atherosclerotic plaques from human coronary arteries, and protein profiles in saliva as a way to detect and diagnose diseases are examples of such work 385 initiator methionine residues; by linkage to carbohydrates and lipids; or by the addition of chemical groups through methylation, acetylation, and phosphorylation and other modifications Over a hundred different mechanisms of posttranslational modification are known In addition, many proteins work via elaborate protein–protein interactions or as part of a large molecular complex Well before a draft sequence of the human genome was available, scientists were already discussing the possibility of a “Human Proteome Project.” One reason such a project never came to pass is that there is no single human proteome: different tissues produce different sets of proteins But the idea of such a project led to the Protein Structure Initiative (PSI) by the National Institute of General Medical Sciences (NIGMS), a division of the National Institutes of Health, involving over a dozen research centers PSI is a multiphase project designed to analyze the three-dimensional structures of more than 4000 protein families Proteins with interesting potential therapeutic properties are a top priority for the PSI, and to date the structures of over 6000 proteins have been determined Developing computation protein structural prediction methods, solving unique protein structures, disseminating PSI information, and focusing on the biological relevance of the work are major goals There also are a number of other ongoing projects dedicated to identifying proteome profiles that correlate with diseases such as cancer and diabetes Reconciling the Number of Genes and the Number of Proteins Expressed by a Cell or Tissue Proteomics Technologies: Two-Dimensional Gel Electrophoresis for Separating Proteins Recall the one-gene:one-polypeptide hypothesis of George Beadle and Edward Tatum (see Chapter 13) Genomics has revealed that the link between gene and gene product is often much more complex Genes can have multiple transcription start sites that produce several different types of transcripts Alternative splicing and editing of pre-mRNA molecules can generate dozens of different proteins from a single gene Remember the current estimate that over 50 percent of human genes produce more than one protein by alternative splicing As a result, proteomes are substantially larger than genomes For instance, the ∼20,000 genes in the human genome encode ∼100,000 proteins, although some estimates suggest that the human proteome may be as large as 150,000–200,000 proteins Proteomes undergo dynamic changes that are coordinated in part by regulation of gene-expression patterns— the transcriptome However, a number of other factors affect the proteome profile of a cell, further complicating the analysis of protein function For instance, many proteins are modified by co-translational or posttranslational events, such as cleavage of signal sequences that target a protein for an organelle pathway, propeptides, or With proteomics technologies, scientists have the ability to study thousands of proteins simultaneously, generating enormous amounts of data quickly and dramatically changing ways of analyzing the protein content of a cell The early history of proteomics dates back to 1975 and the development of two-dimensional gel electrophoresis (2DGE) as a technique for separating hundreds to thousands of proteins with high resolution (Figure 18–12) In this technique, proteins isolated from cells or tissues of interest are loaded onto a polyacrylamide tube gel and first separated by isoelectric focusing, which causes proteins to migrate according to their electrical charge in a pH gradient Then in a second migration, perpendicular to the first, the proteins are separated by their molecular mass using sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) Proteins in the 2D gel are visualized by staining with Coomassie blue, silver stain, or other dyes that reveal the separated proteins as a series of spots in the gel (Figure 18–12) It is not uncommon for a 2D gel loaded with a complex mixture of proteins to show several thousand spots in the gel, as in Figure 18–12, which displays the 18 386 Ge no mi cs , B ioi n f or mat i cs, and Prote omics 1st Dimension: Load protein sample onto an isoelectric focusing tube gel Electrophoresis separates proteins according to their isoelectric point, where their net charge is zero compared to the pH of the gel + pH 4.0 pH 4.0 Proteins pH 10.0 pH 10.0 - Stained gel shows proteins as a series of spots separated by isoelectric point and molecular mass 2nd Dimension: Rotate tube gel 90º and place onto an SDS-polyacrylamide gel (SDS-PAGE) Electrophoresis separates proteins according to mass (molecular weight in kilodaltons, kDa) - 200 a-Actinin Vinculin 100 Pyruvate kinase 70 Calreticulin HSP 60 Gelsolin Transferrin Albumin ER-60 Mass (kDa) 50 Actin b fragment Actin ALADH LDH Glyceraldehyde dehydrogenase 30 Tropomyosin + Triose phosphate isomerase Proteasome delta chain 20 SDS-PAGE SOD C CytochromeC oxidase VA Proteomics Technologies: Mass Spectrometry for Protein Identification Hb-8 10 Ubiquitin Thioredoxin 4.0 4.5 5.0 samples such as human plasma are available, and computer software programs can be used to align and compare the spots from different gels In the early days of 2DGE, proteins were often identified by cutting spots out of a gel and sequencing the amino acids the spots contained Only relatively small sequences of amino acids can typically be generated this way; rarely can an entire polypeptide be sequenced using this technique BLAST and similar programs can be used to search protein databases containing amino acid sequences of known proteins However, because of alternative splicing or posttranslational modifications, peptide sequences may not always match easily with the final product, and the identity of the protein may have to be confirmed by another approach As you will learn in the next section, proteomics has incorporated other techniques to aid in protein identification, and one of these techniques is mass spectrometry 5.5 6.0 6.5 7.0 7.5 8.0 pH As important as 2DGE has been for protein analysis, mass spectrometry (MS) has been instrumental to the development of proteomics Mass spectrometry techniques analyze ionized samples in gaseous form and measure the mass-to-charge (m/z) ratio of the different ions in a sample Proteins analyzed by mass spectra generate m/z spectra that can be correlated with an m/z database containing known protein sequences to discover the protein’s identity Certain MS applications can provide peptide sequences directly from spectra Some of the most valuable proteomics applications of this technology are to identify an unknown protein or proteins in a complex mix of proteins, to sequence peptides, to identify posttranslational modifications of proteins, and to characterize multiprotein complexes Two-dimensional gel electrophoresis (2DGE) is a useful method for separating proteins in a protein extract from cells or tissues that contains a complex mixture of proteins with different biochemical properties The two-dimensional gel photo shows separations of human platelet proteins Each spot represents a different polypeptide separated by molecular weight ( y-axis) and isoelectric point, pH (x-axis) Known protein spots are labeled by name based on identification by comparison to a reference gel or by determination of protein sequence using mass spectrometry techniques Notice that many spots on the gel are unlabeled, indicating proteins of unknown identity F I G U RE – complex mixture of proteins in human platelets (thrombocytes) Particularly abundant protein spots in this gel have been labeled with the names of identified proteins With thousands of different spots on the gel, how are the identities of the proteins ascertained? In some cases, 2D gel patterns from experimental samples can be compared to gels run with reference standards containing known proteins with well-characterized migration patterns Many reference gels for different biological 18.10 Prote o mics Id ent ifie s and Ana lyz e s th e Prote in Compo sit i on o f Cel l s 18–2 Annotation of a proteome attempts to relate each protein to a function in time and space Traditionally, protein annotation depended on an amino acid sequence comparison between a query protein and a protein with known function If the two proteins shared a considerable portion of their sequence, the query would be assumed to share the function of the annotated protein Following is a representation of this method of protein annotation involving a query sequence and three different human proteins Note that the query sequence aligns to common domains within the three other proteins What argument might you present to suggest that the function of the query is not related to the function of the other three proteins? Query amino acid sequence Region of amino acid sequence match to query HINT: This problem asks you to think about sequence similarities between four proteins and predict functional relationships The key to its solution is to remember that although protein domains may have related functions, proteins can contain several different interacting domains that determine protein function For MS analysis, proteins are first extracted from cells or tissues of interest and separated by 2DGE, after which MS is used to identify the proteins in the different spots Figure 18–13 shows an example in which two different sets of cells grown in culture are analyzed for protein differences Just about any source providing a sufficient number of cells can be used: blood, whole tissues, and organs; tumor samples; microbes; and many other substances Many proteins involved in cancer have been identified by the use of MS to compare protein profiles in normal tissue and tumor samples Protein spots are cut out of the 2D gel, and proteins are purified out of each gel spot Computer-automated highthroughput instruments are available that can pick all of the spots out of a 2D gel Isolated proteins are then enzymatically digested with a protease (a protein-digesting enzyme) such as trypsin to create a series of peptides This proteolysis produces a complex mixture of peptides determined by the cleavage sites for the protease in the original protein Each type of protein produces a characteristic set of peptide fragments, and these are identified by MS (Figure 18–14 on p 389) Databases of m/z spectra for different peptides can be analyzed to look for matches between m/z spectra of unknown samples and those of known proteins One limitation of this approach is database quality An unknown 387 protein from a 2D gel can only be identified by MS if proteomics databases have a MS spectrum for that protein But as is occurring with genomics databases, proteomics databases with thousands of well-characterized proteins from different organisms are rapidly developing As we mentioned when discussing genomics, highthroughput 2DGE instruments and mass spectrometers can process thousands of samples in a single day Instruments with faster sample-processing times and increased sensitivity are under development These instruments may soon make “shotgun proteomics” a viable approach for characterizing entire proteomes In late 2014, two different research teams reported results from mass spectrometry analysis of the human proteome that accounted for approximately 84 percent and 92 percent of the proteins encoded by the human genome These studies have created proteome catalogs that will be available for researchers around the world Protein microarrays are also becoming valuable tools for proteomics research These are designed around the same basic concept as microarrays (gene chips) and are often constructed with antibodies that specifically recognize and bind to different proteins These microarrays are used, among other applications, for examining protein– protein interactions, for detecting protein markers for disease diagnosis, and for biosensors designed to detect pathogenic microbes and potentially infectious bioweapons ESSEN T IAL PO IN T Proteomics methods such as mass spectometry are valuable for analyzing proteomes—the protein content of a cell 18–3 Because of its accessibility and biological significance, the proteome of human plasma has been intensively studied and used to provide biomarkers for such conditions as myocardial infarction (troponin) and congestive heart failure (B-type natriuretic peptide) Polanski and Anderson (Polanski, M., and Anderson, N L., Biomarker Insights, 2: 1–48, 2006) have compiled a list of 1261 proteins, some occurring in plasma, that appear to be differentially expressed in human cancers Of these 1261 proteins, only have been recognized by the FDA as tumor-associated proteins First, what advantage should there be in using plasma as a diagnostic screen for cancer? Second, what criteria should be used to validate that a cancerous state can be assessed through the plasma proteome? H I NT: This problem asks you to consider criteria that are valuable for using plasma proteomics as a diagnostic screen for cancer The key to its solution is to consider proteomics data that you would want to evaluate to determine whether a particular protein is involved in cancer 388 18 Ge no mi cs , B ioi n f or mat i cs, and Prote omics Cells grown in vitro Condition A Condition B Extract proteins Extract proteins 2DGE 2DGE Mass (kDa) 200 Mass (kDa) 200 10 4.0 pH 10.0 Expression 10 Compare patterns 4.0 pH 10.0 No expression Cut spot out Mass spectrometry and comparison to protein databases to identify the protein in the spot F I G U RE – In a typical proteomic analysis, cells are exposed to different conditions (such as different growth conditions, drugs, or hormones) Then proteins are extracted from these cells and separated by 2DGE, and the resulting patterns of spots are compared for evidence of differential protein expression Spots of interest are cut out from the gel, digested into peptide fragments, and analyzed by mass spectrometry to identify the protein they contain 18.11 Systems Biology Is an Integrated Approach to Studying Interactions of All Components of an Organism’s Cells We conclude this chapter by discussing systems biology, an emerging discipline that incorporates data from genomics, transcriptomics, proteomics, and other areas of biology, as well as engineering applications and problem-solving approaches In many ways, systems biology is interpreting genomic information in the context of the structure, function, and regulation of biological pathways As is well known, biological systems are very complex By studying relationships between all components in an organism, biologists are trying to build a “systems”-level understanding of how organisms function Systems biologists typically combine recently acquired genomics and proteomics data with years of more traditional studies of gene and protein structure and function Much of this data is retrieved from databases such as PubMed, GenBank, and other newly emerging genomics, transcriptomics, and proteomics resources Systems models are used to diagram interactions within a cell or an entire organism, such as protein–protein interactions, protein–nucleic acid interactions, and protein–metabolite interactions (e.g., 18.11 Systems Biology Is an Integrated Approach to Studying Interactions of All Components of an Organism’s Cells 2D GEL N C An unknown protein cut out from a spot on a 2D gel is first digested into small peptide fragments using a protease such as trypsin N C Subject peptide fragments to mass spectrometry to produce mass-to-charge (m/z) spectra Relative abundance 200 y8 y7 100 a2 y5 y6 b2 y4 y9 y3 600 m/z 200 1000 Compare m/z spectra for unknown protein to a proteomics database of m/z spectra for known peptides A spectrum match would identify the peptide sequence of the unknown protein 200 Relative abundance enzyme-substrate binding) These models help systems biologists understand the components of interacting pathways and the interrelationships of molecules in an interacting pathway In recent years, the term interactome has arisen to describe the interacting components of a cell Systems biologists use several different types of models to diagram protein interaction pathways One of the most common model types is a network map—a sketch showing interacting proteins, genes, and other molecules These diagrams are essentially the equivalent of an electrical wiring diagram One disadvantage of network maps is that they are static diagrams that typically lack information about when and where each interaction occurs Even so, they are a useful foundation for generating computational models that allow the running of simulations to determine how signaling events occur For example, major groups of kinases, enzymes that phosphorylate other proteins to affect their activity, have been network mapped to show their interactions with each other Because kinases play such important roles in the regulation of most critical cellular processes, such information about the “kinome” has been very valuable for companies developing drug treatments targeted to certain metabolic pathways Network maps are helping scientists model intricate potential interactions of molecules involved in normal and disease processes Figure 18–15 shows an example of a network map This map depicts a human disease network model illustrating the complexity of interactions between genes involved in 22 different human diseases Look at the cluster of turquoise-colored nodes corresponding to genes involved in several different cancers One aspect of the map that should be immediately obvious is that a number of cancers share interacting genes even though the cancers affect different organs Knowing the genes involved and the protein interaction networks for different cancers is a major breakthrough for informing scientists about target genes and proteins to consider for therapeutic purposes Systems biology is becoming increasingly important in the drug discovery and development process, where its approaches can help scientists and physicians develop a conceptual framework of gene and protein interactions in human disease that can then serve as the rationale for effective drug design ESSEN T IAL PO IN T 100 a2 S Q A A Systems biology approaches are designed to provide an integrated understanding of interactions between genes, proteins and other molecules that govern complex biological processes y8 y7 E L L y5 y6 b2 y4 y9 y3 389 200 600 m/z 1000 FIGUR E 18–14 Mass spectrometry for identifying an unknown protein isolated from a 2D gel The mass-to-charge spectrum (m/z) for trypsin-digested peptides from the unknown protein can be compared to a proteomics database for a spectrum match to identify the unknown protein The peptide in this example was revealed to have the amino acid sequence serine (S)-glutamine (Q)-alanine (A)-alanine (A)-glutamic acid (E)-leucine (L)-leucine (L), shown in single-letter amino acid code 18 390 Ge no mi cs , B ioi n f or mat i cs, and Prote omics Bone Cancer Cardiovascular Connective tissue Dermatological Developmental Ear, nose, throat Endocrine Gastrointestinal Hematological Immunological Metabolic Muscular Neurological Nutritional Ophthalmological Psychiatric Renal Respiratory Skeletal Multiple Unclassified Node size 41 21 34 15 30 25 10 Cataract Myopathy Epidermolysis bullosa Deafness Leigh syndrome Stroke Charcot-Marie-Tooth disease Gastric cancer Prostate cancer Fanconi anemia Myocardial infarction Diabetes mellitus Epilepsy Mental retardation Retinitis pigmentosa Muscular dystrophy Cardiomyopathy Breast cancer Thyroid carcinoma Ataxiatelangiectasia Obesity Alzheimer disease Hypertension Atherosclerosis Pseudohypoaldosteronism Asthma Lymphoma Colon cancer Leukemia Parkinson disease Spinocerebellar ataxia Hirschsprung disease Blood group Spherocytosis Hemolytic anemia Complement component deficiency F I G U RE – A systems biology model of human disease gene interactions The model shows nodes corresponding to 22 specific disorders colored by class Node size is proportional to the number of genes contributing to the disorder EX P LORI N G G E N O M I C S I Contigs, Shotgun Sequencing, and Comparative Genomics n this chapter, we discussed how wholegenome shotgun sequencing methods can be used to assemble chromosome maps Recall that in the technique of shotgun cloning, chromosomal DNA is digested with different restriction enzymes to create a series of overlapping DNA fragments called contiguous sequences, or “contigs.” The contigs are then subjected to DNA sequencing, after which bioinformatics-based programs are used to arrange the contigs in their correct order on the basis of short overlapping sequences of nucleotides Visit the Study Area: Exploring Genomics In this Exploring Genomics exercise you will carry out a simulation of contig alignment to help you to understand the underlying logic of this approach to creating sequence maps of a chromosome For this purpose, you will use the National Center for Biotechnology INSI G HT S AN D SOLUT ION S Information BLAST site and apply a DNA alignment program called bl2seq exercise we have used short fragments; however, in reality contigs are usually several thousand base pairs long To complete this exercise, copy and paste two sequences into the Align feature of BLAST and then run an alignment (by clicking on “Align”) Repeat these steps with other combinations of two sequences to determine which sequences overlap, and then use your findings to create a sequence map that places overlapping contigs in their proper order Here are a few tips to consider: Exercise I – Arranging Contigs to Create a Chromosome Map Access BLAST from the NCBI Web site at http://blast.ncbi.nlm.nih.gov/ Blast.cgi Locate and select the “Align two sequences using BLAST (bl2seq)” category at the bottom of the BLAST homepage The bl2seq feature allows you to compare two DNA sequences at a time to check for sequence similarity alignments Go to the Companion Web site for Essentials of Genetics and open the Exploring Genomics exercise for this chapter Listed are eight contig sequences, called Sequences A through H, taken from an actual human chromosome sequence deposited in GenBank For this CASE STUDY A • Develop a strategy to be sure that you analyze alignments for all pairs of contigs • Only consider alignment overlaps that show 100 percent sequence similarity 391 On the basis of your alignment results, answer the following questions, referring to the sequences by their letter codes (A through H): a. What is the correct order of overlapping contigs? b. What is the length, measured in number of nucleotides, of each sequence overlap between contigs? c. What is the total size of the chromosome segment that you assembled? d. Did you find any contigs that not overlap with any of the others? Explain Run a nucleotide-nucleotide BLAST search (BLASTn) on any of the overlapping contigs to determine which chromosome these contigs were taken from, and report your answer Your microbiome may be a risk factor for disease number of genes involved in susceptibility to inflammatory bowel disorders (IBDs), including Crohn disease and ulcerative colitis, have been identified However, it is clear that other risk factors, both genetic and nongenetic, are important in triggering the onset of these diseases Recent research has centered on understanding the role of the gut microbiome and its interactions with the host genome in IBD It is known that the microbiome of those with IBD is different from that of those whose IBD is in remission, and it is also different from that of people who not have IBD These observations suggest that transfer of microbiota from unaffected individuals via fecal microbial transplantation (FMT) might be a successful treatment for IBD This idea is supported by the use of FMT as an effective treatment in IBD individuals for a potentially life-threatening infection caused by the bacterium Clostridium difficile Currently, four clinical trials are underway to evaluate the use of FMT as a treatment for IBD If you had IBD, how would you react if your physician recommended that you enroll in one of these clinical studies to evaluate fecal transplants as a treatment? Current treatment of IBD involves the use of anti-inflammatory drugs, but these drugs achieve remission only in some cases If genetic analysis reveals that you carry susceptibility alleles for IBD that respond to periodic FMT as a therapy, would you agree to try this method? Before agreeing to FMT, what would you want to know about the microbiomes of individuals who not have IBD? INSIGHTS AND SOLUTIONS One of the main problems in annotation is deciding how long a putative ORF must be before it is accepted as a gene Shown at the right are three different ORF scans of the same E coli genome region—the region containing the lacY gene Regions shaded in brown indicate ORFs The top scan was set to accept ORFs of 50 nucleotides as genes The middle and bottom scans accepted ORFs of 100 and 300 nucleotides as genes, respectively How many putative genes are detected in each scan? The longest ORF covers 1254 bp; the next longest, 234 bp; and the shortest, 54 bp How can we decide the actual number of genes in this region? In this type of ORF scan, is it more likely that the number of genes in the genome will be overestimated or underestimated? Why? Solution: Generally, one can examine conserved sequences in other organisms to indicate that an ORF is likely a coding region One can also match a sequence to previously described sequences that are known to code for proteins The (continued) 392 18 Ge no mi cs , B ioi n f or mat i cs, and Prote omics Insights and Solutions—continued problem is not easily solved—that is, deciding which ORF is actually a gene The shorter the ORFs scan, the more likely the overestimate of genes because ORFs longer than 200 are less likely to occur by chance For these scans, notice that the 50-bp scans produce the highest number of possible genes, whereas the 300-bp scan produces the lowest number (1) of possible genes Sequencing of the heterochromatic regions (repeat-rich sequences concentrated in centromeric and telomeric areas) of the Drosophila genome indicates that within 20.7 Mb, there are 297 protein-coding genes (Bergman et al 2002 http://genomebiology.com/2002/3/12/research/0086) Given that the euchromatic regions of the genome contain 13,379 protein-coding genes in 116.8 Mb, what general conclusion is apparent? Sequenced strand 50 Complementary strand 100 Sequenced strand Complementary strand Solution: Gene density in euchromatic regions of the Drosophila genome is about one gene per 8730 base pairs, while gene density in heterochromatic regions is one gene per 70,000 bases (20.7 Mb/297) Clearly, a given region of heterochromatin is much less likely to contain a gene than the same-sized region in euchromatin 300 Sequenced strand Complementary strand Problems and Discussion Questions HOW DO WE KNOW ? In this chapter, we focused on the analysis of genomes, transcriptomes, and proteomes and considered important applications and findings from these endeavors At the same time, we found many opportunities to consider the methods and reasoning by which much of this information was acquired From the explanations given in the chapter, what answers would you propose to the following fundamental questions: (a) How we know which contigs are part of the same chromosome? (b) How we know if a genomic DNA sequence contains a protein-coding gene? (c) What evidence supports the concept that humans share substantial sequence similarities and gene functional similarities with model organisms? (d) How can proteomics identify differences between the number of protein-coding genes predicted for a genome and the number of proteins expressed by a genome? (e) How have microarrays demonstrated that, although all cells of an organism have the same genome, some genes are expressed in almost all cells, whereas other genes show celland tissue-specific expression? Visit for instructor-assigned tutorials and problems CONCEPT QUESTION Review the Chapter Concepts list on page 361 All of these pertain to how genomics, bioinformatics, and proteomics approaches have changed how scientists study genes and proteins Write a short essay that explains how recombinant DNA techniques were used to identify and study genes compared to how modern genomic techniques have revolutionized the cloning and analysis of genes What is functional genomics? How does it differ from comparative genomics? Compare and contrast whole-genome shotgun sequencing to a map-based cloning approach What is bioinformatics, and why is this discipline essential for studying genomes? Provide two examples of bioinformatics applications List and describe three major goals of the Human Genome Project Intron frequency varies considerably among eukaryotes Provide a general comparison of intron frequencies in yeast and humans What about intron size? BLAST searches and related applications are essential for analyzing gene and protein sequences Define BLAST, describe basic features of this bioinformatics tool, and provide an example of information provided by a BLAST search PRO BLEMS A ND DI S CUS SIO N QUE S TION S Describe the human genome in terms of genome size, the percentage of the genome that codes for proteins, how much is composed of repetitive sequences, and how many genes it contains Describe two other features of the human genome 10 The Human Genome Project has demonstrated that in humans of all races and nationalities approximately 99.9 percent of the sequence is the same, yet different individuals can be identified by DNA fingerprinting techniques What is one primary variation in the human genome that can be used to distinguish different individuals? Briefly explain your answer 11 Archaea (formerly known as archaebacteria) is one of the three major divisions of living organisms; the other two are eubacteria and eukaryotes Nanoarchaeum equitans is in the Archaea domain and has one of the smallest genomes known, about 0.5 Mb How can an organism complete its life cycle with so little genetic material? 12 Through the Human Genome Project (HGP), a relatively accurate human genome sequence was published in 2003 from combined samples from different individuals It serves as a reference for a haploid genome Recently, genomes of a number of individuals have been sequenced under the auspices of the Personal Genome Project (PGP) How results from the PGP differ from those of the HGP? 13 The term paralog is often used in conjunction with discussions of hemoglobin genes What does this term mean, and how does it apply to hemoglobin genes? 14 It can be said that modern biology is experiencing an “omics” revolution What does this mean? Explain your answer 15 In what way will the discipline called metagenomics contribute to human health and welfare? 16 What are gene microarrays? How are microarrays used? 17 Annotations of the human genome have shown that genes are not randomly distributed, but form clusters with gene “deserts” in between These “deserts” correspond to the dark bands on G-banded chromosomes Comparisons between the human 393 transcriptome map and the genome sequence show that highly expressed genes are also clustered together In terms of genome organization, how is this an advantage? 18 Genomic sequencing has opened the door to numerous studies that help us understand the evolutionary forces shaping the genetic makeup of organisms Using databases containing the sequences of 25 genomes, scientists (Kreil, D.P and Ouzounis, C.A., Nucl Acids Res 29: 1608–1615, 2001) examined the relationship between GC content and global amino acid composition They found that it is possible to identify thermophilic species on the basis of their amino acid composition alone, which suggests that evolution in a hot environment selects for a certain whole organism amino acid composition In what way might evolution in extreme environments influence genome and amino acid composition? How might evolution in extreme environments influence the interpretation of genome sequence data? 19 Systems biology models the complex networks of interacting genes, proteins, and other molecules that contribute to human genetic diseases, such as cancer, diabetes, and hypertension These interactomes show the contribution of each piece towards the whole and where diseases overlap, and provide models for drug discovery and development Describe some of the differences that might be seen in the interactomes of normal and cancerous cells taken from the same tissue, and explain how these differences could lead to drugs specifically targeted against cancer cells 20 Exome sequencing is a procedure to help physicians identify the cause of a genetic condition that has defied diagnosis by traditional means The implication here is that exons in the nuclear genome are sequenced in the hopes that, by comparison with the genomes of nonaffected individuals, a diagnosis might be revealed (a) What are the strengths and weaknesses of this approach? (b) If you were ordering exome sequencing for a patient, would you also include an analysis of the patient’s mitochondrial genome? 19 Applications and Ethics of Genetic Engineering and Biotechnology CHAPTER CONCEPTS ■■ Recombinant DNA technology, genetic engineering, and biotechnology have revolutionized medicine and agriculture ■■ Genetically modified plants and animals can serve as bioreactors to produce therapeutic proteins and other valuable protein products ■■ Genetic modifications of plants have resulted in herbicide- and pest-resistant crops, and crops with improved nutritional value; similarly, transgenic animals are being created to produce therapeutic proteins and to protect animals from disease ■■ A synthetic genome has been assembled and transplanted into a donor bacterial strain, elevating interest in potential applications of synthetic biology ■■ Applications of recombinant DNA technology and genomics have become essential for diagnosing genetic disorders, determining genotypes, and scanning the human genome to detect diseases ■■ Genome-wide association studies (GWAS) scan for hundreds or thousands of genetic differences in an attempt to link genome variations to particular traits and diseases ■■ Medical clinics are adopting wholegenome sequencing of an individual’s DNA for disease diagnosis and treatment ■■ Computational services for predicting offspring based on a couple’s genetics are being advertised to consumers ■■ Almost all applications of genetic engineering and biotechnology present unresolved ethical dilemmas that involve important moral, social, and legal issues 394 GloFish, marketed as the world’s first GM pet, are a controversial product of genetic engineering S ince the dawn of recombinant DNA technology in the 1970s, scientists have harnessed genetic engineering not only for biological research, but also for applications in medicine, agriculture, and biotechnology Genetic engineering refers to the alteration of an organism’s genome and typically involves the use of recombinant DNA technologies to add a gene or genes to a genome, but it can also involve gene removal The ability to manipulate DNA in vitro and to introduce genes into living cells has allowed scientists to generate new varieties of plants, animals, and other organisms with specific traits These organisms are called genetically modified organisms (GMOs) Biotechnology is the use of living organisms to create a product or a process that helps improve the quality of life for humans or other organisms Biotechnology as a modern industry began in earnest shortly after recombinant DNA technology developed But biotechnology is actually a science dating back to ancient civilization and the use of microbes to make many important products, including beverages such as wine and beer, vinegar, breads, and cheeses Modern biotechnology relies heavily on recombinant DNA technology, genetic engineering, and genomics applications, and these areas will be highlighted in this chapter Existing products and new developments that occur seemingly every day make the biotechnology industry one of the most rapidly developing branches of the workforce worldwide, encompassing nearly 5000 companies in 54 countries 19.1 G E NE TIC A LLY E NG INE E RE D ORG A NIS MS S YNTHE S IZ E A WIDE RA NG E O F P RO D U CT S The development of the biotechnology industry and the rapid growth in the number of applications for DNA technologies have raised serious concerns about using our power to manipulate genes and to apply gene technologies Genetic engineering and biotechnology have the potential to provide solutions to major problems globally and to significantly alter how humans deal with the natural world; hence, they raise ethical, social, and economic questions that are unprecedented in human experience These complex issues cannot be fully explored in the context of an introductory genetics textbook This chapter will therefore present only a selection of applications that illustrate the power of genetic engineering and biotechnology and the complexity of the dilemmas they engender We will begin by explaining how genetic engineering occurs in animals We briefly describe how genetic engineering has affected the production of pharmaceutical products, and we examine the impact of genetic technologies on the diagnosis and treatment of human diseases, including gene therapy approaches Finally, we explore some of the social, ethical, and legal implications of genetic engineering and biotechnology Please note that many of the topics discussed in this chapter are covered in more detail later in the text (see Special Topic Chapter 3—DNA Forensics, Special Topic Chapter 4—Genomics and Personalized Medicine, Special Topic Chapter 5—Genetically Modified Foods, and Special Topic Chapter 6—Gene Therapy) 19.1 Genetically Engineered Organisms Synthesize a Wide Range of Biological and Pharmaceutical Products The most successful and widespread application of recombinant DNA technology has been production by the biotechnology industry of recombinant proteins as biopharmaceutical products—particularly, therapeutic proteins to treat diseases Prior to the recombinant DNA era, therapeutic proteins such as insulin, clotting factors, or growth hormones were purified from tissues such as the pancreas, blood, or pituitary glands These tissues were in limited supply, and the purification processes were expensive In addition, products derived from these natural sources could be contaminated by disease agents such as viruses Since human genes encoding important therapeutic proteins can be cloned and expressed in a number of host-cell types, we have more abundant, safer, and less expensive sources of biopharmaceuticals Biopharming is a commonly used term to describe the 395 production of valuable proteins in genetically modified (GM) animals and plants In this section, we outline several examples of therapeutic products that are produced by expression of cloned genes in transgenic host cells and organisms It should not surprise you that cancers, arthritis, diabetes, heart disease, and infectious diseases such as AIDS are among the major diseases that biotechnology companies are targeting for treatment by recombinant therapeutic products Table 19.1 provides a short list of important recombinant products currently synthesized in transgenic bacteria, plants, yeast, and animals Insulin Production in Bacteria Many therapeutic proteins have been produced by introducing human genes into bacteria In most cases, the human gene is cloned into a plasmid, and the recombinant vector is introduced into the bacterial host Large quantities of the transformed bacteria are grown, and the recombinant human protein is recovered and purified from bacterial extracts The first human gene product manufactured by recombinant DNA technology was human insulin, called Humulin, which was licensed for therapeutic use in 1982 by the U.S Food and Drug Administration (FDA), the government agency responsible for regulating the safety of food and drug products and medical devices In 1977, scientists at Genentech, the San Francisco biotechnology company cofounded in 1976 by Herbert Boyer (one of the pioneers of using plasmids for recombinant DNA technology) and Robert Swanson, isolated and cloned the gene for insulin and expressed it in bacterial cells Genentech, short for “genetic engineering technology,” is also generally regarded as the world’s first biotechnology company Previously, insulin was chemically extracted from the pancreas of cows and pigs obtained from slaughterhouses Insulin is a protein hormone that regulates glucose metabolism Individuals who cannot produce insulin have diabetes, a disease that, in its more severe form (type I), affects more than million individuals in the United States Although synthetic human insulin can now be produced by another process, a look at the original genetic engineering method is instructive, as it shows both the promise and the difficulty of applying recombinant DNA technology Clusters of cells embedded in the pancreas synthesize a precursor polypeptide known as preproinsulin As this polypeptide is secreted from the cell, amino acids are cleaved from the end and the middle of the chain These cleavages produce the mature insulin molecule, which contains two polypeptide chains (the A and B chains) joined by disulfide bonds The A subunit contains 21 amino acids, and the B subunit contains 30 396 19 TA B L E Examples of Genetically Engineered Biopharmaceutical Products Available or Under Development Applications and E thics of G enetic E ngineering and B iotechnology Gene Product Condition Treated Host Type Erythropoitin Anemia E coli; cultured mammalian cells Interferons Multiple sclerosis, cancer E coli; cultured mammalian cells Tissue plasminogen activator tPA Heart attack, stroke Cultured mammalian cells Human growth hormone Dwarfism Cultured mammalian cells Monoclonal antibodies against vascular endothelial growth factor (VEGF) Cancers Cultured mammalian cells Human clotting factor VIII Hemophilia A Transgenic sheep, pigs C1 inhibitor Hereditary angioedema Transgenic rabbits Recombinant human antithrombin Hereditary antithrombin deficiency Transgenic goats Hepatitis B surface protein vaccine Hepatitis B infections Cultured yeast cells, bananas Immunoglobulin IgG1 to HSV-2 Herpesvirus infections Transgenic soybeans glycoprotein B Recombinant monoclonal antibodies Passive immunization against rabies (also used in diagnosing rabies), cancer, rheumatoid arthritis Transgenic tobacco, soybeans, cultured mammalian cells Norwalk virus capsid protein Norwalk virus infections Potato (edible vaccine) E coli heat-labile enterotoxin E coli infections Potato (edible vaccine) In the original bioengineering process, genes that encode the A and B subunits were constructed by oligonucleotide synthesis (63 nucleotides for the A polypeptide and 90 nucleotides for the B polypeptide) Each synthetic oligonucleotide was inserted into a separate vector, adjacent to the lacZ gene encoding the bacterial form of the enzyme -galactosidase When transferred to a bacterial host, the lacZ gene and the adjacent synthetic oligonucleotide were transcribed and translated as a unit The product is a fusion protein—that is, a hybrid protein consisting of the amino acid sequence for -galactosidase attached to the amino acid sequence for one of the insulin subunits (Figure 19–1) The fusion proteins were purified from bacterial extracts and treated with cyanogen bromide, a chemical that cleaves the fusion protein from the -galactosidase When the fusion products were mixed, the two insulin subunits spontaneously united, forming an intact, active insulin molecule The purified injectable insulin was then packaged for use by diabetics Shortly after insulin became available, growth hormone—used to treat children who suffer from a form of dwarfism—was cloned Soon, recombinant DNA technology made that product readily available too, as well as a wide variety of other medically important proteins that were once difficult to obtain in adequate amounts Since recombinant insulin ushered in the biotechnology era, well over 200 recombinant products have entered the market worldwide In recent years, the development of many other, nonbiopharmaceutical products has been a very active area of research One example includes the production in E coli of the antioxidant lycopene found in tomatoes Lycopene produced by E coli is not yet available for human consumption Transgenic Animal Hosts and Pharmaceutical Products Although bacteria have been widely used to produce therapeutic proteins, there are some disadvantages in using prokaryotic hosts to synthesize eukaryotic proteins One problem is that bacterial cells often cannot process and modify eukaryotic protein for full biological activity, rendering them inactive In addition, eukaryotic proteins produced in prokaryotic cells often not fold into the proper three-dimensional conformation and are therefore inactive To overcome these difficulties and increase yields, many biopharmaceuticals are produced in eukaryotic hosts As seen in Table 19.1, eukaryotic hosts may include cultured eukaryotic cells (plant or animal) or transgenic farm animals For example, a herd of goats or cows can serve as very effective bioreactors or biofactories—living factories—that could continuously make milk containing the desired therapeutic protein that can then be isolated in a noninvasive way Yeast are also valuable hosts for expressing recombinant proteins Even insect cells are valuable for this purpose, through the use of a gene delivery system (virus) called baculovirus Recombinant baculovirus containing a gene of interest is used to infect insect cell lines, which then express the protein at high levels Baculovirus-insect cell expression is particularly useful for producing human recombinant proteins that are heavily glycosylated 19.1 G E NE TIC A LLY E NG INE E RE D ORG A NIS MS S YNTHE S IZ E A WIDE RA NG E O F P RO D U CT S (a) (b) Promoter Promoter lacZ gene Insulin gene A subunit Antibiotic resistance gene lacZ gene Insulin gene B subunit Antibiotic resistance gene Transform E coli A A b-gal/insulin B fusion protein accumulates in cell Extract and purify B-gal/insulin fusion proteins Treat with cyanogen bromide to cleave A and B chains B B Purify, mix A and B chains to form functional insulin Disulfide bonds Regardless of the host, therapeutic proteins may then be purified from the host cells—or when transgenic animals are used, isolated from animal products such as milk Refer to our discussion earlier in the text (see Chapter 17) on how transgenic animals can be created to allow for expression of a transgene of interest Biotechnology companies are working on expressing many different genes in transgenic animals for the purpose of expressing, isolating, and purifying commercially valuable proteins In 2006, recombinant human antithrombin, an anticlotting protein, became the world’s first drug extracted from the milk of farm animals to be approved for use in humans Scientists at GTC Biotherapeutics introduced the human antithrombin gene into goats By placing the gene adjacent to a promoter for beta casein, a common protein in milk, GTC scientists were able to target antithrombin expression in the mammary gland As a result, antithrombin protein is highly expressed in the milk In one year, a single goat will produce the equivalent amount of antithrombin that in the past would have been isolated from 90,000 blood collections B b-gal/insulin A fusion protein accumulates in cell A 397 A B Active insulin F I G U RE – (a) Humulin, a recombinant form of human insulin, was the first therapeutic protein produced by recombinant DNA technology to be approved for use in humans (b) To synthesize recombinant human insulin, synthetic oligonucleotides encoding the insulin A and B chains were inserted (in separate vectors) at the tail end of a cloned E coli lacZ gene The recombinant plasmids were transformed into E coli host cells, where the -gal/insulin fusion protein was synthesized and accumulated in the cells Fusion proteins were then extracted from the host cells and purified Insulin chains were released from -galactosidase by treatment with cyanogen bromide The insulin subunits were purified and mixed to produce a functional insulin molecule Recombinant DNA Approaches for Vaccine and Antibody Production Another successful application of recombinant DNA technology for therapeutic purposes is the production of vaccines Vaccines stimulate the immune system to produce antibodies against disease-causing organisms and thereby confer immunity against specific diseases Traditionally, two types of vaccines have been used: inactivated vaccines, which are prepared from killed samples of the infectious virus or bacteria; and attenuated vaccines, which are live viruses or bacteria that can no longer reproduce but can cause a mild form of the disease Inactivated vaccines include the vaccines for rabies and influenza; vaccines for tuberculosis, cholera, and chickenpox are examples of attenuated vaccines Genetic engineering is being used to produce subunit vaccines, which consist of one or more surface proteins from the virus or bacterium but not the entire virus or bacterium Often the surface protein is produced through recombinant DNA technology by cloning and expressing the genes encoding the protein to be used for the vaccine This surface protein acts as an antigen that stimulates the immune system to make antibodies that act against the organism from which it was derived One of the first subunit vaccines was made against the hepatitis B virus, which causes liver damage and cancer The gene that encodes the hepatitis B surface protein was cloned into a yeast expression vector, and the cloned gene was expressed in yeast host cells The protein was then extracted and purified from the host cells and packaged for use as a vaccine 398 19 Applications and E thics of G enetic E ngineering and B iotechnology In 2006, the FDA approved Gardasil, a subunit vaccine produced by the pharmaceutical company Merck and the first cancer vaccine to receive FDA approval Gardasil targets four strains of human papillomavirus (HPV) that cause 70 percent of cervical cancers Approximately 70 percent of sexually active women will be infected by an HPV strain during their lifetime Gardasil is designed to provide immune protection against HPV prior to infection but is not effective against existing infections You may have heard of Gardasil through media coverage of the legislation proposed in several states that would require all adolescent school girls to receive a Gardasil vaccination regardless of whether or not they are sexually active Vaccine Proteins and Antibodies Can Be Produced by Plants Plants offer several other advantages for expressing recombinant proteins For instance, once a transgenic plant is made, it can easily be grown and replicated in a greenhouse or field, and it will provide a constant source of recombinant protein In addition, the cost of expressing a recombinant protein in a transgenic plant is typically much lower than making the same protein in bacteria, yeast, or mammalian cells No recombinant proteins expressed in transgenic plants have yet been approved for use by the FDA as therapeutic proteins for humans, although about a dozen products are close to making it through final clinical trials Some edible vaccines are now in clinical trials For example, a vaccine against a bacterium that causes cholera has been produced in genetically engineered potatoes and used to successfully vaccinate human volunteers In 2014, an outbreak of Ebola virus in West Africa killed over 1500 people, with many more cases unreported Ebola causes hemorrhagic fever and produces fatality rates of approximately 90 percent There is no effective treatment for curing or preventing Ebola virus infection But antibodies against Ebola expressed in tobacco leaves are showing promise in ongoing clinical trials Mice were used to create monoclonal antibodies against the virus The antibody genes were then introduced into tobacco plants The transgenic tobacco plants express high quantities of the antibody proteins, which can then be isolated and purified for use in humans Transgenic tobacco plants are commonly used for expressing recombinant proteins because of the large size of their leaves and relatively high yield of recombinant proteins compared to other plants DNA-Based Vaccines DNA-based vaccines have been attempted for many years, and recently there has been renewed interest in using these vaccines to protect against viral pathogens In this approach, DNA encoding proteins from a particular pathogen are inserted into plasmid vectors, which are then injected directly into an individual or delivered via a viral vector similar to the way certain viruses are used for gene therapy The idea here is that pathogen proteins encoded by the delivered DNA would be produced and trigger an immune response that could provide protection should an immunized person be exposed to the pathogen in the future For example, trials are underway using plasmid DNA encoding protein antigens from HIV as an attempt to vaccinate individuals against HIV Thus far, a major limitation of DNA-based vectors has been that they typically result in very low production of protein encoded by delivered genes, and thus the immune response in vaccinated persons is insufficient to provide the desired protection Work on DNA-based vaccines continues to be an active area of exploration, but whether they will ever have significant roles in the vaccine market remains to be seen ESSEN T IAL PO IN T Recombinant DNA technology can be used to produce valuable biopharmaceutical protein products such as therapeutic proteins for treating disease 19–1 In order to vaccinate people against diseases by having them eat antigens (such as the cholera toxin) or antibodies expressed in an edible vaccine, the antigen must reach the cells of the small intestine What are some potential problems of this method? H I NT: This problem asks you to consider why edible vaccines may not be effective The key to its solution is to consider the molecular structure of the antigen or antibody and its recognition by the immune system 19.2 Genetic Engineering of Plants Has Revolutionized Agriculture For millennia, farmers have manipulated the genetic makeup of plants and animals to enhance food production Until the advent of genetic engineering 30 years ago, these genetic manipulations were primarily restricted to selective breeding—the selection and breeding of naturally occurring or mutagen-induced variants In the last 50 to 100 years, genetic improvement of crop plants through the traditional methods of artificial selection and genetic crosses has resulted in dramatic increases in productivity and nutritional enhancement For example, maize yields have 19.3 T ransgenic A nimals S erve Important Roles in Biotechnology 399 The main reasons for generating transgenic crops include: • • • Zea canina Hybrid Zea mays F I G U RE – Selective breeding is one of the oldest methods of genetic alteration of plants Shown here is teosinte (Zea canina, left), a selectively bred hybrid (center), and modern corn (Zea mays) increased fourfold over the last 60 years, and more than half of this increase is due to genetic improvement by artificial selection and selective breeding (Figure 19–2) Modern maize has substantially larger ears and kernels than the predecessor crops, including hybrids from which it was bred Recombinant DNA technology provides powerful new tools for altering the genetic constitution of agriculturally important organisms Scientists can now identify, isolate, and clone genes that confer desired traits, then specifically and efficiently introduce these into organisms As a result, it is possible to quickly introduce insect resistance, herbicide resistance, or nutritional characteristics into farm plants and animals, a primary purpose of agricultural biotechnology There are many examples of applications of recombinant DNA and genomics involving the use of plants; we will defer those discussions in detail Genetic modifications of plant crops are discussed in greater detail later in the text (see Special Topic Chapter 5—Genetically Modified Foods) In this section we primarily consider genetic manipulations to produce transgenic crop plants of agricultural value In Section 19.3, we will discuss examples of genetic manipulations of agriculturally important animals Worldwide, over billion acres of genetically engineered crops have been planted, particularly herbicide- and pestresistant soybeans, corn, cotton, and canola; over 50 different transgenic crop varieties are available, including alfalfa, corn, rice, potatoes, tomatoes, tobacco, wheat, and cranberries Improving the growth characteristics and yield of agriculturally valuable crops Increasing the nutritional value of crops Providing crop resistance against insect and viral pests, drought, and herbicides In addition, many new GM crops and microalgae are being designed for ethanol production and for making biodiesel fuel—that is, for providing sustainable sources of energy Insights from plant genome sequencing projects will undoubtedly be the catalyst for analysis of genetic diversity in crop plants, identification of genes involved in crop domestication and breeding traits, and subsequent enhancement of a variety of desirable traits through genetic engineering In the past several years genome projects have been completed for many major food and industrial crops, including the three crops that account for most of the world’s caloric intake: maize, rice, wheat The genome for a popular crop species of coffee plants was recently sequenced Plant scientists expect to use genome data to improve coffee crop growth and eventually to improve crop phenotypes to produce the most desirable attributes for coffee seeds ESSEN T IAL PO IN T Genetically modified (GM) plants, designed to improve crop yield and nutritional value and to increase resistance to herbicides, pests, and severe weather, are becoming more prevalent worldwide 19.3 Transgenic Animals Serve Important Roles in Biotechnology Although genetically engineered plants are major players in modern agriculture, commercial applications of transgenic animals are less widespread Most transgenic animals are created for research purposes to study gene function Nonetheless, some high-profile examples of genetically engineered farm animals have aroused public interest and controversy It is expected that in the future transgenic animals may increasingly be available for commercial purposes Examples of Transgenic Animals Oversize mice containing a human growth hormone transgene were some of the first transgenic animals created Attempts to create farm animals containing transgenic growth hormone genes have not been particularly successful, probably because growth is a complex, multigene trait One notable exception is the transgenic Atlantic salmon, 400 19 Applications and E thics of G enetic E ngineering and B iotechnology bearing copies of a Chinook salmon growth hormone gene adjacent to a constitutive promoter (see Special Topic Chapter 5—Genetically Modified Foods) As discussed in Section 19.1, currently, the major uses for transgenic farm animals are as bioreactors to produce useful pharmaceutical products, but a number of other interesting transgenic applications are under development Several of these applications are designed to increase milk production or increase nutritional value of milk Significant research efforts are also being made to protect farm animals against common pathogens that cause disease and animal loss (including potential bioweapons that could be used in a terrorist attack on food animals) and put the food supply at risk For instance, controlling mastitis in cattle by creating transgenic cows has shown promise (Figure 19–3) Mastitis is an infection of the mammary glands It is the most costly disease affecting the dairy industry, leading to over $2 billion in losses in the United States Mastitis can block milk ducts, reducing milk output, and can also contaminate the milk with pathogenic microbes Infection by the bacterium Staphylococcus aureus is the most common cause of mastitis, and most cattle with mastitis typically not respond well to conventional treatments with antibiotics As a result, mastitis is a significant cause of herd reduction In an attempt to create cattle resistant to mastitis, transgenic cows were generated that possessed the lysostaphin gene from Staphylococus simulans Lysostaphin is an enzyme that specifically cleaves components of the S aureus cell wall Transgenic cows expressing this protein in milk produce a natural antibiotic that wards off S aureus infections These transgenic cows not completely solve the mastitis problem because lysostaphin is not effective against other microbes such as E coli and S uberis that occasionally cause mastitis; moreover, there is also the potential that S aureus may develop resistance to lysostaphin Nonetheless, scientists are cautiously optimistic that transgenic approaches have a strong future for providing farm animals with a level of protection against major pathogens Researchers in New Zealand have engineered a cow to produce hypoallergenic milk This research effort has been spurred by the fact that an estimated 2–3 percent of babies are allergic to milk from dairy cows and develop a reaction to a protein called -lactoglobulin (BLG) The approach of these researchers involved designing miRNAs to inhibit BLG and then using a transgenic approach to introduce these genes into cow embryos Of 100 GM cow embryos, only one produced a calf, Daisy As Daisy began lactating, researchers found that her milk did not have any detectable levels of BLG Currently, studies are underway to determine if Daisy’s milk is less allergenic to mice, with future plans to test whether humans are allergic to Daisy’s milk Alveolus Blood vessel Muscle cell Epithelial cells Staphylococcus aureus Transgenic animals express lysostaphin protein which attacks and destroys bacteria FIGUR E 19–3 Transgenic cows for battling mastitis The mammary glands of nontransgenic cows are highly susceptible to infection by the skin microbe Staphylococcus aureus Transgenic cows express the lysostaphin transgene in milk, where it can kill S aureus before they can multiply in sufficient numbers to cause inflammation and damage mammary tissue Scientists at Yorktown Industries of Austin, Texas, created the GloFish, a transgenic strain of zebrafish (Danio rerio) containing a red fluorescent protein gene from sea anemones Marketed as the first GM pet in the United States, GloFish fluoresce bright pink when illuminated by ultraviolet light (see the opening photograph at the beginning of this chapter) GM critics describe these fish as an abuse of genetic technology However, GloFish may not be as frivolous a use of genetic engineering as some believe A variation of this transgenic model, incorporating a heavy-metalinducible promoter adjacent to the red fluorescent protein gene, has shown promise in a bioassay for heavy metal contamination of water When these transgenic zebrafish are in water contaminated by mercury and other heavy metals, the promoter becomes activated, inducing transcription of the red fluorescent protein gene In this way, zebrafish fluorescence can be used as a bioassay to measure heavy metal contamination and uptake by living organisms ESSEN T IAL PO IN T Transgenic animals with improved growth characteristics or desirable phenotypes are being genetically engineered for a number of different applications 19.4 S ynthetic G enomes and the E mergence of S ynthetic B iology 19.4 Synthetic Genomes and the Emergence of Synthetic Biology Studying genomes has led to a fundamental question: “what is the minimum number of genes necessary to support life?” Determining the answer to this question is the first step in the ultimate creation of synthetic genomes that can, in turn, lead to the production of artificial cells or organisms To help advance synthetic genome work, we can use the small genomes of obligate parasites For example, the bacterium Mycoplasma genitalium, a human parasitic pathogen, is among the simplest self-replicating prokaryotes known and has served as a model for understanding the minimal elements of a genome necessary for a self-replicating cell M genitalium has a genome of 580 kb In 2010, scientists from the J Craig Venter Institute (JCVI) published the first report of a functional synthetic genome In this approach they designed and had chemically synthesized more than one thousand 1080-bp segments called cassettes covering the entire 1.08-Mb M mycoides genome (Figure 19.4) To assemble these segments correctly, the sequences had 80-bp sequences at each end which overlapped with their neighbor sequences These sequences were cloned in E coli Then, using the yeast Saccharomyces cerevisiae, a homologous recombination approach was used to organize the sequences into 11 separate 100-kb assemblies that were eventually combined to completely span the entire 1.08-Mb M mycoides genome The entire assembled genome, called JCVI-syn1.0, was then subjected to the ultimate test of functionality: transplantation into another cell, in this case, another bacterium They transplanted it into a close relative, M capricolum This resulted in cells with the JCVI-syn1.0 genotype and phenotype Transformation of M capricolum into JCBI-syn 1.0 M mycoides was verified, in part, because these cells were shown to express the lacZ gene which was only present in the synthetic genome The recipient cells also made proteins characteristic of M mycoides and not M capricolum, verifying the strain conversion Design of M mycoides genome F I G U RE – Chemical synthesis of 1078 1080-bp oligonucleotide cassettes spanning the entire 1.08-Mb M mycoides genome 401 The synthetic genome effectively rebooted the M capricolum recipient cells to change them from one form to another When this work was announced, J Craig Venter claimed: “This is equivalent to changing a Macintosh computer into a PC by inserting a new piece of PC software.” This is tedious work Ninety-nine percent of the experiments involved failed! A single base error among a million bp would derail the project for several months Venter’s recent work with M mycoides JCVI-syn1.0, a decade-long project that cost about $40 million, is being hailed as a defining moment in the emerging field of synthetic biology Synthetic biology applies engineering principles and designs to biological systems There are many fundamental questions about synthetic genomes and genome transplantation that need to be answered But clearly these studies provided key “proof of concept” that synthetic genomes could be produced, assembled, and successfully transplanted to create a microbial strain encoded by a synthetic genome and bring scientists closer to producing novel synthetic genomes incorporating genes for specific traits of interest An international effort is now underway to produce synthetic chromosomes comprising an entire yeast genome, about 12.5 million bases In 2014, a synthetic version of yeast (S cerevisiae) chromosome III was created, the first such eukaryotic chromosome What are other potential applications of synthetic genomes and synthetic biology? One of JCVI’s goals is to create microorganisms that can be used to synthesize biofuels Other possibilities exist such as the creation of synthetic microbes engineered to degrade pollutants (bioremediation), the synthesis of new biopharmaceutical products, synthesizing chemicals and fuels from sunlight and carbon dioxide, genetically programmed bacteria to help us heal, and “semisynthetic” crops that contain synthetic chromosomes encoding genes for beneficial traits such as drought resistance or improved photosynthetic efficiency ESSEN T IAL PO IN T Synthetic genomes and synthetic biology offer the potential for geneticists to create genetically engineered cells with novel characteristics that may have commercial value Cloning of cassettes in E coli Complete genome assembly in S cerevisiae Building a synthetic version of the 1.08-Mb Mycoplasma mycoides genome JCVI-syn1.0 Genome transplantation to M capricolum 402 19 Applications and E thics of G enetic E ngineering and B iotechnology 19.5 Genetic Engineering and Genomics Are Transforming Medical Diagnosis Gene-based technologies have had a major impact on the diagnosis of disease and are revolutionizing medical treatments and the development of specific and effective pharmaceuticals In large part as a result of the Human Genome Project, researchers are identifying genes involved in both single-gene diseases and complex genetic traits In this section, we provide an overview of representative examples that demonstrate how gene-based technologies are being used to diagnose genetic diseases Using DNA-based tests, scientists can directly examine a patient’s DNA for mutations associated with disease Gene testing was one of the first successful applications of recombinant DNA technology, and currently more than 900 gene tests are in use These tests usually detect DNA mutations associated with single-gene disorders that are inherited in a Mendelian fashion Examples of such genetic tests are those that detect sickle-cell anemia, cystic fibrosis, Huntington disease, hemophilias, and muscular dystrophies Other genetic tests have been developed for complex disorders such as breast and colon cancers Gene tests are used to perform prenatal diagnosis of genetic diseases, to identify carriers, to predict the future development of disease in adults, to confirm the diagnosis of a disease detected by other methods, and to identify genetic diseases in embryos created by in vitro fertilization For genetic testing of adults, DNA from white blood cells is commonly used Alternatively, many genetic tests can be carried out on cheek cells collected by swabbing the inside of the mouth, or hair cells Some genetic testing can be carried out on gametes Prenatal Genetic Testing The genetic testing of adults is increasing, as is the screening of newborns for genetic disorders (an ethically controversial issue that we will discuss in Section 19.8) In newborns, a simple prick of the heel of a baby produces a few drops of blood that are used to check the newborn for genetic disorders Over the past two decades more genetic tests have been used to detect genetic conditions in babies than in adults All states now require newborn screening for certain medical conditions And there are about 60 conditions that can be screened for In the United States, newborn screening identifies about 12,500 children with medical disorders out of approximately million babies born each year For prenatal diagnosis, fetal cells are obtained by amniocentesis or chorionic villus sampling Figure 19–5 shows the procedure for amniocentesis, in which a small volume of the amniotic fluid surrounding the fetus is removed Amniotic fluid contains fetal cells that can be used for karyotyping, genetic testing, and other procedures For chorionic villus sampling, cells from the fetal portion of the Placenta Amniotic cells Centrifuge Amniotic fluid in amniotic cavity Fluid: Composition analysis Cells: Karyotype, sex determination, biochemical analysis, genetic testing Uterine wall Cell culture: Biochemical analysis, karyotype and chromosome analysis, genetic testing Analysis using recombinant DNA methods F I G U RE – For amniocentesis, the position of the fetus is first determined by ultrasound, and then a needle is inserted through the abdominal and uterine walls to recover amniotic fluid and fetal cells for genetic or biochemical analysis 19.5 G enetic E ngineering and G enomics A re T ransforming M edical D iagnosis 403 placental wall (the chorionic villi) are sampled through a vacuum tube, and analyses can be carried out on this tissue Captured fetal cells can then be subjected to genetic analysis, usually involving techniques that involve PCR (such as allele-specific oligonucleotide testing, described later in this section) Noninvasive procedures are being developed for prenatal genetic testing of fetal DNA These procedures are making prenatal testing easier with little to no risk to the fetus Circulating in each person’s bloodstream is cell-free DNA that is released from dead and dying cells This DNA is cut up into small fragments by enzymes in the blood The blood of a pregnant woman contains snippets of DNA from cells of the fetus It is estimated that to percent of the DNA Mother Father in a pregnant mother’s blood belong to her baby M1 G G C A T T C C A T P1 A T A C A G G C T C It is now possible to analyze these traces of fetal DNA to determine if the baby has certain types M2 G A C A A T C G A T P2 A T T C A C G C T C of genetic conditions such as Down syndrome Fetus Such tests require about a tablespoon of blood M2 G A C A A T C G A T DNA in the blood is sequenced to analyze P2 A T T C A C G C T C haplotypes, contiguous segments of DNA that not undergo recombination during gamete formation, that distinguish which DNA FIGURE 19–6 Deducing fetal genome sequences from maternal blood For any given chromosome, a fetus inherits one copy segments are maternal and which are from the fetus (see of a haplotype from the mother (maternal copies, M1 or M2) Figure 19–6) If a fetal haplotype contained a specific mutation, and another from the father (paternal copies, P1 or P2) For this would also be revealed by sequence analysis In addition, simplicity, a single-stranded sequence of DNA from each haplonearly complete fetal genome sequences have been assembled type is shown Here the fetus inherited haplotypes are M2 and P2 from maternal blood These are developed by sequencing DNA from the mother and father, respectively DNA from the blood of fragments from maternal blood and comparing those fraga pregnant woman would contain paternal haplotypes inherited by the fetus (P2, blue), maternal haplotypes that are not passed ments to sequenced genomes from the mother and father to the fetus (M1, orange), and maternal haplotypes that are Bioinformatics software is then used to organize the genetic inherited by the fetus (M2, yellow) The maternal haplotype inhersequences from the fetus in an effort to assemble the fetal ited by the fetus (M2) would be present in excess amounts relagenome Currently, this technology results in an assembled tive to the haplotype that is not inherited (M1) These haplotype genome sequence with segments missing, so it does not capsequences can be detected by whole-genome shotgun sequencing ture the entire fetal genome It has been shown, however, that whole-genome shotgun sequencing of maternal plasma DNA can help prevent some women from having amniocentesis can be used to accurately sequence the entire exome of a fetus after a false positive report from ultrasound or protein Tests for fetal genetic analysis based on maternal blood marker tests samples started to arrive on the market in 2011 Sequenom of In Section 19.9 we will discuss preconception testing San Diego, California, was one of the first companies to launch and recent patents for computing technologies designed to such a test—MaterniT21®Plus, a Down syndrome test that predict the genetic potential of offspring (destiny tests) can also be used to test for trisomy 13 (Patau syndrome) and trisomy 18 (Edward syndrome) The MaterniT21®Plus Genetic Tests Based on Restriction Enzyme test analyzes 36-bp fragments of DNA to identify chromoAnalysis some 21 from the fetus Sequenom claims that this test is A classic method of genetic testing is restriction fragment highly accurate with a false positive rate of just 0.2 percent length polymorphism (RFLP) analysis As we will discuss The MaterniT21®Plus test can be done as early as week 10 in the next section, PCR-based methods have largely replaced (about the same time at which CVS sampling can be done, RFLP analysis; however, applications of this approach are which is about to weeks earlier than amniocentesis can be still used occasionally, and for historical purposes it is also performed) While not intended to replace amniocentesis, it 404 19 Applications and E thics of G enetic E ngineering and B iotechnology helpful to compare RFLP analysis to new approaches, which were largely not widespread prior to completion of the HGP To illustrate this method, we examine the prenatal diagnosis of sickle-cell anemia As we have discussed before, this disease is an autosomal recessive condition common in people with family origins in areas of West Africa, the Mediterranean basin, and parts of the Middle East and India It is caused by a single amino acid substitution in the -globin protein, as a consequence of a single-nucleotide substitution in the -globin gene The single-nucleotide substitution also eliminates a cutting site in the -globin gene for the restriction enzymes MstII and CvnI As a result, the mutation alters the pattern of restriction fragments seen on Southern blots These differences in restriction cutting sites are used to prenatally diagnose sickle-cell anemia and to establish the parental genotypes and the genotypes of other family members who may be heterozygous carriers of this condition DNA is extracted from tissue samples and digested with MstII This enzyme cuts three times within a region of the normal -globin gene, producing two small DNA fragments In the mutant sickle-cell allele, the middle MstII site is destroyed by the mutation, and one large restriction fragment is produced by MstII digestion (Figure 19–7) The restriction-enzyme-digested DNA fragments are separated by gel electrophoresis, transferred to a nylon membrane, and visualized by Southern blot hybridization, using a probe from this region Figure 19–7 shows the results of RFLP analysis for sickle-cell anemia in one family Only about to 10 percent of all point mutations can be detected by restriction enzyme analysis because most mutations occur in regions of the genome that not contain restriction enzyme cutting sites However, now that many disease-associated mutations are known, geneticists can employ synthetic oligonucleotides to detect these mutations, as described next Genetic Testing Using Allele-Specific Oligonucleotides A common method of genetic testing involves the use of synthetic DNA probes known as allele-specific oligonucleotides (ASOs) Scientists use these short, single-stranded fragments of DNA to identify alleles that differ by as little as a single nucleotide In contrast to restriction enzyme analysis, which is limited to cases for which a mutation changes a restriction site, ASOs detect single-nucleotide changes (single-nucleotide polymorphisms or SNPs), including those that not affect restriction enzyme cutting sites As a result, this method offers increased resolution and wider application The ASO is tagged with a molecule that is either radioactive or fluorescent, to allow for visualization of the ASO hybridized to DNA on the membrane Under proper conditions, an ASO will hybridize only with its complementary DNA sequence and not with other sequences, even those that vary by as little as a single nucleotide Genetic testing using ASOs and PCR analysis is now available to screen for many disorders, such as sickle-cell anemia Figure 19–8 shows an example of ASO testing for sickle-cell anemia This rapid, inexpensive, and accurate technique is used to diagnose a wide range of genetic disorders caused by point mutations Although highly effective, SNPs can affect probe binding leading to false positive or false negative results that may not reflect a genetic disorder, particularly if precise hybridization conditions are not used Sometimes DNA sequencing is carried out on amplified gene segments to confirm identification of a mutation Because ASO testing often involves PCR, small amounts of DNA can be analyzed As a result, ASO testing is ideal for preimplantation genetic diagnosis (PGD) PGD is the genetic analysis of single cells from embryos created by l Region recognized by probe BS-globin gene GTG ll bA/bA bS/bS bA/bS 3’ 5’ MstII MstII Normal BA-globin gene GAG 3’ 5’ MstII MstII MstII Genotypes: bA/bS F I G U RE –7 RFLP diagnosis of sickle-cell anemia In the mutant -globin allele ( s), a point mutation (GAG GTG) has destroyed a cutting site for the restriction enzyme MstII, resulting in a single large fragment on a Southern blot In the pedigree, bA/bS the family has one unaffected homozygous normal daughter (II-1), an affected son (II-2), and an unaffected carrier fetus (II-3) The genotype of each family member can be read directly from the blot and is shown below each lane 19.5 G enetic E ngineering and G enomics A re T ransforming M edical D iagnosis 405 Region of b-globin gene amplified by PCR Codon 5’ 3’ Region covered by ASO probes (a) DNA is spotted onto binding filters and hybridized with ASO probe AA AS SS Genotypes 19–2 The DNA sequence surrounding the site of the sicklecell mutation in the -globin gene, for normal and mutant genes, is as follows Each type of DNA is denatured into single strands and applied to a DNA-binding membrane The membrane containing the two spots is hybridized to an ASO of the sequence 5′-GACTCCTGAGGAGAAGT-3′ Which spot, if either, will hybridize to this probe? A Normal (b ) ASO: 5’ – CTCCTGAGGAGAAGTCTGC – 3’ 5’ -G A C T C C T G A G G A G A A G T - 3’ 3’ -C T G A G G A C T C C T C T T C A - 5’ (b) Genotypes AA AS SS Mutant (bS) ASO: 5’ – CTCCTGTGGAGAAGTCTGC – 3’ Allele-specific oligonucleotide (ASO) testing for the -globin gene and sickle-cell anemia The -globin gene is amplified by PCR, using DNA extracted from white blood cells or cells obtained by amniocentesis The amplified DNA is then denatured and spotted onto strips of DNA-binding membranes Each strip is hybridized to a specific ASO (a) Results observed when the three possible genotypes are hybridized to an ASO from the normal -globin allele: AA-homozygous individuals have normal hemoglobin that has two copies of the normal -globin gene and will show heavy hybridization; AS-heterozygous individuals carry one normal -globin allele and one mutant allele and will show weaker hybridization; SS-homozygous sickle-cell individuals carry no normal copy of the -globin gene and will show no hybridization to the ASO probe for the normal -globin allele (b) Results observed when DNA for the three genotypes are hybridized to the probe for the sickle-cell -globin allele: no hybridization by the AA genotype, weak hybridization by the heterozygote (AS), and strong hybridization by the homozygous sickle-cell genotype (SS) F I G U RE – in vitro fertilization (IVF) When sperm and eggs are mixed to create zygotes, the early-stage embryos are grown in culture A single cell can be removed from an early-stage embryo using a vacuum pipette to gently aspirate one cell away from the embryo This could possibly kill the embryo, but if it is done correctly the embryo will often continue to divide normally DNA from the removed cell is then typically analyzed by FISH (for chromosome analysis) or by ASO testing The genotypes for each cell can then be used to decide which embryos will be implanted into the uterus Any alleles that can be detected by ASO testing can be used for PGD Sickle-cell anemia, cystic fibrosis, and dwarfism are often tested for by PGD, but alleles for many other conditions are often analyzed As you will learn in Section 19.6, it is now becoming possible to carry out whole-genome sequencing on individual cells This method is now being applied for PGD of single cells from an embryo created by IVF Normal DNA 5’ -G A C T C C T G T G G A G A A G T - 3’ 3’ -C T G A G G A C A C C T C T T C A - 5’ Sickle-cell DNA H I NT: This problem asks you to analyze results of an ASO test The key to its solution is to understand that ASO analysis is done under conditions that allow only identical nucleotide sequences to hybridize to the ASO on the membrane For more practice see Problems and 21 Genetic Analysis Using Gene-Expression Microarrays Both RFLP and ASO analyses are efficient methods of screening for gene mutations; however, they can only detect the presence of one or a few specific mutations whose identity and locations in the gene are known There is also a need for genetic tests that detect complex previously unknown mutations in genes associated with genetic diseases and cancers, mutations that may be associated with, or predispose, a patient to a particular disease, and mRNA expression patterns for genes associated with specific diseases To analyze multiple genes or mRNA transcripts by genetic tests oftens requires comprehensive, high-throughput methods From an earlier chapter (see Chapter 18) recall that one high-throughput screening technique is based on the use of DNA microarrays (Figure 19–9) (also called DNA chips or gene chips; Figure 19–10) The numbers and types of single-stranded DNA sequences on a microarray are dictated by the type of analysis that is required For example, each field on a microarray might contain a DNA sequence derived from each member of a gene family, or sequence variants from one or several genes of interest, or a sequence derived from each gene in an organism’s genome 406 19 Applications and E thics of G enetic E ngineering and B iotechnology Normal Tumor Isolate RNA Use RT-PCR to produce cDNA labeled with fluorescent dyes F I G U RE – A commercially available DNA microarray, called a GeneChip, marketed by Affymetrix, Inc This microarray can be used to analyze expression for approximately 50,000 RNA transcripts It contains 22 different probes for each transcript and allows scientists to simultaneously assess the expression levels of most of the genes in the human genome Combine equal amounts Hybridize denatured cDNA to microarray What makes DNA microarrays so amazing is the immense amount of information that can be simultaneously generated from a single array DNA microarrays the size of postage stamps (just over cm square) can contain up to 500,000 different fields, each representing a different DNA sequence Earlier in the text (see Chapter 18), you learned about the use of microarrays for transcriptome analysis In the recent past DNA microarrays have had a wide range of applications, including the detection of mutations in genomic DNA and the detection of gene-expression patterns in diseased tissues However, in the near future whole-genome sequencing, exome sequencing, and RNA sequencing are expected to replace most applications involving microarrays and render this technology obsolete Human genome microarrays containing probes for most human genes are available DNA microarrays have been designed to scan for mutations in many diseaserelated genes, including the p53 gene, which is mutated in a majority of human cancers, and the BRCA1 gene, which, when mutated, predisposes women to breast cancer In addition to testing for mutations in single genes, DNA microarrays can contain probes that detect SNPs SNPs occur randomly about every 100 to 300 nucleotides throughout the human genome, both inside and outside of genes SNPs crop up in an estimated 15 million positions in the genome where these single-based changes reveal differences from one person to the next Certain SNP sequences at a specific locus are shared by certain segments of the population In addition, certain SNPs cosegregate with genes associated with some disease conditions By correlating the presence or absence of a particular SNP with a genetic disease, scientists are able to use the SNP as a genetic testing marker Scan microarray and analyze to reveal hybridization pattern mRNA present only in normal cells mRNA present in relatively equal amounts in normal and cancer cells mRNA present only in cancer cells FIGUR E 19–10 Microarray procedure for analyzing gene expression in normal and cancer cells The method shown here is based on a two-channel microarray in which cDNA samples from the two different tissues are competing for binding to the same probe sets Colors of dots on an expression microarray represent levels of gene expression In this example, green dots represent genes expressed only in one cell type (e.g., normal cells), and red dots represent genes expressed only in another cell type (e.g., cancer cells) Intermediate colors represent different levels of expression of the same gene in the two cell types Only a small portion of the DNA microarray is shown 19.5 G enetic E ngineering and G enomics A re T ransforming M edical D iagnosis The presence of SNPs as probes on a DNA microarray allows scientists to simultaneously screen thousands of genes that might be involved in single-gene diseases as well as those involved in disorders exhibiting multifactorial inheritance This technique, known as genome scanning, makes it possible to analyze a person’s DNA for dozens or hundreds of disease alleles, including those that might predispose the person to heart attacks, asthma, diabetes, Alzheimer disease, and other genetically defined disease subtypes Genome scans are occasionally used when physicians encounter patients with chronic illnesses where the underlying cause cannot be diagnosed Gene-expression microarrays have been widely used in both basic research and genetic testing for detecting geneexpression patterns for specific genes Gene-expression microarrays are effective for analyzing gene-expression patterns in genetic diseases because the progression of a tissue from a healthy to a diseased state is almost always accompanied by changes in expression of hundreds to thousands of genes Because mRNA expression is being detected through gene-expression microarrays, these arrays provide a powerful tool for diagnosing genetic disorders and gene-expression changes Expression microarrays may contain probes for only a few specific genes thought to be expressed differently in cell types or may contain probes representing each gene in the genome Although microarray techniques provide novel information about gene expression, keep in mind that DNA microarrays not directly provide us with information about protein levels in a cell or tissue We often infer what predicted protein levels may be based on mRNA expression patterns, but this may not always be accurate In one type of expression microarray analysis, mRNA is isolated from two different cell or tissue types—for example, normal cells and cancer cells arising from the same cell type (Figure 19–10) The mRNA samples contain transcripts from each gene that is expressed in that cell type Some genes are expressed more efficiently than others; therefore, each type of mRNA is present at a different level The level of each mRNA can be used to develop a geneexpression profile that is characteristic of the cell type Isolated mRNA molecules are converted into cDNA molecules, using reverse transcriptase The cDNAs from the normal cells are tagged with fluorescent dye-labeled nucleotides (for example, green), and the cDNAs from the cancer cells are tagged with a different fluorescent dye-labeled nucleotide (for example, red) The labeled cDNAs are mixed together and applied to a DNA microarray The cDNA molecules bind to complementary single-stranded probes on the microarray but not to other probes Keep in mind that each field or feature does not consist of just one probe, but rather they contain 407 thousands of copies of the probe After washing off the nonbinding cDNAs, scientists scan the microarray with a laser, and a computer captures the fluorescent image pattern for analysis The pattern of hybridization appears as a series of colored dots (often called a “heat” map), with each dot corresponding to one field of the microarray (Figure 19–10) The color patterns revealed on the microarray provide a sensitive measure of the relative levels of each cDNA in the mixture Expression microarray profiling has revealed that certain cancers have distinct patterns of gene expression and that these patterns correlate with factors such as the cancer’s stage, clinical course, or response to treatment As this type of analysis has been introduced into clinical use, it has been possible to adjust therapies for each group of cancer patients and to identify new specific treatments based on gene-expression profiles Similar gene-expression profiles have been generated for many other cancers, including breast, prostate, ovarian, and colon cancer Gene-expression microarrays are providing tremendous insight into both substantial and subtle variations in genetic diseases Several companies are now promoting nutrigenomics services in which they claim to use genotyping and geneexpression microarrays to identify allele polymorphisms and gene-expression patterns for genes involved in metabolism For example, polymorphisms in genes such apolipoprotein A (APOA1), involved in lipid metabolism, and MTHFR (methylenetetrahydrofolate reductase), involved in metabolism of folic acid, have been implicated in cardiovascular disease Nutrigenomics companies claim that microarray analysis of a patient’s DNA sample for genes such as these and others enables them to judge whether a patient’s allele variations or gene-expression profiles warrant dietary changes to potentially improve health and reduce the risk of diet-related diseases Application of Microarrays for Gene Expression and Genotype Analysis of Pathogens Among their many applications, microarrays have provided infectious disease researchers with powerful new tools for studying pathogens Genotyping microarrays are being used to identify strains of emergent viruses, such as the Ebola virus, the virus that causes the highly contagious condition called Severe Acute Respiratory Syndrome (SARS), and the H5N1 avian influenza virus, the cause of bird flu, which has killed people in Asia, leading to the slaughter of over 80 million chickens and causing concern about possible pandemic outbreaks 408 19 Applications and E thics of G enetic E ngineering and B iotechnology N meningitidis SARS virus E coli Identify host-response signatures to different pathogens SARS E coli Genes N meningitidis -30 -20 -10 +10 +20 +30 Whole-genome transcriptome analysis of pathogens is being used to inform researchers about genes that are important for pathogen infection and replication In this approach, bacteria, yeast, protists, or viral pathogens are used to infect host cells in vitro, and then expression microarrays are used to analyze pathogen gene-expression profiles Patterns of gene activity during pathogen infection of host cells and replication are useful for identifying pathogens and understanding mechanisms of infection But of course a primary goal of infectious disease research is to prevent infection Gene-expression profiling is also a valuable approach for identifying important pathogen genes and the proteins they encode that may prove to be useful targets for subunit vaccine development or for drug treatment strategies to prevent or control infectious disease FIGUR E 19–11 Gene-expression microarrays can reveal host-response signatures for pathogen identification In this example, mice were infected with different pathogens: Neisseria meningitidis, the virus that causes Severe Acute Respiratory Syndrome (SARS), and E coli Mouse tissues were then used as the source of mRNA for gene-expression microarray analysis Increased expression compared to uninfected control mice is shown in shades of yellow Decreased expression compared to uninfected controls is indicated in shades of blue Notice that each pathogen elicits a somewhat different response in terms of which major clusters of host genes are activated by pathogen infection (circles) Similarly, researchers are evaluating host responses to pathogens This type of detection has been accelerated in part by the need to develop pathogen-detection strategies for military and civilian use both for detecting outbreaks of naturally emerging pathogens such as SARS and avian influenza and for potential detection of outbreaks such as anthrax (caused by the bacterium Bacillus anthracis) that could be the result of a bioterrorism event Host-response gene-expression profiles are developed by exposing a host to a pathogen and then using expression microarrays to analyze host gene-expression patterns Figure 19–11 shows the different geneexpression profiles for mice following exposure to Neisseria meningitidis, the SARS virus, or E coli Comparing such host gene-expression profiles following exposure to different pathogens provides researchers with a way to quickly diagnose and classify infectious diseases Scientists are developing databases of both pathogen and host-response expression profile data that can be used to identify pathogens efficiently This type of approach was used to identify host genome-wide responses to Ebola virus infection of nonhuman primates Results from these analyses helped researchers develop therapeutic drugs for combating the Ebola virus outbreak in western Africa during 2014 ESSEN T IAL POIN T A variety of different molecular techniques, including restriction fragment length polymorphism analysis, allele-specific oligonucleotide tests, and DNA microarrays, can be used to identify genotypes associated with normal and diseased phenotypes 19.6 Genetic Analysis by Individual Genome Sequencing The ability to sequence and analyze individual genomes is rapidly changing the ways that scientists and physicians evaluate a person’s genetic information Genome 19.7 G enome - W ide A ssociation S tudies Identify Genome Variations T hat Contribute to D isease sequencing is being utilized in medical clinics at an accelerated rate Many major hospitals around the world are setting up clinical sequencing facilities for use in identifying the causes of rare diseases Recently, whole-genome sequencing has provided new insights into the genetics of anorexia, Alzheimer disease, autism, Proteus syndrome, and other diseases Proteus syndrome, a rare congenital disorder that causes atypical bone development, tumors, and other conditions, was the subject of the acclaimed movie Elephant Man Already there have been some very exciting success stories whereby whole-genome sequencing of an individual’s DNA has led to improved treatment of diseases in children and adults For example, through individual genome sequencing, researchers in Newfoundland identified a mutation in a gene called AVRD5 Newfoundlanders have one of the highest incidences in the world of a rare condition called arrhythmogenic right ventricular cardiomyopathy (ARVC), a condition in which affected individuals often have no symptoms but then die suddenly from irregular electrical impulses within the heart Mutations of the AVRD5 gene lead to such cases of premature death through ARVC Approximately 50 percent of males and percent of females die by age 40, and 80 percent of males and 20 percent of females die by age 50 Individuals carrying this mutation are now being implanted with internal cardiac defibrillators that can be used to restart their hearts if electrical impulses stop or become irregular Recall our introduction of the concept of exome sequencing (see Chapter 18) Exome sequencing in clinical settings is now producing some promising results For example, from the time he was born, Nicholas Volker had to live with unimaginable discomfort from an undiagnosed condition that was causing intestinal fistulas (holes from his gut to outside of his body) that were leaking body fluids and feces and requiring constant surgery By years of age, Nicholas had been to the operating room more than 100 times A team at the Medical College of Wisconsin decided to have Nicholas’s exome sequenced Applying bioinformatics to compare his sequence to that of the general population, they identified a mutation in a gene on the X chromosome called X-linked inhibitor of apoptosis (XIAP) XIAP is known to be linked to another condition that can often be corrected by a bone marrow transplant In 2010 a bone marrow transplant saved Nicholas’s life and largely restored his health Shortly thereafter the popular press was describing Nicholas as the first child saved by DNA sequencing We now have the ability to sequence the genome from a single cell! Single-cell sequencing typically involves isolating DNA from a single cell and then executing wholegenome amplification (WGA) to produce sufficient DNA to be 409 sequenced Reliable amplification of the genome to produce enough DNA for sequencing without introducing errors remains a major challenge Single-cell sequencing allows scientists to explore how the genome varies from cell to cell These studies are revealing that different mutant genes can vary greatly between individual cells Cancer cells from a tumor in particular often show genetic diversity that is only recently being appreciated by researchers and clinicians Understanding variations in gene mutations and expression by individual cells within a tumor could lead to better treatment choices Genome sequencing is quickly becoming standard practice in medical clinics, and there is every reason to expect genome-based diagnosis to become an essential part of mainstream medicine in the future 19.7 Genome-Wide Association Studies Identify Genome Variations That Contribute to Disease Microarray-based genomic analysis has led geneticists to employ powerful strategies called genome-wide association studies (GWAS) in their quest to identify genes that may influence disease risk During the past years there has been a dramatic expansion in the number of GWAS being reported For example, GWAS for height differences, autism, obesity, diabetes, macular degeneration, myocardial infarction, arthritis, hypertension, several cancers, bipolar disease, autoimmune diseases, Crohn disease, schizophrenia, amyotrophic lateral sclerosis, multiple sclerosis, and behavioral traits (such as intelligence) are among the many GWAS that have been widely publicized in the scientific literature and popular press In a GWAS, the genomes of thousands of unrelated individuals with a particular disease are analyzed, typically by microarray analysis, and results are compared with genomes of individuals without the disease as an attempt to identify genetic variations that may confer risk of developing the disease Many GWAS involve largescale use of SNP microarrays that can probe on the order of 500,000 SNPs to evaluate results from different individuals Other GWAS approaches can look for specific gene differences or evaluate CNVs or changes in the epigenome, such as methylation patterns in particular regions of a chromosome By determining which SNPs, CNVs, or epigenome changes co-occur in individuals with the disease, scientists can calculate the disease risk associated with each variation Analysis of GWAS results requires statistical analysis to predict the relative potential impact 19 -log10(PVAL) 410 Applications and E thics of G enetic E ngineering and B iotechnology Chr1 Chr2 Chr3 Chr4 Chr5 Chr6 Chr7 Chr8 Chr9 Chr10 Chr11 Chr13 Chr14 Chr15 Chr16 Chr17 Chr18 Chr19 Chr20 Chr21 Chr22 ChrX 7.25 7.00 6.75 6.50 6.25 6.00 5.75 5.50 5.25 5.00 4.75 4.50 4.25 4.00 3.75 3.50 3.25 3.00 2.75 2.50 2.25 2.00 1.75 1.50 1.25 1.00 0.75 0.50 0.25 0.00 Chr12 Sequences above the 10-5 p value are likely disease-related marker sequences FIGURE 19–12 A GWAS study for Type diabetes revealed 386,371 genetic markers, clustered here by chromosome number Markers above the black line appeared to be significantly associated with the disease (association or risk) of a particular genetic variation on development of a disease phenotype Figure 19–12 shows a typical representation of one way that results from GWAS are commonly reported Called a Manhattan plot, such representations are “scatterplots” that are used to display data with a large number of data points The x-axis typically plots a particular position in the genome; in this case loci on each chromosome are plotted in a different color code The y-axis plots results of a genotypic association test There are several ways that association can be calculated Shown here is a negative log of p values that shows loci determined to be significantly associated with a particular condition The top line of this plot establishes a threshold value for significance Marker sequences with significance levels exceeding 10−5, corresponding to 5.0 on the y-axis, are likely disease-related sequences (Figure 19–12) There are many questions and ethical concerns about patients involved in GWAS and their emotional responses to knowing about genetic risk data For example, • What does it mean if an individual has 3, 5, 9, or 30 risk alleles for a particular condition? • How we categorize rare, common, and low-frequency risk alleles to determine the overall risk for developing a disease? • GWAS often reveal dozens of DNA variations, but many variations have only a modest effect on risk How does one explain to a person that he or she has a gene variation that changes a risk difference for a particular disease from 12 to 16 percent over an individual’s lifetime? What does this information mean? • If the sum total of GWAS for a particular condition reveals about 50 percent of the risk alleles, what are the other missing elements of heritability that may contribute to developing a complex disease? In some cases, risk data revealed by GWAS may help patients and physicians develop diet and exercise plans designed to minimize the potential for developing a particular disease But the number of risk genes identified by most GWAS is showing us that, unlike single-gene disorders, complex genetic disease conditions involve a multitude of genetic factors contributing to the total risk for developing a condition We need such information 19.8 G enomics L eads to N ew, More Targeted M edical Treatment Including Personalized M edicine to make meaningful progress in disease diagnosis and treatment, which is ultimately a major purpose of GWAS GWAS is another technique that is likely to be replaced by genome sequencing and RNA-seq in the future ES S E NT I A L PO I N T Genome-wide association studies can reveal genetic variations linked with disease conditions within populations 19.8 Genomics Leads to New, More Targeted Medical Treatment Including Personalized Medicine Genomic technologies are changing medical diagnosis and allowing scientists to manufacture abundant and effective therapeutic proteins The examples already available today are a strong indication that in the near future, we will see even more transformative medical treatments based on genomics and advanced DNA-based technologies In this section, we provide brief introductions to pharmacogenomics, rational drug design, and gene therapy, topics that will be considered in greater detail later in the text (see Special Topic Chapter 4—Genomics and Personalized Medicine and Special Topic Chapter 6—Gene Therapy) Pharmacogenomics and Rational Drug Design Every year, more than million Americans experience serious side effects of medications, and more than 100,000 die from adverse drug reactions Until recently, the selection of effective medications for each individual has been a random, trial-and-error process The new field of pharmacogenomics promises to lead to more specific, effective, and personally customized drugs that are designed to complement each person’s individual genetic makeup In the 1950s, scientists discovered that individual reactions to drugs had a hereditary component We now know that many genes affect how different individuals react to drugs Some of these genes encode products such as cell-surface receptors that bind a drug and allow it to enter a cell, as well as enzymes that metabolize drugs For example, liver enzymes encoded by the cytochrome P450 gene family affect the metabolism of many modern drugs, including those used to treat cardiovascular and neurological conditions DNA sequence variations in these genes result in enzymes with different abilities to metabolize and utilize these drugs Thus, gene variants that encode inactive forms of the cytochrome P450 enzymes are associated with a patient’s inability to break down drugs in the body, leading to drug overdoses A genetic test that recognizes 411 some of these variants is currently being used to screen patients who are recruited into clinical trials for new drugs Knowledge from genetics and molecular biology is also contributing to the development of new drugs targeted at specific disease-associated molecules Most drug development is currently based on trial-and-error testing of chemicals in lab animals, in the hope of finding a chemical that has a useful effect In contrast, rational drug design involves the synthesis of specific chemical substances that affect specific gene products An example of a rational drug design product is the drug imatinib, trade name Gleevec, used to treat chronic myelogenous leukemia (CML) Geneticists had discovered that CML cells contain the Philadelphia chromosome, which results from a reciprocal translocation between chromosomes and 22 Gene cloning revealed that the t(9;22) translocation creates a fusion of the C-ABL proto-oncogene with the BCR gene This BCR-ABL fusion gene encodes a powerful fusion protein that causes cells to escape cell-cycle control The fusion protein, which acts as a tyrosine kinase, is not present in noncancer cells from CML patients To develop Gleevec, chemists used high-throughput screens of chemical libraries to find a molecule that bound to the BCR-ABL enzyme After chemical modifications to make the inhibitory molecule bind more tightly, tests showed that it specifically inhibited BCR-ABL activity Clinical trials revealed that Gleevec was effective against CML, with minimal side effects and a higher remission rate than that seen with conventional therapies Gleevec is now used to treat CML and several other cancers With scientists discovering more genes and gene products associated with diseases, rational drug design promises to become a powerful technology within the next decade Gene Therapy Although drug treatments are often effective in controlling symptoms of genetic disorders, the ideal outcome of medical treatment is to cure these diseases In an effort to cure genetic diseases, scientists are actively investigating gene therapy—a therapeutic technique that aims to transfer normal genes into a patient’s cells In theory, the normal genes will be transcribed and translated into functional gene products, which, in turn, will bring about a normal phenotype In many ways, gene therapy is the ultimate application of recombinant technology and genomics Although there have been some successful applications of gene therapy, it has proven to be technically challenging for many reasons ESSEN T IAL PO IN T Pharmacogenomics and gene therapy apply an understanding of the role of genes to develop customized treatments for genetic disorders 412 19 Applications and E thics of G enetic E ngineering and B iotechnology communicate the results of testing and the actual risks to those being tested? 19.9 Genetic Engineering, Genomics, and Biotechnology Create Ethical, Social, and Legal Questions Geneticists use recombinant DNA and genomic technologies to identify genes, diagnose and treat genetic disorders, produce commercial and pharmaceutical products, and solve crimes However, the applications that arise from these technologies raise important ethical, social, and legal issues that must be identified, debated, and resolved Here we present a brief overview of some current ethical debates concerning the uses of genetic technologies Genetic Testing and Ethical Dilemmas When the Human Genome Project was first discussed, scientists and the general public raised concerns about how genome information would be used and how the interests of both individuals and society can be protected To address these concerns, the Ethical, Legal, and Social Implications (ELSI) Program was established The ELSI Program considers a range of issues, including the impact of genetic information on individuals, the privacy and confidentiality of genetic information, and implications for medical practice, genetic counseling, and reproductive decision making Through research grants, workshops, and public forums, ELSI is formulating policy options to address these issues When the Human Genome Project started, ELSI focused on four areas in its deliberations concerning these various issues: (1) privacy and fairness in the use and interpretation of genetic information, (2) ways to transfer genetic knowledge from the research laboratory to clinical practice, (3) ways to ensure that participants in genetic research know and understand the potential risks and benefits of their participation and give informed consent, and (4) public and professional education The ELSI Program continues to work on a broad range of ethical, societal and policy issues A majority of the most widely applied genetic tests that have been used to date have provided patients and physicians with information that improve quality of life One example involves prenatal testing for phenylketonuria (PKU) and implementing dietary restrictions to diminish the effects of the disease But many of the potential benefits and consequences of genetic testing are not always clear For example, • We have the technologies to test for genetic diseases for which there are no effective treatments Should we test people for these disorders? • With current genetic tests, a negative result does not necessarily rule out future development of a disease; nor does a positive result always mean that an individual will get the disease How can we effectively • What information should people have before deciding to have a genome scan or a genetic test for a single disorder or have their whole genome sequenced? • Sequencing fetal genomes from the maternal bloodstream has revealed examples of mutations in the fetal genome (for example, a gene involved in Parkinson disease) How might parents and physicians use this information? • • How can we protect the information revealed by such tests? • Since sharing patient data through electronic medical records is a significant concern, what issues of consent need to be considered? How can we define and prevent genetic discrimination? Let’s explore a specific example In 2011, a case in Boston revealed the dangers of misleading results based on genetic testing A prenatal ultrasound of a pregnant woman revealed a potentially debilitating problem (Noonan syndrome) involving the spinal cord of the woman’s developing fetus Physicians ordered a DNA test, which came back positive for a gene variant in a database that listed the gene as implicated in Noonan syndrome The parents chose to terminate the pregnancy Months later it was learned that the locus linked to Noonan was not involved in the disease, yet there was no effective way to inform the research community To minimize these kinds of problems in the future, the National Institutes of Health (NIH) National Center for Biotechnology Information (NCBI) has developed a database called ClinVar (see www.ncbi.nlm.nih.gov/clinvar/) which integrates data from clinical genetic testing labs and research literature to provide an updated resource for researchers and physicians Disclosure of incidental results is another ethically challenging issue When someone has his or her genome sequenced or has a test done for a particular locus thought to be involved in a disease condition, the analysis sometimes reveals other mutations that could be of significance to the patient Researchers and clinicians are divided on whether such information should be disclosed to the patient What you think? Earlier in this chapter we discussed preimplantation genetic diagnosis (PGD), which provides couples with the ability to screen embryos created by in vitro fertilization for genetic disorders As we learn more about genes involved in human traits, will other, nondisease-related genes be screened for by PGD? Will couples be able to select embryos with certain genes encoding desirable traits for height, weight, intellect, and other physical or mental characteristics? What you think of using genetic testing to purposely select for an embryo with a genetic disorder? There have been several well-publicized 19.9 G enetic E ngineering , G enomics , and B iotechnology Create Ethical , Social , and Legal Q uestions cases of couples seeking to use prenatal diagnosis or PGD to select for embryos with dwarfism and deafness As identification of genetic traits becomes more routine in clinical settings, physicians will need to ensure genetic privacy for their patients There are significant concerns about how genetic information could be used in negative ways by employers, insurance companies, governmental agencies, or the general public Genetic privacy and prevention of genetic discrimination will be increasingly important in the coming years In 2008, the Genetic Information Nondiscrimination Act was signed into law in the United States This legislation is designed to prohibit the improper use of genetic information in health insurance and employment Direct-to-Consumer Genetic Testing and Regulating the Genetic Test Providers The past decade has seen dramatic developments in directto-consumer (DTC) genetic tests A simple Web search will reveal many companies offering DTC genetic tests As of 2015, there are over 2000 diseases for which such tests are now available (in 1993 there were about 100 such tests) Most DTC tests require that a person mail a saliva sample, hair sample, or cheek cell swab to the company For a range of pricing options, DTC companies largely use SNPbased tests such as ASO tests to screen for different mutations For example, in 2007 Myriad Genetics, Inc began a major DTC marketing campaign of its tests for BRCA1 and BRCA2, which have been available since 1996 Mutations in these genes increase risk of developing breast and ovarian cancer DTC testing companies report absolute risk, the probability that an individual will develop a disease, but how such risks results are calculated is highly variable and subject to certain assumptions Such tests are controversial for many reasons For example, the test is purchased online by individual consumers and requires no involvement of a physician or other health-care professionals such as a nurse or genetic counselor to administer or to interpret results There are significant questions about the quality, effectiveness, and accuracy of such products because currently the DTC industry is largely self-regulated The FDA does not regulate DTC genetic tests There is at present no comprehensive way for patients to make comparisons and evaluations about the range of tests available and their relative quality Most companies make it clear that they are not trying to diagnose or prevent disease, nor that they are offering health advice, so what is the purpose of the information that test results provide to the consumer? Web sites and online programs from DTC companies provide information on what advice a person should pursue if positive results are obtained But is this enough? If results are not understood, might negative tests not provide a false 413 sense of security? Just because a woman is negative for BRCA1 and BRCA2 mutations does not mean that she cannot develop breast or ovarian cancer Refer to Problem 15 for an example of a personal decision that actress Angelina Jolie made based on the results of a genetic test In 2010, the FDA announced that five genetic test manufacturers would need FDA approval before their tests could be sold to consumers This action was prompted when Pathway Genomics announced plans to market a DTC kit for “comprehensive genotyping” in the pharmacy chain Walgreens Pathway Genomics and the other companies had been selling their DTCs through company Web sites for several years Pathway and others claimed that because their DTC kits are Clinical Laboratory Improvement Amendments (CLIA) approved that no further regulation was required CLIA regulates certain laboratory tests but is not part of the FDA Pathway and de-CODE dropped out of the DTC market Whether the FDA will oversee DTC genetic tests in the future is unclear At the time of publication of this edition, the FDA continues to work on plans for regulation of DTC genetic tests But because some DTC genetic testing companies, such as 23andMe, offer health-related analyses or health reports, they fall under FDA regulations The FDA continues to issue warnings to DTC companies to provide what the FDA considers to be appropriate health-related interpretations of genetic tests There are varying opinions on the regulatory issue Some believe that the FDA has no business regulating DTC tests and that consumers should be free to purchase products according to their own needs or interests Others insist that the FDA must regulate DTCs in the interest of protecting consumers The National Institutes of Health created the Genetic Testing Registry (GTR), designed to increase transparency by allowing companies to publicly share information about the utility of their genetic testing products, research for the general public, patients, health-care workers, genetic counselors, insurance companies, and others The GTR is intended to allow individuals and families access to key resources to make more well-informed decisions about their health and genetic tests But participation in the GTR by DTC companies has not been made mandatory yet, so will companies involved in genetic testing participate? DNA and Gene Patents Intellectual property (IP) rights are also being debated as an aspect of the ethical implications of genetic engineering, genomics, and biotechnology Patents on intellectual property (isolated genes, new gene constructs, recombinant cell types, GMOs) can be potentially lucrative for the patentholders but may also pose ethical and scientific problems Why is protecting IP important for companies? Consider this issue If a company is willing to spend millions or billions of dollars and several years doing research and development 414 19 Applications and E thics of G enetic E ngineering and B iotechnology (R&D) to produce a valuable product, then shouldn’t it be afforded a period of time to protect its discovery so that it can recover R&D costs and make a profit on its product? Genes in their natural state as products of nature cannot be patented Consider the possibilities for a human gene that has been cloned and then patented by the scientists who did the cloning The person or company holding the patent could require that anyone attempting to research with the patented gene pay a licensing fee for its use Should a diagnostic test or therapy result from the research, more fees and royalties may be demanded, and as a result the costs of a genetic test may be too high for many patients to afford But limiting or preventing the holding of patents for genes or genetic tools could reduce the incentive for pursuing the research that produces such genes and tools, especially for companies that need to profit from their research Should scientists and companies be allowed to patent DNA sequences from naturally living organisms? And should there be a lower or an upper limit to the size of those sequences? For example, should patents be awarded for small pieces of genes, such as expressed sequence tags (ESTs), just because some individual or company wants to claim a stake in having cloned a piece of DNA first, even if no one knows whether the DNA sequence has a use? Can or should investigators be allowed to patent the entire genome of any organism they have sequenced? As of 2014, the U.S Patent and Trademark Office has granted patents for more than 35,000 genes or gene sequences, including an estimated 20 percent of human genes Incidentally the patenting of human genes has led some to use the term patentome! Some scientists are concerned that to award a patent for simply cloning a piece of DNA is awarding a patent for too little work Given that computers most of the routine work of genome sequencing, who should get the patent? What about individuals who figure out what to with the gene? What if a gene sequence has a role in a disease for which a genetic therapy may be developed? Many scientists believe that it is more appropriate to patent novel technology and applications that make use of gene sequences than to patent the gene sequences themselves In recent years Congress has been considering legislation that would ban the patenting of human genes and any sequences, functions, or correlations to naturally occurring products from a gene The patenting of genetic tests is also under increased scrutiny in part because of concerns that a patented test can create monopolies in which patients cannot get a second opinion if only one company holds the rights to conduct a particular genetic test Recent analysis has estimated that as many as 64 percent of patented tests for disease genes make it very difficult or impossible for other groups to propose a different way to test for the same disease In 2010 a landmark case brought by the American Civil Liberties Union against Myriad Genetics contended that Myriad could not patent the BRCA1 and BRCA2 sequences used to diagnose breast cancer Myriad’s BRACAnalysis product has been used to screen over a million women for BRCA and during its period of patent exclusivity A U.S District Court judge ruled Myriad’s patents invalid on the basis that DNA in an isolated form is not fundamentally different from how it exists in the body Myriad was essentially accused of having a monopoly on its tests, which have existed for a little over a decade based on its exclusive licenses in the United States This case went to the Supreme Court in 2013, which rendered a 9–0 ruling against Myriad, stripping it of five of its patent claims for the BRCA1 and BRCA2 genes, largely based on the view that natural genes are a product of nature and just because they are isolated does not mean they can be patented The Court ruled that cDNA sequences produced in a lab can continue to be patentable Myriad still holds about 500 valid claims related to BRCA gene testing Whole-Genome Sequence Analysis Presents Many Questions of Ethics To date, the majority of genetic testing applications have involved approaches such as amniocentesis and chorionic villus sampling for identifying chromosomal defects and testing for individual genes through methods such as ASO analysis Even microarray analysis has not been widely used for genetic testing But in the next decade and beyond, it is expected that whole-genome sequencing of adults and babies will increasingly become common in clinical settings A Genomic Sequencing and Newborn Screening Disorders Program is underway to sequence the exomes of more than 1500 babies Both infants with illnesses and babies who are healthy will be part of this screening program This will allow scientists to carry out comparative genomic analyses of specific sequences to help identify genes involved in disease conditions Screening of newborns is important to help prevent or minimize the impacts of certain disorders Earlier in this section we mentioned the positive impact of early screening for PKU Each year routine blood tests from a heel prick of newborn babies reveal rare genetic conditions in several thousand infants in the United States alone A small number of states allow parents to opt out of newborn testing In the future, should DNA sequencing at the time of birth be required? Do we really know enough about which human genes are involved in disease to help prevent disease in children? Estimates suggest that sequencing can identify approximately 15 to 50 percent of children with diseases that currently cannot be diagnosed by other methods What is the value of having sequencing data for healthy children? As exciting as this period of human genetics and medicine is becoming, many of the whole-genome sequencing studies of individuals are happening in a largely unregulated environment, especially with respect to DNA collection, the G E NE TIC S , TE CHNOLOG Y, AN D S O CI ET Y variability and quality control of DNA handling protocols, sequence analysis, storage, and confidentiality of genetic information (see the Genetics, Technology, and Society box below), which is raising significant ethical concerns Preconception Testing, Destiny Predictions, and Baby-Predicting Patents Companies are now promoting the ability to preconception testing and thus make “destiny predictions” about the potential phenotypes of hypothetical offspring based on computation methods for analyzing sequence data of parental DNA samples The company 23andMe has been awarded a U.S patent for a computational method called the Family Traits Inheritance Calculator to use parental DNA samples to predict a baby’s traits, including eye color and the risk of certain diseases This patent includes applications of technologies to screen sperm and ova for in vitro fertilization (IVF) Currently, gender selection of embryos generated by IVF is very common But could preconception testing lead to the selection of “designer babies”? Fear of eugenics surrounds these conversations, particularly as genetic analysis starts moving away from disease conditions to nonmedical traits such as hair color, eye color, other physical traits, and potentially behavioral traits The patent has been awarded for a process that will compare the genotypic data of an egg provider and a sperm provider to suggest gamete donors that might result in a baby or hypothetical offspring with particular phenotypes of interest to a prospective parent What you think about this? A company called GenePeeks claims to have a patent-pending technology for reducing the risk of inherited disorders by “digitally weaving” together the DNA of prospective parents GenePeeks plans to sequence the DNA of sperm donors and women who want to get pregnant to inform women about donors who are most genetically compatible for the traits they seek in offspring Their 415 proprietary computing technology is intended to use sequence data to examine virtual progeny from donorclient pairings to estimate the likelihood of particular diseases from prospective parents Initially, the company claims that it plans to focus on looking at around 100,000 loci involved in rare disease Will technologies such as this become widespread and attract consumer demand in the future? What you think? Would you want this analysis done before deciding whether to have a child with a particular person? Patents and Synthetic Biology The J Craig Venter Institute (JCVI) has filed two patent applications for what is being called “the world’s firstever human-made life form.” The patents are intended to cover the minimal genome of M genitalium, which JCVI believes are the genes essential for self-replication One of these patent applications is designed to claim the rights to synthetically constructed organisms Another U.S patent issued to a different group of researchers covers application of a minimal genome for E coli, which has generated even more concern given its relative importance compared with M genitalium What you think? Should it be possible to patent a minimal genome or a synthetic organism? Consider these other ethical issues about synthetic biology Synthetic biology has the potential to be used for harmful purposes (such as bioterrorism) What regulatory policies and restrictions should be placed on applications of synthetic biology and on patents of these applications? The ability to modify life forms offends some people How will this issue be addressed by the synthetic biology research community? ESSEN T IAL PO IN T Applications of genetic engineering and biotechnology involve a wide range of ethical, legal, and social dilemmas with important scientific and societal implications G E N E T I C S , T E C H N O L O G Y, A N D S O C I E T Y O Privacy and Anonymity in the Era of Genomic Big Data ur lives are surrounded by Big Data Enormous quantities of personal information are stored on private and public databases, revealing our purchasing preferences, search engine histories, social contacts, and even GPS locations We allow this information to be collected in exchange for services that we perceive as valuable, or at least innocuous But often we not know how this information may be used now, or in the future, and we not consider how its distribution may affect us, our families, and our relationships Perhaps the most personal of all Big Data entries are those obtained from personal genome sequences and genomic analyses Tens of thousands of individuals are now donating DNA for whole-genome sequencing—to be carried out by both private gene-sequencing companies and (continued) 416 19 Applications and E thics of G enetic E ngineering and B iotechnology Genetics, Technology, and Society, continued public research projects Most people who donate their DNA for sequence analysis so with the assumption, or promise, that these data will remain anonymous and private Even when we are informed that the data will be available to others, we express little concern After all, what consequences could possibly come from access to gigabytes of anonymous A’s, C’s, T’s, and G’s? Surprisingly, the answer is—quite a lot One of the first inklings of genetic privacy problems arose in 2005, when a 15-year-old boy named Ryan Kramer tracked down his anonymous spermdonor father using his own Y chromosome sequence data and the Internet (Motluk, A., New Scientist 2524: November 3, 2005) Ryan submitted a DNA sample to a genealogy company that generates Y chromosome profiles, matches them against entries in their database, and puts people into contact with others who share similar genetic profiles, indicating relatedness Two men contacted Ryan, and both had the same last name, with different spellings Ryan combined the last names with the only information that he had about his sperm-donor father—date of birth, birth place, and college degree Using an Internet people-search service, he obtained the names of everyone born on that date in that place On the list, there was one man with the same last name as his two Y chromosome relatives Through another Internet search, Ryan confirmed that the man also had the appropriate college degree He then contacted his spermdonor father Since this report, other children of sperm donors have used DNA genealogy searching to find their paternal parent In these cases, the sperm donors had not submitted their DNA to genealogy companies, but their identities could CASE STUDY A be determined indirectly The implications for sperm donors have been unsettling, as most are promised anonymity In some cases, donors are troubled to learn that they have fathered dozens of offspring More recently, several published reports reveal the ease with which anyone’s identity can be traced using whole-genome DNA sequences For example, a research team led by Dr Yaniv Erlich of the Whitehead Institute described how anonymous entries in the 1000 Genomes Project database could be traced and identified (Gymrek, M et al 2013 Science 339:321–324) To begin, they converted a number of randomly chosen Y chromosome sequences from the database into STR profiles, and then they used the profiles to search two free public genealogy databases These searches yielded family surnames and other information such as geographical location and pedigrees In all, the identities of 50 people were revealed starting from the genome sequences of only five individuals Studies similar to Erlich’s show that a person’s identity and other personal information such as age, sex, body mass index, glucose, insulin, lipid levels, and disease susceptibilities can be revealed, starting with anonymous RNA expression-level data or data from SNP genotyping microarrays To many people, the implications of “genomic de-identifications” are disturbing Genomic information leaks could reveal personal medical information, physical appearance, and racial origins They could also be used to synthesize DNA to plant at a crime scene or could be used in unforeseen ways in the future as we gain more information that resides in our genome The consequences of genomic information leaks are not limited to the person whose genome was sequenced, but encompass family members from many generations, who share the person’s genetic heritage It may be time to address questions revolving around genome data privacy and anonymity before it is too late to put the genome-genie back into the bottle Your Turn Would you donate your DNA to a research project designed to discover the genetic links to certain cancers? If not, why not? If so, what privacy assurances would you need to make you comfortable about your donation? Some of the consequences of genome data leaks as they pertain to research projects are outlined in Brenner, S.E 2013 Be prepared for the big genome leak Nature 498: 139 Genome data privacy issues are being debated by scientists and regulatory agencies What are some of their concerns about privacy, and what ideas are being proposed to deal with these concerns? To begin a discussion about privacy, informed consent, and regulations, see Hayden, E.C 2012 Informed consent: a broken contract Nature 486:312–314 One way to deal with issues of genome data leaks is to make all personal genome data open and accessible What are the advantages and disadvantages of this approach? This approach is being taken by the Personal Genome Project Read about their nonanonymous approach to DNA data at http://www personalgenomes.org/non-anonymous Three-parent babies—the ethical debate couple wants to have a baby, but the woman has MERRF (Myoclonic Epilepsy with Ragged Red Fibers) syndrome, caused by a mutation at position 8344 in the mitochondrial genome (mtDNA) Since all mitochondria are inherited solely from the mother, her baby would have her disease They hear of a new technique called mitochondrial manipulation technology, where the mother’s nucleus is transferred to a donor egg and then fertilized with the father’s sperm There are some concerns about the safety and efficiency of the technique, the impacts on the mental wellbeing of the child, and also about the identity of the child as he or she has three parents There are also profound ethical implications because the technique introduces permanent changes into the germline What would you advise the couple to do? Describe the pros and cons of the mitochondrial manipulation technology The British government legalized mitochondrial manipulation in February 2015, the first government to so Should more countries the same? Justify your answer PROBLE MS A ND DIS C U S S IO N Q U ES T I O N S 417 INSIGHTS AND SOLUTIONS interleukins, and genes expressed in the hair follicle itself Speculate how these candidate genes may help scientists understand how AA progresses as a disease Research by Petukhova et al (Nature 466:113–117, 2010) involved a GWAS to analyze 1054 cases of patients with alopecia areata (AA) and 3278 controls Alopecia areata is a condition that leads to major hair loss and affects approximately 5.3 million people in the United States alone Solution: (a) Investigators identified eight genomic regions with SNPs that exceed the genome-wide significance value of (red line) These regions were clustered on chromosomes 2, 4, 6, 9, 10, 11, and 12 (a) A Manhattan plot from this work is shown below: 40 -log P 30 20 10 Position in genome Based on your interpretation of this plot, which chromosomes were associated with loci that may contribute to AA? (b) Of the 139 SNPs significantly associated with AA, several genes are involved in controlling the activation and proliferation of regulatory T lymphocytes (Treg cells) and cytotoxic T lymphocytes, genes involved in antigen presentation to immune cells, immune regulatory molecules such as the Problems and Discussion Questions HOW DO WE KNOW ? In this chapter, we focused on a number of interesting applications of genetic engineering, genomics, and biotechnology At the same time, we found many opportunities to consider the methods and reasoning by which much of this information was acquired From the explanations given in the chapter, what answers would you propose to the following fundamental questions: (a) What experimental evidence confirms that we have introduced a useful gene into a transgenic organism and that it performs as we anticipate? (b) How can we use DNA analysis to determine that a human fetus has sickle-cell anemia? (c) How can DNA microarray analysis be used to identify specific genes that are being expressed in a specific tissue? (d) How are GWAS carried out, and what information they provide? (e) What are some of the technical reasons why gene therapy is difficult to carry out effectively? CONCEPT QUESTION Review the Chapter Concepts list on page 394 Most of these center on applications of genetic technology that are becoming widespread Write a short essay that summarizes the impacts 10 11 12 13 14 15 16 17 18 19 21 20 22 (b) AA is an autoimmune disease in which the immune system attacks hair follicles, resulting in hair loss that can permeate across the entire scalp and even the whole body AA hair follicles are attacked by T cells The identification of candidate genes involved in T-cell proliferation, immune system regulation, and follicular development may potentially help investigators develop cures for AA Visit for instructor-assigned tutorials and problems that genomic applications are having on society and the ethical issues presented by these applications An unapproved form of gene therapy, known as enhancement gene therapy, can create considerable ethical dilemmas Why? There are more than 1000 cloned farm animals in the United States In the near future, milk from cloned cows and their offspring (born naturally) may be available in supermarkets These cloned animals have not been transgenically modified, and they are no different than identical twins Should milk from such animals and their natural-born offspring be labeled as coming from cloned cows or their descendants? Why? One of the major causes of sickness, death, and economic loss in the cattle industry is Mannheimia haemolytica, which causes bovine pasteurellosis, or shipping fever Noninvasive delivery of a vaccine using transgenic plants expressing immunogens would reduce labor costs and trauma to livestock An early step toward developing an edible vaccine is to determine whether an injected version of an antigen (usually a derivative of the pathogen) is capable of stimulating the development of antibodies in a test organism The following table assesses the ability of a transgenic portion of a toxin (Lkt) of M haemolytica to stimulate development of specific antibodies in rabbits 418 19 Applications and E thics of G enetic E ngineering and B iotechnology Immunogen Injected Antibody Production in Serum Lkt50*—saline extract + Lkt50—column extract + Mock injection − Pre-injection − *Lkt50 is a smaller derivative of Lkt that lacks all hydrophobic regions + indicates at least 50 percent neutralization of toxicity of Lkt; − indicates no neutralization activity Source: Modified from Lee et al 2001 Infect and Immunity 69: 5786–5793 (a) What general conclusion can you draw from the data? (b) With regards to development of a usable edible vaccine, what work remains to be done? Describe how the team from the J Craig Venter Institute created a synthetic genome How did they demonstrate that the genome converted the recipient strain of bacteria into a different strain? Sequencing the human genome and the development of microarray technology promise to improve our understanding of normal and abnormal cell behavior How are microarrays dramatically changing our understanding of complex diseases such as cancer? A couple with European ancestry seeks genetic counseling before having children because of a history of cystic fibrosis (CF) in the husband’s family ASO testing for CF reveals that the husband is heterozygous for the Δ508 mutation and that the wife is heterozygous for the R117 mutation You are the couple’s genetic counselor When consulting with you, they express their conviction that they are not at risk for having an affected child because they each carry different mutations and cannot have a child who is homozygous for either mutation What would you say to them? As genetic testing becomes widespread, medical records will contain the results of such testing Who should have access to this information? Should employers, potential employers, or insurance companies be allowed to have this information? Would you favor or oppose having the government establish and maintain a central database containing the results of individuals’ genome scans? 10 What limits the use of differences in restriction enzyme sites as a way of detecting point mutations in human genes? 11 Genes in their natural state cannot be patented This policy allows research and use of natural products for the common good What argument might be presented in favor of patenting genes or gene products? 12 What are the different genetic markers that genome-wide association studies (GWAS) employ? How can scientists use this data to calculate the disease risk associated with each variation? 13 The family of a sixth-grade boy in Palo Alto, California, was informed by school administrators that he would have to transfer out of his middle school because they believed his mutation of the CFTR, which does not produce any symptoms associated with cystic fibrosis, posed a risk to other students at the school who have cystic fibrosis After missing 11 days of school, a settlement was reached to have the boy return to school Based on what you know about GINA, the Genetic Information Nondiscrimination Act, what ethical problems might you associate with this example? 14 Dominant mutations can be categorized according to whether they increase or decrease the overall activity of a gene or gene product Although a loss-of-function mutation (a mutation that inactivates the gene product) is usually recessive, for some genes, one dose of the normal gene product, encoded by the normal allele, is not sufficient to produce a normal phenotype In this case, a loss-of-function mutation in the gene will be dominant, and the gene is said to be haploinsufficient A second category of dominant mutation is the gain-of-function mutation, which results in a new activity or increased activity or expression of a gene or gene product The gene therapy technique currently used in clinical trials involves the “addition” to somatic cells of a normal copy of a gene In other words, a normal copy of the gene is inserted into the genome of the mutant somatic cell, but the mutated copy of the gene is not removed or replaced Will this strategy work for either of the two aforementioned types of dominant mutations? 15 The first attempts at gene therapy began in 1990 with the treatment of a young girl with a genetic disorder abbreviated SCID What does SCID stand for? In the context of SCID, what does ADA stand for? 16 The Genetic Testing Registry is intended to provide better information to patients, but companies involved in genetic testing are not required to participate Should company participation be mandatory? Why or why not? Explain your answers 17 Once DNA is separated on a gel, it is often desirable to gain some idea of its informational content How might this be done? 18 Would you have your genome sequenced, if the price was affordable? Why or why not? If you answered yes, would you make your genome sequence publicly available? How might such information be misused? 19 Following the tragic shooting of 20 children at a school in Newtown, Connecticut, in 2012, Connecticut’s state medical examiner requested a full genetic analysis of the killer’s genome What you think investigators might be looking for? What might they expect to find? Might this analysis lead to oversimplified analysis of the cause of the tragedy? 20 Private companies are now offering personal DNA sequencing along with interpretation What services they offer? Do you think that these services should be regulated, and if so, in what way? Investigate one such company, 23andMe, at http:// www.23andMe.com 21 Yeager, M., et al (Nature Genetics 39: 645–649, 2007) and Sladek, R et al (Nature 445: 881–885, 2007) have used singlenucleotide polymorphisms (SNPs) in genome-wide association studies (GWAS) to identify novel risk loci for prostate cancer and Type diabetes mellitus, respectively Each study suggests that disease-risk genes can be identified that significantly contribute to the disease state Given your understanding of such complex diseases, what would you consider as reasonable factors to consider when interpreting the results of GWAS studies? 22 In March 2010 Judge R Sweet ruled to invalidate Myriad Genetics’ patents on the BRCA1 and BRCA2 genes Sweet wrote that since the genes are part of the natural world, they are not patentable Myriad Genetics also holds patents on the development of a direct-to-consumer test for the BRCA1 and BRCA2 genes (a) Would you agree with Judge Sweet’s ruling to invalidate the patenting of the BRCA1 and BRCA2 genes? If you were asked to judge the patenting of the direct-to-consumer test for the BRCA1 and BRCA2 genes, how would you rule? (b) J Craig Venter has filed a patent application for his “firstever human made life form.” This patent is designed to cover the genome of M genitalium Would your ruling for Venter’s “organism” be different from Judge Sweet’s ruling on patenting of the BRCA1 and BRCA2 genes? 20 Developmental Genetics CHAPTER CONCEPTS ■■ Gene expression during development is based on the differential transcription of selected genes ■■ Animals use a small number of shared signaling systems and regulatory networks to construct a wide range of adult body forms from the zygote These shared properties make it possible to use animal models to study human development ■■ Differentiation is controlled by cascades of gene expression that are a consequence of events that specify and determine the developmental fate of cells ■■ Plants independently evolved developmental regulatory mechanisms that parallel those of animals ■■ In many organisms, cell–cell signaling systems program the developmental fate of adjacent and distant cells This unusual four-winged Drosophila has developed an extra set of wings as a result of a mutation in a homeotic selector gene O ver the last two decades, the use of genetic analysis, molecular biology, and genomics showed that, in spite of wide diversity in the size and shape of adult animals and plants, multicellular organisms share many genetic pathways and molecular signaling mechanisms that control developmental events leading from the zygote to the adult At the cellular level, development is marked by three important events: specification, when the first cues confer spatial identity, determination, when a specific developmental fate for a cell becomes fixed, and differentiation, the process by which a cell achieves its final adult form and function Now, thanks to newly developed methods of analysis including microarrays, high-throughput sequencing, epigenetics, proteomics, and systems biology, we are beginning to understand how the action and interaction of genes and environmental factors control developmental processes in multicelluar organisms In this chapter, the primary emphasis will be on how genetic analysis has been used to study development This field, called developmental genetics, laid the foundation for our understanding of developmental events at the molecular and cellular levels, which contribute to the continually changing phenotype of the newly formed organism 419 420 20 D EVELOPMEN TAL GEN ETICS 20.1 Differentiated States Develop from Coordinated Programs of Gene Expression Animal genomes contain tens of thousands of genes, but only a small subset of these control the events that shape the adult body (Figure 18–1) Developmental geneticists study mutant alleles of these genes to ask important questions about development: • • • What genes are expressed? • • How is the expression of these genes regulated? When are they expressed? In what parts of the developing embryo are they expressed? What happens when these genes are defective? These questions provide a foundation for exploring the molecular basis of developmental processes such as determination, induction, cell–cell communication, and cellular differentiation Genetic analysis of mutant alleles is used to establish a causal relationship between the presence or absence of inducers, receptors, transcriptional events, cell and tissue interactions, and the observable structural changes that accompany development A useful way to define development is to say that it is the attainment of a differentiated state by all the cells of an organism (except for stem cells) For example, a cell in a blastula-stage embryo (when the embryo is just a ball of uniform-looking cells) is undifferentiated, whereas a red blood cell synthesizing hemoglobin in the adult body is differentiated How cells get from the undifferentiated to the differentiated state? The process involves progressive activation of different groups of gene sets in different cells of the embryo From a genetics perspective, one way of defining the different cell types that form during development in multicellular organisms is to identify and catalog the genes that are active in each cell type In other words, development depends on patterns of differential gene expression The idea that differentiation is accomplished by activating and inactivating genes at different times and in different cell types is called the variable gene activity hypothesis Its underlying assumptions are, first, that each cell contains an entire genome and, second, that differential transcription of selected genes controls the development and differentiation of each cell In multicellular organisms, the genes involved with development have been conserved throughout evolution, along with the patterns of differential transcription, and the ensuing developmental mechanisms As a result, scientists are able to learn about development in complex multicellular organisms including humans by dissecting these mechanisms in a small number of genetically well-characterized model organisms 20.2 Evolutionary Conservation of Developmental Mechanisms Can Be Studied Using Model Organisms Genetic analysis of development across a wide range of organisms has shown that the size and shape of all animal bodies are controlled by a common set of genes and developmental mechanisms For example, most of the differences in anatomical structures between organisms as diverse as zebras and zebrafish result from different patterns of expression in a single gene set, called the homeotic (abbreviated as Hox) genes, and not by expression of a host of speciesspecific genes Genome-sequencing projects have confirmed that homeotic genes from a wide range of organisms have a common ancestry; this homology means that many aspects of normal human embryonic development and associated genetic disorders can be studied in model organisms such as Drosophila, where genetic methods including mutagenesis, genetic crosses, and large-scale experiments involving hundreds of offspring can be conducted (see Chapter for a discussion of model organisms in genetics) Studies comparing developmental processes in different organisms (a field called evolutionary developmental biology) have revealed that although many developmental mechanisms are common to all animals, over evolutionary time, several new and unique ways of transforming a zygote into an adult have appeared These evolutionary changes result from mutation, gene duplication and divergence, assignment of new functions to old genes, and the recruitment of genes to new developmental pathways However, the emphasis in this chapter will be on the similarities in genes and developmental mechanisms among species Analysis of Developmental Mechanisms In the space of this chapter, we cannot survey all aspects of development, nor can we explore in detail how the array of developmental mechanisms triggered by the fusion of sperm and egg were identified Instead, we will focus on a number of general processes in development: • how the adult body plan of animals is laid down in the embryo • the program of gene expression that turns undifferentiated cells into differentiated cells • the role of cell–cell communication in development 20.3 G E NE TIC A NALY S IS OF E MBRYONIC DE VE LOPME NT IN D RO S O PH I L A To examine these developmental processes, we will use three model systems—the fruit fly Drosophila melanogaster, the flowering plant Arabidopsis thaliana, and the nematode Caenorhabditis elegans We will examine how patterns of differential gene expression lead to the progressive restriction of developmental options resulting in the formation of the adult body plan in Drosophila and Arabidopsis We will then expand the discussion to consider how our knowledge of events in these organisms has contributed to our understanding of developmental defects in humans Finally, we will consider the role of cell–cell communication in the development of adult structures in C elegans 421 Embryo Pupa 1st Instar Larva 20.3 Genetic Analysis of Embryonic Development in Drosophila Reveals How the Body Axis of Animals Is Specified How does a given cell at a precise position in the embryo switch on or switch off specific genes at timed stages of development? This is a central question in developmental biology To answer this question, we will examine the sequence of gene expression in the embryo of Drosophila Although development in a fruit fly appears to have little in common with humans, recall that shared genes drive these steps in both species Overview of Drosophila Development The life cycle of Drosophila is about 10 days long with a number of distinct phases: the embryo, three larval stages, the pupal stage, and the adult stage (Figure 20–1) Internally, the cytoplasm of the unfertilized egg is organized into a series of maternally constructed molecular gradients that play a key role in determining the developmental fates of nuclei located in specific regions of the embryo Immediately after fertilization, the zygote nucleus undergoes a series of divisions without cytokinesis [Figure 20–2(a) and (b)] The resulting cell, which contains multiple nuclei, is called a syncytium At about the tenth nuclear division, the nuclei move to the periphery of the egg, where the cytoplasm contains localized gradients of maternally derived mRNA transcripts and proteins [Figure 20–2(c)] After several more divisions, the nuclei become enclosed in plasma membranes [Figure 20–2(d)], forming a cellular layer on the outside edge of the embryo Interactions between the nuclei and the cytoplasmic components of these cells initiate and direct the pattern of embryonic gene expression 3rd Instar Larva FIGUR E 20–1 2nd Instar Larva Drosophila life cycle Germ cells, which in the adult, are destined to undergo meiosis and produce gametes, form at the posterior pole of the embryo [Figure 20–2(c) and (d)] Nuclei in other regions of the embryo normally form somatic cells If nuclei from these regions are transplanted into the posterior cytoplasm, they will form germ cells and not somatic cells, confirming that the cytoplasm at the posterior pole of the embryo contains maternal components that direct nuclei to form germ cells Transcriptional programs activated by cytoplasmic components in somatic (non–germ-cell) nuclei form the embryo’s anterior–posterior (front to back) and dorsal– ventral (upper to lower) axes of symmetry, leading to the formation of a segmented embryo [Figure 20–2(e)] Under control of the Hox gene set (discussed in a later section), these segments will give rise to the differentiated structures of the adult fly [Figure 20–2(f)] ESSEN T IAL PO IN T In Drosophila, both genetic and molecular studies have confirmed that the egg contains gradients of molecular information, which initiates a transcriptional cascade that specifies the body plan of the larva, pupa, and adult 422 20 D EVELOPMEN TAL GEN ETICS Pole cells (a) (d) Diploid zygote nucleus is produced by fusion of parental gamete nuclei Nuclei become enclosed in membranes, forming a single layer of cells over embryo surface (b) (e) T1 T2 T3 A1 A2 A3 A4 A5 A6 Nine rounds of nuclear divisions produce multinucleated syncytium (c) A8 A7 Embryo (f) T2 T3 A1 A2 T1 Pole cells form at posterior pole (precursors to germ cells) Approximately four further divisions take place at the cell periphery F I G U RE – Early stages of embryonic development in Drosophila (a) Fertilized egg with zygotic nucleus (2n), shortly after fertilization (b) Nuclear divisions occur about every 10 minutes Nine rounds of division produce a multinucleate cell, the syncytial blastoderm (c) At the tenth division, the nuclei migrate to the periphery or cortex of the egg, and four additional rounds of nuclear division occur A small cluster of cells, the pole cells, form at the posterior pole about 2.5 hours after fertilization These cells will Genetic Analysis of Embryogenesis Two different gene sets control embryonic development in Drosophila: maternal-effect genes and zygotic genes (Figure 20–3) Products of maternal gene transcription (mRNA and/or proteins) are placed in the cytopasm of the developing egg Many of these products are distributed in a gradient or concentrated in specific regions of the egg Female flies homozygous for deleterious recessive mutations of maternal-effect genes are sterile None of their embryos receive wild-type maternal gene products, so all of the embryos develop abnormally and die Maternal-effect genes encode transcription factors, receptors, and proteins that regulate gene expression During development, these gene products activate or repress time- and locationspecific programs of gene expression in the embryo Zygotic genes are those transcribed in the embryonic nuclei formed after fertilization These products of the embryonic genome are differentially transcribed in specific regions of the embryo in response to the distribution A3 A4 A5 A6 A7 A8 Adult form the germ cells of the adult (d) About hours after fertilization, the nuclei become enclosed in membranes, forming a single layer of cells over the embryo surface, creating the cellular blastoderm (e) The embryo at about 10 hours after fertilization At this stage, the segmentation pattern of the body is clearly established Behind the segments that will form the head, T1–T3 are thoracic segments, and A1–A8 are abdominal segments (f) The adult fly showing the structures formed from each segment of the embryo of maternal-effect proteins Recessive mutations in these genes can lead to embryonic lethality in homozygotes In a cross between flies heterozygous for a recessive zygotic mutation, one-fourth of the embryos (the recessive homozygotes) fail to develop normally and die Much of our knowledge about the genes that regulate Drosophila development is based on the work of Ed Lewis, Christiane Nüsslein-Volhard, and Eric Wieschaus, who were awarded the 1995 Nobel Prize for Physiology or Medicine Ed Lewis initially identified and studied one of these regulatory selector genes Later, Nüsslein-Volhard and Wieschaus devised a strategy to identify all the genes that control the segmental pattern in Drosophila larvae Their scheme required examining thousands of offspring of mutagenized flies, looking for recessive embryonic lethal mutations that alter external structures These mutations, called segmentation genes, were grouped into three classes: gap, pair-rule, and segment polarity genes In 1980, on the basis of their observations, Nüsslein-Volhard and 20.3 G E NE TIC A NALY S IS OF E MBRYONIC DE VE LOPME NT IN D RO S O PH I L A Maternal-effect genes Anterior group Posterior group Anterior Terminal group 423 Posterior (a) Anterior–posterior gradients formed by maternal-effect genes Segmentation genes Zygotic genes Gap genes (b) Zygotic gap genes divide embryo into broad regions Pair-rule genes Zygotic pair-rule genes divide embryo into stripes about two segments wide The combined action of all pair-rule genes defines segment borders (c) Segment polarity genes Homeotic genes F I G U RE – The hierarchy of genes involved in establishing the segmented body plan in Drosophila Gene products from the maternal genes regulate the expression of the first three groups of zygotic genes (gap, pair-rule, and segment polarity, collectively called the segmentation genes), which in turn control expression of the homeotic genes Wieschaus proposed a model in which embryonic development is initiated by gradients of maternal-effect gene products Then, the positional information laid down by these gradients is interpreted by two sets of zygotic (embryonic) genes: (1) the segmentation genes (gap, pair-rule, and segment polarity genes) identified in their search for mutants, and (2) homeotic selector (Hox) genes Segmentation genes divide the embryo into a series of stripes or segments and define the number, size, and polarity of each segment The homeotic genes specify the developmental fate of cells within each segment as well as the adult structures that will be formed from each segment (Figure 20–3) Their model is shown in Figure 20–4 Most maternaleffect gene products placed in the egg during oogenesis are activated immediately after fertilization and help establish the anterior–posterior axis of the embryo by activating position-specific patterns of gene expression in the embryo’s nuclei [Figure 20–4(a)] Many maternal gene products encode transcription factors that activate gap genes, whose expression divides the embryo into regions corresponding to the head, thorax, and abdomen of the adult [Figure 20–4(b)] The activated gap genes encode other transcription factors that activate pair-rule genes, whose products divide the embryo into smaller regions about two segments wide [Figure 20–4(c)] In turn, expression of the pair-rule genes activates the segment polarity genes, which divide each segment into anterior and posterior regions [Figure 20–4(d)] The collective action of the maternal genes and the segmentation genes define the (d) (e) Zygotic segment polarity genes divide segments into anterior and posterior halves Homeotic selector genes specify the identity of each segment FIGUR E 20–4 (a) Progressive restriction of cell fate during development in Drosophila Gradients of maternal proteins are established along the anterior-posterior axis of the embryo (b), (c), and (d) Three groups of segmentation genes progressively define the body segments (e) Individual segments are given identity by the homeotic genes anterior–posterior axis, which is the field of action for the homeotic (Hox) genes [Figure 20–4(e)] ESSEN T IAL POIN T Maternal-effect gene products activate genes that lay down the anterior–posterior axis of the embryo and specify the location and number of segments, which in turn have their identity determined by homeotic selector genes 20–1 Suppose you initiate a screen for maternal-effect mutations in Drosophila affecting external structures of the embryo and you identify more than 100 mutations that affect these structures From their screenings, other researchers concluded that there are only about 40 maternal-effect genes How you reconcile these different results? H I NT: This problem involves an understanding of how mutants are identified when adult Drosophila have been exposed to mutagens The key to its solution is an understanding of the differences between genes and alleles 424 20 D EVELOPMEN TAL GEN ETICS Yellow stripe contains both Hunchback and Krüppel proteins 20.4 Zygotic Genes Program Segment Formation in Drosophila Hunchback protein To summarize, certain genes in the zygote’s genome are activated or repressed according to a positional gradient of maternal gene products Expression of three sets of segmentation genes divides the embryo into a series of segments along its anterior–posterior axis These segmentation genes are normally transcribed in the developing embryo, and mutations of these genes have embryo-lethal phenotypes More than 20 segmentation genes (Table 20.1) have been identified, and they are classified on the basis of their mutant phenotypes: (1) mutations in gap genes delete a group of adjacent segments, causing gaps in the normal body plan of the embryo, (2) mutations in pair-rule genes affect every other segment and eliminate a specific part of each affected segment, and (3) mutations in segment polarity genes cause defects in portions of each segment In addition to these three sets of genes that determine the anterior–posterior axis of the developing embryo, another set of genes determines the dorsal–ventral axis of the embryo Our discussion will be limited to the gene sets involved in the anterior–posterior axis Let us now examine members of each group in greater detail Gap Genes Transcription of gap genes in the embryo is controlled by maternal gene products laid down in gradients in the egg Gap genes also cross-regulate each other to define the early stage of the body plan Mutant alleles of these genes produce large gaps in the embryo’s segmentation pattern Hunchback mutants lose head and thorax structures, Krüppel mutants lose thoracic and abdominal structures, and knirps mutants lose most abdominal structures Transcription of wild-type TA B L E Segmentation Genes in Drosophila Gap Genes Pair-Rule Genes Krüppel knirps hunchback giant tailless buckebein caudal hairy even-skipped runt fushi-tarazu paired odd-paired odd-skipped sloppy-paired Segment Polarity Genes engrailed wingless cubitis hedgehog fused armadillo patched gooseberry paired naked disheveled Anterior Krüppel protein Posterior FIGUR E 20–5 Expression of gap genes in a Drosophila embryo The hunchback protein is shown in orange, and Krüppel is indicated in green The yellow stripe is created when cells contain both hunchback and Krüppel proteins Each dot in the embryo is a nucleus gap genes (which encode transcription factors) divides the embryo into a series of broad regions that become the head, thorax, and abdomen Within these regions, different combinations of gene activity eventually specify both the type of segment that forms and the proper order of segments in the body of the larva, pupa, and adult Expression domains of the gap genes in different parts of the embryo correlate roughly with the location of their mutant phenotypes: hunchback at the anterior, Krüppel in the middle (Figure 20–5), and knirps at the posterior As mentioned earlier, gap genes encode transcription factors that bind to enhancer regions that control the expression of pair-rule genes Pair-Rule Genes Pair-rule genes are expressed in a series of seven narrow bands or stripes of nuclei extending around the circumference of the embryo The expression of this gene set does two things: first it establishes the boundaries of segments, and then it programs the developmental fate of the cells within each segment by controlling expression of the segment polarity genes Mutations in pair-rule genes eliminate segment-size sections of the embryo at every other segment At least eight pair-rule genes act to divide the embryo into a series of stripes The transcription of the pair-rule genes is mediated by the action of maternal gene products and gap gene products Initially, the boundaries of these stripes overlap, so that in each area of overlap, cells express a different combination of pairrule genes (Figure 20–6) The resolution of boundaries in this segmentation pattern results from the interaction among the gene products of the pair-rule genes themselves (Figure 20–7) Segment Polarity Genes Expression of segment polarity genes is controlled by transcription factors encoded by pair-rule genes Within 20.4 (a) Z YG OTIC G E NE S PROG RA M S E G ME NT F ORMATION IN D RO S O PH I L A 425 Overlap 1 No transcription 2 Transcription No transcription (a) (b) Area of mRNA transcription F I G U RE – New patterns of gene expression can be generated by overlapping regions containing two different gene products (a) Transcription factors and are present in an overlapping region of expression If both transcription factors must bind to the promoter of a target gene to trigger expression, the gene will be active only in cells containing both factors (most likely in the zone of overlap) (b) The expression of the target gene in the restricted region of the embryo each of the segments created by pair-rule genes, segment polarity genes become active in a single band of cells that extends around the embryo’s circumference (Figure 20–8) This divides the embryo into 14 segments The products of the segment polarity genes control the cellular identity within each of them and establish the anterior–posterior pattern (the polarity) within each segment Segmentation Genes in Mice and Humans We have seen that segment formation in Drosophila depends on the action of three sets of segmentation genes Are these genes found in humans and other mammals, and they control aspects of embryonic development in these organisms? To answer this question, let’s examine runt, one of the pair-rule genes in Drosophila Later in development, it controls aspects of sex determination and formation of the nervous system The gene encodes a protein that regulates transcription of its target genes Runt contains a 128-amino-acid DNA-binding region (called the runt domain) that is highly conserved in Drosophila, mouse, and human proteins In fact, in vitro experiments show that the Drosophila and mouse runt proteins are functionally interchangeable In mice, runt is expressed early in development and controls formation of blood cells, bone, and the genital system Although the target gene sets controlled by runt are different in Drosophila and the mouse, in both organisms, (b) FIGUR E 20–7 Stripe pattern of pair-rule gene expression in a Drosophila embryo This embryo is stained to show patterns of expression of the genes even-skipped and fushi-tarazu; (a) lowpower view and (b) high-power view of the same embryo expression of runt specifies the fate of uncommitted cells in the embryo by regulating transcription of target genes In humans, mutations in RUNX2, a human homolog of runt, causes cleidocranial dysplasia (CCD), an autosomal dominantly inherited trait Those affected with CCD have a hole in the top of their skull because bone does not form in the membranous gap known as the fontanel Their collar bones (clavicles) not develop, enabling affected individuals to fold their shoulders across their chest Mice with FIGUR E 20–8 The 14 stripes of expression of the segment polarity gene engrailed in a Drosophila embryo 426 20 D EVELOPMEN TAL GEN ETICS (a) (b) F I G U RE – (a) The skeletal system of a normal mouse Bone is stained red and cartilage is stained blue (b) The skeletal system of a mouse carrying a mutation of the Cbfa1 gene Most of the skeleton contains only cartilage and not bone Expression of the normal allele of Cbfa1 is required for bone formation Hox Genes in Drosophila The Drosophila genome contains two clusters of homeotic selector genes (called Hox genes) on chromosome that encode transcription factors (Table 20.2) The Antennapedia (ANT-C) cluster contains five genes that specify structures in the head and first two segments of the thorax [Figure 20–11(a)] The second cluster, the bithorax (BX-C) complex, contains three genes that specify structures in the second and third segments of the thorax, and the abdominal segments [Figure 20–11(b)] Hox genes (listed in Table 20.2) from a wide range of species have two properties in common First, each contains a highly conserved 180-bp nucleotide sequence known as a homeobox (Hox is a contraction of homeobox.) The homeobox encodes a DNA-binding region of 60 amino acids known one mutant copy of the runt homolog have a phenotype similar to that seen in humans; mice with two mutant copies of the gene have no bones at all Their skeletons contain only cartilage (Figure 20–9), much like sharks, emphasizing the role of runt as an important gene controlling the initiation of bone formation in both mice and humans The sequence similarity of the runt domain in Drosophila, mice, and humans and the ability of the mouse runt gene to replace the Drosophila version in fly development all indicate that the same segmentation genes are found in organisms separated from a common ancestor by millions of years 20.5 Homeotic Selector Genes Specify Body Parts of the Adult As segment boundaries are established by expression of segmentation genes, the homeotic (from the Greek word for “same”) genes are activated Expression of homeotic selector genes determines which adult structures will be formed by each body segment In Drosophila, this includes the antennae, mouth parts, legs, wings, thorax, and abdomen Mutant alleles of these genes are called homeotic mutations because one segment is transformed so that it forms the same structure as another segment For example, the wild-type allele of the homeotic selector gene Antennapedia (Antp) specifies formation of a leg on the second segment of the thorax Dominant gain-of-function Antp mutations cause this gene to be expressed in the head and the thorax The result is that mutant flies have a leg on their head instead of an antenna (Figure 20–10) (a) (b) FIGUR E 20–10 Antennapedia (Antp) mutation in Drosophila (a) Head from wild-type Drosophila, showing the antenna and other head parts (b) Head from an Antp mutant, showing the replacement of normal antenna structures with legs This is caused by activation of the Antp gene in the head region 20.5 TA B L E H omeotic S elector G enes S pecify Body Parts of the A du lt (a) Expression domains of homeotic genes Hox Genes of Drosophila Antennapedia Complex Bithorax Complex labial Antennapedia Sex combs reduced Deformed proboscipedia Ultrabithorax abdominal A Abdominal B Head as a homeodomain Second, in most species, expression of Hox genes is colinear with the anterior to posterior organization of the body In other words, genes at the beginning of the cluster (the 3′-end) are expressed at the anterior end of the (a) T2 T3 A1 A2 A3 T1 lab pb Dfd Scr A4 A5 A6 A7 A8 Antp (b) BX-C Abd-B abd-A T2 T3 T1 A1 A2 A3 A4 A5 A6 Thorax Abdomen (b) Chromosomal locations of homeotic genes 3‘ lab pb Dfd Scr Antp Ubx abd-A Abd-B 5‘ FIGURE 20–12 The colinear relationship between the spatial pattern of expression and chromosomal locations of homeotic genes in Drosophila (a) Drosophila embryo and the domains of homeotic gene expression in the embryonic epidermis and central nervous system (b) Chromosomal location of homeotic selector genes Note that the order of genes on the chromosome correlates with the sequential anterior borders of their expression domains ANT-C Ubx 427 A7 A8 F I G U RE – 1 Genes of the Antennapedia complex and the adult structures they specify (a) In the ANT-C complex, the labial (lab) and Deformed (Dfd) genes control the formation of head segments The Sex combs reduced (Scr) and Antennapedia (Antp) genes specify the identity of the first two thoracic segments, T1 and T2 The remaining gene in the complex, proboscipedia (pb), may not act during embryogenesis but may be required to maintain the differentiated state in adults In mutants, the labial palps are transformed into legs (b) In the BX-C complex, Ultrabithorax (Ubx) controls formation of structures in the posterior compartment of T2 and structures in T3 The two other genes, abdominal A (abdA) and Abdominal B (AbdB), specify the segmental identities of the eight abdominal segments (A1–A8) embryo, those in the middle are expressed in the middle of the embryo, and genes at the end of a cluster (the 5′-end) are expressed at the embryo’s posterior region (Figure 20–12) Although first identified in Drosophila, Hox genes are found in the genomes of most eukaryotes with segmented body plans, including nematodes, sea urchins, zebrafish, frogs, mice, and humans (Figure 20–13) To summarize, genes that control development in Drosophila act in a temporally and spatially ordered cascade, beginning with the genes that establish the anterior– posterior (and dorsal–ventral) axis of the early embryo Gradients of maternal mRNAs and proteins along the anterior–posterior axis activate gap genes, which subdivide the embryo into broad bands Gap genes in turn activate pairrule genes, which divide the embryo into segments The final group of segmentation genes, the segment polarity genes, divides each segment into anterior and posterior regions arranged linearly along the anterior–posterior axis The segments are then given identity by action of the Hox genes Therefore, this progressive restriction of developmental potential of the Drosophila embryo’s cells (all of which occurs during the first third of embryogenesis) involves a cascade of gene action, with regulatory proteins acting on transcription, translation, and signal transduction Hox Genes and Human Genetic Disorders Hox genes are found in the genomes of all animals where they play a fundamental role in shaping the body and its appendages In vertebrates, the conservation of sequence, the order of genes in the Hox clusters, and their pattern of expression in vertebrates suggest that, as in Drosophila, 20 428 D EVELOPMEN TAL GEN ETICS Anterior lab FIGUR E 20–13 Conservation of organization and patterns of expression in Hox genes (Top) The structures formed in adult Drosophila are shown, with the colors corresponding to members of the Hox cluster that control their formation (Bottom) The arrangement of the Hox genes in an early human embyro As in Drosophila, genes at the 3′-end of the cluster form anterior structures, and genes at the 5′-end of the cluster form posterior structures Posterior bcd pb zen Dfd Scr ftz Antp Ubx abd-A Abd-B Drosophila HOM-C Ancestral Urbilaterian HOM-C Hox1 Hox2 Hox3 Hox4 Hox5 Hox6 (central) A1 A2 A3 A4 A5 A6 A7 B1 B2 B3 B4 B5 B6 B7 C4 C5 C6 Hox7 (posterior) A9 A10 A11 A13 Human HoxA B8 B9 C8 C9 C10 C11 C12 C13 D8 D9 D10 D11 D12 D13 10 11 12 13 B13 HoxB HoxC D1 D2 D4 HoxD Homology group Transcription 3‘ Anterior 20–2 In Drosophila, both fushi tarazu (ftz) and engrailed (eng) genes encode homeobox transcription factors and are capable of eliciting the expression of other genes Both genes work at about the same time during development and in the same region to specify cell fate in body segments To discover if ftz regulates the expression of engrailed; if engrailed regulates ftz; or if both are regulated by another gene, you perform a mutant analysis In ftz embryos (ftz/ ftz) engrailed protein is absent; in engrailed embryos (eng/ eng) ftz expression is normal What does this tell you about the regulation of these two genes—does the engrailed gene regulate ftz, or does the ftz gene regulate engrailed? HINT: This problem involves an understanding of how genes are regulated at different stages of preadult development in Drosophila The key to its solution lies in using the results of the mutant analysis to determine the timing of expression of the two genes being examined 5‘ Posterior these genes control development along the anterior–posterior and the formation of appendages However, in vertebrates, there are four clusters of Hox genes: HOXA, HOXB, HOXC, and HOXD instead of a single cluster as in Drosophila This means that in vertebrates, not just one, but a combination of to Hox genes, is involved in forming specific structures As a result, in vertebrates, mutations in individual Hox genes not produce complete transformation as in Drosophila, where mutation of a single Hox gene can transform a haltere into a wing (see the photo at the beginning of this chapter) The role of HOXD genes in human development was confirmed by the discovery that several inherited limb malformations are caused by mutations in HOXD genes For example, mutations in HOXD13 cause synpolydactyly (SPD), a malformation characterized by extra fingers and toes, and abnormalities in bones of the hands and feet (Figure 20–14) ESSEN T IAL PO IN T Once the boundaries of body segments have been established by segmentation genes, the homeotic selector genes act to specify which body structures will be formed by each segment 20.6 B inary S w itch G enes and R eg u latory Path ways P rogram Organ F ormation 429 encode transcription factors, are defined by their ability to initiate complete development of an organ or a tissue type and are part of gene-regulatory networks (GRNs) and subnetworks that program transcription of gene sets at specific times and specific stages of tissue and organ formation Here we will describe how a binary switch gene controls the formation of the eye in Drosophila and how its regulatory network is used in all organisms with eyes The Control of Eye Formation F I G U RE – Mutations in posterior Hox genes (HOXD13 in this case) in humans result in malformations of the limbs, shown here as extra toes This condition is known as synpolydactyly Mutations in HOXD13 are also associated with abnormalities of the bones in the hands and feet 20.6 Binary Switch Genes and Regulatory Pathways Program Organ Formation The Hox genes that determine which adult structures will be formed by each body segment in Drosophila act as switches, selecting alternative developmental pathways for cells to follow Each pathway decision point is usually binary—that is, there are two alternative developmental fates for a cell at a given time—and these binary switch genes program a cell to follow only one of these pathways These genes, which (a) (b) Drosophila adults have compound eyes [Figure 20–15(a)] Action of the wild-type allele of the binary switch gene eyeless programs cells to follow the developmental pathway for eye formation instead of the pathway for antenna formation In flies homozygous for recessive mutant alleles of the eyeless gene, the eyes are reduced in size and have irregular facets [Figure 20–15(b)] In developmental pathways that normally specify the formation of other organs such as legs, wings, and antennae, abnormal expression of eyeless may result in eye formation in other parts of the body [Figure 20–15(c)] This indicates that switching on the eyeless gene at the wrong time or in the wrong cells can override normal programs of determination and differentiation, causing cells to follow the developmental program for eye formation instead of their normal pathway Gene Networks in Eye Formation The wild-type allele of the eyeless gene is part of a GRN that is the master regulator of eye formation Five cross-regulating genes that encode transcription factors are the core of this network However, much less is known about events in downstream regulation and the number of target genes that control eye development However, recent work combining (c) F I G U RE – (a)The normal compound eye of adult Drosophila (b) In flies homozygous for the eyeless (ey) mutation, eye development is abnormal, and adults have reduced or absent eyes The ey gene is a binary switch gene in a gene-regulatory network that controls eye formation in all animals (c) An eye formed on the leg of a fly This abnormal location is the result of eyeless expression in cells normally destined to form leg structures 430 20 D EVELOPMEN TAL GEN ETICS genomics and classic reverse genetics has revealed that the eyeless GRN is large and complex, and contains 241 genes encoding transcription factors and has more than 5,600 target genes Eye formation in Drosophila (and in other animals) is obviously an extremely complex event In spite of its size and complexity, this GRN has been highly conserved during evolution, and its core components are used by all animals, including humans, to make eyes The discovery that the eyeless gene and its vertebrate equivalent, Pax6, have a high degree of DNA sequence homology and are expressed during eye development forced reevaluation of the long-held belief that the compound eyes of insects and the single-lens eyes of vertebrates evolved independently In addition, copies of the mouse Pax6 gene, when transferred to Drosophila as transgenes, control eye formation in these flies, demonstrating that the eyes of Drosophila and the mouse are homologous structures The downstream targets of these binary switch genes are also conserved, indicating that steps in the genetic control of eye development are shared between species that diverged over half a billion years ago from a common ancestor This evolutionary conservation makes it possible to use genetic analysis in Drosophila to study the development of eyes and to explore the molecular basis for inherited eye defects in humans 20.7 Plants Have Evolved Developmental Regulatory Systems That Parallel Those of Animals Plants and animals diverged from a common ancestor about 1.6 billion years ago, after the origin of eukaryotes and probably before the rise of multicellular organisms Genetic analysis of mutants and genome sequencing in plants and animals indicate that basic mechanisms of pattern formation evolved independently in animals and plants We have already examined the genetic systems that control development and pattern formation in animals, using Drosophila as a model organism In plants, pattern formation has been extensively studied using flower development in Arabidopsis thaliana (Figure 20–16), a small plant in the mustard family, as a model organism A cluster of undifferentiated cells, called the floral meristem, gives rise to flowers (Figure 20–17) Each flower consists of four organs—sepals, petals, stamens, and carpels—that develop from concentric rings of cells within the meristem [Figure 20–18(a)] Each organ develops from a different concentric ring, or whorl of cells FIGUR E 20.16 The flowering plant Arabidopsis thaliana, used as a model organism in plant genetics A genes specify sepals; class A and class B genes acting together specify petals Expression of both class B and class C genes controls stamen formation Class C genes acting alone specify carpels During flower development [Figure 20–18(b)], class A genes are active in whorls and (sepals and petals), class B genes are expressed in whorls and (petals and stamens), and class C genes are expressed in whorls and (stamens and carpels) Which organs are formed depends on the expression pattern of the three gene classes In whorl 1, expression of class A genes alone causes sepals to form Expression of class A and class B genes in whorl leads to petal formation Expression of class B and class C genes in whorl leads to stamen formation In whorl 4, expression of class C genes alone causes carpel formation As in Drosophila, mutations in homeotic genes cause organs to form in abnormal locations For example, in APETALA2 mutants (ap2), the order of organs is carpel, stamen, stamen, and carpel instead of the normal order, sepal, petal, stamen, and carpel [Figure 20–19(a) and (b)] In class B loss-of-function mutants (ap3, pi), petals become sepals, and stamens are transformed into carpels [Figure 20–19(c)], and the order of organs becomes sepal, TA BLE 20.3 Class A Class B Homeotic Genes in Arabidopsis Three classes of floral homeotic genes control the development of these organs (Table 20.3) Acting alone, class Class C Homeotic Selector Genes in Arabidopsis* APETALA1 (AP1) APETALA2 (AP2) APETALA3 (AP3) PISTILLATA (P1) AGAMOUS (AG) *By convention, wild-type genes in Arabidopsis use capital letters 20.7 P lants Have Evolved Developmental R egulatory S ystems That Parallel Those of A nimals (a) (b) sepal, carpel, carpel Plants carrying a mutation for the Class gene AGAMOUS will have petals in whorl (instead of stamens) and sepals in whorl (instead of carpels), and the order of organs will be sepal, petal, petal, and sepal [Figure 20–19(d)] Evolutionary Divergence in Homeotic Genes Drosophila and Arabidopsis use different sets of master regulatory genes to establish the body axis and specify the identity of structures along the axis In Drosophila, this task is accomplished in part by the Hox genes, which encode a set of transcription factors sharing a homeobox domain In Arabidopsis, the floral homeotic genes belong to a different family of transcription factors, called the MADS-box proteins, characterized by a shared domain of 58 amino acids with no similarity in amino acid sequence or protein structure with the Hox genes Both gene sets encode transcription factors, both sets are master regulators of development expressed in a pattern of overlapping domains, and both specify identity of structures (a) 431 FIGUR E 20–17 (a) Parts of the Arabidopsis flower The floral organs are arranged concentrically The sepals form the outermost ring, followed by petals and stamens, with carpels on the inside (b) View of the flower from above Reflecting their evolutionary origin from a common ancestor, the genomes of Drosophila and Arabidopsis both contain members of the homeobox and MADS-box genes, but these genes have been adapted for different uses in the plant and animal kingdoms This indicates that developmental mechanisms evolved independently in each group In both plants and animals, the action of transcription factors depends on changes in chromatin structure that make genes available for expression Mechanisms of transcription initiation are conserved in plants and animals, as is reflected in the homology of genes in Drosophila and Arabidopsis that maintain patterns of expression initiated by regulatory gene sets Action of the floral homeotic genes is controlled by a gene called CURLY LEAF This gene shares significant homology with members of the Polycomb gene family in Drosophila This family of regulatory genes controls expression of homeobox genes during development Both CURLY LEAF and Polycomb encode proteins that alter chromatin conformation and shut off gene expression Thus, although different genes are used (b) A genes Sepal B genes Petal C genes Stamen Carpel FIGUR E 20–18 Cell arrangement in the floral meristem (a) The four concentric rings, or whorls, labeled 1–4, give rise to (b) arrangement of the sepals, petals, stamens, and carpels, respectively, in the mature flower 432 20 D EVELOPMEN TAL GEN ETICS (a) (b) (c) (d) F I G U RE – (a) Wild-type flowers of Arabidopsis have (from outside to inside) sepals, petals, stamens, and carpels (b) A homeotic APETALA2 mutant flower has carpels, stamens, stamens, and carpels (c) PISTILLATA mutants have sepals, sepals, carpels, and carpels (d) AGAMOUS mutants have petals and sepals at places where stamens and carpels should form to control development, both plants and animals use an evolutionarily conserved mechanism to regulate expression of these gene sets ES SEN T I A L PO I N T Flower formation in Arabidopsis is controlled by homeotic genes, but these gene sets are from a different gene family than the homeotic selector genes of Drosophila and other animals 20.8 C elegans Serves as a Model for Cell–Cell Interactions in Development During development in multicellular organisms, cell–cell interactions influence the transcriptional programs and developmental fate of the interacting cells and surrounding cells Cell–cell interaction is an important process in the embryonic development of most eukaryotic organisms, including Drosophila, mice, and humans Signaling Pathways in Development In early development, animals use a number of signaling pathways to regulate development; after organ formation begins, additional pathways are added to those already in use These newly activated pathways act both independently and in coordinated networks to generate specific transcriptional patterns Signal networks establish anterior–posterior polarity and body axes, coordinate pattern formation, and direct the differentiation of tissues and organs The signaling pathways used in early development and some of the developmental processes they control are listed in Table 20.4 After an introduction to the components and interactions of one of these systems—the Notch signaling pathway—we will briefly examine its role in the development of the vulva in the nematode, Caenorhabditis elegans The Notch Signaling Pathway The genes in the Notch pathway are named after the Drosophila mutants that were used to identify components of this signal transduction system (Notch mutants have an indentation or notch in their wings) Notch signaling works through direct cell–cell contact to control the developmental fate of interacting cells The Notch gene (and the equivalent gene in other organisms) encodes a receptor protein embedded in the plasma membrane (Figure 20–20) The signal is another membrane protein encoded by the Delta gene (and its equivalents) Because both the signal and receptor are membrane proteins, the Notch signal system works only when adjacent cells come into physical contact When the Delta protein from one cell binds to the Notch TA BLE 20.4 Signaling Pathways Used in Early Embryonic Development Wnt Pathway Dorsalization of body Female reproductive development Dorsal–ventral differences TGF-B Pathway Mesoderm induction Left–right asymmetry Bone development Hedgehog Pathway Notochord induction Somitogenesis Gut/visceral mesoderm Receptor Tyrosine Kinase Pathway Mesoderm maintenance Notch Signaling Pathway Blood cell development Neurogenesis Retina development *Source: Taken from Gerhart, J 1999 1998 Warkany lecture: Signaling pathways in development Teratology 60: 226–239 20.8 C E LE GA NS S E RVE S A S A MODE L F OR CE LL–CE LL INTE RAC TIONS IN DEVELO P M EN T Nucleus Transcription start site Su(H) Delta protein Su(H) Altered pattern of gene expression NICD NICD FIGUR E 20–20 Components of the Notch signaling pathway in Drosophila The cell carrying the Delta transmembrane protein is the sending cell; the cell carrying the transmembrane Notch protein receives the signal Binding of Delta to Notch triggers a proteolytic-mediated activation of transcription The fragment cleaved from the cytoplasmic side of the Notch protein, called the Notch intracellular domain (NICD), combines with the Su(H) protein and moves to the nucleus where it activates a program of gene transcription Proteolytic cleavage Notch receptor receptor protein on a neighboring cell, the cytoplasmic tail of the Notch protein is cleaved off and binds to a cytoplasmic protein encoded by the Su(H) (suppressor of Hairless) gene This protein complex moves into the nucleus and binds to transcriptional cofactors, activating transcription of a gene set that controls a specific developmental pathway (Figure 20–20) One of the main roles of the Notch signal system is to specify different developmental fates for equivalent cells in a population In its simplest form, this interaction involves two neighboring cells that are developmentally equivalent but become specified to form different adult structures We will explore the role of the Notch signaling system in development of the vulva in C elegans, after a brief introduction to nematode embryogenesis Overview of C elegans Development The nematode C elegans is widely used as a model organism to study the genetic control of development There are several advantages in using this organism: (1) its genetics are well known, (2) its genome has been sequenced, and (3) adults contain a small number of cells that follow a highly deterministic developmental program Adult nematodes are about mm long and develop from a fertilized egg in about two days (Figure 20–21) The life cycle includes an embryonic stage (about 16 hours), four larval stages (L1 through L4), and the adult stage Adults are of two sexes: XX self-fertilizing hermaphrodites that can make both eggs and sperm, and XO males Self-fertilization of mutagentreated hermaphrodites is used to develop homozygous Zygote (a) AB P1 P2 EMS MS Hypodermis Neurons Pharyngeal muscles Body muscles Body muscles Glands Pharyngeal muscles Neurons Glands (b) 433 Cuticle Gonad Egg Intestine Vulva E C P3 D Gut Hypodermis Body muscles Two neurons Body muscles P4 Z2 Z3 Germ cells Nervous system Pharynx Sperm F I G U RE – (a) A truncated cell lineage chart for C elegans, showing early divisions and the tissues and organs formed from these lineages Each vertical line represents a cell division, and horizontal lines connect the two cells produced For example, the first division of the zygote creates two new cells, AB and P1 During embryogenesis, cell divisions will produce the 959 somatic cells of the adult hermaphrodite worm (b) An adult C elegans hermaphrodite This nematode, about mm in length, consists of 959 cells and is widely used as a model organism to study the genetic control of development 434 20 D EVELOPMEN TAL GEN ETICS stocks of mutant strains, and hundreds of such mutants have been generated, cataloged, and mapped Adult hermaphrodites have 959 somatic cells (and about 2000 germ cells) The lineage of each cell, from fertilized egg to adult, has been mapped (Figure 20–21) and is invariant from individual to individual Knowing the lineage of each cell, we can easily follow altered cell fates generated by mutations or by killing specific cells with laser microbeams or ultraviolet irradiation In hermaphrodites, the developmental fate of cells in the reproductive system is determined by cell–cell interaction, illustrating how gene expression and cell–cell interaction work together to (a) Signal Receptor Z1.ppp Z4.aaa During L2, both cells begin secreting signal for uterine differentiation (b) specify developmental outcomes 20–3 The identification and characterization of genes that control sex determination have been another focus of investigators working with C elegans As with Drosophila, sex in this organism is determined by the ratio of X chromosomes to sets of autosomes A diploid wild-type male has one X chromosome, and a diploid wild-type hermaphrodite has two X chromosomes Many different mutations have been identified that affect sex determination Loss-of-function mutations in a gene called her-1 cause an XO nematode to develop into a hermaphrodite and have no effect on XX development (That is, XX nematodes are normal hermaphrodites.) In contrast, loss-of-function mutations in a gene called tra-1 cause an XX nematode to develop into a male Deduce the roles of these genes in wild-type sex determination from this information HINT: This problem involves an understanding of the mecha- nism of sex determination by the ratio of X chromosomes to sets of autosomes The key to its solution is an understanding of the effect of loss-of-function mutations on expression of other genes or the action of other proteins Genetic Analysis of Vulva Formation Adult C elegans hermaphrodites lay eggs through the vulva, an opening near the middle of the body (Figure 20–21) The vulva is formed in stages during larval development and involves several rounds of cell–cell interactions In C elegans, interaction between two neighboring cells, Z1.ppp and Z4.aaa, determines which will become the gonadal anchor cell (from which the vulva forms) and which will become a precursor to the uterus (Figure 20–22) The determination of which cell becomes which occurs during the second larval stage (L2) and is controlled by the Notch receptor gene, lin-12 In recessive lin-12(0) mutants (a loss-of-function mutant), no functional receptor protein is present, and both cells become anchor cells The Z1.ppp Z4.aaa By chance, Z1.ppp secretes more signal In response to signal, Z4.aaa increases production of LIN-12 receptor protein, triggering determination as uterine precursor cell Becomes anchor cell Becomes ventral uterine precursor cell FIGUR E 20–22 Cell–cell interaction in anchor cell determination (a) During L2, two neighboring cells begin the secretion of chemical signals for the induction of uterine differentiation (b) By chance, cell Z1.ppp produces more of these signals, causing cell Z4.aaa to increase production of the receptor for signals The action of increased signals causes Z4.aaa to become the ventral uterine precursor cell and allows Z1.ppp to become the anchor cell dominant mutation lin-12(d) (a gain-of-function mutation) causes both to become uterine precursors Based on the phenotypes of these two mutant alleles, we can conclude that normally, expression of lin-12 directs selection of the uterine pathway because in the absence of the LIN-12 (Notch) receptor, both cells become anchor cells However, the situation is more complex than it first appears Initially, the two neighboring cells are developmentally equivalent Each synthesizes low levels of the Notch signal protein (encoded by the lag-2 gene) and the Notch receptor protein (encoded by the lin-12 gene) By chance, one cell ends up secreting more of the signal (LAG-2 or Delta protein) than the other cell This causes the neighboring cell to increase production of the receptor (LIN-12 protein) The cell producing more of the receptor protein becomes the uterine precursor, and the cell, producing more signal protein, becomes the anchor cell The critical factor in this first round of cell–cell interaction is the balance between the LAG-2 (Delta) signal gene product and the LIN-12 (Notch) receptor gene product G E NE TIC S , TE CHNOLOG Y, AN D S O CI ET Y Once the gonadal anchor cell has been determined, a second round of cell–cell interaction leads to formation of the vulva This interaction involves the anchor cell (located in the gonad) and six neighboring cells (called precursor cells) located in the skin The precursor cells, named P3.p to P8.p, are called Pn.p cells The developmental fate of each Pn.p cell is specified by its position relative to the anchor cell During vulval development, the LIN-3 signal protein is synthesized by the anchor cell; this signal is received and processed by three adjacent Pn.p precursor cells (Pn.p 5–7) The cell closest to the anchor cell (usually Pn.p 6) becomes 435 the primary vulval precursor cell, and the adjacent cells (Pn.p and 7) become secondary precursor cells A signal protein from the primary vulval precursor cell activates the lin-12 receptor gene in the secondary cells, preventing them from becoming primary precursor cells The other precursor cells (Pn.p 3, 4, and 8) receive no signal from the anchor cell and become skin cells ESSEN T IAL PO IN T In C elegans, the well-studied pathway of cell lineage during embryonic development allows developmental biologists to study the cell– cell signaling required for organogenesis G E N E T I C S , T E C H N O L O G Y, A N D S O C I E T Y S tem cell research may be the most controversial research area since the beginning of recombinant DNA technology in the 1970s Although stem cell research is the focus of presidential proclamations, media campaigns, and ethical debates, few people understand it sufficiently to evaluate its pros and cons Stem cells are primitive cells that replicate indefinitely and have the capacity to differentiate into cells with specialized functions, such as the cells of heart, brain, liver, and muscle tissue Some types of stem cells are defined as totipotent, meaning that they have the ability to differentiate into any mature cell type in the body, as well as tissues associated with the developing embryo, such as placenta Other types of stem cells are pluripotent, meaning that they are able to differentiate into any of a smaller number of mature cell types In contrast, mature, fully differentiated cells not replicate or undergo transformations into different cell types In the last few years, several research teams have isolated and cultured human pluripotent stem cells When treated with growth factors or hormones, these pluripotent stem cells differentiate into cells that have the characteristics of neural, bone, kidney, liver, heart, or pancreatic cells The fact that pluripotent stem cells grow prolifically in culture and Stem Cell Wars differentiate into more specialized cells has created great excitement Some foresee a day when stem cells may be a cornucopia from which to harvest unlimited numbers of specialized cells to replace cells in damaged and diseased tissues Hence, stem cells could be used to treat Parkinson disease, type diabetes, chronic heart disease, Alzheimer disease, genetic defects, and even cancers The excitement about stem cell therapies has been fueled by reports of dramatically successful experiments in animals For example, mice with spinal cord injuries regained their mobility and bowel and bladder control after they were injected with human stem cells Given the potential for such beneficial treatments, why should stem cell research be so contentious? The answer to that question lies in the source of the pluripotent stem cells Until recently, all pluripotent stem cell lines were derived from five-day-old embryonic blastocysts Blastocysts at this stage consist of 50–150 cells, most of which will develop into placental and supporting tissues for the early embryo The inner cell mass of the blastocyst consists of about 30 to 40 pluripotent stem cells that can develop into all the embryo’s tissues In vitro fertilization clinics grow fertilized eggs to the five-day blastocyst stage prior to uterine transfer Pluripotent embryonic stem cell (ESC) lines are created by taking the inner cell mass out of five-day blastocysts and growing the cells in culture dishes The fact that early embryos are destroyed in the process of establishing human ESC lines disturbs people who believe that preimplantation embryos are persons with rights; however, it does not disturb people who believe that these embryos are too primitive to have the status of a human being Both sides in the debate invoke fundamental questions of what constitutes a human being Recently, scientists have developed several types of pluripotent stem cells without using embryos One of the most promising types—known as induced pluripotent stem (iPS) cells—uses adult somatic cells as the source of pluripotent stem cell lines To prepare iPS cells, scientists isolate somatic cells (such as cells from skin) and infect them with engineered retroviruses that integrate into the cells’ DNA These retroviruses contain several cloned human genes that encode products responsible for converting the somatic cells into immortal, pluripotent stem cells The development of iPS cell lines has the potential to bypass the ethical problems associated with the use of human embryos In addition, they may become sources of patient-specific pluripotent stem cell lines that can be used for transplantation, without immune system rejection (continued) 20 436 D EVELOPMEN TAL GEN ETICS Genetics, Technology, and Society, continued Your Turn T ake time, individually or in groups, to answer the following questions Investigate the references and links to help you understand the technologies and controversies surrounding stem cell research In 2013, a research group reported that they had created a human blastocyst after transferring a somatic cell nucleus into an enucleated human egg and activating the egg They used the blastocyst to isolate ESCs that were a genetic match to the nucleus donor How did these researchers this, and why did CASE STUDY they use this method rather than derive iPS cells from the donor? Read about this study in Tachibana, M., et al 2013 Human embryonic stem cells derived by somatic cell nuclear transfer Cell 153: 1–11 The technologies that give rise to iPS cells and ESCs using nuclear transfer are based on the work of Drs John Gurdon and Shinya Yamanaka In 2012, these scientists were awarded the Nobel Prize in Physiology or Medicine for their work in reprogramming adult cells to stem cells Describe their key discoveries on which stem cell therapies are based A summary of their work can be found on the Nobel Prize Web site: http://www nobelprize.org/nobel_prizes/medicine/ laureates/2012/advanced.html Although several stem cell therapies are in clinical trials, some unregulated clinics currently offer these therapies to patients Which clinical trials are presently being offered, and what is the status of unregulated stem cell therapy? Begin your search with information from the International Society for Stem Cell Research: http://www.closerlookatstemcells.org A case of short thumbs and toes A doctor received a female patient with unusually short thumbs and great toes, small feet, short fifth fingers with clinodactyly (bending towards fourth fingers), and a duplication of the genital tract that led to urinary problems He diagnosed her with hand-foot-genital syndrome (HFGS) The sequencing of her DNA revealed a polyalanine expansion in exon of HOXA13, the most 5¿ gene in the HOXA cluster on chromosome 7p15.2 The same mutation was found in her father, who had short thumbs and great toes but no genital abnormalities HFGS is inherited in an autosomal dominant fashion, and is caused by the haploinsufficiency of HOXA13 transcription factor 1 How you think a haploinsufficiency of HOXA13 transcription factor during embryonic development would lead to the phenotypes observed in the patient with HFGS? What is the equivalent gene of HOXA13 in Drosophila? What does it do? In humans, what other homeobox genes have spatial expression patterns similar to those of HOXA13? INSIGHTS AND SOLUTIONS In the slime mold Dictyostelium, experimental evidence suggests that cyclic AMP (cAMP) plays a central role in the developmental program leading to spore formation The genes encoding the cAMP cell-surface receptor have been cloned, and the amino acid sequence of the protein components is known To form reproductive structures, free-living individual cells aggregate together and then differentiate into one of two cell types, prespore cells or prestalk cells Aggregating cells secrete waves or oscillations of cAMP to foster the aggregation of cells and then continuously secrete cAMP to activate genes in the aggregated cells at later stages of development It has been proposed that cAMP controls cell–cell interaction and gene expression It is important to test this hypothesis by using several experimental techniques What different approaches can you devise to test this hypothesis, and what specific experimental systems would you employ to test them? Solution: Two of the most powerful forms of analysis in biology involve the use of biochemical analogs (or inhibitors) to block gene transcription or the action of gene products in a predictable way, and the use of mutations to alter genes and their products These two approaches can be used to study the role of cAMP in the developmental program of Dictyostelium First, compounds chemically related to cAMP, such as GTP and GDP, can be used to test whether they have any effect on the processes controlled by cAMP In fact, both GTP and GDP lower the affinity of cell-surface receptors for cAMP, effectively blocking the action of cAMP Mutational analysis can be used to dissect components of the cAMP receptor system One approach is to use transformation with wild-type genes to restore mutant function Similarly, because the genes for the receptor proteins have been cloned, it is possible to construct mutants with known alterations in the component proteins and transform them into cells to assess their effects In the sea urchin, early development may occur even in the presence of actinomycin D, which inhibits RNA synthesis However, if actinomycin D is present early in development but is removed a few hours later, all development stops In fact, if actinomycin D is present only between the sixth and eleventh hours of development, events that normally occur at the fifteenth hour are arrested What conclusions can be drawn concerning the role of gene transcription between hours and 15? PROBLE MS A ND DIS CUS S IO N QUES T I O N S Solution: Maternal mRNAs are present in the fertilized sea urchin egg Thus, a considerable amount of development can take place without transcription of the embryo’s genome Because development past 15 hours is inhibited by prior treatment with actinomycin D, it appears that transcripts from the embryo’s genome are required to initiate or maintain these events This transcription must take place between the sixth and fifteenth hours of development If it were possible to introduce one of the homeotic genes from Drosophila into an Arabidopsis embryo homozygous for a homeotic flowering gene, would you expect any of the Problems and Discussion Questions HOW DO WE KNOW ? In this chapter, we have focused on large-scale as well as the inter- and intracellular events that take place during embryogenesis and the formation of adult structures In particular, we discussed how the adult body plan is laid down by a cascade of gene expression, and the role of cell-cell communication in development Based on your knowledge of these topics, answer several fundamental questions: (a) How we know how many genes control development in an organism like Drosophila? (b) What experimental evidence demonstrates that molecular gradients in the egg control development? (c) How did we discover that selector genes specify which adult structures will be formed by body segments? (d) How did we learn about the levels of gene regulation involved in vulval development in C elegans? CONCEPT QUESTION Review the Chapter Concepts list on page 419 Most of these are concerned with the cascade of gene transcription that converts a zygote into an adult organism Write a short essay outlining the differences and similarities in the gene families used by plants and animals to establish the body axis and to regulate gene expression of these gene sets Nuclei from almost any source may be injected into Xenopus oocytes Studies have shown that these nuclei remain active in transcription and translation How can such an experimental system be useful in developmental genetic studies? Distinguish between the syncytial blastoderm stage and the cellular blastoderm stage in Drosophila embryogenesis (a) What are maternal-effect genes? (b) When are gene products from these genes made, and where are they located? (c) What aspects of development maternal-effect genes control? (d) What is the phenotype of maternal-effect mutations? How are the zygotic genes influenced by the maternal genes? What would happen if you mutate the zygotic genes? List the main classes of zygotic genes What is the function of each class of these genes? Experiments have shown that any nuclei placed in the polar cytoplasm at the posterior pole of the Drosophila egg will differentiate into germ cells If polar cytoplasm is transplanted into the anterior end of the egg just after fertilization, what will happen to nuclei that migrate into this cytoplasm at the anterior pole? 437 Drosophila genes to negate (rescue) the Arabidopsis mutant phenotype? Why or why not? Solution: The Drosophila homeotic genes belong to the Hox gene family, whereas Arabidopsis homeotic genes belong to the MADS-box protein family Both gene families are present in Drosophila and Arabidopsis, but they have evolved different functions in the animal and the plant kingdoms As a result, it is unlikely that a transferred Drosophila Hox gene would rescue the phenotype of a MADS-box mutant, but only an actual experiment would confirm this Visit for instructor-assigned tutorials and problems How can you determine whether a particular gene is being transcribed in different cell types? 10 You observe that a particular gene is being transcribed in different cell types during development You now want to assess whether protein is being made in these cells or not Suggest an experiment for this 11 What is the primary function of Hox genes? 12 The homeotic mutation Antennapedia causes mutant Drosophila to have legs in place of antennae and is a dominant gainof-function mutation What are the properties of such mutations? How does the Antennapedia gene change antennae into legs? 13 The Drosophila homeotic mutation spineless aristapedia (ss a) results in the formation of a miniature tarsal structure (normally part of the leg) on the end of the antenna What insight is provided by (ss a) concerning the role of genes during determination? 14 A number of genes that control expression of Hox genes in Drosophila have been identified One of these homozygous mutants is extra sex combs, where some of the head and all of the thorax and abdominal segments develop as the last abdominal segment In other words, all affected segments develop as posterior segments What does this phenotype tell you about which set of Hox genes is controlled by the extra sex combs gene? 15 In Arabidopsis, flower development is controlled by sets of homeotic genes How many classes of these genes are there, and what structures are formed by their individual and combined expression? 16 The floral homeotic genes of Arabidopsis belong to the MADSbox gene family, while in Drosophila, homeotic genes belong to the homeobox gene family In both Arabidopsis and Drosophila, members of the Polycomb gene family control expression of these divergent homeotic genes How Polycomb genes control expression of two very different sets of homeotic genes? 17 Dominguez et al (2004) suggest that by studying genes that determine growth and tissue specification in the eye of Drosophila, much can be learned about human eye development (a) What evidence suggests that genetic eye determinants in Drosophila are also found in humans? Include a discussion of orthologous genes in your answer (b) What evidence indicates that the eyeless gene is part of a developmental network? (c) Are genetic networks likely to specify developmental processes in general? Explain fully and provide an example 21 Quantitative Genetics and Multifactorial Traits CHAPTER CONCEPTS ■■ Quantitative inheritance results in a range of measurable phenotypes for a polygenic trait ■■ Polygenic traits most often demonstrate continuous variation ■■ Quantitative inheritance can be explained in Mendelian terms whereby certain alleles have an additive effect on the traits under study ■■ The study of polygenic traits relies on statistical analysis ■■ Heritability values estimate the genetic contribution to phenotypic variability under specific environmental conditions ■■ Twin studies allow an estimation of heritability in humans ■■ Quantitative trait loci (QTLs) can be mapped and identified 438 A field of pumpkins, where size is under the influence of quantitative inheritance U p to this point in the text, most of our examples of phenotypic variation have been those that have been assigned to distinct and separate categories; for example, human blood type was A, B, AB, or O; squash fruit shape was spherical, disc-shaped, or elongated; and fruit fly eye color was red or white (see Chapter 4) Typically in these traits, a genotype will produce a single identifiable phenotype, although phenomena such as variable penetrance and expressivity, pleiotropy, and epistasis can obscure the relationship between genotype and phenotype However, many traits are not as distinct and clear cut, including many that are of medical or agricultural importance They show much more variation, often falling into a continuous range of multiple phenotypes Most show what we call continuous variation, including, for example, height in humans, milk and meat production in cattle, and yield and seed protein content in various crops Continuous variation across a range of phenotypes can be measured and described in quantitative terms, so this genetic phenomenon is known as quantitative inheritance And because the varying phenotypes result from the input of genes at more than one, and often many, loci, the traits are said to be polygenic (literally “of many genes”) The genes involved are often referred to as polygenes To further complicate the link between the genotype and phenotype, the genotype generated at fertilization establishes a quantitative range within which a particular individual can fall However, the final phenotype is often also influenced by environmental factors to which that individual is 21.1 exposed Human height, for example, is genetically influenced but is also affected by environmental factors such as nutrition Quantitative (polygenic) traits whose phenotypes result from both gene action and environmental influences are termed multifactorial, or complex traits Often these terms are used interchangeably For consistency throughout the chapter, we will utilize the term multifactorial in our discussions In this chapter, we will examine examples of quantitative inheritance, multifactorial traits, and some of the statistical techniques used to study them We will also consider how geneticists assess the relative importance of genetic versus environmental factors contributing to continuous phenotypic variation, and we will discuss approaches to identifying and mapping genes that influence quantitative traits P1 AABB Red F1 White AaBb Intermediate color Additive alleles 1/4 AA 1/2 Aa 21.1 Quantitative Traits Can Be Explained in Mendelian Terms The Multiple-Gene Hypothesis for Quantitative Inheritance aabb * F2 The question of whether continuous phenotypic variation could be explained in Mendelian terms caused considerable controversy in the early 1900s Some scientists argued that, although Mendel’s unit factors, or genes, explained patterns of discontinuous variation with discrete phenotypic classes, they could not account for the range of phenotypes seen in quantitative patterns of inheritance However, geneticists William Bateson and G Udny Yule, adhering to a Mendelian explanation, proposed the multiple-factor or multiple-gene hypothesis, in which many genes, each individually behaving in a Mendelian fashion, contribute to the phenotype in a cumulative or quantitative way 439 QUANTITATIVE TRA IT S C A N BE E XPL A INE D IN ME ND EL IAN T ER M S 1/4 aa 1/4 BB — 1/16 AABB 1/2 Bb — 2/16 AABb 1/4 bb — 1/16 AAbb 1/4 BB — 2/16 AaBB 1/2 Bb — 4/16 AaBb 1/4 bb — 2/16 Aabb 1/4 BB — 1/16 aaBB 1/2 Bb — 2/16 aaBb 1/4 bb — 1/16 aabb Additive alleles F2 ratio 1/16 Red 4/16 6/16 4/16 Intermediate colors 1/16 White FIGUR E 21–1 How the multiple-factor hypothesis accounts for the 1:4:6:4:1 phenotypic ratio of grain color when all alleles designated by an uppercase letter are additive and contribute an equal amount of pigment to the phenotype The multiple-gene hypothesis was initially based on a key set of experimental results published by Hermann Nilsson-Ehle in 1909 Nilsson-Ehle used grain color in wheat to test the concept that the cumulative effects of alleles at multiple loci produce the range of phenotypes seen in quantitative traits In one set of experiments, wheat with red grain was crossed to wheat with white grain (Figure 21–1) The F1 generation demonstrated an intermediate pink color, which at first sight suggested incomplete dominance of two alleles at a single locus However, in the F2 generation, Nilsson-Ehle did not observe the typical segregation of a monohybrid cross Instead, approximately 15/16 of the plants showed some degree of red grain color, while 1/16 of the plants showed white grain color Careful examination of the F2 revealed that grain with color could be classified into four different shades of red Because the F2 ratio occurred in sixteenths, it appears that two genes, each with two alleles, control the phenotype and that they segregate independently from one another in a Mendelian fashion Additive Alleles: The Basis of Continuous Variation The multiple-gene hypothesis consists of the following major points: Phenotypic traits showing continuous variation can be quantified by measuring, weighing, counting, and so on Two or more gene loci, often scattered throughout the genome, account for the hereditary influence on the phenotype in an additive way Because many genes may be involved, inheritance of this type is called polygenic Each gene locus may be occupied by either an additive allele, which contributes a constant amount to the phenotype, or a nonadditive allele, which does not contribute quantitatively to the phenotype Gene pair Aa * Aa 50 25 Frequency (%) 2 Gene pairs AaBb * AaBb 50 25 Frequency (%) If each gene has one potential additive allele that contributes approximately equally to the red grain color and one potential nonadditive allele that fails to produce any red pigment, we can see how the multiple-factor hypothesis could account for the various grain color phenotypes In the P1 both parents are homozygous; the red parent contains only additive alleles (AABB in Figure 21–1), while the white parent contains only nonadditive alleles (aabb) The F1 plants are heterozygous (AaBb), contain two additive (A and B) and two nonadditive (a and b) alleles, and express the intermediate pink phenotype Each of the F2 plants has 4, 3, 2, 1, or additive alleles F2 plants with no additive alleles are white (aabb) like one of the P1 parents, while F2 plants with additive alleles are red (AABB) like the other P1 parent Plants with 3, 2, or additive alleles constitute the other three categories of red color observed in the F2 generation The greater the number of additive alleles in the genotype, the more intense the red color expressed in the phenotype, as each additive allele present contributes equally to the cumulative amount of pigment produced in the grain Nilsson-Ehle’s results showed how continuous variation could still be explained in a Mendelian fashion, with additive alleles at multiple loci influencing the phenotype in a quantitative manner, but each individual allele segregating according to Mendelian rules As we saw in Nilsson-Ehle’s initial cross, if two loci, each with two alleles, were involved, then five F2 phenotypic categories in a 1:4:6:4:1 ratio would be expected However, there is no reason why three, four, or more loci cannot function in a similar fashion in controlling various quantitative phenotypes As more quantitative loci become involved, greater and greater numbers of classes appear in the F2 generation in more complex ratios The number of phenotypes and the expected F2 ratios for crosses involving up to five gene pairs are illustrated in Figure 21–2 Frequency (%) QUAN TITATIVE GEN ETICS AND MU LTIFAC TORIA L TRA IT S Gene pairs AaBbCc * AaBbCc 50 25 Frequency (%) 21 15 20 15 Gene pairs AaBbCcDd * AaBbCcDd 50 25 Frequency (%) 440 50 28 56 70 56 28 Gene pairs AaBbCcDdEe * AaBbCcDdEe 25 10 45 120 210 252 210 120 45 10 FIGUR E 21–2 The genetic ratios (on the X-axis) resulting from crossing two heterozygotes when polygenic inheritance is in operation with to gene pairs The histogram bars indicate the distinct F2 phenotypic classes, ranging from one extreme (left end) to the other extreme (right end) Each phenotype results from a different number of additive alleles The contribution to the phenotype of each additive allele, though often small, is approximately equal While we now know this is not always true, we have made this assumption in the above discussion Together, the additive alleles contributing to a single quantitative character produce substantial phenotypic variation 21.2 THE S TU DY OF POLYG E NIC TRAIT S RE LIE S ON S TATIS TI C AL ANALY S I S 441 Calculating the Number of Polygenes ESSEN T IAL PO IN T Various formulas have been developed for estimating the number of polygenes contributing to a quantitative trait For example, if the ratio of F2 individuals resembling either of the two extreme P1 phenotypes can be determined, the number of polygenes (loci) involved (n) may be calculated as Quantitative inheritance results in a range of phenotypes due to the action of additive alleles from two or more genes, as influenced by environmental factors 1>4n = ratio of F2 individuals expressing either extreme phenotype In the example of the red and white wheat grain color summarized in Figure 21–1, 1/16 of the progeny are either red or white like the P1 phenotypes This ratio can be substituted on the right side of the equation to solve for n: 1 = 4n 16 1 = 16 42 n = 21–1 A homozygous plant with 20-cm diameter flowers is crossed with a homozygous plant of the same species that has 40-cm diameter flowers The F1 plants all have flowers 30 cm in diameter In the F2 generation of 512 plants, plants have flowers 20 cm in diameter, plants have flowers 40 cm in diameter, and the remaining 508 plants have flowers of a range of sizes in between (a) Assuming that all alleles involved act additively, how many genes control flower size in this plant? (b) What frequency distribution of flower diameter would you expect to see in the progeny of a backcross between an F1 plant and the large-flowered parent? H I NT: This problem provides F1 and F2 data for a cross involving Table 21.1 lists the ratio and the number of F2 phenotypic classes produced in crosses involving up to five gene pairs For low numbers of polygenes (n), it is sometimes easier to use the equation (2n + 1) = the number of distinct phenotypic a quantitative trait and asks you to calculate the number of genes controlling the trait The key to its solution is to remember that unless you know the total number of distinct F2 phenotypes involved, then the ratio (not the number) of parental phenotypes reappearing in the F2 must be used in your determination of the number of genes involved categories observed For example, when there are two polygenes involved (n = 2), then (2n + 1) = and each phenotype is the result of 4, 3, 2, 1, or additive alleles If n = 3, 2n + = and each phenotype is the result of 6, 5, 4, 3, 2, 1, or additive alleles Thus, working backwards with this rule and knowing the number of phenotypes, we can calculate the number of polygenes controlling them It should be noted, however, that both of these simple methods for estimating the number of polygenes involved in a quantitative trait assume not only that all the relevant alleles contribute equally and additively, but also that phenotypic expression in the F2 is not affected significantly by environmental factors As we will see later, for many quantitative traits, these assumptions may not be true TABLE 1.1 Determination of the Number of Polygenes (n) Involved in a Quantitative Trait n Individuals Expressing Either Extreme Phenotype Distinct Phenotypic Classes 1/4 1/16 1/64 1/256 1/1024 3 5 7 9 11 21.2 The Study of Polygenic Traits Relies on Statistical Analysis Before considering the approaches that geneticists use to dissect how much of the phenotypic variation observed in a population is due to genotypic differences among individuals and how much is due to environmental factors, we need to consider the basic statistical tools they use for the task It is not usually feasible to measure expression of a polygenic trait in every individual in a population, so a random subset of individuals is usually selected for measurement to provide a sample It is important to remember that the accuracy of the final results of the measurements depends on whether the sample is truly random and representative of the population from which it was drawn Suppose, for example, that a student wants to determine the average height of the 100 students in his genetics class, and for his sample he measures the two students sitting next to him, both of whom happen to be centers on the college basketball team It is unlikely that this sample will provide a good estimate of the average height of the class, for two reasons: First, it is too small; second, it is not a representative subset of the class (unless all 100 students are centers on the basketball team) 21 QUAN TITATIVE GEN ETICS AND MU LTIFAC TORIA L TRA IT S where X is the mean, Σ Xi represents the sum of all individual values in the sample, and n is the number of individual values The mean provides a useful descriptive summary of the sample, but it tells us nothing about the range or spread of the data As illustrated in Figure 21–4, a symmetrical distribution of values in the sample may, in one case, be clustered near the mean Or a set of measurements may have the same mean but be distributed more widely around it A second statistic, the variance, provides information about the spread of data around the mean Frequency 442 Interval F I G U RE – Normal frequency distribution, characterized by a bell-shaped curve If the sample measured for expression of a quantitative trait is sufficiently large and also representative of the population from which it is drawn, we often find that the data form a normal distribution; that is, they produce a characteristic bell-shaped curve when plotted as a frequency histogram (Figure 21–3) Several statistical concepts are useful in the analysis of traits that exhibit a normal distribution, including the mean, variance, standard deviation, standard error of the mean, covariance, and correlation coefficient The Mean The mean provides information about where the central point lies along a range of measurements for a quantitative trait Figure 21–4 shows the distribution curves for two different sets of phenotypic measurements Each of these sets of measurements clusters around a central value (as it happens, they both cluster around the same value) This clustering is called a central tendency, and the central point is the mean Specifically, the mean (X) is the arithmetic average of a set of measurements and is calculated as Σ Xi n The variance (s 2) for a sample is the average squared distance of all measurements from the mean It is calculated as s2 = where the sum (Σ) of the squared differences between each measured value (Xi) and the mean (X) is divided by one less than the total sample size (n - 1) As Figure 21–4 shows, it is possible for two sets of sample measurements for a quantitative trait to have the same mean but a different distribution of values around it This range will be reflected in different variances Estimation of variance can be useful in determining the degree of genetic control of traits when the immediate environment also influences the phenotype Standard Deviation Because the variance is a squared value, its unit of measurement is also squared (m2, g2, etc.) To express variation around the mean in the original units of measurement, we can use the square root of the variance, a term called the standard deviation (s) Table 21.2 shows the percentage of individual values within a normal distribution that fall within different multiples of the standard deviation The values that fall within one standard deviation to either side of the mean represent TA BLE 21.2 Sample Inclusion for Various s Values Multiples of s X Class Two normal frequency distributions with the same mean but different amounts of variation F I G U RE – Σ (Xi - X)2 n - s = 2s2 Distribution curves Number X = Variance Sample Included (%) X { 1s 68.3 X { 1.96s 95.0 X { 2s 95.5 X { 3s 99.7 21.2 THE S TU DY OF POLYG E NIC TRAIT S RE LIE S ON S TATIS TI C AL ANALY S I S The covariance can then be standardized as yet another statistic, the correlation coefficient (r) The calculation is 68 percent of all values in the sample More than 95 percent of all values are found within two standard deviations to either side of the mean This indicates that the standard deviation (s) can also be interpreted in the form of a probability For example, a sample measurement picked at random has a 68 percent probability of falling within the range of one standard deviation r = covXY >SXSY where SX is the standard deviation of the first set of quantitative measurements X, and SY is the standard deviation of the second set of quantitative measurements Y Values for the correlation coefficient r can range from -1 to +1 Positive r values mean that an increase in measurement for one trait tends to be associated with an increase in measurement for the other, while negative r values mean that increases in one trait are associated with decreases in the other Therefore, if heavier hens tend to lay more eggs, a positive r value can be expected A negative r value, on the other hand, suggests that greater egg production is more likely from less heavy birds One important point to note about correlation coefficients is that even significant r values—close to +1 or - 1—do not prove that a cause-and-effect relationship exists between two traits Correlation analysis simply tells us the extent to which variation in one quantitative trait is associated with variation in another, not what causes that variation Standard Error of the Mean If multiple samples are taken from a population and measured for the same quantitative trait, we might find that their means vary Theoretically, larger, truly random samples will represent the population more accurately, and their means will be closer to each other To measure the accuracy of the sample mean, we use the standard error of the mean (SX ), calculated as SX = s 2n where s is the standard deviation and 2n is the square root of the sample size Because the standard error of the mean is computed by dividing s by 2n, it is always a smaller value than the standard deviation Analysis of a Quantitative Character To apply these statistical concepts, let’s consider a genetic experiment that crossed two different homozygous varieties of tomato One of the tomato varieties produces fruit averaging 18 oz in weight, whereas fruit from the other averages oz The F1 obtained by crossing these two varieties has fruit weights ranging from 10 to 14 oz The F2 population contains individuals that produce fruit ranging from to 18 oz The results characterizing both generations are shown in Table 21.3 The mean value for the fruit weight in the F1 generation can be calculated as Covariance and Correlation Coefficient Often geneticists working with quantitative traits find they have to consider two phenotypic characters simultaneously For example, a poultry breeder might investigate the correlation between body weight and egg production in hens: Do heavier birds tend to lay more eggs? The covariance statistic measures how much variation is common to both quantitative traits It is calculated by taking the deviations from the mean for each trait (just as we did for estimating variance) for each individual in the sample This gives a pair of values for each individual The two values are multiplied together, and the sum of all these individual products is then divided by one fewer than the number in the sample Thus, the covariance (covXY) of two sets of trait measurements, X and Y, is calculated as covXY = TABLE X = Σ Xi 626 = = 12.04 n 52 The mean value for fruit weight in the F2 generation is calculated as X = Σ 3(Xi - X) (Yi - Y)4 Σ Xi 872 = = 12.11 n 72 Although these mean values are similar, the frequency distributions in Table 21.3 show more variation in the n - Distribution of F1 and F2 Progeny Derived from a Theoretical Cross Involving Tomatoes Weight (oz) Number of F1 Individuals F2 1 443 10 11 12 13 14 14 16 12 13 17 14 15 16 17 18 444 21 QUAN TITATIVE GEN ETICS AND MU LTIFAC TORIA L TRA IT S F generation The range of variation can be quantified as the sample variance s2, calculated, as we saw on page 442, as the sum of the squared differences between each value and the mean, divided by one less than the total number of observations: s2 = Σ (Xi - X)2 n - When the above calculation is made, the variance is found to be 1.29 for the F1 generation and 4.27 for the F2 generation When converted to the standard deviation (s = 2s 2), the values become 1.13 and 2.06, respectively Therefore, the distribution of tomato weight in the F1 generation can be described as 12.04 { 1.13, and in the F2 generation it can be described as 12.11 { 2.06 Assuming that both parental varieties are homozygous at the loci of interest and that the alleles controlling fruit weight act additively, we can estimate the number of loci involved in this trait Since 1>72 of the F2 offspring have a phenotype that overlaps one of the parental strains (72 total F2 offspring; one weighs oz, one weighs 18 oz; see Table 21.3), the use of the formula 1>4n = 1>72 indicates that n is between and 4, providing evidence of the number of genes that control fruit weight in these tomato strains 21–2 The following table shows measurements for fiber lengths and fleece weight in a small flock of eight sheep Sheep Fiber Length (cm) Fleece Weight (kg) 9.7 7.9 5.6 4.5 10.7 8.3 6.8 5.4 11.0 9.1 4.5 4.9 7.4 6.0 5.9 5.1 (a) What are the mean, variance, and standard deviation for each trait in this flock? (b) What is the covariance of the two traits? (c) What is the correlation coefficient for fiber length and fleece weight? (d) Do you think greater fleece weight is correlated with an increase in fiber length? Why or why not? HINT: This problem provides data for two quantitative traits and asks you to make numerous statistical calculations, ultimately determining if the traits are correlated The key to its solution is that once the calculation of the correlation coefficient (r) is completed, you must interpret that value—whether it is positive or negative, and how close to zero it is ESSEN T IAL PO IN T Numerous statistical methods are essential during the analysis of quantitative traits, including the mean, variance, standard deviation, standard error, covariance, and the correlation coefficient 21.3 Heritability Values Estimate the Genetic Contribution to Phenotypic Variability The question most often asked by geneticists working with multifactorial traits and diseases is how much of the observed phenotypic variation in a population is due to genotypic differences among individuals and how much is due to environment The term heritability is used to describe the proportion of total phenotypic variation in a population that is due to genetic factors For a multifactorial trait in a given population, a high heritability estimate indicates that much of the variation can be attributed to genetic factors, with the environment having less impact on expression of the trait With a low heritability estimate, environmental factors are likely to have a greater impact on phenotypic variation within the population The concept of heritability is frequently misunderstood and misused It should be emphasized that heritability indicates neither how much of a trait is genetically determined nor the extent to which an individual’s phenotype is due to genotype In recent years, such misinterpretations of heritability for human quantitative traits have led to controversy, notably in relation to measurements such as intelligence quotients, or IQs Variation in heritability estimates for IQ among different racial groups led to incorrect suggestions that unalterable genetic factors control differences in intelligence levels among humans of different ancestries Such suggestions misrepresented the meaning of heritability and ignored the contribution of genotype-by-environment interaction variance (see p 445) to phenotypic variation in a population Moreover, heritability is not fixed for a trait For example, a heritability estimate for egg production in a flock of chickens kept in individual cages might be high, indicating that differences in egg output among individual birds are largely due to genetic differences, as they all have very similar environments For a different flock kept outdoors, heritability for egg production might be much lower, as variation among different birds may also reflect differences in their individual environments Such differences could include how much food each bird manages to find and whether it competes successfully for a good roosting spot at night Thus, a heritability estimate tells us the proportion of phenotypic variation that can be attributed to genetic 445 HERITAB ILI T Y VALU E S E S TIMATE THE G E NE TIC CONTRIBU TION TO PHE NOT YPI C VAR IABI L I T Y variation within a certain population in a particular environment If we measure heritability for the same trait among different populations in a range of environments, we frequently find that the calculated heritability values have large standard errors This is an important point to remember when considering heritability estimates Parallel studies using different population bases are likely to yield different heritability estimates For example, a mean heritability estimate of 0.65 for human height does not mean that your height is 65 percent due to your genes, but rather that in the populations sampled, on average, 65 percent of the overall variation in height could be explained by genotypic differences among individuals With this subtle but important distinction in mind, we will now consider how geneticists divide the phenotypic variation observed in a population into genetic and environmental components As we saw in the previous section, variation can be quantified as a sample variance: taking measurements of the trait in question from a representative sample of the population and determining the extent of the spread of those measurements around the sample mean This gives us an estimate of the total phenotypic variance in the population (VP) Heritability estimates are obtained by using different experimental and statistical techniques to partition VP into genotypic variance (VG) and environmental variance (VE) components An important factor contributing to overall levels of phenotypic variation is the extent to which individual genotypes affect the phenotype differently depending on the environment For example, wheat variety A may yield an average of 20 bushels an acre on poor soil, while variety B yields an average of 17 bushels On good soil, variety A yields 22 bushels, while variety B averages 25 bushels an acre There are differences in yield between the two genotypically distinct varieties, so variation in wheat yield has a genetic component Both varieties yield more on good soil, so yield is also affected by environment However, we also see that the two varieties not respond to better soil conditions equally: The genotype of wheat variety B achieves a greater increase in yield on good soil than does variety A Thus, we have differences in the interaction of genotype, with environment contributing to variation for yield in populations of wheat plants This third component of phenotypic variation is genotype-by-environment interaction variance (VG : E ) (Figure 21–5) We can now summarize all the components of total phenotypic variance VP using the following equation: VP = VG + VE + VG * E In other words, total phenotypic variance can be subdivided into genotypic variance, environmental variance, and genotype-by-environment interaction variance When obtaining heritability estimates for a multifactorial trait, (a) No G * E interaction Yield 21.3 y Variet A y Variet B Low High Soil fertility (b) G * E interaction present Yield A Variety ty B ie Var Low High Soil fertility FIGUR E 21–5 Differences in yield between two wheat varieties at different soil fertility levels (a) No genotype-by-environment, or G * E, interaction: The varieties show genetic differences in yield but respond equally to increasing soil fertility (b) G * E interaction present: Variety A outyields B at low soil fertility, but B yields more than A at high-fertility levels researchers often assume that the genotype-by-environment interaction variance is small enough that it can be ignored or combined with the environmental variance However, it is worth remembering that this kind of approximation is another reason heritability values are estimates for a given population in a particular context, not a fixed attribute for a trait Animal and plant breeders use a range of experimental techniques to estimate heritabilities by partitioning measurements of phenotypic variance into genotypic and environmental components One approach uses inbred strains containing genetically homogeneous individuals with highly homozygous genotypes Experiments are then designed to test the effects of a range of environmental conditions on phenotypic variability Variation between different inbred strains reared in a constant environment is due predominantly to genetic factors Variation among members of the same inbred strain reared under different conditions is more likely to be due to environmental factors Other approaches involve analysis of variance for a 446 21 QUAN TITATIVE GEN ETICS AND MU LTIFAC TORIA L TRA IT S quantitative trait among offspring from different crosses, or comparing expression of a trait among offspring and parents reared in the same environment Broad-Sense Heritability Broad-sense heritability (represented by the term H 2) measures the contribution of the genotypic variance to the total phenotypic variance It is estimated as a proportion: H2 = VG VP Heritability values for a trait in a population range from 0.0 to 1.0 A value approaching 1.0 indicates that the environmental conditions have little impact on phenotypic variance, which is therefore largely due to genotypic differences among individuals in the population Low values close to 0.0 indicate that environmental factors, not genotypic differences, are largely responsible for the observed phenotypic variation within the population studied Few quantitative traits have very high or very low heritability estimates, suggesting that both genetics and environment play a part in the expression of most phenotypes for the trait The genotypic variance component VG used in broadsense heritability estimates includes all types of genetic variation in the population It does not distinguish between quantitative trait loci with alleles acting additively as opposed to those with epistatic or dominance effects Broadsense heritability estimates also assume that the genotypeby-environment variance component is negligible While broad-sense heritability estimates for a trait are of general genetic interest, these limitations mean this kind of heritability is not very useful in breeding programs Animal or plant breeders wishing to develop improved strains of livestock or higher-yielding crop varieties need more precise heritability estimates for the traits they wish to manipulate in a population Therefore, another type of estimate, narrow-sense heritability, has been devised that is of more practical use Narrow-Sense Heritability Narrow-sense heritability (h2) is the proportion of phenotypic variance due to additive genotypic variance alone Genotypic variance can be divided into subcomponents representing the different modes of action of alleles at quantitative trait loci As not all the genes involved in a quantitative trait affect the phenotype in the same way, this partitioning distinguishes among three different kinds of gene action contributing to genotypic variance Additive variance, VA, is the genotypic variance due to the additive action of alleles at quantitative trait loci Dominance variance, VD, is the deviation from the additive components that results when phenotypic expression in heterozygotes is not precisely intermediate between the two homozygotes Interactive variance, VI, is the deviation from the additive components that occurs when two or more loci behave epistatically The amount of interactive variance is often negligible, and so this component is often excluded from calculations of total genotypic variance The partitioning of the total genotypic variance VG is summarized in the equation VG = VA + VD + VI and a narrow-sense heritability estimate based only on that portion of the genotypic variance due to additive gene action becomes h2 = VA VP Omitting VI and separating VP into genotypic and environmental variance components, we obtain h2 = VA VE + VA + VD Heritability estimates are used in animal and plant breeding to indicate the potential response of a population to artificial selection for a quantitative trait Narrowsense heritability, h 2, provides a more accurate prediction of selection response than broad-sense heritability, H 2, and therefore h is more widely used by breeders ESSEN T IAL POIN T Heritability is an estimate of the relative contribution of genetic versus environmental factors to the range of phenotypic variation seen in a quantitative trait in a particular population and environment Artificial Selection Artificial selection is the process of choosing specific individuals with preferred phenotypes from an initially heterogeneous population for future breeding purposes Theoretically, if artificial selection based on the same trait preferences is repeated over multiple generations, a population can be developed containing a high frequency of individuals with the desired characteristics If selection is for a simple trait controlled by just one or two genes subject to little environmental influence, generating the desired population of plants or animals is relatively fast and easy However, many traits of economic importance in crops and livestock, such as grain yield in plants, weight gain or milk yield in cattle, and speed or stamina in horses, are polygenic and frequently multifactorial Artificial selection for such traits is slower and more complex Narrow-sense heritability estimates are valuable to the plant or animal breeder because, as we have just seen, they estimate the 21.3 447 HERITAB ILI T Y VALU E S E S TIMATE THE G E NE TIC CONTRIBU TION TO PHE NOT YPI C VAR IABI L I T Y proportion of total phenotypic variance for the trait that is due to additive genetic variance Quantitative trait alleles with additive impact are those most easily manipulated by the breeder Alleles at quantitative trait loci that generate dominance effects or interact epistatically (and therefore contribute to VD or VI) are less responsive to artificial selection Thus, narrow-sense heritability, h 2, can be used to predict the impact of selection The higher the estimated value for h in a population, the more likely the breeder will observe a change in phenotypic range for the trait in the next generation after artificial selection Partitioning the genetic variance components to calculate h and predict response to selection is a complex task requiring careful experimental design and analysis The simplest approach is to select individuals with superior phenotypes for the desired quantitative trait from a heterogeneous population and breed offspring from those individuals The mean score for the trait of those offspring (M2) can then be compared to that of: (1) the original population’s mean score (M) and (2) the selected individuals used as parents (M1) The relationship between these means and h is h2 = M2 - M M1 - M This equation can be further simplified by defining M2 - M as the selection response (R)—the degree of response to mating the selected parents—and M1 - M as the selection differential (S)—the difference between the mean for the whole population and the mean for the selected population—so h reflects the ratio of the response observed to the total response possible Thus, h2 = R S A narrow-sense heritability value obtained in this way by selective breeding and measuring the response in the offspring is referred to as an estimate of realized heritability As an example of a realized heritability estimate, suppose that we measure the diameter of corn kernels in a population where the mean diameter M is 20 mm From this population, we select a group with the smallest diameters, for which the mean M1 equals 10 mm The selected plants are interbred, and the mean diameter M2 of the progeny kernels is 13 mm We can calculate the realized heritability h to estimate the potential for artificial selection on kernel size: h2 = M2 - M M1 - M h2 = 13 - 20 10 - 20 = -7 -10 = 0.70 Percentage oil 19 18 17 16 15 14 13 12 11 10 (0.12) Selection for high oil content (0.34) (0.11) (0.32) Selection for low oil content 10 20 30 40 50 Generation 60 70 80 FIGUR E 21–6 Response of corn selected for high and low oil content over 76 generations The numbers in parentheses at generations 9, 25, 52, and 76 for the “high oil” line indicate the calculation of heritability at these points in the continuing experiment This value for narrow-sense heritability indicates that the selection potential for kernel size is relatively high The longest running artificial selection experiment known is still being conducted at the State Agricultural Laboratory in Illinois Corn has been selected for both high and low oil content After 76 generations, selection continues to result in increased oil content (Figure 21–6) With each cycle of successful selection, more of the corn plants accumulate a higher percentage of additive alleles involved in oil production Consequently, the narrowsense heritability h2 of increased oil content in succeeding generations has declined (see parenthetical values at generations 9, 25, 52, and 76 in Figure 21–6) as artificial selection comes closer and closer to optimizing the genetic potential for oil production Theoretically, the process will continue until all individuals in the population possess a uniform genotype that includes all the additive alleles responsible for high oil content At that point, h2 will be reduced to zero, and response to artificial selection will cease The decrease in response to selection for low oil content shows that heritability for low oil content is approaching this point Table 21.4 lists narrow-sense heritability estimates expressed as percentage values for a variety of quantitative traits in different organisms As you can see, these h2 values 21 448 QUAN TITATIVE GEN ETICS AND MU LTIFAC TORIA L TRA IT S TAB L E Estimates of Heritability for Traits in Different Organisms Trait Mice Tail length Body weight Litter size Chickens Body weight Egg production Egg hatchability Cattle Birth weight Milk yield Conception rate Heritability (h2) 60% 37 15 50 20 15 45 44 3 vary, but heritability tends to be low for quantitative traits that are essential to an organism’s survival Remember, this does not indicate the absence of a genetic contribution to the observed phenotypes for such traits Instead, the low h values show that natural selection has already largely optimized the genetic component of these traits during evolution Egg production, litter size, and conception rate are examples of how such physiological limitations on selection have already been reached Traits that are less critical to survival, such as body weight, tail length, and wing length, have higher heritabilities because more genotypic variation for such traits is still present in the population Remember also that any single heritability estimate can only provide information about one population in a specific environment Studies involving the same trait in differing environments most often yield different results Therefore, narrow-sense heritability is a more valuable predictor of response to selection when estimates are calculated for many populations and environments and show the presence of a clear trend Limitations of Heritability Studies While the above discussion makes clear that heritability studies are valuable in estimating the genetic contribution to phenotypic variance, the knowledge gained about heritability of traits must be balanced by awareness of some of the constraints inherent in such estimates: • Heritability values provide no information about what genes are involved in traits • Heritability is measured in populations, and has only limited application to individuals • Measured heritability depends on the environmental variation present in the population being studied, and cannot be used to evaluate differences between populations • Future changes in environmental factors can affect heritability 21.4 Twin Studies Allow an Estimation of Heritability in Humans Human twins are useful subjects for examining how much phenotypic variance for a multifactorial trait is due to the genotype as opposed to the environment In these studies, the underlying principle has been that monozygotic (MZ), or identical, twins are derived from a single zygote that divides mitotically and then spontaneously splits into two separate cells Both cells give rise to a genotypically identical embryo Dizygotic (DZ), or fraternal, twins, on the other hand, originate from two separate fertilization events and are only as genetically similar as any two siblings, with an average of 50 percent of their alleles in common For a given trait, therefore, phenotypic differences between pairs of MZ twins will be equivalent to the environmental variance (VE) (because the genotypic variance is zero) Phenotypic differences between DZ twins, however, display both environmental variance (VE) and approximately half the genotypic variance (VG) Comparing the extent of phenotypic variance for the same trait in MZ and DZ sets of twins provides an estimate of broad-sense heritability for the trait Twins are said to be concordant for a given trait if both express it or neither expresses it If one expresses the trait and the other does not, the pair is said to be discordant Comparison of concordance values of MZ versus DZ twins reared together illustrates the potential value for heritability assessment (See the Now Solve This feature on page 450, for example.) Before any conclusions can be drawn from twin studies, the data must be examined carefully For example, if concordance values approach 90 to 100 percent in MZ twins, we might be inclined to interpret that as a large genetic contribution to the phenotype of the trait In some cases—for example, blood types and eye color—we know that this is indeed true In the case of contracting measles, however, a high concordance value merely indicates that the trait is almost always induced by a factor in the environment—in this case, a virus It is more meaningful to compare the difference between the concordance values of MZ and DZ twins If concordance values are significantly higher in MZ twins, we suspect a strong genetic component in the determination of the trait In the case of measles, where concordance is high in both types of twins, the environment is assumed to be the major contributing factor Such an analysis is useful because 21.4 T WIN S TU DIE S A LLOW A N E S TIMATION OF HE RITABILIT Y I N HU M AN S phenotypic characteristics that remain similar in different environments are likely to have a strong genetic component Large Scale Analysis of Twin Studies For decades, researchers have used twin studies to examine the relative contributions of genotype and environment to the phenotypic variation observed in complex traits in humans These traits involve the interplay of multiple genes with a network of environmental factors, and the genetic components of the resulting phenotypic variance can be difficult to study The simplest way to assess the genetic contribution is to assume that the effect of each gene on a trait is independent of the effects of other genes Because the effects of all genes are added together, this is called the additive model However, in recent years, some geneticists have proposed that non-additive factors such as dominance and epistasis are more important than additive genetic effects As a result, the relative roles of additive and non-additive factors are a subject of active debate In an attempt to resolve this issue, an international project recently examined the results of all twin studies performed in the last 50 years This study, published in 2015, involved the compilation and analysis of the data for over 17,000 traits studied in more than 14 million twin pairs drawn from more than 2,700 published papers Several important general conclusions can be drawn from this landmark study First, based on correlations between MZ and DZ twin pairs, which can be used to draw conclusions about how likely it is that genetic influences on a trait are mostly additive or non-additive, researchers concluded that the vast majority of traits follow a simple additive model, providing strong support for one of the foundations of heritability studies This does not exclude the role of non-additive factors such as dominance and epistasis, but these factors most likely play a secondary role in heritability Second, the results are consistent with the findings from genome wide association studies (GWAS) that many complex traits are controlled by many genes, each with a small effects Third, genetic variance is an important component of the individual variations observed in populations In addition, the relative effects of genotypes and environmental factors are non-randomly distributed, making their contributions somewhat trait-specific The data from this study are available in a web-based application, Meta-analysis of Twin Correlations and Heritability (MaTCH), which can be used as a resource for the study of complex traits and the genetic and environmental components of heritability Twin Studies Have Several Limitations Interesting as they are, human twin studies contain some unavoidable sources of error For example, MZ twins are 449 often treated more similarly by parents and teachers than are DZ twins, especially when the DZ siblings are of different sex This circumstance may inflate the environmental variance for DZ twins Another possible error source is interactions between the genotype and the environment that produce variability in the phenotype These interactions can increase the total phenotypic variance for DZ twins compared to MZ twins raised in the same environment, influencing heritability calculations Overall, heritability estimates for human traits based on twin studies should therefore be considered approximations and examined very carefully before any conclusions are drawn Although they must often be viewed with caution, classical twin studies, based on the assumption that MZ twins share the same genome, have been valuable for estimating heritability over a wide range of traits including multifactorial disorders such as cardiovascular disease, diabetes, and mental illness, for example These disorders clearly have genetic components, and twin studies provide a foundation for studying interactions between genes and environmental factors However, results from genomics research have challenged the view that MZ twins are truly identical and have forced a reevaluation of both the methodology and the results of twin studies Such research has also opened the way to new approaches to the study of interactions between the genotype and environmental factors The most relevant genomic discoveries about twins include the following: • By the time they are born, MZ twins not necessarily have identical genomes • Gene-expression patterns in MZ twins change with age, leading to phenotypic differences We will address these points in order First, MZ twins develop from a single fertilized egg, where sometime early in development the resulting cell mass separates into two distinct populations creating two independent embryos Until that time, MZ twins have identical genotypes Subsequently, however, the genotypes can diverge slightly For example, differences in copy number variation (CNV)—variation in the number of copies of numerous large DNA sequences (usually 1000 bp or more)—may arise, differentially producing genetically distinct populations of cells in each embryo (see Chapter for a discussion of CNV) This creates a condition called somatic mosaicism, which may result in a milder disease phenotype in some disorders and may play a similar role in phenotypic discordance observed in some pairs of MZ twins At this point, it is difficult to know for certain how often CNV arises after MZ twinning, but one estimate suggests that such differences occur in 10 percent of all twin 450 21 QUAN TITATIVE GEN ETICS AND MU LTIFAC TORIA L TRA IT S pairs In those pairs where it does occur, one estimate is that such divergence takes place in 15 to 70 percent of the somatic cells In one case, a CNV difference between MZ twins has been associated with chronic lymphocytic leukemia in one twin, but not the other The second genomic difference between MZ twins involves epigenetics—the chemical modification of their DNA and associated histones An international study of epigenetic modifications in adult European MZ twins showed that MZ twin pairs are epigenetically identical at birth, but adult MZ twins show significant differences in the methylation patterns of both DNA and histones Such epigenetic changes in turn affect patterns of gene expression The accumulation of epigenetic changes and gene-expression profiles may explain some of the observed phenotypic discordance and susceptibility to diseases in adult MZ twins For example, a clear difference in DNA methylation patterns is observed in MZ twins discordant for Beckwith– Wiedemann syndrome, a genetic disorder associated with variable developmental overgrowth of certain tissues and organs and an increased risk of developing cancerous and noncancerous tumors Affected infants are often larger than normal, and one in five dies early in life Other complex disorders displaying a genetic component are similarly being investigated using epigenetic analysis in twin studies These include susceptibility to several neurobiological disorders, including schizophrenia and autism, as well as to the development of Type diabetes, breast cancer, and autoimmune disease Progressive, age-related genomic modifications may be the result of MZ twins being exposed to different environmental factors, or from failure of epigenetic marking following DNA replication These findings also indicate that concordance studies in DZ twins must take into account genetic as well as epigenetic differences that contribute to discordance in these twin pairs The realization that epigenetics may play an important role in the development of phenotypes promises to make twin studies an especially valuable tool in dissecting the interactions among genes and the role of environmental factors in the production of phenotypes Once the degree of epigenetic differences between MZ and DZ twin pairs has been defined, molecular studies on DNA and histone modification can link changes in gene expression with differences in the concordance rates between MZ and DZ twins We will discuss the most recent findings involving epigenetics and summarize its many forms and functions later in the text (see Special Topic Chapter 1—Epigenetics) ES SEN T IA L PO I N T Twin studies, while having some limitations, are useful in assessing heritabilities for polygenic traits in humans 21–3 The following table gives the percentage of twin pairs studied in which both twins expressed the same phenotype for a trait (concordance) Percentages listed are for concordance for each trait in monozygotic (MZ) and dizygotic (DZ) twins Assuming that both twins in each pair were raised together in the same environment, what you conclude about the relative importance of genetic versus environmental factors for each trait? Trait MZ % DZ % Blood types 100 66 Eye color 99 28 Mental retardation 97 37 Measles 95 87 Hair color 89 22 Handedness 79 77 Idiopathic epilepsy 72 15 Schizophrenia 69 10 Diabetes 65 18 Identical allergy 59 5 Cleft lip 42 5 Club foot 32 3 Mammary cancer 6 3 H I NT: This problem asks you to evaluate the relative importance of genetic versus environmental contributions to specific traits by examining concordance values in MZ versus DZ twins The key to its solution is to examine the difference in concordance values and to factor in what you have learned about the genetic differences between MZ and DZ twins 21.5 Quantitative Trait Loci Are Useful in Studying Multifactorial Phenotypes Environmental effects, interaction among segregating alleles, and the large number of genes that may contribute to the phenotype of polygenically controlled complex traits make it difficult to: (1) identify all genes that are involved; and (2) determine the effect of each gene on the phenotype However, because many quantitative traits are of economic or medical relevance, it is often desirable to obtain this information In such studies, a chromosome region is identified as containing one or more genes contributing to a quantitative trait known as a quantitative trait locus (QTL).* When possible, the relevant gene or genes contained within a QTL are isolated and studied The modern approach used to find and map QTLs involves looking for associations between DNA markers and phenotypes One way to this is to begin with *We utilize QTLs to designate the plural form, quantitative trait loci 21.5 QUANTITATIVE TRA IT LOCI ARE U S E FU L IN S TU DYING MU LTIFAC TORIA L P HEN OT Y P ES Quantitative phenotype (a) 10 15 Generation 20 25 (b) P1 * P1 * F1 * F1 F2 Probability of association (c) 50 40 30 20 10 0 20 40 60 80 100 120 Mapping position (cM) individuals from two lines created by artificial selection that are highly divergent for a phenotype (fruit weight, oil content, bristle number, etc.) For example, Figure 21–7 illustrates a generic case of QTL mapping Over many generations of artificial selection, two divergent lines become highly homozygous, which facilitates their use in QTL mapping Individuals from each of the lines with divergent phenotypes [generation 25 in Figure 21–7(a)] are used as parents to create an F generation whose members will be heterozygous at most of the loci contributing to the trait Additional crosses, either among F individuals or between the F and the inbred parent lines, result in F 451 FIGUR E 21–7 (a) Individuals from highly divergent lines created by artificial selection are chosen from generation 25 as parents (b) The thick bars represent the genomes of individuals selected from the divergent lines as parents These individuals are crossed to produce an F1 generation (not shown) An F2 generation is produced by crossing members of the F1 As a result of crossing over, individual members of the F2 generation carry different portions of the P1 genome, as shown by the colored segments of the thick bars DNA markers and phenotypes in individuals of the F2 generation are analyzed (c) Statistical methods are used to determine the probability that a DNA marker is associated with a QTL that affects the phenotype The results are plotted as the likelihood of association against chromosomal location Units on genetic maps are measured in centimorgans (cM), determined by crossover frequencies Peaks above the horizontal line represent significant results The data show five possible QTLs, with the most significant findings at about 10 cM and 60 cM generations that carry different portions of the parental genomes [Figure 21–7(b)] with different QTL genotypes and associated phenotypes This segregating F is known as the QTL mapping population Researchers then measure phenotypic expression of the trait among individuals in the mapping population and identify genomic differences among individuals by using chromosome-specific DNA markers such as restriction fragment length polymorphisms (RFLPs), microsatellites, and single-nucleotide polymorphisms (SNPs) (see Chapter 18) Computer-based statistical analysis is used to search for linkage between the markers and a component of phenotypic variation associated with the trait If a DNA marker (such as those markers described above) is not linked to a QTL, then the phenotypic mean score for the trait will not vary among individuals with different genotypes at that marker locus However, if a DNA marker is linked to a QTL, then different genotypes at that marker locus will also differ in their phenotypic expression of the trait When this occurs, the marker locus and the QTL are said to cosegregate Consistent cosegregation establishes the presence of a QTL at or near the DNA marker along the chromosome—in other words, the marker and QTL are linked When numerous QTLs for a given trait have been located, a genetic map is created, showing the probability that specific chromosomal regions are associated with the phenotype of interest [Figure 21–7(c)] Further research using genomic techniques identifies genes in these regions that contribute to the phenotype QTL mapping has been used extensively in agriculture, including plants such as corn, rice, wheat, and tomatoes (Table 21.5), and livestock such as cattle, pigs, sheep, and chickens Tomatoes are one of the world’s major vegetable crops, and hundreds of varieties are grown and harvested each year To aid in the creation of new varieties, hundreds of QTLs for traits including fruit size, shape, soluble solid content, and acidity have been identified and mapped to all 12 chromosomes in the tomato haploid genome In addition, 452 21 TAB L E QTLs for Quantitative Phenotypes QUAN TITATIVE GEN ETICS AND MU LTIFAC TORIA L TRA IT S Organism Quantitative Phenotype Tomato Soluble solids Fruit mass Fruit pH Growth Leaflet shape Height Height Leaf length Grain yield Number of ears Maize QTLs Identified 7 13 9 5 9 9 11 7 18 10 Source: Used with permission of Annual Reviews of Genetics, from “Mapping Polygenes” by S.D Tanksley, Annual Review of Genetics, Vol 27:205–233, Table 1, December 1993 © Annual Reviews, Inc the genomes of several tomato varieties have been sequenced We will describe studies focused on quantitative traits controlling fruit shape and weight as an example of QTL research While the cultivated tomato can weigh up to 1000 grams, fruit from the related wild species thought to be the ancestor of the modern tomato weighs only a few grams [Figure 21–8(a)] In a study by Steven Tanksley, QTL mapping has identified more than 28 QTLs related to this thousand-fold variation in fruit weight More than ten years of work was required to localize, identify, and clone one of these QTLs, called fw2.2 (on chromosome 2) Within this QTL, a specific gene, ORFX, has been identified, and alleles at this locus are responsible for about 30 percent of the variation in fruit weight The ORFX gene has been isolated, cloned, and transferred between plants, with interesting results One allele of ORFX is present in all wild small-fruited varieties of tomatoes investigated, while another allele is present in all domesticated large-fruited varieties When a cloned ORFX gene from small-fruited varieties is transferred to a plant that normally produces large tomatoes, the transformed plant produces fruits that are greatly reduced in weight In the varieties studied by Tanksley’s group, the reduction averaged 17 grams, a statistically significant phenotypic change caused by the action of a gene found within a single QTL F I G U RE – A theoretical wild species of tomato, similar in size to the tomato on the left, is regarded as the ancestor of all modern tomatoes, including the beefsteak tomato shown at the right Further analysis of ORFX revealed that this gene encodes a protein that negatively regulates cell division during fruit development Differences in the time of gene expression and differences in the amount of transcript produced lead to small or large fruit Higher levels of expression mediated by transferred ORFX alleles exert a negative control over cell division, resulting in smaller tomatoes Yet ORFX and other related genes cannot account for all the observed variation in tomato size Analysis of another QTL, fas (located on chromosome 11), indicates that the development of extreme differences in fruit size resulting from artificial selection also involves an increase in the number of compartments in the mature fruit The small, ancestral stocks produce fruit with two to four seed compartments, but the large-fruited present-day strains have eight or more compartments Thus, the QTLs that affect fruit size in tomatoes work by controlling at least two developmental processes: cell division early in development and the determination of the number of ovarian compartments The discovery that QTLs can control levels of gene expression has led to new, molecular definitions of phenotypes associated with quantitative traits For example, the phenotype investigated may be the amount of an RNA transcript produced by a gene (expression QTLs, or eQTLs), or the amount of protein produced (protein QTLs, or pQTLs) These molecular phenotypes are polygenically controlled in the same way as more conventional phenotypes, such as fruit weight Gene expression, for example, is controlled by cis factors, including promoters, and by trans-acting transcription factors (see Chapter 15 for a discussion of gene regulation in eukaryotes) eQTLs and other genomic techniques such as genome editing may be necessary tools to develop new varieties Studies indicate that intense artificial selection over the last 300 years has resulted in fixation of large portions of the genome and the accompanying loss of 95% of the tomato’s genetic diversity Expression QTLs (eQTLs) and Genetic Disorders We conclude this chapter by discussing the role that variation in the levels of gene expression plays in the phenotypic variation observed in complex disorders The ability to study gene expression (eQTLs) and gene variability in the same individual helps identify gene/disease associations and the network of genes controlling those disorders This approach has identified genes responsible for complex diseases such as asthma, cleft lip, Type diabetes, and coronary artery disease Asthma cases have risen dramatically over the last three decades and it is now a major public health concern Genome-wide association studies (GWAS) have identified loci that confer susceptibility to asthma; however, the functions of many of these genes are unknown, and GWAS alone are unable to establish which alleles of these loci are responsible for susceptibility or the mechanism of their action To identify genes directly involved in asthma susceptibility, researchers collected lung specimens from over 1000 G E NE TIC S , TE CHNOLOG Y, AN D S O CI ET Y individuals and used lung-specific gene expression as a phenotype to study how genetic variants (DNA polymorphisms) are linked to both gene expression (eQTLs) and the asthma phenotype Integration of the GWAS and the eQTL data identified a network of 34 genes that constitute the most likely gene set that causes asthma In addition, six other genes were identified as drivers that control the other genes in the network These driver genes are candidates for drug discovery studies to develop therapies for this chronic and sometimes fatal disease 453 Similar approaches have identified candidate susceptibility genes and the outline of gene networks in schizophrenia and in Parkinson disease The expanded use of eQTL analysis is expected to rapidly advance our knowledge of the genes responsible for other complex genetic disorders ESSEN T IAL POIN T Quantitative trait loci, or QTLs, may be identified and mapped using DNA markers G E N E T I C S , T E C H N O L O G Y, A N D S O C I E T Y The Green Revolution Revisited: Genetic Research with Rice O f the more than billion people now living on Earth, over 800 million not have enough to eat That number is expected to grow by an additional million people each year for the next several decades How will we be able to feed the estimated billion people on Earth by 2025? The past gives us some reasons to be optimistic In the 1950s and 1960s, plant scientists set about to increase the production of crop plants, including the three most important grains—rice, wheat, and maize These efforts became known as the Green Revolution The approach was three-fold: (1) to increase the use of fertilizers, pesticides, and irrigation; (2) to bring more land under cultivation; and (3) to develop improved varieties of crop plants by intensive plant breeding The results were dramatic Developing nations more than doubled their production of rice, wheat, and maize between 1961 and 1985 The Green Revolution saved millions of people from starvation and improved the quality of life for millions more; however, its effects may be diminishing The rate of increase in grain yields has slowed since the 1980s If food production is to keep pace with the projected increase in the world’s population, we will have to depend more on the genetic improvement of crop plants to provide higher yields About half of the Earth’s population depends on rice for basic nourishment The Green Revolution for rice began in 1960, aided by the establishment of the International Rice Research Institute (IRRI) One of their major developments was the breeding of a rice variety with improved disease resistance and higher yield The IRRI research team crossed a Chinese rice variety (Dee-geo-woo-gen) and an Indonesian variety (Peta) to create a new cultivar known as IR8 IR8 produced a greater number of rice kernels per plant However, IR8 plants were so topheavy with grain that they tended to fall over—a trait called “lodging.” To reduce lodging, IRRI breeders crossed IR8 with a dwarf native variety to create semi-dwarf lines Due in part to the adoption of the semi-dwarf IR8 lines, the world production of rice doubled in 25 years Predictions suggest that a 40 percent increase in the annual rice harvest may be necessary to keep pace with anticipated population growth during the next 30 years Greater emphasis will be placed on creating new rice varieties that have even higher yields and greater disease resistance In addition to conventional hybridization and selection techniques, dozens of quantitative trait loci (QTLs) from wild rice appear to contribute to increased yields, and scientists are attempting to introduce these traits into current dwarf varieties of domestic rice Genomics and genetic engineering are also contributing to the new Green Revolution for rice In 2002, the rice genome was the first cereal crop genome to be sequenced As useful genes are identified, they may be transferred into rice plants with the goal of improving disease resistance, tolerance to drought and salinity, and nutritional content Your Turn T ake time, individually or in groups, to answer the following questions Investigate the references and links to help you understand some of the technologies and issues surrounding the new Green Revolution Almost half of the world’s rice is 1. grown in soils that are poor in nutrients, subject to flooding or drought, or farmed by people unable to afford fertilizers Can genetic research address these limitations? One recent study indicates that a QTL in a wild rice variety may confer tolerance to several of these factors Read about the promising research on the PSTOL1 gene in Gamuyao, R et al 2012 The protein kinase Pstol1 from traditional rice confers tolerance of phosphorus deficiency Nature 488: 535–539 2. Despite its benefits, the Green Revolution has been the subject of controversy What are the main criticisms of the Green Revolution, and how can we mitigate some of the Green Revolution’s negative aspects? A discussion of this topic can be found in Pingali, P.L 2012 Green Revolution: Impacts, limits, and the path ahead Proc Natl Acad Sci USA 109 (31): 12302–12308 21 454 QUAN TITATIVE GEN ETICS AND MU LTIFAC TORIA L TRA IT S CASE STUDY Tissue-specific eQTLs A n eQTL study was carried out using colon and rectal biopsies, collected endoscopically from 65 controls and patients with inflammatory bowel disease (IBD) and colorectal cancer RNA was extracted from each biopsy, and the gene expression was measured using microarray chipsets Genomic DNA was genotyped with 730,525 SNPs to measure genetic variation throughout the genome The linkage between gene expression and genetic variation was calculated in the patients and controls The study identified 1312 independent eQTLs associated with the differential expression of 1222 genes in rectal tissues 26% of these were novel and unique, compared to the previous GWAS and eQTL studies for IBD carried out on general tissues, either lymphoblastoid cell lines or blood An examination of 163 IBD risk loci identified 11 SNPs that were rectal eQTLs A colorectal cancer locus at 11q23 contained an eQTL associated with COLCA2, a protein implicated in colon cancer What is an eQTL, and what does it mean in the context of this study? Why you think it is important to measure gene expression in the tissues relevant to the disease, not just in general tissues like blood, in eQTL studies? How can knowing which gene networks show differential levels of expression in IBD and colorectal cancer patients help us to treat these complex diseases, both now and in the future? INSIGHTS AND SOLUTIONS In a certain plant, height varies from to 36 cm When 6-cm and 36-cm plants were crossed, all F1 plants were 21 cm In the F2 generation, a continuous range of heights was observed Most were around 21 cm, and of 200 were as short as the 6-cm P1 parent (a) What mode of inheritance does this illustrate, and how many gene pairs are involved? (b) How much does each additive allele contribute to height? Solution: When testcrossed (with aabbcc), the unknown plant must be able to contribute either one, two, or three additive alleles in its gametes in order to yield the three phenotypes in the offspring Since no 6-cm offspring are observed, the unknown plant never contributes all nonadditive alleles (abc) Only plants that are homozygous at one locus and heterozygous at the other two loci will meet these criteria Therefore, the unknown parent can be any of three genotypes, all of which have a phenotype of 26 cm: (c) List all genotypes that give rise to plants that are 31 cm Solution: (a) Polygenic inheritance is illustrated when a trait is continuous and when alleles contribute additively to the phenotype The 3>200 ratio of F2 plants is the key to determining the number of gene pairs This reduces to a ratio of 1>66.7, very close to 1>64 Using the formula 1>4n = 1>64 (where 1>64 is equal to the proportion of F2 phenotypes as extreme as either P1 parent), n = Therefore, three gene pairs are involved AABbCc AaBbCC AaBBCc For example, in the first genotype (AABbCc), AABbCc * aabbcc ∂ 1>4 AaBbCc 21 cm (b) The variation between the two extreme phenotypes is 1>4 AaBbcc 16 cm 36 - = 30 cm 1>4 AabbCc 16 cm Because there are six potential additive alleles (AABBCC), each contributes 30>6 = cm to the base height of cm, which results when no additive alleles (aabbcc) are part of the genotype (c) All genotypes that include five additive alleles will be 31 cm (5 alleles * cm/allele + cm base height = 31 cm) Therefore, AABBCc, AABbCC, and AaBBCC are the genotypes that will result in plants that are 31 cm A plant of unknown phenotype and genotype from the population described above (1.) was testcrossed, with the following results 1>4 11 cm 2>4 16 cm 1>4 21 cm An astute genetics student realized that the unknown plant could be only one phenotype but could be any of three genotypes What were they? 1>4 Aabbcc 11 cm which is the ratio of phenotypes observed The mean and variance of corolla length in two highly inbred strains of Nicotiana and their progeny are shown in the following table One parent (P1) has a short corolla, and the other parent (P2) has a long corolla Calculate the broad-sense heritability (H2) of corolla length in this plant Strain P1 short P2 long F1 (P1 * P2) F2 (F1 * F1) Mean (mm) Variance (mm) 40.47 93.75 63.90 68.72 3.12 3.87 4.74 47.70 Solution: The formula for estimating heritability is H2 = VG /VP, where VG and VP are the genetic and phenotypic components of variation, respectively The main issue in this problem is obtaining some estimate of two components of PROBLE MS AND DIS C U S S IO N Q U ES T I O N S 455 Insights and Solutions—continued phenotypic variation: genetic and environmental factors VP is the combination of genetic and environmental variance Because the two parental strains are true breeding, they are assumed to be homozygous, and the variance of 3.12 and 3.87 is considered to be the result of environmental influences The average of these two values is 3.50 The F1 is also genetically homogeneous and gives us an additional estimate of the impact of environmental factors By averaging this value along with that of the parents, we obtain a relatively good idea of environmental impact on the phenotype The phenotypic variance in the F2 is the sum of the genetic (VG) and environmental (VE) components We have estimated the environmental input as 4.12, so 47.70 minus 4.12 gives us an estimate of VG of 43.58 Heritability then becomes 43.58/47.70, or 0.91 This value, when interpreted as a percentage, indicates that about 91 percent of the variation in corolla length is due to genetic influences 4.74 + 3.50 = 4.12 Problems and Discussion Questions HOW DO WE KNOW ? In this chapter, we focused on a mode of inheritance referred to as quantitative genetics, as well as many of the statistical parameters utilized to study quantitative traits Along the way, we found opportunities to consider the methods and reasoning by which geneticists acquired much of their understanding of quantitative genetics From the explanations given in the chapter, what answers would you propose to the following fundamental questions: (a) How can we ascertain the number of polygenes involved in the inheritance of a quantitative trait? (b) What findings led geneticists to postulate the multiplefactor hypothesis that invoked the idea of additive alleles to explain inheritance patterns? (c) How we assess environmental factors to determine if they impact the phenotype of a quantitatively inherited trait? (d) How we know that monozygotic twins are not identical genotypically as adults? CONCEPT QUESTION Review the Chapter Concepts list on page 438 These all center on quantitative inheritance and the study and analysis of polygenic traits Write a short essay that discusses the difference between the more traditional Mendelian and Neomendelian modes of inheritance (qualitative inheritance) and quantitative inheritance Define the following: (a) polygenic, (b) additive alleles, (c) monozygotic and dizygotic twins, (d) heritability, and (e) QTL A dark-red strain and a white strain of wheat are crossed and produce an intermediate, medium-red F1 When the F1 plants are interbred, an F2 generation is produced in a ratio of darkred: medium-dark-red: medium-red: light-red: white Further crosses reveal that the dark-red and white F2 plants are true breeding (a) Based on the ratios in the F2 population, how many genes are involved in the production of color? (b) How many additive alleles are needed to produce each possible phenotype? (c) Assign symbols to these alleles and list possible genotypes that give rise to the medium-red and light-red phenotypes (d) Predict the outcome of the F1 and F2 generations in a cross between a true-breeding medium-red plant and a white plant Visit for instructor-assigned tutorials and problems Height in humans depends on the additive action of genes Assume that this trait is controlled by the four loci R, S, T, and U and that environmental effects are negligible Instead of additive versus nonadditive alleles, assume that additive and partially additive alleles exist Additive alleles contribute two units, and partially additive alleles contribute one unit to height (a) Can two individuals of moderate height produce offspring that are much taller or shorter than either parent? If so, how? (b) If an individual with the minimum height specified by these genes marries an individual of intermediate or moderate height, will any of their children be taller than the tall parent? Why or why not? An inbred strain of plants has a mean height of 24 cm A second strain of the same species from a different geographical region also has a mean height of 24 cm When plants from the two strains are crossed together, the F1 plants are the same height as the parent plants However, the F2 generation shows a wide range of heights; the majority are like the P1 and F1 plants, but approximately of 1000 are only 12 cm high, and about of 1000 are 36 cm high (a) What mode of inheritance is occurring here? (b) How many gene pairs are involved? (c) How much does each gene contribute to plant height? (d) Indicate one possible set of genotypes for the original P1 parents and the F1 plants that could account for these results (e) Indicate three possible genotypes that could account for F2 plants that are 18 cm high and three that account for F2 plants that are 33 cm high Erma and Harvey were a compatible barnyard pair, but a curious sight Harvey’s tail was only cm long, while Erma’s was 30 cm Their F1 piglet offspring all grew tails that were 18 cm When inbred, an F2 generation resulted in many piglets (Erma and Harvey’s grandpigs), whose tails ranged in 4-cm intervals from to 30 cm (6, 10, 14, 18, 22, 26, and 30) Most had 18-cm tails, while 1/64 had 6-cm tails and 1/64 had 30-cm tails (a) Explain how these tail lengths were inherited by describing the mode of inheritance, indicating how many gene pairs were at work, and designating the genotypes of Harvey, Erma, and their 18-cm-tail offspring (b) If one of the 18-cm F1 pigs is mated with one of the 6-cm F2 pigs, what phenotypic ratio would be predicted if many offspring resulted? Diagram the cross 456 21 QUAN TITATIVE GEN ETICS AND MU LTIFAC TORIA L TRA IT S In the following table, average differences of height, weight, and fingerprint ridge count between monozygotic twins (reared together and apart), dizygotic twins, and nontwin siblings are compared: Trait MZ Reared Together MZ Reared Apart DZ Reared Together Sibs Reared Together 1.7 1.9 0.7 1.8 4.5 0.6 4.4 4.5 2.4 4.5 4.7 2.7 Height (cm) Weight (kg) Ridge count Based on the data in this table, which of these quantitative traits has the highest heritability values? Define the term broad-sense heritability (H2) What is implied by a relatively high value of H2? Express aspects of broad-sense heritability in equation form 10 Describe the value of using twins in the study of questions relating to the relative impact of heredity versus environment 11 Corn plants from a test plot are measured, and the distribution of heights at 10-cm intervals is recorded in the following table: Height (cm) Plants (no.) 100 110 120 130 140 150 160 170 180 20 60 90 130 180 120 70 50 40 Calculate (a) the mean height, (b) the variance, (c) the standard deviation, and (d) the standard error of the mean Plot a rough graph of plant height against frequency Do the values represent a normal distribution? Based on your calculations, how would you assess the variation within this population? 12 The following variances were calculated for two traits in a herd of hogs Trait Back fat Body length VP VG VA 30.6 52.4 12.2 26.4 8.44 11.70 (a) Calculate broad-sense (H2) and narrow-sense (h2) heritabilities for each trait in this herd (b) Which of the two traits will respond best to selection by a breeder? Why? 13 The mean and variance of plant height of two highly inbred strains (P1 and P2) and their progeny (F1 and F2) are shown here Strain Mean (cm) Variance P1 P2 F1 F2 34.2 55.3 44.2 46.3 4.2 3.8 5.6 10.3 Calculate the broad-sense heritability (H2) of plant height in this species 14 A hypothetical study investigated the vitamin A content and the cholesterol content of eggs from a large population of chickens The variances (V) were calculated, as shown below: Variance Vitamin A Cholesterol VP VE VA VD 123.5 96.2 12.0 15.3 862.0 484.6 192.1 185.3 (a) Calculate the narrow-sense heritability (h2) for both traits (b) Which trait, if either, is likely to respond to selection? 15 If one is attempting to determine the influence of genes or the environment on phenotypic variation, inbred strains with individuals of a relatively homogeneous or constant genetic background are often used Variation observed between different inbred strains reared in a constant or homogeneous environment would likely be caused by genetic factors What would be the source of variation observed among members of the same inbred strain reared under varying environmental conditions? 16 A population of laboratory mice was weighed at the age of six weeks (full adult weight) and found to have a mean weight of 20 g The narrow heritability of weight gain (h2) is known to be 0.25 in this laboratory strain If mice weighing 24 g are selected and mated at random, what is the expected mean weight of the next generation? 17 If the experiment was repeated by mating mice g lighter than the mean (16 g), what would be the result? If repeated experiments were carried out always selecting for mice that were g lighter than the mean in the current generation, what would eventually happen? 18 In a herd of Texas Longhorn cattle, the mean horn length from tip to tip is 52" and h2 is 0.2 Predict the mean horn length if cattle with horns 61" long are interbred 19 In a population of 100 inbred, genotypically identical rice plants, variance for grain yield is 4.67 What is the heritability for yield? Would you advise a rice breeder to improve yield in this strain of rice plants by selection? 20 A 3-inch plant was crossed with a 15-inch plant, and all F1 plants were inches The F2 plants exhibited a “normal distribution,” with heights of 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15 inches (a) What ratio will constitute the “normal distribution” in the F2? (b) What will be the outcome if the F1 plants are testcrossed with plants that are homozygous for all nonadditive alleles? 21 Two different crosses were set up between carrots (Daucus carota) of different colors and carotenoid content (Santos, Carlos A F and Simon, Philipp W 2002 Horticultura Brasileira 20) Analyses of the F2 generations showed that four loci are associated with the a carotene content of carrots, with a broad-sense heritability of 90% How many distinct phenotypic categories and genotypes would be seen in each F2 generation, and what does a broad-sense heritability of 90% mean for carrot horticulture? 22 While most quantitative traits display continuous variation, there are others referred to as “threshold traits” that are distinguished by having a small number of discrete phenotypic classes For example, Type diabetes (adult-onset diabetes) is considered to be a polygenic trait, but demonstrates only two phenotypic classes: individuals who develop the disease and those who not Theorize how a threshold trait such as Type diabetes may be under the control of many polygenes, but express a limited number of phenotypes 22 Population and Evolutionary Genetics CHAPTER CONCEPTS ■■ Most populations and species harbor considerable genetic variation ■■ This variation is reflected in the alleles distributed among populations of a species ■■ The relationship between allele frequencies and genotype frequencies in an ideal population is described by the Hardy–Weinberg law ■■ Selection, migration, and genetic drift can cause changes in allele frequency ■■ Mutation creates new alleles in a population gene pool ■■ Nonrandom mating changes population genotype frequency but not allele frequency ■■ A reduction in gene flow between populations, accompanied by selection or genetic drift, can lead to reproductive isolation and speciation ■■ Genetic differences between populations or species are used to reconstruct evolutionary history These ladybird beetles, from the Chiricahua Mountains in Arizona, show considerable phenotypic variation I n the mid-nineteenth century, Alfred Russel Wallace and Charles Darwin identified natural selection as the mechanism of evolution In his book, On the Origin of Species, published in 1859, Darwin provided evidence that populations and species are not fixed, but change, or evolve, over time as a result of natural selection However, Wallace and Darwin could not explain either the origin of the variations that provide the raw material for evolution or the mechanisms by which such variations are passed from parents to offspring Gregor Mendel published his work on the inheritance of traits in 1866, but it received little notice at the time The rediscovery of Mendel’s work in 1900 began a 30-year effort to reconcile Mendel’s concept of genes and alleles with the theory of evolution by natural selection As twentiethcentury biologists applied the principles of Mendelian genetics to populations, both the source of variation (mutation and recombination) and the mechanism of inheritance (segregation of alleles) were explained We now view evolution as a consequence of changes in genetic material through mutation and changes in allele frequencies in populations over time This union of population genetics with the theory of natural selection generated a new view of the evolutionary process, called neo-Darwinism In addition to natural selection, other forces including mutation, migration, and drift, individually and collectively, alter allele frequencies and bring about evolutionary divergence that eventually may result in speciation, the formation of new species Speciation is facilitated by environmental diversity If a population is spread over a geographic range encompassing a 457 458 22 Po pul atio n and Evolut iona ry G enetics number of ecologically distinct subenvironments with different selection pressures, the populations occupying these areas may gradually adapt and become genetically distinct from one another Genetically differentiated populations may remain in existence, become extinct, reunite with each other, or continue to diverge until they become reproductively isolated Populations that are reproductively isolated are regarded as separate species Genetic changes within populations can modify a species over time, transform it into another species, or cause it to split into two or more species Population geneticists investigate patterns of genetic variation within and among groups of interbreeding individuals As changes in genetic structure form the basis for evolution of a population, population genetics has become an important subdiscipline of evolutionary biology In this chapter, we examine the population genetics processes of microevolution—defined as evolutionary change within populations of a species—and then consider how molecular aspects of these processes can be extended to macroevolution—defined as evolutionary events leading to the emergence of new species and other taxonomic groups 22.1 Genetic Variation Is Present in Most Populations and Species A population is a group of individuals belonging to the same species that live in a defined geographic area and actually or potentially interbreed In thinking about the human population, we can define it as everyone who lives in the United States, or in Sri Lanka, or we can specify a population as all the residents of a particular small town or village The genetic information carried by members of a population constitutes that population’s gene pool At first glance, it might seem that a population that is welladapted to its environment must have a gene pool that is highly homozygous because it would seem likely that the most favorable allele at each locus is present at a high frequency In addition, a look at most populations of plants and animals reveals many phenotypic similarities among individuals However, a large body of evidence indicates that, in reality, most populations contain a high degree of heterozygosity This built-in genetic variation is not necessarily apparent in the phenotype; hence, detecting it is not a simple task Nevertheless, the amount of variation within a population can be revealed by several methods Detecting Genetic Variation The detection and use of genetic variation in individuals and populations began long before genetics emerged as a FIGUR E 22–1 The size difference between a chihuahua and a Great Dane illustrates the high degree of genetic variation present in the dog genome science Millennia ago, plant and animal breeders began using artificial selection to domesticate plants and animals However, as genetic technology developed in the last century, the ability to detect and quantify genetic variation in genes, in individual genomes, and in the genomes of populations has grown exponentially One of the more spectacular examples of how much variation exists in the gene pool of a species was the use of selective breeding to create hundreds of dog breeds in nineteenth-century England over a period of less than 75 years Many people, seeing a chihuahua (about 10 inches high) and a Great Dane (about 42 inches high) for the first time, might find it difficult to believe they are both members of the same species (Figure 22–1) Recombinant DNA Technology and Genetic Variation After the discovery that DNA carries genetic information and the development of recombinant DNA technology, efforts centered on detecting genetic variation in the sequence of individual genes carried by individuals in a population In one such study, Martin Kreitman isolated, cloned, and sequenced copies of the alcohol dehydrogenase (Adh) gene from individuals representing five different populations of Drosophila melanogaster The 11 cloned genes from these five populations contained a total of 43 nucleotide differences in the Adh sequence of 2721 base pairs (Figure 22–2) These variations are distributed throughout the gene: 14 in exon coding regions, 18 within introns, and 11 in untranslated flanking regions Of the 14 variations in exons, only one leads to an amino acid replacement—the one in codon 192, resulting in the two known alleles of this gene The other 13 nucleotide changes not lead to amino acid replacements and are silent variations of this gene 22.2 Exon Consensus Adh sequence: C C C C THE HARDY–WEINBERG LAW DESCRIBES ALLELE FREQUENCIES AND GENOT YPE FREQUENCIES Intron Exon GGA A T C T C C A*C T A G Strain Wa-S T T • A CA • T A AC • • • • • • • Fl1-S T T • A CA • T A AC • • • • • • • Ja-S • • • • • • • • • • • • T • T • CA Fl-F • • • • • • • • • • • GT C T CC • Ja-F • • A • • • G • • • • GT C T CC • F I G U RE 2 – DNA sequence variation in parts of the Drosophila Adh gene in a sample of the 11 laboratory strains derived from the five natural populations The dots represent nucleotides that are the same as the consensus sequence; letters represent nucleotide polymorphisms An A/C polymorphism (A*) in codon 192 creates the two Adh alleles (F and S) All other polymorphisms are silent or noncoding Genetic Variation in Genomes The development of next-generation sequencing technology has extended the detection of genomic variation from individuals to populations The 1000 Genomes Project, started in 2008, is a global effort with the goal of identifying and cataloging 95 percent of the common genetic variations carried by the billion people now inhabiting the planet The Project’s three pilot studies combined lowcoverage whole-genome sequencing, exome sequencing of selected protein-coding regions, and deep sequencing of selected parents and one of their children Studying 1000 genomes is just a start; eventually, the Project intends to sequence the genomes of 2500 individuals from 27 population groups worldwide To date, the pilot studies have identified 15 million single-nucleotide polymorphisms (SNPs), million short insertions and/or deletions (indels), and 20,000 large structural variants in the human genome In addition, many individuals were heterozygous for 250 to 300 genes with loss-of-function mutations, 20 to 40 percent of which are known to be associated with genetic disorders The Project’s overall goal is to explore and understand the relationship between genotype and phenotype In humans, this translates into identifying variants associated with disease For example, in studies to date, no single variant has been associated with diabetes; this implies that a combination of heritable multiple rare variants is related to this common disorder Eventually, researchers hope to associate specific genetic variants with cellular pathways and networks associated with complex disorders such as hypertension, cardiovascular disease, and neurological disorders associated with protein accumulation such as Alzheimer disease and Huntington disease 459 ESSEN T IAL PO IN T Genetic variation is widespread in most populations and provides a reservoir of alleles that serve as the basis for evolutionary changes within the population 22.2 The Hardy–Weinberg Law Describes Allele Frequencies and Genotype Frequencies in Population Gene Pools Often when we examine a single gene in a population, we find that combinations of the alleles of this gene result in individuals with different genotypes For example, two alleles (A and a) of the A gene can be combined to produce three genotypes: AA, Aa, and aa Key elements of population genetics depend on the calculation of allele frequencies and genotype frequencies in a gene pool, and the determination of how these frequencies change from one generation to the next Population geneticists use these calculations to answer questions such as: How much genetic variation is present in a population? Are genotypes randomly distributed in time and space, or discernible patterns exist? What processes affect the composition of a population’s gene pool? Do these processes produce genetic divergence among populations that may lead to the formation of new species? Changes in allele frequencies in a population that not directly result in species formation are examples of microevolution In the following sections, we will discuss microevolutionary changes in population gene pools, and in later sections, we will consider macroevolution and the process of speciation The relationship between the relative proportions of alleles in the gene pool and the frequencies of different genotypes in the population was elegantly described in a mathematical model This model, called the Hardy–Weinberg law, describes what happens to allele and genotype frequencies in an “ideal” population that is infinitely large and randomly mating, and that is not subject to any evolutionary forces such as mutation, migration, or selection Calculating Genotype Frequencies The Hardy–Weinberg model uses the principle of Mendelian segregation and simple probability to explain the relationship between allele and genotype frequencies in a population We can demonstrate how this works by considering a single autosomal gene with two alleles, A and a, in a population where the frequency of A is 0.7 and the frequency of a is 0.3 Note that 0.7 + 0.3 = 1, indicating that all the alleles of gene A present in the population are accounted for These allele frequencies mean that the probability that any female gamete will contain A is 0.7, and the probability 460 22 Po pul atio n and Evolut iona ry G enetics Sperm fr(A) = 0.7 fr(A) = 0.7 fr(a) = 0.3 fr(AA) = 0.7 * 0.7 = 0.49 fr(Aa) = 0.7 * 0.3 = 0.21 fr(aA) = 0.3 * 0.7 = 0.21 fr(aa) = 0.3 * 0.3 = 0.09 Eggs fr(a) = 0.3 F I G U RE 2 – Calculating genotype frequencies from allele frequencies Gametes represent samples drawn from the gene pool to form the genotypes of the next generation In this population, the frequency of the A allele is 0.7, and the frequency of the a allele is 0.3 The frequencies of the genotypes in the next generation are calculated as 0.49 for AA, 0.42 for Aa, and 0.09 for aa Under the Hardy–Weinberg law, the frequencies of A and a remain constant from generation to generation that a male gamete will contain A is also 0.7 The probability that both gametes will contain A is 0.7 * 0.7 = 0.49 Thus we predict that in the offspring, the genotype AA will occur 49 percent of the time The probability that a zygote will be formed from a female gamete carrying A and a male gamete carrying a is 0.7 * 0.3 = 0.21, and the probability of a female gamete carrying a being fertilized by a male gamete carrying A is 0.3 * 0.7 = 0.21, so the frequency of genotype Aa in the offspring is 0.21 + 0.21 = 0.42 = 42 percent Finally, the probability that a zygote will be formed from two gametes carrying a is 0.3 * 0.3 = 0.09, so the frequency of genotype aa is percent As a check on our calculations, note that 0.49 + 0.42 + 0.09 = 1.0, confirming that we have accounted for all possible genotypic combinations in the zygotes These calculations are summarized in Figure 22–3 Calculating Allele Frequencies Now that we know the frequencies of genotypes in the next generation, what will be the allele frequencies in this new generation? Under the Hardy–Weinberg law, we assume that all genotypes have equal rates of survival and reproduction This means that in the next generation, all genotypes contribute equally to the new gene pool The AA individuals constitute 49 percent of the population, and we can predict that the gametes they produce will constitute 49 percent of the gene pool These gametes all carry allele A Similarly, Aa individuals constitute 42 percent of the population, so we predict that their gametes will constitute 42 percent of the new gene pool Half (0.5) of these gametes will carry allele A Thus, the frequency of allele A in the gene pool is 0.49 + (0.5) 0.42 = 0.7 The other half of the gametes produced by Aa individuals will carry allele a The aa individuals constitute percent of the population, so their gametes will constitute percent of the new gene pool All these gametes carry allele a Thus, we can predict that the allele a in the new gene pool is (0.5) 0.42 + 0.09 = 0.3 As a check on our calculation, note that 0.7 + 0.3 = 1.0, accounting for all of the gametes in the gene pool of the new generation The Hardy–Weinberg Law and Its Assumptions Because the Hardy–Weinberg law is a mathematical model, we use variables instead of numerical values for the allele frequencies in the general case Imagine a gene pool in which the frequency of allele A is represented by p and the frequency of allele a is represented by q, such that p + q = If we randomly draw male and female gametes from the gene pool and pair them to make a zygote, the probability that both will carry allele A is p * p Thus, the frequency of genotype AA among the zygotes is p2 The probability that the female gamete carries A and the male gamete carries a is p * q, and the probability that the female gamete carries a and the male gamete carries A is q * p Thus, the frequency of genotype Aa among the zygotes is 2pq Finally, the probability that both gametes carry a is q * q, making the frequency of genotype aa among the zygotes q2 Therefore, the distribution of genotypes among the zygotes is p2 + 2pq + q2 = These calculations are summarized in Figure 22–4 They demonstrate the two main predictions of the Hardy–Weinberg model: Allele frequencies in our population not change from one generation to the next After one generation of random mating, genotype frequencies can be predicted from the allele frequencies In other words, there is no change in allele frequency, and for this locus, the population does not undergo any Sperm fr(A) = p fr(a) = q fr(A) = p fr(AA) = p2 fr(Aa) = pq fr(aA) = qp fr(aa) = q Eggs fr(a) = q FIGUR E 22–4 The general description of allele and genotype frequencies under Hardy–Weinberg assumptions The frequency of allele A is p, and the frequency of allele a is q After mating, the three genotypes AA, Aa, and aa have the frequencies p2, 2pq, and q2, respectively 22.3 T he Hardy–W einberg Law Can Be App li ed to Hu man P op u l at i ons CCR5 microevolutionary change The theoretical population described by the Hardy–Weinberg model is based on the following assumptions: Individuals of all genotypes have equal rates of survival and equal reproductive success—that is, there is no selection No new alleles are created or converted from one allele into another by mutation Individuals not migrate into or out of the population The population is infinitely large, which in practical terms means that the population is large enough that sampling errors and other random effects are negligible Individuals in the population mate randomly These assumptions are what make the Hardy–Weinberg model so useful in population genetics research By specifying the conditions under which the population does not evolve, the Hardy–Weinberg model can be used to identify the real-world forces that cause allele frequencies to change Application of this model can also reveal “neutral genes” in a population gene pool—those not being operated on by the forces of evolution The Hardy–Weinberg model has three additional important consequences: Dominant traits not necessarily increase from one generation to the next Genetic variability can be maintained in a population, since, once established in an ideal population, allele frequencies remain unchanged Under Hardy–Weinberg assumptions, knowing the frequency of just one genotype enables us to calculate the frequencies of all other genotypes at that locus This is particularly useful in human genetics because we can calculate the frequency of heterozygous carriers for recessive genetic disorders even when all we know is the frequency of affected individuals 22–1 The ability to taste the compound PTC is controlled by a dominant allele T, while individuals homozygous for the recessive allele t are unable to taste PTC In a genetics class of 125 students, 88 can taste PTC and 37 cannot Calculate the frequency of the T and t alleles in this population and the frequency of the genotypes 461 ¢32 FIGUR E 22–5 Organization of the CCR5 gene in region 3p21.3 of human chromosome The gene contains exons and introns (there is no intron between exons and 3) The arrow shows the location of the 32-bp deletion in exon that confers resistance to HIV-1 infection 22.3 The Hardy–Weinberg Law Can Be Applied to Human Populations To show how allele frequencies are measured in a real population, let’s consider a gene that influences an individual’s susceptibility to infection by HIV-1, the virus responsible for AIDS (acquired immunodeficiency syndrome) A small number of individuals who make high-risk choices (such as unprotected sex with HIV-positive partners) never become infected Some of these individuals are homozygous for a mutant allele of a gene called CCR5 The CCR5 gene (Figure 22–5) encodes a protein called the C-C chemokine receptor-5, often abbreviated CCR5 Chemokines are signaling molecules associated with the immune system The CCR5 protein is also used by strains of HIV-1 to gain entry into cells The mutant allele of the CCR5 gene contains a 32-bp deletion, making the encoded protein shorter and nonfunctional, blocking the entry of HIV-1 into cells The normal allele is called CCR51 (also called 1), and the mutant allele is called CCR5-∆32 (also called ∆32) Individuals homozygous for the mutant allele (∆32/ ∆32) are resistant to HIV-1 infection Heterozygous (1/∆32) individuals are susceptible to HIV-1 infection but progress more slowly to AIDS Table 22.1 summarizes the genotypes possible at the CCR5 locus and the phenotypes associated with each The discovery of the CCR5-∆32 allele generates two important questions: Which human populations carry this TA BLE 22.1 CCR5 Genotypes and Phenotypes Genotype Phenotype HINT: This problem involves an understanding of how to use the 1/1 Hardy–Weinberg law The key to its solution is to determine which allele frequency (p or q) you must estimate first when homozygous dominant and heterozygous genotypes have the same phenotype 1/∆32 ∆32/∆32 Susceptible to sexually transmitted strains of HIV-1 Susceptible but may progress to AIDS slowly Resistant to most sexually transmitted strains of HIV-1 1/¢32 Po pul atio n and Evolut iona ry G enetics ¢32/¢32 22 1/1 462 403 bp 371 bp 332 bp F I G U RE 2 – Allelic variation in the CCR5 gene Michel Samson and colleagues used PCR to amplify a part of the CCR5 gene containing the site of the 32-bp deletion, cut the resulting DNA fragments with a restriction enzyme, and ran the fragments on an electrophoresis gel Each lane reveals the genotype of a single individual The allele produces a 332-bp fragment and a 403-bp fragment; the ∆32 allele produces a 332-bp fragment and a 371-bp fragment Heterozygotes produce three bands allele, and how common is it? To address these questions, teams of researchers surveyed members of several populations Genotypes were determined by direct analysis of DNA (Figure 22–6) In one population, 79 individuals had genotype 1/1, 20 were 1/∆32, and individual was ∆32/ ∆32 We can see that this population has 158 alleles carried by the 1/1 individuals plus 20 alleles carried by 1/∆32 individuals, for a total of 178 The frequency of the CCR51 allele in the sample population is thus 178/200 = 0.89 = 89 percent Copies of the CCR5-∆32 allele were carried by 20 1/∆32 individuals, plus carried by the ∆32/∆32 individual, for a total of 22 The frequency of the CCR5-∆32 allele is thus 22 /200 = 0.11 = 11 percent Notice that p + q = 1, confirming that we have accounted for all the alleles of the CCR51 gene in the population Table 22.2 shows two methods for computing the frequencies of the alleles in the population surveyed TA B L E 2 Can we expect the CCR5-∆32 allele to increase in human populations because it offers resistance to infection by HIV? This specific question is difficult to answer directly, but as we will see later in this chapter, when factors such as natural selection, mutation, migration, or genetic drift are present, the allele frequencies in a population may change from one generation to the next By determining allele frequencies over more than one generation, it is possible to determine whether the frequencies remain in equilibrium because the Hardy–Weinberg assumptions are operating Populations that meet the Hardy– Weinberg assumptions are not evolving because allele frequencies (for the generations tested) are not changing However, a population may be in Hardy–Weinberg equilibrium for the alleles being tested, but other genes may not be in equilibrium Testing for Hardy–Weinberg Equilibrium in a Population One way to see whether any of the Hardy–Weinberg assumptions not hold in a given population is to determine whether the population’s genotypes are in equilibrium To this, we first determine the genotype frequencies This can be done directly from the phenotypes (if heterozygotes are recognizable), by analyzing proteins or DNA sequences, or indirectly using the frequency of the HIV-1 resistant phenotype in the population to calculate genotype frequencies using the Hardy–Weinberg law We can then calculate the allele frequencies from the genotype frequencies Finally, the allele frequencies in the parental generation are used to predict the genotype frequencies in the next generation According to the Hardy–Weinberg law, genotype frequencies are predicted to fit the p2 + 2pq + q2 = relationship If they not, then one or more of the assumptions are invalid for the population in question Methods of Determining Allele Frequencies from Data on Genotypes Genotype (a) Counting Alleles 1/1 Number of individuals 79 Number of alleles 158 0 Number of ∆32 alleles Total number of alleles 158 Frequency of CCR51 in sample: 178/200 = 0.89 = 89% Frequency of CCR5-∆32 in sample: 22/200 = 0.11 = 11% 1/∆32 ∆32/∆32 Total 20 20 20 40 2 100 178 22 200 Genotype (b) From Genotype Frequencies 1/1 1/ ∆ 32 Number of individuals 79 20 20/100 = 0.20 Genotype frequency 79/100 = 0.79 Frequency of CCR51 in sample: 0.79 + (0.5)0.20 = 0.89 = 89% Frequency of CCR5-∆32 in sample: (0.5)0.20 + 0.01 = 0.11 = 11% ∆ 32/ ∆ 32 Total 1/100 = 0.01 100 1.00 22.3 T he Hardy–W einberg Law Can Be App li ed to Hu man P op u l at i ons To demonstrate, let’s examine CCR5 genotypes in a hypothetical population Our population is composed of 283 individuals; of these, 223 have genotype 1/1; 57 have genotype 1/∆32; and have genotype ∆32/∆32 These numbers represent the following genotype frequencies: 1/1 = 223/283 = 0.788, 1/∆32 = 57/283 = 0.201, and ∆32/∆32 = 3/283 = 0.011, respectively From the genotype frequencies, we can compute the CCR51 allele frequency as 0.89 and the frequency of the CCR5-∆32 allele as 0.11 Once we know the allele frequencies, we can use the Hardy–Weinberg law to determine whether this population is in equilibrium The allele frequencies predict the genotype frequencies in the next generation as follows Expected frequency of genotype 1>1 = p2 = (0.89)2 = 0.792 Expected frequency of genotype 1> ∆32 = 2pq = 2(0.89)(0.11) = 0.196 Expected frequency of genotype ∆32> ∆32 = q2 = (0.11)2 = 0.012 These expected frequencies are nearly identical to the frequencies observed in the parental generation Our test of this population has failed to provide evidence that Hardy– Weinberg assumptions are being violated The conclusion can be confirmed by using the whole numbers utilized in calculating the genotype frequencies to perform a x2 analysis (see Chapter 3) In this case, neither the genotype frequencies nor the allele frequencies are changing in this population, meaning that the population is in equilibrium As we will see in later sections of this chapter, forces such as natural selection, mutation, migration, and chance operate to bring about changes in allele frequency These forces drive both microevolution and the formation of new species (macroevolution) ESS E N TIA L PO I N T Populations that are not in Hardy–Weinberg equilibrium may be undergoing changes in allele frequency owing to forces such as selection, drift, migration, or nonrandom mating 22–2 Determine whether the following two sets of data represent populations that are in Hardy–Weinberg equilibrium (a) CCR5 genotypes: 1/1, 60 percent; 1/∆32, 35.1 percent; ∆32/∆32, 4.9 percent (b) Sickle-cell hemoglobin: SS, 75.6 percent; Ss, 24.2 percent; ss, 0.2 percent (S = normal hemoglobin allele; s = mutant hemoglobin allele) HINT: This problem involves an understanding of how to use the Hardy–Weinberg law to determine whether populations are in genetic equilibrium The key to its solution is to first determine the allele frequencies based on the genotype frequencies provided 463 Calculating Frequencies for Multiple Alleles in Populations Although we have used one-gene two-allele systems as examples, many genes have several alleles, all of which can be found in a single population The ABO blood group in humans (discussed in Chapter 4) is such an example The locus I (isoagglutinin) has three alleles I A , I B , and i, yielding six possible genotypic combinations (I AI A , I BI B, ii, IAIB, IAi, IBi) Remember that in this case I A and I B are codominant alleles and that both of these are dominant to i The result is that homozygous I AI A and heterozygous IAi individuals are phenotypically identical, as are I BI B and IBi individuals, so we can distinguish only four phenotypic blood-type combinations: Type A, Type B, Type AB, and Type O By adding another variable to the Hardy–Weinberg equation, we can calculate both the genotype and allele frequencies for the situation involving three alleles Let p, q, and r represent the frequencies of alleles I A, I B, and i, respectively Note that because there are three alleles p + q + r = Under Hardy–Weinberg assumptions, the frequencies of the genotypes are given by (p + q + r)2 = p2 + q2 + r + 2pq + 2pr + 2qr = If we know the frequencies of blood types for a population, we can then estimate the frequencies for the three alleles of the ABO system For example, in one population sampled, the following blood-type frequencies are observed: A = 0.53, B = 0.133, O = 0.26 Because the i allele is recessive, the population’s frequency of type O blood equals the proportion of the recessive genotype r Thus, r = 0.26 r = 20.26 r = 0.51 Using r, we can calculate the allele frequencies for the I A and I B alleles The I A allele is present in two genotypes, I AI A and I Ai The frequency of the I A I A genotype is represented by p2 and the IAi genotype by 2pr Therefore, the combined frequency of type A blood and type O blood is given by p2 + 2pr + r = 0.53 + 0.26 If we factor the left side of the equation and take the sum of the terms on the right, (p + r)2 = 0.79 p + r = 20.79 p = 0.89 - r p = 0.89 - 0.51 = 0.38 464 22 Po pul atio n and Evolut iona ry G enetics TA B L E 2 Calculating Genotype Frequencies for Multiple Alleles in a Hardy– Weinberg Population Where the Frequency of Allele I A = 0.38, Allele I B = 0.11, and Allele i = 0.51 Genotype A A I I I Ai I BI B I Bi I AI B ii Genotype Frequency Phenotype Phenotype Frequency p = (0.38) = 0.14 2pr = 2(0.38)(0.51) = 0.39 A 0.53 q2 = (0.11)2 = 0.01 2qr = 2(0.11)(0.51) = 0.11 2pr = 2(0.38)(0.11) = 0.084 B 0.12 AB O 0.08 0.26 2 2 r = (0.51) = 0.26 Having calculated p and r, the frequencies of allele I A and allele i, we can now calculate the frequency for the I B allele: p + q + r = q = - p - r = - 0.38 - 0.51 = 0.11 The phenotypic and genotypic frequencies for this population are summarized in Table 22.3 Now that the allele frequencies are known, we can calculate the frequency of the heterozygous genotype In the Hardy–Weinberg equation, the frequency of heterozygotes is 2pq Thus, 2pq = 2(0.98)(0.02) = 0.04 or percent, or 1>25 The results show that heterozygotes for cystic fibrosis are rather common (about 1>25 individuals, or percent of the population), even though the frequency of homozygous recessives is only 1>2500, or 0.04 percent However, keep in mind that these calculations are estimates because the population may not meet all Hardy–Weinberg assumptions 22–3 If the albino phenotype occurs in 1/10,000 individuals in a population at equilibrium and albinism is caused by an autosomal recessive allele a, calculate the frequency of: (a) the recessive mutant allele; (b) the normal dominant allele; (c) heterozygotes in the population; (d) mating between heterozygotes H I NT: This problem involves an understanding of the method of Calculating Heterozygote Frequency A useful application of the Hardy–Weinberg law, especially in human genetics, allows us to estimate the frequency of heterozygotes in a population The frequency of a recessive trait can usually be determined by identifying and counting individuals with the homozygous recessive phenotype in a sample of the population With this information and the Hardy–Weinberg law, we can then calculate the allele and genotype frequencies for this gene Cystic fibrosis, an autosomal recessive trait, has an incidence of about 1/2500 (0.0004) in people of northern European ancestry Individuals with cystic fibrosis are easily distinguished from the population at large by such symptoms as extra-salty sweat, excess amounts of thick mucus in the lungs, and susceptibility to bacterial infections Because this is a recessive trait, individuals with cystic fibrosis must be homozygous Their frequency in a population is represented by q2 (provided that mating has been random in the previous generation) The frequency of the recessive allele is therefore q = 2q2 = 20.0004 = 0.02 Knowing that the frequency of the recessive allele is about 2%, we can calculate the frequency of the normal (dominant) allele because p + q = Using this equation, the frequency of p is p = - q = - 0.02 = 0.98 calculating allele and genotype frequencies The key to its solution is to first determine the frequency of the albinism allele in this population 22.4 Natural Selection Is a Major Force Driving Allele Frequency Change To understand evolution, we must understand the forces that transform the gene pools of populations and can lead to the formation of new species Chief among the mechanisms transforming populations is natural selection, discovered independently by Charles Darwin and Alfred Russel Wallace The Wallace–Darwin concept of natural selection can be summarized as follows: Individuals of a species exhibit variations in phenotype—for example, differences in size, agility, coloration, defenses against enemies, ability to obtain food, courtship behaviors, and flowering times Many of these variations, even small and seemingly insignificant ones, are heritable and are passed on to offspring Organisms tend to reproduce in an exponential fashion More offspring are produced than can survive This causes members of a species to engage in a struggle 22.4 Nat u ral S el ection Is a Ma jor F orce D riving All ele Freq uen cy Change for survival, competing with other members of the community for scarce resources Offspring also must avoid predators, and in sexually reproducing species, adults must compete for mates In the struggle for survival, individuals with particular phenotypes will be more successful than others, allowing the former to survive and reproduce at higher rates As a consequence of natural selection, populations and species change Traits that promote differential survival and reproduction will become more common, and traits that confer a lowered ability for survival and reproduction will become less common This means that over many generations, traits that confer a reproductive advantage will increase in frequency, which in turn causes the population to become better adapted to its current environment Over time, if selection continues, it may result in the appearance of new species Detecting Natural Selection in Populations Recall that measuring allele frequencies and genotype frequencies using the Hardy–Weinberg law is based on several assumptions about an ideal population: large population size, lack of migration, presence of random mating, absence of selection and mutation, and equal survival rates of offspring However, if all genotypes not have equal rates of survival or not leave equal numbers of offspring, then allele frequencies may change from one generation to the next To see why, let’s imagine a population of 100 individuals in which the frequency of allele A is 0.5 and that of allele a is 0.5 Assuming the previous generation mated randomly, we find that the genotype frequencies in the present generation are (0.5)2 = 0.25 for AA, 2(0.5)(0.5) = 0.5 for Aa, and (0.5)2 = 0.25 for aa Because our population contains 100 individuals, we have 25 AA individuals, 50 Aa individuals, and 25 aa individuals Now let’s suppose that individuals with different genotypes have different rates of survival: All 25 AA individuals survive to reproduce, 90 percent or 45/50 of the Aa individuals survive to reproduce, and 80 percent or 20/25 of the aa individuals survive to reproduce When the survivors reproduce, each contributes two gametes to the new gene pool, giving us 2(25) + 2(45) + 2(20) = 180 gametes What are the frequencies of the two alleles in the surviving population? We have 50 A gametes from AA individuals, plus 45 A gametes from Aa individuals, so the frequency of allele A is (50 + 45) >180 = 0.53 We have 45 a gametes from Aa individuals, plus 40 a gametes from aa individuals, so the frequency of allele a is (45 + 40) > 180 = 0.47 465 These differ from the frequencies we started with The frequency of allele A has increased, whereas the frequency of allele a has declined A difference among individuals in survival or reproduction rate (or both) is an example of natural selection Natural selection is the principal force that shifts allele frequencies within large populations by promoting differential survival and reproduction It is one of the most important factors in evolutionary change Fitness and Selection Selection occurs whenever individuals with a particular genotype enjoy an advantage in survival or reproduction over other genotypes However, selection may vary over a wide range, from much less than percent to 100 percent In the previous hypothetical example, selection was strong Weak selection might involve just a fraction of a percent difference in the survival rates of different genotypes Advantages in survival and reproduction ultimately translate into increased genetic contribution to future generations An individual organism’s genetic contribution to future generations is called its fitness Genotypes associated with high rates of reproductive success are said to have high fitness, whereas genotypes associated with low reproductive success are said to have low fitness Hardy–Weinberg analysis also allows us to examine fitness as a measure of the degree of natural selection By convention, population geneticists use the letter w to represent fitness Thus, wAA represents the relative fitness of genotype AA, wAa the relative fitness of genotype Aa, and waa the relative fitness of genotype aa Assigning the values wAA = 1, wAa = 0.9, and waa = 0.8 would mean, for example, that all AA individuals survive, 90 percent of the Aa individuals survive, and 80 percent of the aa individuals survive, as in the previous hypothetical case Let’s consider selection against deleterious alleles Fitness values wAA = 1, wAa = 1, and waa = describe a situation in which a is a homozygous lethal allele As homozygous recessive individuals die without leaving offspring, the frequency of allele a will decline The decline in the frequency of allele a is described by the equation qg = q0 + gq0 where qg is the frequency of allele a in generation g, qo is the starting frequency of a (i.e., the frequency of a in generation zero), and g is the number of generations that have passed Figure 22–7 shows what happens to a lethal recessive allele with an initial frequency of 0.5 At first, because of the high percentage of aa genotypes, the frequency of allele a declines rapidly The frequency of a is halved in only two generations By the sixth generation, the frequency is 22 466 Po pul atio n and Evolut iona ry G enetics Frequency of allele a 0.50 0.40 0.30 0.25 0.20 0.10 0 10 20 100 heterozygotes and 99.6 percent of the aa homozygotes survive (blue curve), it takes 1000 generations for the frequency of allele a to drop from 0.99 to 0.93 Two important conclusions can be drawn from this example First, over thousands of generations, even weak selection can cause substantial changes in allele frequencies; because evolution generally occurs over a large number of generations, selection is a powerful force in evolutionary change Second, for selection to produce rapid changes in allele frequencies, the differences in fitness among genotypes must be large Generation F I G U RE 2 – Change in the frequency of a lethal recessive allele, a The frequency of a is halved in two generations and halved again by the sixth generation Subsequent reductions occur slowly because the majority of a alleles are carried by heterozygotes halved again By now, however, the majority of a alleles are carried by heterozygotes Because a is recessive, these heterozygotes are not selected against Consequently, as more time passes, the frequency of allele a declines ever more slowly As long as heterozygotes continue to mate, it is difficult for selection to completely eliminate a recessive allele from a population Figure 22–8 shows the outcome of different degrees of selection against a nonlethal recessive allele, a In this case, the intensity of selection varies from strong (red curve) to weak (blue curve), as well as intermediate values (yellow, purple, and green curves) In each example, the frequency of the deleterious allele, a, starts at 0.99 and declines over time However, the rate of decline depends heavily on the strength of selection When selection is strong and only 90 percent of the heterozygotes and 80 percent of the aa homozygotes survive (red curve), the frequency of allele a drops from 0.99 to less than 0.01 in about 85 generations However, when selection is weak, and 99.8 percent of the There Are Several Types of Selection The phenotype is the result of the combined influence of the individual’s genotype at many different loci and the effects of the environment Selection can be classified as (1) directional, (2) stabilizing, or (3) disruptive In directional selection traits at one end of a spectrum of phenotypes present in the population become selected for or against, usually as a result of changes in the environment A carefully documented example comes from research by Peter and Rosemary Grant and their colleagues, who study the medium ground finches (Geospiza fortis) of Daphne Major Island in the Galapagos Islands These researchers discovered that the beak size of these birds varies over time (Figure 22–9) In 1977, a severe drought killed some 80 percent of the finches on the island Big-beaked birds survived at higher rates than small-beaked birds because when food became scarce, the big-beaked birds were able to eat a greater variety of seeds, especially larger ones with hard shells After the drought ended, more plants were available, and beak size declined Droughts in 1980 and 1982 again saw differential survival and reproduction, shifting the average beak size toward one phenotypic extreme Mean beak depth (mm) 10.0 Frequency of allele a 1.0 0.8 0.6 0.4 0.2 0 200 400 600 800 1000 Generation F I G U RE 2 – The effect of selection on allele frequency The rate at which a deleterious allele is removed from a population depends heavily on the strength of selection Wet year 9.8 9.6 9.4 9.2 Dry year Dry year Dry year 1977 1980 1982 9.0 1984 FIGUR E 22–9 Beak size in finches during dry years increases because of strong selection Between droughts, selection for large beak size is not as strong, and birds with smaller beak sizes survive and reproduce, increasing the number of birds with smaller beaks Percentage mortality 70 Infant births 60 Infant deaths 50 15 10 40 30 20 10 10 Birth weight (pounds) F I G U RE 2 – Relationship between birth weight and mortality in humans Stabilizing selection, in contrast, selects for intermediate phenotypes, with those at both extremes being selected against Over time, this will reduce the phenotypic variance in the population but without a significant shift in the mean One of the clearest demonstrations of stabilizing selection is from a study of human birth weight and survival for 13,730 children born over an 11-year period Figure 22–10 shows the distribution of birth weight, the percentage of mortality at five weeks, and the percent of births in the population (at right) Infant mortality increases on either side of the optimal birth weight of 7.5 pounds Stabilizing selection acts to keep a population well adapted to its current environment Disruptive selection is selection against intermediate phenotypes and selection for phenotypes at both extremes It can be viewed as the opposite of stabilizing selection because the intermediate types are selected against This will result in a population with an increasingly bimodal distribution for a trait, as we can see in Figure 22–11 In experiments using Drosophila, after several generations of disruptive artificial selection for bristle number, in which only flies with high- or low-bristle numbers were allowed to breed, most flies could be easily placed in a low- or high-bristle category In natural populations, such a situation might exist for a population in a heterogeneous environment ES S E NT I A L PO I N T The rate of change under natural selection depends on initial allele frequencies, selection intensity, and the relative fitness of different genotypes 22.5 Mutation Creates New Alleles in a Gene Pool Within a population, the gene pool is reshuffled each generation to produce new in the offspring The enormous genetic variation present in the gene pool allows Generations of selection 20 80 467 Mu tat io n Creates New A lleles in a Gene P o o l Percent of births in population 22.5 10 11 12 10 15 20 25 30 35 Number of bristles FIGUR E 22–11 The effect of disruptive selection on bristle number in Drosophila When individuals with the highest and lowest bristle number were selected, the population showed a nonoverlapping divergence in only 12 generations assortment and recombination to produce new combinations of genes already present in the gene pool But assortment and recombination not produce new alleles Mutation alone acts to create new alleles It is important to keep in mind that mutational events occur at random— that is, without regard for any possible benefit or disadvantage to the organism Mutations not only create new alleles, but in very small populations can change allele frequencies Let’s consider a population of 20 individuals, and a gene with two alleles, A and a If the frequency of A is 0.90, the frequency of a is 0.10 A mutational event changes one A allele into an a allele This event reduces the frequency of the A allele from 0.90 to 0.85 and increases the frequency of the a allele from 0.10 to 0.15 In this section, we consider whether mutation, by itself, in the larger case, is a significant factor in changing allele frequencies 468 22 Po pul atio n and Evolut iona ry G enetics To determine whether mutation is a significant force in changing allele frequencies, we measure the rate at which they are produced As in our example, most mutations are recessive, so it is difficult to observe mutation rates directly in diploid organisms Indirect methods use probability and statistics or large-scale screening programs to estimate mutation rates For certain dominant mutations, however, a direct method of measurement can be used To ensure accuracy, several conditions must be met: The allele must produce a distinctive phenotype that can be distinguished from similar phenotypes produced by recessive alleles The trait must be fully expressed or completely penetrant so that mutant individuals can be identified An identical phenotype must never be produced by nongenetic agents such as drugs or chemicals Suppose that for a given gene that undergoes mutation to a dominant allele, out of 100,000 births exhibit a mutant phenotype, but the parents are phenotypically normal Because the zygotes that produced these births each carry two copies of the gene, we have actually surveyed 200,000 copies of the gene (or 200,000 gametes) If we assume that the affected births are each heterozygous, we have uncovered mutant alleles out of 200,000 Thus, the mutation rate is 2>200,000 or 1>100,000, which in scientific notation is written as * 10-5 In humans, a dominant form of dwarfism known as achondroplasia fulfills the requirements for measuring mutation rates Individuals with this skeletal disorder have an enlarged skull, short arms and legs, and can be diagnosed by X-ray examination at birth In a survey of almost 250,000 births, the mutation rate (μ) for achondroplasia has been calculated as μ = 1.4 * 10 -5 { 0.5 * 10 -5 Knowing the rate of mutation, we can estimate the extent to which mutation can change allele frequencies from one generation to the next We represent the normal allele as d and the allele for achondroplasia as D Instead of a population of 20 individuals, imagine a population of 500,000 individuals in which everyone has genotype dd The initial frequency of d is 1.0, and the initial frequency of D is If each individual contributes two gametes to the gene pool, the gene pool will contain 1,000,000 gametes, all carrying allele d Although the gametes are in the gene pool, 1.4 of every 100,000 d alleles mutate into a D allele The frequency of allele d is now (1,000,000 - 14)/1,000,000 = 0.999986, and the frequency of allele D is 14>1,000,000 = 0.000014 From these numbers, it will clearly be a long time before mutation, by itself, causes any appreciable change in the allele frequencies in this population In other words, mutation generates new alleles but, unless the population is very small, by itself does not alter allele frequencies at an appreciable rate 22.6 Migration and Gene Flow Can Alter Allele Frequencies The Hardy–Weinberg law assumes that migration does not take place However, occasionally, migration, or gene flow, occurs when individuals move between the populations Migration reduces the genetic differences between populations of a species and can increase the level of genetic variation in some populations Imagine a species in which a given locus has two alleles, A and a There are two populations of this species, one on a mainland and one on an island The frequency of A on the mainland is represented by pm, and the frequency of A on the island is pi If there is migration from the mainland to the island, the frequency of A in the next generation on the island (pi ′) is given by pi ′ = (1 - m)pi + mpm where m represents migrants from the mainland to the island and that migration is random with respect to genotype As an example of how migration might affect the frequency of A in the next generation on the island (pi ′), assume that pi = 0.4 and pm = 0.6 and that 10 percent of the parents of the next generation are migrants from the mainland (m = 0.1) In the next generation, the frequency of allele A on the island will therefore be pi ′ = [(1 - 0.1) * 0.4] + (0.1 * 0.6) = 0.36 + 0.06 = 0.42 In this case, the flow of genes from the mainland has changed the frequency of A on the island from 0.40 to 0.42 in a single generation These calculations reveal that the change in allele frequency attributable to migration is proportional to the differences in allele frequency between the donor and recipient populations and to the rate of migration If either m is large or pm is very different from pi , then a rather large change in the frequency of A can occur in a single generation If migration is the only force acting to change the allele frequency on the island, then equilibrium will be attained when pi = pm These guidelines can often be used to estimate migration in cases where it is difficult to quantify Even in large populations, over time, the effect of migration can substantially alter allele frequencies in populations, as shown for the I B allele of the ABO blood group in Figure 22–12 22.7 G enet ic D ri ft Causes Rando m Changes in A llel e Freq uen c y in Sma ll P opu l at i ons 20–25% 25–30% 469 FIGUR E 2–12 Migration as a force in evolution The I B allele of the ABO locus is present in a gradient from east to west This allele shows the highest frequency in central Asia and the lowest in northeast Spain The gradient parallels the waves of Mongol migration into Europe following the fall of the Roman Empire and is a genetic relic of human history 15–20% 5–10% 10–15% 0–5% 22.7 Genetic Drift Causes Random Changes in Allele Frequency in Small Populations In small populations, significant random fluctuations in allele frequencies are possible by chance alone, a situation known as genetic drift In addition to small population size, drift can arise through the founder effect, which occurs when a population originates from a small number of individuals Although the population may later increase to a large size, the genes carried by all members are derived from those of the founders (assuming no mutation, migration, or selection, and the presence of random mating) Drift can also arise via a genetic bottleneck Bottlenecks develop when a large population undergoes a drastic but temporary reduction in numbers Even though the population recovers, its genetic diversity has been greatly reduced In summary, drift is a product of chance and can arise through small population size, founder effects, and bottlenecks In the following section, we will examine how founder effects can affect allele frequencies Founder Effects in Human Populations Allele frequencies in certain human populations demonstrate the role of genetic drift in natural populations Native Americans living in the southwestern United States have a high frequency of oculocutaneous albinism (OCA) In the Navajo, who live primarily in northeast Arizona, albinism occurs with a frequency of in 1500–2000, compared with whites (1 in 36,000) and African-Americans (1 in 10,000) There are four different forms of OCA (OCA1–4), all with varying degrees of melanin deficiency in the skin, eyes, and hair OCA2 is caused by mutations in the P gene, which encodes a plasma membrane protein To investigate the genetic basis of albinism in the Navajo, researchers screened for mutations in the P gene In their study, all Navajo with albinism were homozygous for a 122.5-kb deletion in the P gene, spanning exons 10–20 Using a set of PCR primers spanning the deletion, researchers were able to identify homozygous affected individuals, heterozygous carriers, and homozygous normal individuals (Figure 22–13) They surveyed 134 normally pigmented Navajo and 42 members of the Apache, a tribe closely related to the Navajo Based on this sample, the heterozygote frequency in the Navajo is estimated to be 4.5 percent No carriers were found in the Apache population that was studied The 122.5-kb deletion allele causing OCA2 albinism was found only in the Navajo population and not in members of other Native American tribes in the southwestern United States, suggesting that the mutant allele is specific to the Navajo and may have arisen in a single individual who was one of the small number of founders of the Navajo population Workers originally estimated the age of the 22 470 M Po pul atio n and Evolut iona ry G enetics N5 C N2 N3 N4 606 bp 257 bp mutation to be between 400 and 11,000 years, but tribal history and Navajo oral tradition indicated that the Navajo and Apache became separate populations between 600 and 1000 years ago Because the deletion is not found in the Apaches, it probably arose in the Navajo population after the tribes split On this basis, the deletion is estimated to be 400 to 1000 years old and probably arose as a founder mutation 22.8 Nonrandom Mating Changes Genotype Frequency but Not Allele Frequency We have explored how populations that not meet the first four assumptions of the Hardy–Weinberg law, in the form of selection, mutation, migration, and genetic drift, can have changes in allele frequencies The fifth assumption is that members of a population mate at random; in other words, any one genotype has an equal probability of mating with any other genotype in the population Nonrandom mating can change the frequencies of genotypes in a population Subsequent selection for or against certain genotypes has the potential to affect the overall frequencies of the alleles they contain, but it is important to note that nonrandom mating does not itself directly change allele frequencies Nonrandom mating can take one of several forms In positive assortive mating, similar genotypes are more likely to mate than dissimilar ones This often occurs in humans: A number of studies have indicated that many people are more attracted to individuals who physically resemble them (and are therefore more likely to be genetically similar as well) Negative assortive mating occurs when dissimilar genotypes are more likely to mate; some plant species have inbuilt recognition systems that prevent FIGUR E 22–13 PCR screens of Navajo affected with albinism (N4 and N5) and the parents of N4 (N2 and N3) Affected individuals (N4 and N5) have a single, dense band at 606 bp; heterozygous carriers (N2 and N3) have two bands, one at 606 bp and one at 257 bp The homozygous normal individual (C) has a single dense band at 257 bp Each genotype produces a distinctive band pattern, allowing detection of heterozygous carriers in the population Molecular size markers (M) are in the first lane Courtesy of Murray Brilliant, “A 122.5 kilobase deletion of P gene underlies the high prevalence of oculocutaneous albinism type in the Navajo population.” From: American Journal Human Genetics 72: 62–72, Figure 3, p 67 Published by University of Chicago Press fertilization between individuals with the same alleles at key loci However, the form of nonrandom mating most commonly found to affect genotype frequencies in population genetics is inbreeding Inbreeding Inbreeding occurs when mating individuals are more closely related than any two individuals drawn from the population at random; loosely defined, inbreeding is mating among relatives For a given allele, inbreeding increases the proportion of homozygotes and decreases the proportion of heterozygotes in the population A completely inbred population will theoretically consist only of homozygous genotypes High levels of inbreeding can be harmful because it increases the probability that the number of individuals homozygous for deleterious and/or lethal alleles will increase in the population To describe the intensity of inbreeding in a population, Sewall Wright devised the coefficient of inbreeding (F) This coefficient quantifies the probability that the two alleles of a given gene present in an individual are identical because they are descended from the same single copy of the allele in an ancestor If F = 1, all individuals in the population are homozygous, and both alleles in every individual are derived from the same ancestral copy If F = 0, no individual has two alleles derived from a common ancestral copy One method of estimating F for an individual is shown in Figure 22–14 The fourth-generation female (shaded pink) is the daughter of first cousins (yellow) Suppose her great-grandmother (green) was a carrier of a recessive lethal allele, a What is the probability that the fourthgeneration female will inherit two copies of her greatgrandmother’s lethal allele? For this to happen, (1) the great-grandmother had to pass a copy of the allele to her son, (2) her son had to pass it to his daughter, and (3) his 22.9 Aa S pe ciatio n Occ u rs Via Reprodu c ti ve Iso l at i on AA FIGUR E 22–14 Calculating the coefficient of inbreeding (F) for the offspring of a first-cousin marriage 2 471 2 The chance that this female will inherit two copies of her great-grandmother’s a allele is F=1*1*1*1*1*1= 2 2 2 64 Because the female’s two alleles could be identical by descent from any of four different alleles, = F = * 64 16 F = * (1>64) = 1>16 ES S E NT I A L PO I N T Nonrandom mating in the form of inbreeding increases the frequency of homozygotes in the population and decreases the frequency of heterozygotes 22–4 A prospective groom, who is normal, has a sister with cystic fibrosis (CF), an autosomal recessive disease Their parents are normal The brother plans to marry a woman who has no history of CF in her family What is the probability that they will produce a CF child? They are both Caucasian, and the overall frequency of CF in the Caucasian population is 1/2500—that is, affected child per 2500 (Assume the population meets the Hardy–Weinberg assumptions.) HINT: This problem involves an understanding of how recessive alleles are transmitted (see Chapter 3) and the probability of receiving a recessive allele from a heterozygous parent The key to its solution is to first work out the probability that each parent carries the mutant allele 22.9 Speciation Occurs Via Reproductive Isolation A species can be defined as a group of actually or potentially interbreeding organisms that is reproductively isolated in nature from all other such groups In sexually reproducing organisms, speciation transforms the parental species into another species, or divides a single species into two or more separate species (Figure 22–15) Populations within a species may carry considerable genetic variation, present as differences in alleles or allele frequencies at a variety of loci Genetic divergence of these populations that result in different allele frequencies and/or different alleles in their gene pools can reflect the action of forces such as natural selection, mutation, and genetic drift When gene flow between populations is reduced or absent, the populations may diverge to the point that members of one population are no longer able to interbreed successfully with members of the other When populations reach the point where they are reproductively isolated Species Species Form daughter had to pass it to her daughter (the pink female) Also, (4) the great-grandmother had to pass a copy of the allele to her daughter, (5) her daughter had to pass it to her son, and (6) her son had to pass it to his daughter (the pink female) Each of the six necessary events has an individual probability of 1>2, and they all have to happen, so the probability that the pink female will inherit two copies of her great-grandmother’s lethal allele is (1/2)6 = 1/64 However, to calculate an overall value of F for the pink female as a child of a first-cousin marriage, remember that she could also inherit two copies of any of the other three dominant alleles present in her great-grandparents Because any of four possibilities would give the pink female two alleles identical by descent from an ancestral copy, Anagenesis Stasis Species Cladogenesis Species Species Time FIGUR E 22–15 After a period with no change (stasis), species is transformed into species 2, a process called anagenesis Later, species splits into two new species (species and 4), a process called cladogenesis 472 22 Po pul atio n and Evolut iona ry G enetics from one another, they have become different species, according to the biological species concept The biological barriers that prevent or reduce interbreeding between populations are called reproductive isolating mechanisms These mechanisms may be ecological, behavioral, seasonal, mechanical, or physiological Prezygotic isolating mechanisms prevent individuals from mating in the first place Individuals from different populations may not find each other at the right time, may not recognize each other as suitable mates, or may try to mate but find that they are unable to so because of differences in mating behavior Postzygotic isolating mechanisms create reproductive isolation even when the members of two populations are willing and able to mate with each other For example, mating may take place, and hybrid zygotes may be formed, but all or most of them may be inviable Alternatively, the hybrids may be viable, but be sterile or suffer from reduced fertility Yet again, the hybrids themselves may be fertile, but their progeny may have lowered viability or fertility In all these situations, hybrids are genetic dead-ends These postzygotic mechanisms act at or beyond the level of the zygote and are generated by genetic divergence Postzygotic isolating mechanisms waste gametes and zygotes and lower the reproductive fitness of hybrid survivors Selection will therefore favor the spread of alleles that lead to the development of prezygotic isolating mechanisms, which in turn prevent interbreeding and the formation of hybrid zygotes and offspring In animal evolution, one of the most effective prezygotic mechanisms is behavioral isolation, involving courtship behavior Changes Leading to Speciation One form of speciation depends on the formation of geographic barriers between populations, which prevents gene flow between the isolated populations Isolation allows the gene pools of these populations to diverge If the isolated populations later come into contact, several outcomes are possible If reproductive isolating mechanisms are not in place, the populations will mate and will be regarded as one species However, if reproductive isolating mechanisms have developed, the two populations will be regarded as separate species Formation of the Isthmus of Panama about million years ago created a land bridge connecting North and South America and separated the Caribbean Sea from the Pacific Ocean After identifying seven Caribbean species of snapping shrimp (Figure 22–16) and seven similar Pacific species, researchers matched them in pairs Analysis of allele frequencies and mitochondrial DNA sequences confirmed that the ancestors of each pair were members of a single species When the isthmus closed, each of the seven FIGUR E 22–16 A snapping shrimp (genus Alpheus) ancestral species was divided into two separate populations, one in the Caribbean and the other in the Pacific But after million years of separation, were members of these populations different species? Males and females were paired together, and successful matings between Caribbean–Pacific couples versus those of Caribbean–Caribbean or Pacific–Pacific pairs were calculated In three of the seven species pairs, transoceanic couples refused to mate altogether Of the transoceanic pairs that mated, only percent produced viable offspring, while 60 percent of same-ocean pairs produced viable offspring We can conclude that million years of separation has resulted in complete or nearly complete speciation, involving strong pre- and postzygotic isolating mechanisms for all seven species pairs The Rate of Macroevolution and Speciation How much time is required for speciation? As we saw in the example above, the time needed for genetic divergence and formation of new species can occur over a span of several million years In fact, the average time for speciation ranges from 100,000 to 10,000,000 years However, rapid speciation over much shorter time spans has been reported in a number of cases, including fishes in East African lakes, marine salmon, palm trees on isolated islands, polyploid plants, and brown algae in the Baltic Sea In Nicaragua, Lake Apoyo was formed within the last 23,000 years in the crater of a volcano (Figure 22–17) This small lake is home to two species of cichlid fish: the Midas cichlid, Amphilophus citrinellus, and the Arrow cichlid, A zaliosus The Midas is the most common cichlid in the region and is found in nearby lakes; the Arrow cichlid is found only in Lake Apoyo To establish the evolutionary origin of the Arrow cichlid, researchers used a variety of approaches, including phylogenetic, morphological, and ecological analyses Sequence analysis of mitochondrial DNA established that the two species form a group with a common ancestor (a 22.10 473 P h ylo gen y Can B e Use d to A na lyz e E volutio nary Hi sto ry (a) Lake Managua Xiloá Apoyo Lake Nicaragua F I G U RE 2 – Lake Apoyo in Nicaragua occupies the crater of an inactive volcano The lake formed about 23,000 years ago Two species of cichlid fish in the lake share a close evolutionary relationship monophyletic group) Further genomic analysis of both species using a PCR-based method strengthened the conclusion that these two species are monophyletic and that A zaliosus evolved from A citrinellus Members of the two species have distinctive morphologies (Figure 22–18), including jaw specializations that reflect different food preferences, which were confirmed by analysis of stomach contents In addition, the two species are reproductively isolated, a conclusion substantiated by laboratory experiments Using a molecular clock calibrated for cichlid mtDNA, researchers have estimated that A zaliosus evolved from A citrinellus sometime within the last 10,000 years This estimate, and examples from other species, provide unambiguous evidence that, depending on the strength of selection and that of other parameters of the Hardy–Weinberg law, species formation can occur over a much shorter time scale than the usual range of 100,000–10,000,000 years (b) FIGUR E 22–18 The two species of cichilds in Lake Apoyo exhibit distinctive morphologies: (a) Amphilophus citrinellus, (b) Amphilophus zaliosus Branch Species A 22.10 Phylogeny Can Be Used to Analyze Evolutionary History Speciation is associated with genetic divergence of populations Therefore, we should be able to use genetic differences and similarities among present-day species to reconstruct their evolutionary histories These relationships are most often presented in the form of phylogenetic trees (Figure 22–19), which show the ancestral relationships among a group of organisms These groups can be species, or larger groups such as phyla In a phylogenetic tree, branches represent the relationships among lineages over time The length of a branch can be derived from a time scale, showing the length of time between speciation Species B Root Node Species C Tip FIGUR E 22–19 Elements of a phylogenetic tree showing the relationships among several species The root represents a common ancestor to all species on the tree Branches represent lineages through time The points at which the branches separate are called nodes, and at the tips of the branches are the living or extinct species 474 22 Po pul atio n and Evolut iona ry G enetics events Branch points, or nodes, show when a species split into two or more species Each node represents a common ancestor of the species diverging at that node The tips of the branches represent species (or a larger group) alive today (or those that ended in extinction) Groups that consist of an ancestral species and all its descendants are called monophyletic groups The root of a phylogenetic tree represents the oldest common ancestor to all the groups shown in the tree Trees can be constructed from differences in morphology of living organisms, from fossils, and the molecular sequences of proteins, RNA, and DNA (a) Constructing Phylogenetic Trees from DNA Sequences Advances in DNA sequencing technology have made genetic and genomic information from many species available, and today, most phylogenetic trees are constructed using DNA sequences Constructing a species-level phylogenetic tree using DNA sequences requires three steps: (b) DNA sequences representing a gene or genome of interest from a number of different species must be acquired With the proliferation of DNA sequencing projects, these are usually available from public databases The sequences must be aligned with each other so that the related parts of each sequence can be compared to see if they are the same or different The sequences to be compared can be imported into software programs that maximize the number of aligned base pairs by inserting gaps as needed As discussed earlier, more distantly related species have acquired more DNA differences because of the longer time that has elapsed since they last shared a common ancestor More closely related species have fewer DNA differences because there has been less time for accumulation of DNA differences since they last shared a common ancestor These DNA differences are used to construct a phylogenetic tree, often beginning with the most closely related sequences and working backwards through sequences that are less closely related Reconstructing Vertebrate Evolution by Phylogenetic Analysis One of the most important steps in the evolutionary history of our species was the ancient transition of vertebrates from the ocean to the land For more than a century, biologists have debated and argued about which group of lobefinned fish crawled ashore as the ancestor of all terrestrial vertebrates (amphibians, reptiles, birds, and mammals) In past years, phylogenetic trees constructed from the fossil record, from living species, and from mitochondrial DNA FIGUR E 22-20 Phylogenetic evidence indicates that the lungfish (a) and not the coelacanth (b) is a common ancestor of amphibians, reptiles, birds and mammals sequences pointed to the lungfish (Figure 22–20) as the closest living relative to terrestrial vertebrates, but could not rule out the possibility that vertebrates may have two common ancestors, the lungfish and another organism, the coelacanth (Figure 22–20) Recently, the coelacanth genome has been sequenced, and the data from this study have reopened the question of which group shares a common ancestor with our species and all other land vertebrates Using sequence data from the coelacanth, the lungfish, and selected vertebrate species, researchers aligned and analyzed information from 251 protein-coding genes to construct a phylogenetic tree (Figure 22–21) The results strongly support earlier work indicating that terrestrial vertebrates are more closely related to the lungfish than to the coelacanth Thus, the door has been closed on this important evolutionary question Molecular Clocks Measure the Rate of Evolutionary Change In many cases, we would like to estimate not only which members of a set of species are most closely related, but 22.10 P h ylo gen y Can B e Use d to A na lyz e E volutio nary Hi sto ry Mammals Reptiles Birds Common Ancestor 475 scales, and times and dates must be added to the clock using independent evidence such as the fossil record Figure 22–22 shows a molecular clock showing divergence times for humans and other vertebrates based on the fossil record [Figure 22–22(a)] and molecular data [Figure 22–22(b)] In both cases, changes in amino acid sequence and nucleotide sequence increase linearly with time Amphibians The Complex Origins of Our Genome Coelacanth Rayfinned Fish Sharks, Rays F I G U RE 2 -2 A phylogenetic tree of selected jawed vertebrates, including the lungfish and the coelacanth, shows that the lungfish shares the most recent common ancestor with these vertebrates also when their common ancestors lived The ability to construct phylogenetic trees from protein and nucleic acid sequences led to the development of molecular clocks, which use the rate of change in amino acid or nucleotide sequences as a way to estimate the time of divergence from a common ancestor To be useful, molecular clocks must be carefully calibrated Molecular clocks can only measure changes in amino acids or nucleotides; they are linear over certain time Sequence divergence (per site) (a) Fossil record 1.4 synonymous nucleotide substitutions 1.2 1.0 0.8 0.6 0.4 0.2 amino acid substitutions 100 200 300 400 500 Divergence time (MY) based on fossil records F I G U RE 2 – 2 Relationship between the number of amino acid substitutions and the number of nucleotide substitutions for 4198 nuclear genes from 10 vertebrate species Humans versus (1) chimpanzee, (2) orangutan, (3) macaque, (4) mouse, (5) cow, (6) opossum, (7) chicken, (8) western clawed frog, Current fossil, molecular, and genomic evidence indicates that our species, Homo sapiens, arose in Africa about 200,000 years ago from earlier species of Homo When populations of H sapiens first expanded out of Africa sometime between 50,000 and 70,000 years ago, parts of Europe and Asia were already occupied by members of other human species Advances in DNA sequencing technology and new methods of DNA extraction that allow the recovery of genomic DNA from fossil remains have created a new field, called paleogenomics, which in turn, has revolutionized the study of human evolution The genomes of two extinct groups who lived in the Middle East, Asia, and Europe, the Neanderthals and the Denisovans, have been sequenced and compared with the genomes of present-day humans The results show that modern human populations outside Africa, including those of the Middle East, Europe, Asia, Australia/Oceania, and the Americas, carry sequences from these two groups Here is what we know about the genome of the Neanderthals and the contributions they made to our genome The first Neanderthal genome was assembled in 2010 from three skeletons discovered in a Croatian cave Since then, genomes from several other Neanderthals have been (b) Molecular data Sequence divergence (per site) Lungfish 1.4 synonymous nucleotide substitutions 1.2 1.0 0.8 0.6 0.4 0.2 amino acid substitutions 12 100 200 300 400 500 Divergence time (MY) based on molecular data and (9) zebrafish In (a) the data are calculated by divergence times based on the fossil record, and in (b), based on synonymous nucleotide substitutions, which are mutations that not result in any changes in the amino acid sequence of a protein 476 22 Po pul atio n and Evolut iona ry G enetics sequenced Comparative genome analysis shows that the genomes of our species and the Neanderthals are the same size (about 3.2 billion base pairs) and are 99.7 percent identical Populations of H neanderthalensis lived in Europe and western Asia from some 300,000 years ago until they disappeared about 40,000 years ago For at least 30,000 years, Neanderthals coexisted with anatomically modern humans (H sapiens) in regions of the Middle East and Europe, providing an opportunity for interbreeding between these species In fact, gene flow from extinct Neanderthals to modern humans through interbreeding is estimated to represent percent of the genome of non-African populations Thus, the 99.7 percent sequence identity between the two species includes the percent contributed by Neanderthals that has become fixed in the genome of our species However, different individuals carry different portions of the Neanderthal genome; taken together, upward of 20 percent of the Neanderthal genome may be present in the genomes of modern non-African populations From these studies, two conclusions can be drawn First, Neanderthals are not direct ancestors of our species Second, Neanderthals and members of our species did interbreed, and Neanderthals contributed to our genome Thus, although Neanderthals are extinct, some of their DNA has survived and is a fixed part of our genome In 2008, human fossils were discovered in a cave near Denisova, Siberia (Figure 22–23) A complete mtDNA genome sequence showed that these fossils belonged to a group separate from both Neanderthals and our species They were named the Denisovans A nuclear Denisovan genome sequence shows that they are more closely related to Neanderthals than to our species In addition, the Denisovan genome contains sequences from another, as yet unknown, archaic group that made no contribution to the Neanderthal genome Analysis of modern human populations shows that to percent of the DNA in the genomes of residents of the FIGUR E 22–23 The cave in Denisova, Siberia, where the Denisovan fossils were discovered Melanesian Islands in the South Pacific is derived from the Denisovans, and smaller amounts of Denisovan DNA are found in the genomes of Australian aborigines, as well as Polynesians, Fujians, east Indonesians, and some populations of East Asia As things stand now, we know that as a result of gene flow, some members of our species outside of Africa carry DNA from one or two other human groups (Figure 22–23) The Neanderthal and Denisovan genomes were assembled from fossil remains that are 40,000 to 80,000 years old The recent sequencing of a genome from a 700,000-yearold horse fossil opens the possibility that genome sequences can be recovered from fossils of much older human species For now, using the paleogenomic techniques currently available, we can expect exciting answers to questions about the similarities and differences between our genome and those of other human species, providing revolutionary insights into the evolution of our species and other human species that preceded us on this planet G E N E T I C S , T E C H N O L O G Y, A N D S O C I E T Y B Tracking Our Genetic Footprints out of Africa ased on the physical traits and distribution of hominid fossils, most paleoanthropologists agree that a large-brained, tool-using hominid they call Homo erectus appeared in east Africa about million years ago This species used simple stone tools and hunted, but did not fish, build houses, or follow ritual burial practices About 1.7 million years ago, H erectus spread into Eurasia and south Asia Most scientists also agree that H erectus likely developed into H heidelbergensis—a species that became the ancestor to our species (in Africa), Neanderthals (in Europe), and Denisovans (in Asia) These hominids were anatomically robust, with large, heavy skeletons and skulls These groups disappeared 50,000 to 30,000 years ago—around the same time that anatomically modern humans (H sapiens) appeared all over the world The events that led to the appearance of H sapiens throughout Europe and Asia are a source of controversy At present, two main hypotheses explain the origins of modern humans: the multiregional hypothesis and the out-of-Africa hypothesis The multiregional hypothesis is based primarily on archaeological and fossil evidence It proposes that H sapiens developed gradually and simultaneously all over the world from existing H heidelbergensis groups, including INSI G HT S AN D S OLUT ION S Neanderthals Interbreeding between these groups eventually made H sapiens a genetically homogeneous species Natural selection then created the regional variants that we see today In the multiregional view, our genetic makeup should include significant contributions from many H heidelbergensis groups, including Neanderthals In contrast, the out-of-Africa hypothesis, based primarily on genetic analyses of modern human populations, contends that H sapiens evolved from the descendants of H heidelbergensis in sub-Saharan Africa about 200,000 years ago A small band of H sapiens (probably fewer than 1000) then left Africa around 50,000 years ago By 40,000 years ago, they had reached Europe, Asia, and Australia In the out-of-Africa model, H sapiens replaced all existing hominins In this way, H sapiens became the only species in the genus by about 30,000 years ago Although the out-of-Africa hypothesis is still debated, most genetic evidence appears to support it Humans all over the globe are remarkably similar genetically DNA sequences from any two people chosen at random are 99.9 percent identical More genetic identity exists between two persons chosen at random from a human population than between two chimpanzees chosen at random from a chimpanzee population Interestingly, about 90 percent of the genetic differences that exist occur between individuals rather than between populations This unusually high degree of genetic relatedness in all humans around the world supports the idea that our species arose recently from a small founding group of humans CASE STUDY Studies of mitochondrial DNA se- quences from current human populations reveal that the highest levels of genetic variation occur within African populations Africans show twice the mitochondrial DNA sequence diversity of non-Africans This implies that the earliest branches of H sapiens diverged in Africa and had a longer time to accumulate mitochondrial DNA mutations, which are thought to accumulate at a constant rate over time DNA sequences from mitochondrial, Y-chromosome, and chromosome-21 markers support the idea that human roots are in east Africa and that the migration out of Africa occurred through Ethiopia, along the coast of the Arabian Peninsula, and outward to Eurasia and Southeast Asia Recent data based on nuclear microsatellite variants and whole genome single-nucleotide polymorphism (SNP) analysis further support the notion that humans migrated out of Africa and dispersed throughout the world from a small founding population As with any explanation of human origins, the out-of-Africa hypothesis is actively debated As methods to sequence DNA from ancient fossils improve, it may be possible to fill the gaps in the genetic pathway leading out of Africa and to resolve those age-old questions about our origins Your Turn Recent sequencing data suggest that there was some interbreeding between H sapiens, Neanderthals and Denisovans Discuss the evidence that supports the idea that these 477 groups interbred How might interbreeding have affected the survival of H sapiens out of Africa? Start your investigations by reading Kelso, J and Prufer, K 2014 Ancient humans and the origin of modern humans Curr Opin Genet Dev 29: 133–138 If all people on Earth are very similar genetically, how did we come to have the range of physical differences, which some describe as racial differences? How has modern genomics contributed to the debate about the validity and definition of the term race? For an interesting discussion of race, human variation, and genomic studies, see Lewontin, R.C 2006 Confusion about human races, on the Social Sciences Research Center website— raceandgenomics.ssrc.org/Lewontin A study of the genetic differences between human population groups can be found in Witherspoon, D.J et al 2007 Genetic similarities within and between human populations Genetics 176(1): 351–359 Geneticists study mitochondrial and Y-chromosome DNA to determine the ancestry of modern humans Why are these two types of DNA used in lineage studies? What is meant by the terms mitochondrial Eve and Y-chromosome Adam? To read the original paper hypothesizing a mitochondrial Eve, see Cann, R.L et al 1987 Mitochondrial DNA and human evolution Nature 325: 31–36 For a discussion of Y-chromosome Adam, see Gibbons, A 1997 Y Chromosome shows that Adam was an African Science 278: 804–805 An unexpected outcome A newborn screening program identified a baby with a rare autosomal recessive disorder called arginosuccinate aciduria (AGA), which causes high levels of ammonia to accumulate in the blood Symptoms usually appear in the first week after birth and can progress to include severe liver damage, developmental delay, and mental retardation AGA occurs with a frequency of about in 70,000 births There is no history of this disorder in either the father’s or mother’s family This case raises several questions: Since it appears that the unaffected parents are heterozygotes, would it be considered unusual that there would be no family history of the disorder? How would they be counseled about risks to future children? If the disorder is so rare, what is the frequency of heterozygous carriers in the population? What are the chances that two heterozygotes will meet and have an affected child? INSIGHTS AND SOLUTIONS Tay–Sachs disease is caused by loss-of-function mutations in a gene on chromosome 15 that encodes a lysosomal enzyme Tay-Sachs is inherited as an autosomal recessive condition Among Ashkenazi Jews of Central European ancestry, about in 3600 children is born with the disease What fraction of the individuals in this population are carriers? Solution: If we let p represent the frequency of the wildtype enzyme allele and q the total frequency of recessive loss-of-function alleles, and if we assume that the population is in Hardy–Weinberg equilibrium, then the frequencies of the genotypes are given by p2 for homozygous normal, 2pq (continued) 478 22 Po pul ation an d Evolutio na ry G en etic s Insights and Solutions—continued for carriers, and q2 for individuals with Tay–Sachs The frequency of Tay–Sachs alleles is thus q = 2q2 = Since p + q = 1, we have = 0.017 A 3600 p = - q = - 0.017 = 0.983 Therefore, we can estimate that the frequency of carriers is 2pq = 2(0.983)(0.017) = 0.033 or about in 30 A single plant twice the size of others in the same population suddenly appears Normally, plants of that species reproduce by self-fertilization and by cross-fertilization Is this new giant plant simply a variant, or could it be a new species? How would you determine which it is? Problems and Discussion Questions HOW DO WE KNOW ? Population geneticists study changes in the nature and amount of genetic variation in populations, the distribution of different genotypes, and how forces such as selection and drift act on genetic variation to bring about evolutionary change in populations and the formation of new species From the explanation given in the chapter, what answers would you propose to the following fundamental questions? (a) How we know how much genetic variation is in a population? (b) How geneticists detect the presence of genetic variation as different alleles in a population? (c) How we know whether the genetic structure of a population is static or dynamic? (d) How we know when populations have diverged to the point that they form two different species? (e) How we know the age of the last common ancestor shared by two species? CONCEPT QUESTION Review the Chapter Concepts on page 457 All these pertain to the principles of population genetics and the evolution of species Write a short essay describing the roles of mutation, migration, and selection in bringing about speciation Price et al (1999 J Bacteriol 181: 2358–2362) conducted a genetic study of the toxin transport protein (PA) of Bacillus anthracis, the bacterium that causes anthrax in humans Within the 2294-nucleotide gene in 26 strains they identified five point mutations—two missense and three synonyms—among different isolates Necropsy samples from an anthrax outbreak in 1979 revealed a novel missense mutation and five unique nucleotide changes among ten victims The authors concluded that these data indicate little or no horizontal transfer between different B anthracis strains (a) Which types of nucleotide changes (missense or synonyms) cause amino acid changes? (b) What is meant by horizontal transfer? (c) On what basis did the authors conclude that evidence of horizontal transfer is absent from their data? The genetic difference between two Drosophila species, D heteroneura and D sylvestris, as measured by nucleotide diversity, Solution: One of the most widespread mechanisms of speciation in higher plants is polyploidy, the multiplication of entire sets of chromosomes The result of polyploidy is usually a larger plant with larger flowers and seeds There are two ways of testing the new variant to determine whether it is a new species First, the giant plant should be crossed with a normal-sized plant to see whether the giant plant produces viable, fertile offspring If it does not, then the two different types of plants would appear to be reproductively isolated Second, the giant plant should be cytogenetically screened to examine its chromosome complement If it has twice the number of its normal-sized neighbors, it is a tetraploid that may have arisen spontaneously If the chromosome number differs by a factor of two and the new plant is reproductively isolated from its normal-sized neighbors, it is a new species Visit for instructor-assigned tutorials and problems is about 1.8 percent The difference between chimpanzees (P troglodytes) and humans (H sapiens) is about the same, yet the latter species are classified in different genera In your opinion, is this valid? Explain why The use of nucleotide sequence data to measure genetic variability is complicated by the fact that the genes of higher eukaryotes are complex in organization and contain 5′ and 3′ flanking regions as well as introns Researchers have compared the nucleotide sequence of two cloned alleles of the g-globin gene from a single individual and found a variation of percent Those differences include 13 substitutions of one nucleotide for another and short DNA segments that have been inserted in one allele or deleted in the other None of the changes takes place in the gene’s exons (coding regions) Why you think this is so, and should it change our concept of genetic variation? Calculate the frequencies of the AA, Aa, and aa genotypes after one generation if the initial population consists of 0.2 AA, 0.6 Aa, and 0.2 aa genotypes and meets the requirements of the Hardy–Weinberg relationship What genotype frequencies will occur after a second generation? Consider rare disorders in a population caused by an autosomal recessive mutation From the frequencies of the disorder in the population given, calculate the percentage of heterozygous carriers (a) 0.0064 (b) 0.000081 (c) 0.09 (d) 0.01 (e) 0.10 What must be assumed in order to validate the answers in Problem 7? In a population that meets the Hardy–Weinberg equilibrium assumptions, 81% of the individuals are homozygous for a recessive allele What percentage of the individuals would be expected to be heterozygous for this locus in the next generation? 10 In a population of cattle, the following color distribution was noted: 36% red (RR), 48% roan (Rr), and 16% white (rr) Is this population in a Hardy–Weinberg equilibrium? What will be the distribution of genotypes in the next generation if the Hardy– Weinberg assumptions are met? 11 Consider a population in which the frequency of allele A is p = 0.7 and the frequency of allele a is q = 0.3, and where the 479 PRO BLE MS AND DISCUSSION Q UE STIONS (a) wAA = 1, wAa = 0.9, and waa = 0.8 (b) wAA = 1, wAa = 0.95, and waa = 0.9 (c) wAA = 1, wAa = 0.99, waa = 0.98 (d) wAA = 0.8, wAa = 1, waa = 0.8 12 In a population of 10,000 individuals, where 3600 are MM, 1600 are NN, and 4800 are MN, what are the frequencies of the M alleles and the N alleles? 13 Under what circumstances might a lethal dominant allele persist in a population? 14 A certain form of albinism in humans is recessive and autosomal Assume that 1% of the individuals in a given population are albino Assuming that the population is in Hardy–Weinberg equilibrium, what percentage of the individuals in this population is expected to be heterozygous? 15 One of the first Mendelian traits identified in humans was a dominant condition known as brachydactyly This gene causes an abnormal shortening of the fingers or toes (or both) At the time, some researchers thought that the dominant trait would spread until 75 percent of the population would be affected (because the phenotypic ratio of dominant to recessive is 3:1) Show that the reasoning was incorrect 16 What is the original source of genetic variation in a population? Which natural factors affect changes in this original variation? 17 Achondroplasia is a dominant trait that causes a characteristic form of dwarfism In a survey of 50,000 births, five infants with achondroplasia were identified Three of the affected infants had affected parents, while two had normal parents Calculate the mutation rate for achondroplasia and express the rate as the number of mutant genes per given number of gametes 18 A recent study examining the mutation rates of 5669 mammalian genes (17,208 sequences) indicates that, contrary to popular belief, mutation rates among lineages with vastly different generation lengths and physiological attributes are remarkably constant (Kumar, S., and Subramanian, S 2002 Proc Natl Acad Sci [USA] 99: 803–808) The average rate is estimated at 12.2 * 10-9 per bp per year What is the significance of this finding in terms of mammalian evolution? 19 A form of dwarfism known as Ellis–van Creveld syndrome was first discovered in the late 1930s, when Richard Ellis and Simon van Creveld shared a train compartment on the way to a pediatrics meeting In the course of conversation, they discovered that they each had a patient with this syndrome They published a description of the syndrome in 1940 Affected individuals have a short-limbed form of dwarfism and often have defects of the lips and teeth, and polydactyly (extra fingers) The largest pedigree for the condition was reported in an Old Order Amish population in eastern Pennsylvania by Victor McKusick and his colleagues (1964) In that community, about per 1000 births are affected, and in the population of 8000, the observed frequency is per 1000 All affected individuals have unaffected parents, and all affected cases can trace their ancestry to Samuel King and his wife, who arrived in the area in 1774 It is known that neither King nor his wife was affected with the disorder There are no cases of the disorder in other Amish communities, such as those in Ohio or Indiana (a) From the information provided, derive the most likely mode of inheritance of this disorder Using the Hardy–Weinberg law, calculate the frequency of the mutant allele in the population and the frequency of heterozygotes, assuming Hardy–Weinberg conditions (b) What is the most likely explanation for the high frequency of the disorder in the Pennsylvania Amish community and its absence in other Amish communities? 20 List the barriers that prevent interbreeding and give an example of each 21 Present a rationale for using DNA sequence polymorphisms as an index of genetic diversity Is genetic diversity directly proportional to evolutionary (phylogenetic) diversity? 22 Are there nucleotide substitutions that will not be detected by electrophoretic studies of a gene’s protein product? 23 In a recent study of cichlid fish inhabiting Lake Victoria in Africa, Nagl et al (1998 Proc Natl Acad Sci [USA] 95: 14,238–14,243) examined suspected neutral sequence polymorphisms in noncoding genomic loci in 12 species and their putative river-living ancestors At all loci, the same polymorphism was found in nearly all of the tested species from Lake Victoria, both lacustrine and riverine Different polymorphisms at these loci were found in cichlids at other African lakes (a) Why would you suspect neutral sequences to be located in noncoding genomic regions? (b) What conclusions can be drawn from these polymorphism data in terms of cichlid ancestry in these lakes? 24 What genetic changes take place during speciation? 25 Some critics have warned that the use of gene therapy to correct genetic disorders will affect the course of human evolution Evaluate this criticism in light of what you know about population genetics and evolution, distinguishing between somatic gene therapy and germ-line gene therapy 26 Comparisons of Neanderthal mitochondrial DNA with that of modern humans indicate that they are not related to modern humans and did not contribute to our mitochondrial heritage However, because Neanderthals and modern humans are separated by at least 25,000 years, this does not rule out some forms of interbreeding causing the modern European gene pool to be derived from both Neanderthals and early humans (called CroMagnons) To resolve this question, Caramelli et al (2003 Proc Natl Acad Sci [USA] 100: 6593–6597) analyzed mitochondrial DNA sequences from 25,000-year-old Cro-Magnon remains and compared them to four Neanderthal specimens and a large dataset derived from modern humans The results are shown in the graph 0.16 Genetic distance between samples and modern humans alleles are codominant What will be the allele frequencies after one generation if the following occurs? Modern humans Cro-Magnons Neanderthals 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 10 15 20 25 30 35 40 45 Age of specimens (103 years) The x-axis represents the age of the specimens in thousands of years; the y-axis represents the average genetic distance Modern humans are indicated by filled squares; Cro-Magnons, open squares; and Neanderthals, diamonds (a) What can you conclude about the relationship between CroMagnons and modern Europeans? What about the relationship between Cro-Magnons and Neanderthals? (b) From these data, does it seem likely that Neanderthals made any mitochondrial DNA contributions to the Cro-Magnon gene pool or the modern European gene pool? SPECIAL TOPICS IN MOD ERN GE NE TICS Epigenetics SPECIAL TOPIC T he somatic cells of the human body contain 20,000 but this genome can be modified in diverse cell types at difto 25,000 genes In the more than 200 cell types ferent times to produce many epigenomes present in the body, different cell-specific gene sets Current research efforts are focused on several are transcribed, while the rest of the genome is transcripaspects of epigenetics: how an epigenome arises in develtionally inactive During development, as embryonic cells oping and differentiated cells and how these epigenomes gradually become specialized cells and exhibit adult phenoare transmitted via mitosis and meiosis, making them types, programs of gene expression become increasingly heritable traits In addition, because epigenetically conrestricted Until recently, it was thought that most regulation trolled alterations to the genome are associated with of gene expression is coordinated by cis-regulatory elements common diseases such as cancer, diabetes, and asthma, as well as DNA-binding proteins and transcription factors, efforts are also being directed toward developing drugs and that this regulation can occur at any of the steps in gene that can modify or reverse disease-associated epigenetic expression (see Chapter 14) However, as we have learned changes in cells more about genome organization and the regulation of Here we will focus on the association of epigenetics gene expression, it is clear that classical regulatory mechawith some heritable genetic disorders, cancer, and environnisms cannot fully explain how some phenotypes arise or ment–genome interactions Because epigenetic changes are why phenotypes change during the life cycle For example, potentially reversible, we will also examine how knowledge monozygotic twins have identical genotypes but often deof molecular mechanisms of epigenetics is being used to velop different phenotypes In addition, although one allele develop drugs and treatments for human diseases of each gene is inherited maternally and one is inherited paternally, in some cases, only the maternal or paternal allele is expressed, while the other is transcriptionally silent Epigenetic Alterations The newly emerging field of epigenetics is providing us with a basis for understanding how heritable changes other to the Genome than those in DNA sequence can influence phenotypic variation (ST Figure 1–1) These advances greatly extend Unlike the genome, which is identical in all cell types of an our understanding of the molecular basis of gene regulaorganism, the epigenome is cell-type specific and changes tion and have application in wide-ranging throughout the life cycle in response to areas including genetic disorders, cancer, environmental cues Like the genome, the The newly emerging epigenome can be transmitted to daughter and behavior field of epigenetics cells by mitosis and to future generations An epigenetic trait is a stable, mitotically and meiotically heritable phenoby meiosis In the following sections, we is providing us type that results from changes in gene will examine mechanisms of epigenetic with a basis for expression without alterations in the DNA and their role in development, understanding how changes sequence Epigenetics is the study of the aging, cancer, and environment–genome heritable changes ways in which these changes alter cell- and interactions, providing a snapshot of the tissue-specific patterns of gene expression other than those in many roles played by this recently discov(see Box 1) Epigenetic regulation of gene DNA sequence can ered mechanism of gene regulation expression uses reversible modifications of There are three major epigenetic influence phenotypic DNA and chromatin structure to mediate mechanisms: (1) reversible modification of variation the interaction of the genome with a variDNA by the addition or removal of methyl ety of environmental factors and generates groups; (2) remodeling of chromatin by the changes in the patterns of gene expression in response to addition or removal of chemical groups to histone proteins; these factors The epigenome refers to the epigenetic state and (3) regulation of gene expression by small, noncoding of a cell During its life span, an organism has one genome, RNA molecules 480 ST (a) Phenotype E PIG E NE TIC ALTE RATIONS TO T HE G EN O M E Promoter is unmethylated and gene can be transcribed CpG island in promoter (b) 481 Gene Promoter is methylated and gene is silenced Epigenome Genome CpG island in promoter Unmethylated CpG S T F I GURE – The phenotype of an organism is the product of interactions between the genome and the epigenome The genome is a constant from fertilization throughout life, but cells, tissues, and the organism develop different epigenomes as a result of epigenetic reprogramming of gene activity in response to environmental stimuli These reprogramming events lead to phenotypic changes through the life cycle Methylation Methylated CpG S T FIGUR E 1–2 Methylation patterns of CpG dinucleotides in promoters control activity of the adjacent genes CpG islands outside and within genes also have characteristic methylation patterns, contributing to the overall level of genome methylation the genome, genes with adjacent methylated CpG islands and methylated promoters are transcriptionally silenced The methyl groups in CpG dinucleotides occupy the major groove of DNA and block the binding of transcription factors necessary to form transcription complexes During development, methylation of CpG islands is a normal process and plays a role in the inactivation of X chromosomes in females (see Chapter for a detailed discussion of X inactivation) In addition, DNA methylation plays a role in parent-specific allele expression, a process called imprinting Histone Modification and Chromatin Configuration In addition to DNA methylation, chromatin remodeling is an important epigenetic mechanism of gene regulation Recall that chromatin is a dynamic structure composed of BOX The Beginning of Epigenetics C H Waddington coined the term epigenetics in the 1940s to describe how environmental influences on developmental events can affect the phenotype of the adult He showed that environmental alterations during development induced alternative phenotypes in organisms with identical genotypes Using Drosophila melanogaster, Waddington found that wing vein patterns could be altered by administering heat shocks during pupal development Offspring of flies with these environmentally induced changes showed the alternative phenotype without the need for continued environmental stimulus He called this phenomenon “genetic assimilation.” In other words, interactions between the environment and the genome during certain stages of development produced heritable phenotypic changes In the 1970s, Holliday and Pugh proposed that changes in the program of gene expression during development depends on the methylation of specific bases in DNA, and that altering methylation patterns affects the resulting phenotype Waddington’s pioneering work, the methylation model of Holliday and Pugh, and the discovery that expression of genes from both the maternal and paternal genomes is required for normal development, all helped set the stage for the birth of epigenetics and epigenomics as fields of scientific research SPECIAL TOPIC In mammals, DNA methylation takes place after replication and during cell differentiation This process involves the addition of a methyl group (–CH3) to cytosine, a reaction catalyzed by a family of enzymes called methyltransferases Methylation takes place almost exclusively on cytosine bases located adjacent to a guanine base, a combination called a CpG dinucleotide Many of these dinucleotides are clustered in regions called CpG islands, which are located in and near promoter sequences adjacent to genes (ST Figure 1–2) In euchromatin, CpG islands and promoters adjacent to essential genes (housekeeping genes) and cell-specific genes are unmethylated, making these genes available for transcription In heterochromatic regions of Gene 482 SPECIAL TOPIC: EPIGEN ETICS chromatin region For example, histone methylation can either increase or decrease gene activity, depending on which amino acids are methylated and how many methyl groups are added A large combination of histone modifications are possible, and the sum of the complex patterns and interactions of histone modifications that alter chromatin structure and gene expression is called the histone code These changes allow differentiated cells to carry out cell-specific patterns of gene transcription and to respond to external signals that modify these patterns without any changes in DNA sequence MicroRNAs and Long Noncoding RNAs Histones Me Ac P Ac SPECIAL TOPIC DNA S T F I G U RE 1– Clusters of histones in nucleosomes have their N-terminal tails covalently modified in epigenetic modifications that alter patterns of gene expression Ac = acetyl groups, Me = methyl groups, P = phosphate groups DNA wound around a core of histone proteins to form nucleosomes Several sets of proteins are involved in modifying histones: “writers” and “erasers” that add or remove chemical groups, and “readers” that interpret these modifications The N-terminal region of each histone molecule extends beyond the nucleosome, and the amino acids in these tails can be chemically modified by the addition of acetyl, methyl, and phosphate groups (ST Figure 1–3) These modifications remodel chromatin structure and loosen the binding between histones and DNA, and shift the spacing of nucleosomes, making genes accessible for transcription [ST Figure 1–4 (a)] Chromatin remodeling is a reversible process, and the removal of chemical groups changes chromatin from an “open” to a “closed” configuration and silences genes by making them unavailable for transcription [ST Figure 1–4(b)] We are learning that specific combinations of histone modifications control the transcriptional status of a In addition to messenger RNA (mRNA), genome transcription produces several classes of noncoding RNAs Two of these, microRNAs (miRNAs) and long, noncoding RNAs (lncRNAs), play important roles in epigenetic regulation of gene expression The structure and function of these RNAs are discussed in Special Topic Chapter 2—Emerging Roles of RNA miRNAs are involved in controlling pattern formation in developing embryos, the timing of developmental events, and physiological processes such as cell signaling Recent evidence shows that miRNAs also play roles in the development of cardiovascular disease and cancer Primary miRNA transcripts are processed into precursor molecules about 70–100 nucleotides long, containing a double-stranded stem loop and single-stranded regions Processing removes the single-stranded regions, and the loops move to the cytoplasm where they are altered further The resulting double-stranded RNA is incorporated into a protein complex where one RNA strand is removed and degraded, forming a mature RNA-Induced Silencing Complex (RISC) containing the remaining single miRNA strand RISCs act as posttranscriptional repressors of gene expression by binding to and destroying target mRNA molecules carrying sequences complementary to the RISC miRNA mRNAs that are partially complementary to the RISC miRNA are modified, making these target mRNAs less likely to be translated by ribosomes, resulting in downregulation of gene expression In addition to forming RISC complexes, miRNA can also associate with a different set of proteins to form RNAInduced Transcriptional Silencing (RITS) complexes RITS complexes reversibly convert euchromatic chromosome regions into facultative heterochromatin, silencing the genes located within these newly created heterochromatic regions Unlike the constitutive heterochromatin at telomeres and centromeres, facultative heterochromatin can be reversibly converted to euchromatin, making genes in this region once again accessible for transcription Long noncoding RNAs (lncRNAs) share properties in common with mRNAs; they often have 5′ caps, 3′ poly-A tails, and are spliced What distinguishes lncRNAs from coding ST E PIG E NE TIC S AND DE VE LOPME NT: I M P R I N T I N G 483 “Open” configuration DNA is unmethylated and histones are acetylated Genes can be transcribed (a) Ac Ac Ac Ac “Closed” configuration DNA is methylated at CpG islands (black circles) and histones are deacetylated Genes cannot be transcribed (b) H1 H1 H1 (mRNA) transcripts is the lack of an extended open reading frame that codes for this insertion of amino acids into a polypeptide As epigenetic modulators, lncRNAs bind to chromatin-modifying enzymes and direct their activity to specific regions of the genome At these sites, lncRNAs direct chromatin modification, altering the pattern of gene expression As information is accumulated about the molecular mechanisms associated with epigenetic regulation of gene expression, it has become clear that mutations in genes associated with these mechanisms are the basis for human genetic disorders Dozens of such diseases have been identified, including Rett syndrome, a disorder of the nervous system, caused by mutations associated with DNA methylation Weaver syndrome (a growth disorder) is associated with mutation of histone modification genes, and fragile-X syndrome (see Chapter 6) is associated with defects in miRNA processing In summary, epigenetic modifications alter chromatin structure by several mechanisms including DNA methylation, histone acetylation, histone deacetylation, and action of miRNAs, without changing the sequence of DNA These epigenetic changes create an epigenome that, in turn, can regulate normal development or generate physiological responses to environmental signals Epigenetics and Development: Imprinting Some epigenetic modifications are heritable and are transmitted to daughter cells at cell division This ensures that during development, cell and tissue-specific patterns of gene transcription are initiated and maintained In this way, developing nerve cells maintain their identity as they divide and form components of the central and peripheral nervous system The DNA carried by sperm and eggs are highly methylated However, shortly after fertilization, most of the methylation marks associated with differentiated cells are erased This resets embryonic cells to a pluripotent state, allowing them to undergo new epigenetic modifications to form the more than 200 cell types found in the adult body About the time the embryo is implanting in the wall of the uterus, cells take on tissue-specific epigenetic identities, and methylation patterns and histone modifications change rapidly to reflect those seen in differentiated cells Some genomic regions, however, escape these rounds of global demethylation and remethylation The genes contained in these regions remain imprinted with the methylation marks of the maternal and paternal chromosomes This inherited pattern of methylation in CpG-rich regions and in promoter sequences produces allele-specific imprinting The maternally and paternally inherited copies of imprinted genes remain transcriptionally silent during embryogenesis and later stages of development In humans, imprinted genes are usually found in clusters that can occupy more than 1,000 kb of DNA Because these genes are located near each other at a limited number of sites in the genome, mutation in one imprinted gene can affect the function of adjacent imprinted genes, amplifying the mutation’s phenotypic impact Mutations in imprinted genes can arise by producing changes in the DNA sequence SPECIAL TOPIC SPECIAL TOPIC S T F I GURE – Epigenetic modifications to the genome alter the spacing of nucleosomes and alter the availability of genes for transcription 1 SPECIAL TOPIC: EPIGEN ETICS or by causing dysfunctional epigenetic changes, called epimutations, both of which can cause heritable changes in gene activity Later in development, a second wave of epigenetic reprogramming occurs in the germ cells, which are located in the fetal gonads In this wave, all epigenetic modifications, even those on imprinted genes, are completely removed from the germ-cell chromosomes Once this process is accomplished, sex-appropriate epigenetic imprinting modifications are added to the germ-cell chromosomes Germ cells in the developing ovary are programmed with female imprints, and in male germ cells, the chromosomes receive male imprints (ST Figure 1–5) Most human disorders associated with imprinting originate during fetal growth and development Imprinting defects cause Prader–Willi syndrome, Angelman syndrome, Beckwith–Wiedemann syndrome, and several other diseases (ST Table 1.1) However, given the number of candidate genes and the possibility that additional imprinted genes remain to be discovered, the overall number of imprinting-related genetic disorders may be much higher In humans, most known imprinted genes encode growth factors or other growth-regulating genes An autosomal dominant disorder of imprinting, Beckwith–Wiedemann syndrome (BWS), offers insight into how disruptions of epigenetically imprinted genes lead to an abnormal phenotype BWS is a prenatal overgrowth disorder with abdominal wall defects, enlarged organs, large birth weight, and predisposition to cancer BWS is not caused by mutation, nor is it associated with any chromosomal aberration Instead it is a disorder of imprinting and is caused by abnormal methylation patterns and resulting altered patterns of gene expression Genes associated with BWS are located in a cluster of imprinted genes on the short arm of chromosome 11 All genes in this cluster regulate growth during prenatal development One of the genes in this cluster is IGF2 (insulin growth factor 2) Normally, the paternal allele of IGF2 is expressed, and the maternal allele is silenced In many individuals with BWS, the maternal IGF2 allele is not silenced As a result, both the maternal and paternal alleles are transcribed, resulting in the overgrowth of tissues that are characteristic of this disease The known number of imprinted genes represents only a small fraction (less than percent) of the mammalian genome, but they play major roles in regulating growth during prenatal development Because they act so early in life, external or internal factors that disturb the epigenetic pattern of imprinting or the expression of imprinted genes can have serious phenotypic consequences In the United States, assisted reproductive technologies (ART), including in vitro fertilization (IVF), are now Female gamete Male gamete Reprogrammed to female imprint Reprogrammed to male imprint Zygote Erasure in germ cells Erasure in germ cells Embryo Adult female Adult male Gene active Gene silenced S T FIGUR E 1–5 Imprinting patterns in germ cells are reprogrammed each generation to form sex-specific patterns of epigenetic modifications S T TA BLE 1.1 Some Imprinting Disorders in Humans Disorder Locus Albright hereditary osteodystrophy Angelman syndrome Beckwith–Wiedemann syndrome Prader–Willi syndrome Silver–Russell syndrome Uniparental disomy 14 20q13 15q11-q15 11p15 15q11-q15 Chromosome Chromosome 14 SPECIAL TOPIC SPECIAL TOPIC 484 ST used in over percent of all births Over the past decade, several studies have suggested that children born through the use of ART have an increased risk for imprinting errors (epimutations) created by the manipulation of gametes or embryos For example, the use of ART results in a four- to nine-fold increased risk of BWS; in addition, there are increased risks for Prader–Willi syndrome and Angelman syndrome Studies of children with Angelman syndrome or BWS conceived by IVF have shown that they have reduced levels or loss of maternal-specific methylation at known imprinting sites in the genome, confirming the role of epigenetics in these cases Although imprinting errors are uncommon in the general population (BWS occurs in about in 15,000 births), epimutations may be a significant risk factor for those conceived by ART However, large-scale and longitudinal studies will be needed to assess the relationship among imprinting abnormalities, growth disorders, and ART Until recently, the conventional view has been that cancer is clonal in origin and begins in a single cell that has accumulated a suite of dominant and recessive mutations in genes that promote (proto-oncogenes) or inhibit (tumorsuppressor genes) cell division, allowing it to escape control of the cell cycle Subsequent mutations allow cells of the tumor to become metastatic, spreading the cancer to other locations in the body where new malignant tumors appear As far back as the 1980s, however, researchers observed that cancer cells had much lower levels of methylation than normal cells derived from the same tissue Subsequent research by many investigators showed that complex changes in DNA methylation patterns are associated with cancer These studies showed that genomic hypomethylation is a property of all cancers examined to date DNA hypomethylation reverses the transcriptional inactivation, leading to unrestricted transcription of many gene sets, including those associated with the development of cancer It also relaxes control over imprinted genes, causing cells to acquire new growth properties In addition, selective hypermethylation of CpG islands and subsequent gene silencing are also properties of cancer cells, indicating that aberrant patterns of DNA methylation are a universal property of cancer As a result, cancer is now viewed as a disease that results from the accumulation of both epigenetic and genetic changes that lead to alterations in gene expression and the development of malignancy (ST Figure 1–6) 485 (a) Tumor-suppressor genes Promoter and CpG island hypermethylation Turn on Turn off Genetic mutations Oncogenes (b) Genomic chromosomal instability Global hypomethylation Chromosome translocations S T FIGUR E 1–6 The development and maintenance of malignant growth in cancer involves gene mutations, hypomethylation, hypermethylation, overexpression of oncogenes, and the silencing of tumor-suppressor genes Hypomethylation of repetitive DNA sequences in heterochromatic regions is associated with an increase in chromosome rearrangements and changes in chromosome number, both of which are characteristic of cancer cells In addition, hypomethylation of repetitive sequences leads to transcriptional activation of transposable DNA sequences such as LINEs and SINEs, further increasing genomic instability While widespread hypomethylation is a hallmark of cancer cells, hypermethylation at CpG islands and promoters silences certain genes, including tumor-suppressor genes (ST Table 1.2), often in a tumor-specific pattern For example, BRCA1 is hypermethylated and inactivated in breast and ovarian cancer, and MLH1 is hypermethylated in some forms of colon cancer Inactivation of tumorsuppressor genes by hypermethylation is thought to play S T TA BLE 1.2 Some Cancer-Related Genes Inactivated by Hypermethylation in Human Cancers Gene Locus Function Related Cancers BRCA1 17q21 DNA repair Breast, ovarian APC 5q21 Nucleocytoplasmic signaling Colorectal, duodenal MLH1 3p21 DNA repair Colon, stomach RB1 13q14 Cell-cycle control point Retinoblastoma, osteosarcoma AR Xq11-12 Nuclear receptor for androgen; transcriptional activator Prostate ESR1 6q25 Nuclear receptor for estrogen; transcriptional activator Breast, colorectal SPECIAL TOPIC Epigenetics and Cancer E PIG E NE TIC S AN D C AN CER SPECIAL TOPIC 486 SPECIAL TOPIC: EPIGEN ETICS an important complementary role to mutational changes that accompany the transformation of normal cells into malignant cells For example, in a bladder cancer cell line, one allele of the cell-cycle control gene CDKN2A is mutated, and the other, normal allele is inactivated by hypermethylation Because both alleles are inactivated (although by different mechanisms), cells are able to escape control of the cell cycle and divide continuously In many clinical cases, a combination of mutation and epigenetic hypermethylation occurs in familial forms of cancer However, genes other than those controlling the cell cycle are also hypermethylated in some cancers; these include genes that control or participate in DNA repair, differentiation, apoptosis, and drug resistance In addition to dysregulation of DNA methylation, many cancers also have altered patterns of chromatin conformation Chromatin remodeling is controlled by the reversible covalent modification of histone proteins in nucleosome cores This process involves three classes of enzymes: “writers” that add chemical groups to histones; “erasers” that remove these groups; and “readers” that recognize and interpret the chemical marks Abnormal regulation of each of these enzyme classes results in disrupted histone profiles and is associated with a variety of cancer subtypes Acetylation of histones is strongly correlated with the activation of gene transcription by reducing the strength of the interaction with DNA, making promoters available to RNA polymerase Mutations in genes of the histone acetyltransferase (HAT) family and genes of the histone deacetylase (HDAC) family, which encode enzymes that remove acetyl groups and induce transcriptional repression, are linked to the development of cancer For example, individuals with Rubinstein–Taybi syndrome inherit a germ-line mutation that produces a dysfunctional HAT and have a greater than 300-fold increased risk of cancer Abnormalities in histone deacetylation have been identified as a early stage in the transformation of normal cells into cancer cells HDAC complexes are selectively recruited to tumor-suppressor genes by mutated, oncogenic DNA binding proteins Action of the HDAC complexes at these genes converts the chromatin to a closed configuration and inhibits transcription, causing the cell to lose control of the cell cycle and undergo uncontrolled division Many of the mechanisms that cause epigenetic changes in cancer cells are not well understood, partly because they take place very early in the conversion of a normal cell to a cancerous one and partly because by the time the cancer is detected, alterations in the methylation pattern have already occurred The fact that such changes occur very early in the transformation process has led to the proposal that epigenetic changes leading to cancer may occur within adult stem cells in normal tissue Three lines of evidence support this idea: (1) epigenetic mechanisms can replace mutations as a way of silencing individual tumor-suppressor genes or activating oncogenes; (2) global hypomethylation may cause genomic instability and the large-scale chromosomal changes that are a characteristic feature of cancer; and (3) epigenetic modifications can silence multiple genes, making them more effective in transforming normal cells into malignant cells than sequential mutations of single genes In addition to changing ideas about the origins of cancer, the role of epigenetic mechanisms in cancer has opened the way to develop new classes of drugs for chemotherapy The focus of epigenetic therapy is the reactivation of genes silenced by methylation or histone modification, essentially reprogramming the pattern of gene expression in cancer cells Several epigenetic drugs have been approved by the U.S Food and Drug Administration, and another 12 to 15 drugs are in clinical trials One new drug, Vidaza, is used in the treatment of myelodysplastic syndrome, a precursor to leukemia, and for treatment of acute myeloid leukemia This drug is an analog of cytidine and is incorporated into DNA during replication during the S phase of the cell cycle Methylation enzymes (methyltransferases) bind irreversibly to decitabine, preventing methylation of DNA at other sites, effectively reducing the amount of methylation in cancer cells Other drugs that inhibit histone deacetylases (HDAC) have been approved by the FDA for use in epigenetic cancer therapy The HDAC inhibitor drug Zolinza is used to treat some forms of lymphoma The inhibition of HDAC activity reactivates tumor-suppressor gene activity, bringing the tumor cells under cell-cycle control and halting uncontrolled cell division Epigenetic cancer therapy drugs discovered to date are only moderately effective and are best used in combination with conventional chemotherapy drugs To develop more effective epigenetic therapy, several basic questions must be answered (Box 2) Research into the mechanisms and genomic locations of epigenetic modifications in cancer cells will help promote the design of more potent drugs for epigenetic chemotherapy Epigenetics and the Environment The epigenome receives and integrates intracellular, extracellular, and environmental signals with the information encoded in the genome to generate programs of gene expression in response to these signals This means that organisms are able to adapt to and respond to internal and external stimuli throughout their life span One of the most important sources of such signals is the environment Environmental agents including nutrition, chemicals, medical or recreational drugs, as well as social interactions, stress, and exercise, exert effects on the epigenome ST E PIG E NE TIC S AND THE E N VI RO N M EN T 487 BOX What More We Need to Know about Epigenetics and Cancer T he discovery that epigenetic changes may be as important as genetic changes in the origin, maintenance, and metastasis of cancers has opened new avenues of cancer research Key discoveries about epigenetic mechanisms include the finding of tumor-specific deregulation of genes by altered DNA methylation profiles and histone modifications, the discovery that epigenetic changes in histones or DNA methylation are interconnected, and the recognition that epigenetic changes can affect hundreds of genes in a single cancer cell These advances were made in the span of a few years, and while it is clear that epigenetics plays a key role in cancer, many questions remain to be answered before we can draw conclusions about the relative contributions of genetics and epigenetics to the development of cancer Some of these questions are as follows: • Do these changes arise primarily in • Is global hypomethylation in can- • Can we target specific genes for cer cells a cause or an effect of the malignant condition? • Once methylation alterations begin, what triggers hypermethylation in cancer cells? • Is hypermethylation a process that targets certain gene classes, or is it a random event? • Can we develop drugs that target cancer cells and reverse tumorspecific epigenetic changes? reactivation, while leaving others inactive? the agouti phenotype A mutant allele (Avy ) causes yellow pigment formation along the entire hair shaft, producing a yellow phenotype This allele is the result of the insertion of a transposable element near the transcription start site of the Agouti gene A promoter element within the transposon is responsible for this change in gene expression Researchers found that the degree of methylation in the transposon’s promoter is related to the amount of yellow pigment deposited in the hair shaft and that the amount of methylation varies from individual to individual The result is variation in coat-color phenotypes even in genetically identical mice (ST Figure 1–7) In these mice, coat colors range from yellow (unmethylated promoter) to pseudoagouti (highly methylated promoter) In addition to a gradation in coat color, there is also a gradation in body weight Yellow mice are more obese than the brown, pseudoagouti mice Yellow Slightly mottled Mottled Heavily mottled Pseudoagouti S T FIGUR E 1–7 Variable expression of yellow phenotype in mice caused by diet-related epigenetic changes in the genome SPECIAL TOPIC In addition, physical factors, including seasons of the year, storms, and temperature, can also generate changes in gene expression mediated through the epigenome Many of these epigenomic changes are heritable, and influence gene expression in future generations In humans it is difficult to directly determine the relative contributions of environmental or learned behavior as factors in changing the epigenome, but there is evidence that environmental factors such as maternal nutrition and exposure to agents that affect the developing fetus can have detrimental effects during adulthood Women who were pregnant during the 1944–1945 famine in the Netherlands had children with increased risks for obesity, diabetes, and coronary heart disease In addition, as adults, these individuals had significantly increased risks for schizophrenia and other neuropsychiatric disorders Members of the F2 generation also had abnormal patterns of weight gain and growth The most direct evidence for the role of environmental factors in modifying the epigenome comes from studies in experimental animals A low-protein diet fed to pregnant rats results in permanent changes in the expression of several genes in both the F1 and F2 offspring Increased expression of liver genes is associated with hypomethylation of their respective promoter regions Other evidence indicates that epigenetic changes triggered by this diet modification were gene-specific A dramatic example of how epigenome modifications affect the phenotype comes from the study of coat color in mice, where color is controlled by the dominant allele Agouti (A) In homozygous AA mice, the gene is active only during a specific time during hair development, resulting in a yellow band on an otherwise black hair shaft, producing stem cells or in differentiated cells? 488 SPECIAL TOPIC: EPIGEN ETICS SPECIAL TOPIC To evaluate the role of environmental factors in modifying the epigenome, the diet of pregnant Avy mice was supplemented with methylation precursors, including folic acid, vitamin B12, and choline In the offspring, coat-color variation was reduced and shifted toward the pseudoagouti (highly methylated) phenotype The shift in coat color was accompanied by increased methylation of the transposon’s promoter These findings have applications to epigenetic diseases in humans For example, the risk of colorectal cancer is linked directly to folate dietary deficiency and activity differences in enzymes leading to the synthesis of methyl donors In addition to foods that mediate epigenetic changes in gene expression via methylation, it has recently been reported that miRNAs from some plant foods such as rice and potatoes enter the blood stream of humans and regulate expression of target genes Follow-up studies have not confirmed these findings, but further work should carefully explore the possibility that genetic information can be transferred by dietary intake Epigenetics and Behavior A growing body of evidence shows that epigenetic changes, including alterations in DNA methylation and histone modification, have important effects on behavioral phenotypes In mice, two regions of the brain show preferential expression of maternal or paternal alleles Upward of 1000 genes in the developing brain are imprinted, supporting the idea that epigenetic mechanisms operating in different regions of the brain may represent a major form of behavioral regulation In humans, epigenetic changes have been documented during the progression of neurodegenerative disorders and in neuropsychiatric diseases, both of which show altered behavioral phenotypes Epigenetic changes to the nervous system occur in Alzheimer disease, Parkinson disease, Huntington disease, and in schizophrenia and bipolar disorder However, because the phenotypes in these disorders are influenced by a number of factors including genetic predispositions, events in prenatal development, and prenatal and postnatal environmental effects, it is not yet possible to define a cause and effect relationship between epigenomic changes and the onset and intensity of neural disorders One of the most significant findings in the epigenetics of behavior is that stress-induced epigenetic changes that occur prenatally or early in life can influence behavior (and physical health) later in adult life, and potentially be transmitted to future generations For example, an early study showed that newborn rats that experienced reduced levels of maternal nurturing (low-MN) did not adapt well to stress and to anxiety-inducing situations in adulthood In rats and humans, the hypothalamic region of the brain mediates stress reactions by controlling levels of glucocorticoid hormones via the action of cell-surface glucocorticoid receptors (GRs) In rats exposed to normal levels of nurturing care early in life (high-MN), GR expression is increased and adults are stress-adaptive However, low-MN rats had reduced levels of GR transcription and were less able to adapt to stress The relevant observation was that the differences in GR expression were associated with differences in DNA methylation and histone acetylation levels in the GR gene promoter Low-MN rats had significantly higher levels of methylation than high-MN rats Subsequent research showed that differences in DNA methylation are present in hundreds of genes across the genome, all of which show differential expression in lowMN and high-MN adults Significantly, in low-MN adults, administering drugs that lower methylation levels reversed the effect of poor early-life nurturing and improved their stress responses Later studies showed that these phenotypes can be transmitted across generations Female rats raised by more nurturing mothers are more attentive to their own newborns, whereas those raised by less nurturing mothers are much less attentive and less nurturing to their offspring Similar epigenetic changes triggered by prenatal or early childhood environmental factors may alter later behavior in humans For example, it is known that a history of child abuse increases the risk of suicide later in life One study examined epigenomic differences in brain tissue from two classes of suicide victims and in others who died suddenly of unrelated causes One class of suicide victims had experienced childhood abuse, and the other had no history of child abuse Those who died suddenly of unrelated causes also had no history of child abuse High levels of GR gene promoter methylation were found in suicide victims with a history of child abuse, but not in the other two groups These results are consistent with those found in experimental animals and suggest that parental care, epigenomic variation, GR expression, and adult behavior are linked in both rats and humans Further research may lead to the development of drugs to treat depression and help prevent suicide in humans As the role of the epigenome (the epigenetic state of a cell) in disease has become increasingly clear, researchers across the globe have formed multidisciplinary projects to map all the epigenetic changes that occur in the normal genome and to study the role of the epigenome in specific diseases DIS C U S S IO N Q U ES T I O N S These include: • NIH Roadmap Epigenomics Project • Human Epigenome Project (HEP) • International Human Epigenome Consortium (IHEC) • International Cancer Genome Consortium (ICGC) In our conclusion, we will discuss some of these projects and their goals The NIH Roadmap Epigenomics Project focuses on how epigenetic mechanisms controlling stem cell differentiation and organ formation generate biological responses to external and internal stimuli that result in disease Part of this Project, the Human Epigenome Atlas collects and catalogs detailed information about epigenomic modifications at specific loci in different cell types, different physiological states, and different genotypes These data allow researchers to perform integrative and comparative analysis of epigenomic data across genomic regions or entire genomes The Human Epigenome Project is a multinational, public/private consortium organized to identify, map, and establish the functional significance of all DNA methylation 489 patterns in the human genome across all major tissue types in the body Analysis of these methylation patterns may show that genetic responses to environmental cues mediated by epigenetic changes are a pathway to disease The International Human Epigenome Consortium (IHEC) is a global program established to determine how the epigenome has altered human populations in response to environmental factors The consortium is cataloging the epigenomes of 1000 people from different populations around the world and will also include the epigenomes of 250 different cell types Although these projects are in the early stages of development, the information already available strongly suggests that we are on the threshold of a new era in genetics, one in which we can study the development of disease at the genomic level and understand the impact of environmental factors on gene expression The results of these projects may help explain how environmental settings in early life can affect predisposition to adulthood diseases Visit the Study Area in MasteringGenetics for a list of further readings on this topic, including journal references and selected web sites What are the major mechanisms of epigenetic genome modification? What parts of the genome are reversibly methylated? How does this affect gene expression? What are the roles of proteins in histone modification? Describe how reversible chemical changes to histones are linked to chromatin modification What is the histone code? What is the difference between silencing genes by imprinting and silencing by epigenetic modifications? Why are changes in nucleosome spacing important in changing gene expression? How microRNAs regulate epigenetic mechanisms during development? What is the role of imprinting in human genetic disorders? Discussion Questions Imprinting disorders not involve changes in DNA sequence, but only the methylated state of the DNA, or the modification of histones Does it seem likely that imprinting disorders could be treated prenatally or prevented by controlling the maternal environment in some way, perhaps by dietary changes? Should fertility clinics be required by law to disclose that some assisted reproductive technologies (ART) can result in epigenetic diseases? How would you and your partner balance the risks of ART with the desire to have a child? How can the role of epigenetics in cancer be reconciled with the idea that cancer is caused by the accumulation of mutations in tumor-suppressor genes and proto-oncogenes? If true, would the knowledge that plant miRNAs can affect gene expression in your body affect your food choices? SPECIAL TOPIC Review Questions SPECIAL TOPICS IN MOD ERN GE NE TICS Emerging Roles of RNA SPECIAL TOPIC I n 1958, Francis Crick proposed his theory for the cenhairpin or stem-loop structures As with proteins, such tral dogma of molecular biology, which described how three-dimensional folding imparts enormous structural genetic information flows from DNA to RNA to protein and thus functional diversity to RNA molecules AdditionProteins, as the end products of the gene, were viewed as ally, RNAs can associate with proteins and modify their the functional units of the cell, while DNA was considered activity or recruit them to other nucleic acids through coma stable molecule that stores genetic information RNA was plementary base pairing considered as the temporary message between DNA and As our knowledge of the diversity of RNA structure protein This insightful model was the basis for the emergand function has increased, it has shifted several paraing field of molecular biology and is still considered digms for how we think about life at the molecular level accurate today However, this model did not Whereas we once thought only proteins anticipate many other types of noncoding “Noncoding RNAs could be enzymes, we are now searching RNAs (ncRNAs) that today are known to for ways to use catalytic RNAs as therahave myriad play important roles in genetic processes peutics to fight disease Whereas we once Advancing rapidly in the early 1990s and functions in the cell, thought that most of the human genome continuing to this day, studies have revealed including catalyzing was “junk DNA,” we are now discovering that ncRNAs are more abundant, more reactions, modifying that most noncoding sequences can be diverse, and more important than we could transcribed and even display important protein activity, ever have imagined anticancer functions Whereas we never defending the cell While it has been known for well over imagined that RNA could regulate gene ∼ against foreign a decade that only percent of 3.2 billion expression, we now know that it does so at the level of both transcription and base pairs making up the human genome nucleic acids, and translation As you will soon learn, RNA encode proteins, we have recently learned regulating gene discoveries have even changed our views that more than 75 percent of the human expression.” on how life began In this Special Topics genome is in fact transcribed into RNA chapter we will examine the versatility Among these many transcripts are numerof RNA and highlight some modern applications of RNAs ous examples of ncRNAs that have experimentally deteras research tools and therapeutic agents mined biological roles Earlier in the text, we introduced several examples You may recall from studying DNA replication that RNA serves as a primer for DNA synthesis and as a template molecule for reverse transcription of telomere Catalytic Activity of RNAs: sequences by telomerase (Chapter 10) And small nuclear RNAs (snRNAs) bound by several proteins (snRNPs) catRibozymes and the Origin of Life alyze the splicing of pre-mRNAs as they are converted to mature transcripts (Chapter 12) Still another example is In the 1960s, several scientists including Carl Woese, Francis the relatively recent finding that the 23S prokaryotic and Crick, and Leslie Orgel postulated that RNAs could be 28S eukaryotic rRNAs within the ribosome are responsible catalysts based on the fact that RNAs can form complex for catalyzing peptide bond formation during translation secondary structures However, the first examples of RNAs (Chapter 13) acting as biological catalysts, or ribozymes, came many The key to RNA’s versatility lies in its structural years later in the early 1980s from two different studies diversity While RNA is transcribed as a single strand that Tom Cech’s lab at the University of Colorado at Boulder discan serve as an informational molecule such as mRNA, it covered that introns (Group I introns) in some RNAs could can also form complementary bonds with other nucleic be spliced out in the complete absence of any proteins acids, creating DNA/RNA duplexes and RNA/RNA duplexes Such self-splicing introns demonstrated that RNAs could (double-stranded RNA), or it can fold back on itself to form break and form phosphodiester bonds (see Chapter 12) 490 ST C ATA LY TIC AC TIVIT Y OF RNAs: RIBOZ YME S A ND THE O R I G I N O F L I FE 491 SPECIAL TOPIC U C Sidney Altman’s lab at Yale University discovered tC19Z U G that the M1 RNA moiety of RNase P is a ribozyme C G G C RNase P cleaves the 59 leader sequence of precurG AA C G C A C G sor tRNAs during the process of tRNA biogenesis AC C GU A A A U GGGGAU ACCUG GGCGAUGUU Cech and Altman were awarded the 1989 Nobel A C Prize in Chemistry “for their discovery of cataCGAGGCCCCUAA CUGGAC CCGC AC A G U U A C lytic properties of RNA.” UA GGCUCC 3‘ The discovery of ribozymes has had major U G A C U A implications for theories involving the molecular A GCUUGAGAACAUCU origins of life Since DNA encodes proteins and U A C C C CCACAGAC AACUCUUGUAGG proteins are required to replicate DNA, this presC G A ents a “chicken or the egg” paradox as to which G A A GCGCCAACGGAGG A G C U may have preceded the other as life originated G C A CGCGGUGGCUUCC C G on Earth However, since RNA can both encode G A U information and catalyze reactions, it is attracG C U tive to hypothesize that RNA was the precursor A A ACA GA AA to cellular life At the center of this RNA World A ssC19 A Hypothesis, coined by Walter Gilbert in 1986, is A Primer A A the possibility that RNAs can be self-replicating 5‘ CUGCCA ACCG UGCGAAGCGUG 5‘ GUCAUUG A Can an RNA molecule catalyze the synthesis 3‘ (A10)GACGGUUGGC(ACGCUUCGCAC)nACGCUUCGCACAGUAACUG 5‘ of an identical RNA molecule? If we examine the S T FIGUR E 2–1 A ribozyme polymerase The lab-synthesized tC19Z naturally occurring ribozymes, the repertoire of ribozyme is capable of copying RNA templates up to 95 nucleotides long, ribozyme-catalyzed reactions is fairly limited with a sequence complementary to the ssC19 59-end of the ribozyme Most ribozymes catalyze the cleavage or formation of phosphodiester bonds One notable carbon–carbon bond formation, RNA aminoacylation, exception is the ribosome, as discussed earlier; the 23S and other reactions Importantly, ribozymes with RNA prokaryotic and 28S eukaryotic large subunit rRNAs are polymerase activity have also been isolated Recently, the ribozymes that catalyze peptide bond formation However, Holliger lab at the University of Cambridge reported a ribonaturally occurring ribozymes are limited in their catazyme polymerase called tC19Z, which can reliably copy lytic activity in that they generally are only able to cataRNA molecules up to 95 nucleotides long (ST Figure 2–1) lyze a reaction once For example, self-splicing introns cut One template RNA that tC19Z copied successfully was that themselves out and ligate the exons together but can of another ribozyme, which was subsequently shown to be neither repeat this reaction on other molecules nor reverse functional Such a finding lends further credibility to the it Thus far, there are only three examples of naturally RNA World Hypothesis, but it remains an open question occurring ribozymes that are capable of “multiple turnover”: as to how complex molecules such as the ribonucleotide the ribosome, snRNPs, and RNase P So the naturally building blocks of RNAs could be synthesized in a prebiotic occurring ribozymes fall far short of the self-replicating world molecules postulated in the RNA World Hypothesis NoneGenetically engineered ribozymes are also currently theless, one may predict that self-replicating RNAs did being investigated for therapeutic use Angiozyme, designed once exist but that during evolution, they were usurped to block angiogenesis, or the growth of new blood vessels, by the more stable informational storage of DNA and the was the first ribozyme to be tested in clinical trials Canmore efficient catalysis of enzymes cer cells often secrete a protein called vascular endothelial growth factor (VEGF), which signals through VEGF recepGenetic Engineering of Ribozymes tors on cells of blood vessels to induce vascularization of tumors This provides the cancer cells a supply of oxygen To better understand the potential of ribozymes, scienand nutrients as well as a route for spreading to other parts tists have turned toward genetically engineering them of the body To battle this problem, Angiozyme targets By starting with a large collection of lab-synthesized RNA VEGF receptor mRNAs for destruction to prevent formamolecules of varying sequences, scientists have been able to tion of the receptor proteins, thus stopping angiogenesis identify and isolate molecules with specific catalytic activAngiozyme showed promising results in animal tests and ity Using this process, dubbed in vitro evolution, labprogressed to phase II clinical trials on patients with metsynthesized ribozymes with novel functions have been created astatic breast cancer Although analysis of patient blood These functions include RNA and DNA phosphorylation, 492 SPECIAL TOPIC: EMERGIN G ROLE S OF RNA serum showed a significant reduction in VEGF receptor levels, Angiozyme was not effective at fighting cancer and further development of the drug was halted This study suggests that there is potential for ribozymes as drugs because there was specific target inhibition, but the study also demonstrates the complex nature of fighting cancer There are ongoing clinical trials for ribozymes engineered for multiple targets associated with fighting HIV infection Negative regulation Positive regulation sRNA pairing inhibits ribosome binding Translation repressed sRNA RBS RBS X A sRN SPECIAL TOPIC Small Noncoding RNAs Play Regulatory Roles in Prokaryotes Prokaryotic small noncoding RNAs (sRNAs) were discovered decades ago, but their regulatory functions are still being elucidated and new sRNAs are still being discovered It is thought that E coli contains roughly 80–100 sRNAs, and other species are reported to have three times that number sRNAs are generally between 50 and 500 nucleotides long and are involved in gene regulation and the modification of protein function sRNAs involved in gene regulation are often transcribed from loci that partially overlap the coding genes that they regulate However, they are transcribed from the opposite strand of DNA and in the opposite direction, making them complementary to mRNAs transcribed from that locus In other cases, sRNAs are complementary to target mRNAs, but are transcribed from loci that not overlap target genes sRNAs regulate gene expression by binding to mRNAs (usually at the 59 end) that are being transcribed In some cases, sRNA binding to mRNAs blocks translation of the mRNA by blocking the ribosome binding site (RBS) In other cases, binding enhances translation by preventing secondary structures from forming in the mRNA that would block translation, often by masking the RBS (see ST Figure 2–2) Thus, sRNAs can be both negative and positive regulators of gene expression sRNAs have been shown to play important roles in gene regulation in response to changing environmental conditions or stress For example, the sRNA DsrA of E coli is upregulated in response to low temperature and promotes the expression of genes that enable the long-term survival of the cell under stressful conditions, or stationary phase DsrA binds to rpoS mRNA to promote the translation of the RpoS stress response sigma factor (see Chapter 12), which is the primary transcriptional regulator of genes that promote stationary phase In contrast, RyhB sRNA from E coli is a negative regulator of gene expression In response to low iron levels, RyhB is transcribed to inhibit the translation of several nonessential iron-containing enzymes so that the more critical iron-containing enzymes can utilize what little iron is present in the cytoplasm sRNA pairing unmasks the RBS Translation proceeds S T FIGUR E 2–2 Bacterial small noncoding RNAs regulate gene expression Bacterial sRNAs can be negative regulators of gene expression by binding to mRNAs and preventing translation by masking the ribosome binding site (RBS), or they can be positive regulators of gene expression by binding to mRNAs and preventing secondary structures (that would otherwise mask an RBS) and enable translation In addition to regulating mRNA translation, sRNAs can modulate protein activity The E coli 6S RNA is a 184nucleotide sRNA that accumulates during stationary phase and functions to repress the transcription of genes that mediate vegetative growth It accomplishes this by binding to RNA polymerases associated with the RpoD sigma factor that promotes the transcription of genes required for vegetative growth 6S RNA has a hairpin structure that binds to the active site of the RNA polymerase and mimics doublestranded DNA, thus blocking the RNA polymerase from binding to promoter regions of target genes Prokaryotes Have an RNA-Guided Viral Defense Mechanism Biological warfare between viruses (bacteriophages) and prokaryotes as a result of their coevolution has led to a diversity of defense mechanisms For example, bacteria express endonucleases (restriction enzymes), which cleave specific DNA sequences Such restriction enzymes destroy foreign bacteriophage DNA, while the host’s genomic DNA is protected by DNA methylation These same restriction enzymes have been adopted by molecular biologists for use in recombinant DNA technology (see Chapter 17) Bacteria can also defend against bacteriophage attack by blocking bacteriophage adsorption, blocking DNA insertion, and inducing suicide in infected cells All of these defense mechanisms are innate because they are not tailored to a specific pathogen In contrast, recent research has identified ST PROKA RYOTE S HAVE A N RNA-G U IDE D VIRAL DE FE NSE M ECHAN I S M 493 sequences between them These spacers remained a mystery until 2005 when three independent studies demonstrated that the spacer sequences of CRISPRs were identical to bacteriophage sequences We now know that insertion of fragments of phage DNA into CRISPR loci is required A CRISPR array for adaptive immunity CRISPR-associated genes (cas) are found adjacent to CRISPR loci and are Acquisition Spacers required for various aspects of adaptive immunity (ST Figure 2–3) There are many different types cas genes Leader of Cas proteins, and they include proteins with Repeats DNase, RNase, and helicase domains, as well as proteins of unknown function CRISPR transcription The CRISPR/Cas mechanism for adaptive immunity has three discrete steps (ST Figure 2–3) The first step is the integration of invading phage DNA into CRISPR loci and is often referred to as Cas or RNase III acquisition It is not completely clear how the host cell fragments phage DNA and inserts it into these 3' crRNA biogenesis 5' specific CRISPR loci However, one of the Cas proteins, Cas1 from E coli, is a double-stranded DNase and is thought to be involved in this process Target viral DNA Interference Certain sequences in phage genomes also appear to be targeted for CRISPR integration 3' 5' After acquisition, the second step is the Cas protein transcription of CRISPR loci and the processing of CRISPR-derived RNAs (crRNAs) This step S T F I GURE – The CRISPR/Cas system mediates an adaptive immune response in prokaryotes CRISPR loci contain repeat sequences derived from is referred to as crRNA biogenesis, which also foreign viral DNA separated by spacer sequences cas genes, located nearby requires assistance from Cas proteins CasE from in the genome, are involved in CRISPR-mediated adaptive immunity The E coli is an endoribonuclease that cleaves long CRISPR/Cas mechanism for adaptive immunity has three steps of acquisiprecursor crRNAs (pre-crRNAs) into mature tion (viral DNA is inserted into CRISPR loci), crRNA biogenesis (CRISPR loci crRNAs Each mature crRNA contains a spacer are transcribed and processed into short crRNAs), and interference (crRNAs sequence (determined by the integrated phage direct Cas proteins to viral DNA to destroy it) DNA fragment) flanked by repeat sequences In some species, CRISPR repeat sequences form secondary structures such as stem-loops that are required for an adaptive viral defense strategy whereby previous infecprocessing In other species, a short RNA called a transaction by a pathogen provides immunity to that cell and its tivating CRISPR RNA (tracrRNA) is also required for prodescendants In 2007, researchers at a Danish food science cessing tracrRNAs are complementary to CRISPR repeat company called Danisco sought to create a strain of Strepsequences in the pre-crRNA and bind to them to form sectococcus thermophilus that was more resistant to bacterioondary structures phage attack, thus making it more efficient for use in the The third and final step is the targeting and cleavage of production of yogurt and cheese The Danisco researchers phage DNA sequences complementary to crRNAs This step found that when they exposed S thermophilus to a particular is referred to as interference Mature crRNAs associate bacteriophage, it essentially became immune to it The with one or more Cas proteins to form CRISPR-associated Danisco research group and others identified an adaptive ribonucleoprotein complexes, which bind to foreign DNA defense mechanism that uses RNA as a guide for destroying through crRNA-DNA complementary interactions When viral DNA a complementary crRNA-DNA match is made, the comThe adaptive viral defense mechanism is dependent plex catalyzes cleavage of both strands of the foreign DNA on a genomic feature called a CRISPR, so named because within the region of complementarity Cas3, found in many it contains clustered regularly interspaced short palinspecies, has both helicase and DNase domains and is impordromic repeats (ST Figure 2–3) CRISPR sequences were first tant for target DNA cleavage Amazingly, this system is identified in 1987 in the E coli genome based on a simple able to distinguish “self” DNA from foreign DNA; otherwise description of repeated DNA sequences with unique spacer SPECIAL TOPIC SPECIAL TOPIC SPECIAL TOPIC 494 SPECIAL TOPIC: EMERGIN G ROLE S OF RNA the CRISPR/Cas system would cleave CRISPR spacers in the genome since they contain bacteriophage sequences “Self” identification is possible because the crRNAs contain CRISPR repeat sequences that are complementary to the “self” DNA and inhibit cleavage Complementarity outside of the spacer region prevents DNA cleavage presumably because of conformational changes to the CRISPR-associated ribonucleoprotein complexes Although the details of the mechanism have been studied in just a select few species, it is already apparent that the CRISPR/Cas system is variable with respect to the number of CRISPR loci, the number of repeats, the length of the repeats, the length of the spacers, and the cas genes It is also apparent, however, that some variation of the CRISPR/Cas adaptive defense system is present in roughly 48 percent of sequenced bacterial species and roughly 95 percent of sequenced Archaea, suggesting that it is an ancient and successful defense strategy Furthermore, evidence suggests that this defense strategy is not just active against bacteriophages; this system is active against plasmid DNA as well CRISPR/Cas has been harnessed as an important tool for molecular biology By expressing components of the CRISPR/Cas system in eukaryotic cells, scientists are able to specifically and efficiently target genes for mutation The CRISPR/Cas system of Streptococcus pyrogenes has been used most extensively for this purpose because only three things are needed to target a specific DNA sequence: the mature crRNA, the Cas9 endonuclease, and a tracrRNA (see above) required for crRNA interaction with Cas9 However, the crRNA and tracrRNA can be genetically engineered and expressed as a hybrid small guide RNA (sgRNA), which provides the requisite secondary structure needed for activity Thus, by inserting Cas9 and sgRNAs, one can target a specific gene for interruption This can be done by inserting the genes encoding for Cas9 and the sgRNA or by injecting in vitro transcribed sgRNA and mRNA for Cas9 Since this technique was published in 2012, it has been successfully used to target genes for mutation in Drosophila, C elegans, zebrafish, Arabidopsis, mice, and cultured human cells, demonstrating the broad applicability and the fast pace with which this is being adopted as a tool for science CRISPR/Cas technology also has potential for specific genome editing in human embryos In fact, this has already been achieved A team of researchers in China led by Dr Junjiu Huang used CRISPR/Cas technology to create specific edits in the genomes of human embryos However, these embryos were abnormal triploids formed in vitro when two sperm fertilized the same oocyte Such triploid embryos are inviable early in development While this study showed that targeted genome editing in human embryos is indeed possible with CRISPR/Cas, the study also demonstrated that the specific edits occurred at a low efficiency and that the genomes also suffered non-specific, off-target alterations The fast pace of CRISPR/Cas genome editing technology has led many researchers to call for a moratorium on use of CRISPR/Cas genome editing of human DNA in any scenarios in which the targeted changes would be inherited by the next generation This, however, does leave open the possibility of using CRISPR/Cas for gene therapy on somatic cells that will not be inherited (see Box 1) Small Noncoding RNAs Mediate the Regulation of Eukaryotic Gene Expression As discussed above, small noncoding RNAs (sRNAs) regulate gene expression and modulate protein activity in prokaryotes Indeed, small noncoding RNAs are present in eukaryotes as well Despite the fact that they are both called small noncoding RNAs, the prokaryotic and eukaryotic varieties differ in terms of their length, biosynthesis, mechanisms of action, and repertoire of regulatory activities To help keep this distinction clear, scientists have applied the abbreviation sRNA to refer to the prokaryotic variety and refer to small noncoding RNAs in eukaryotes as sncRNAs sncRNAs are short (20–31 nucleotides long) double-stranded RNAs with a 2-nucleotide 39 overhang and are involved in silencing gene expression at the transcriptional or posttranscriptional levels through a process called RNA-induced gene silencing There are three different types of sncRNAs, which are classified according to their biogenesis and function Small interfering RNAs (siRNAs) protect the cell from exogenous RNAs, such as from retroviral genomes However, the exploitation of the siRNA mechanism for research and application has already proved indispensable and still has enormous potential MicroRNAs (miRNAs) play important roles in regulating gene expression, and their malfunction is associated with human diseases (see Chapter 15 for additional coverage of siRNAs and miRNAs) The recently identified and poorly understood Piwi-interacting RNAs (piRNAs) protect the germ cells from the harmful effects of mobile DNA elements We will consider siRNAs and miRNAs in more depth below siRNAs and RNA Interference Nobel Prize-winning research from Andrew Fire and Craig Mello in 1998 demonstrated that when double-stranded RNA (dsRNA) is introduced into the cells of C elegans, it leads to a “potent and specific” degradation of complementary mRNAs through a process we call RNA interference (RNAi) We now know a great deal about the molecular basis of RNAi, that it functions broadly in eukaryotes, and how to harness it as a tool for research and as a therapeutic ST SMALL NONCODING RNA s ME DIATE THE RE G U L ATION OF E U KA RYOTIC G E NE EXP R ES S I O N 495 BOX RNA-Guided Gene Therapy with CRISPR/Cas Technology T agent When double-stranded RNAs (dsRNAs) are present in the cytoplasm of a eukaryotic cell, they are recognized and cleaved into ∼22-nucleotide-long RNAs with a 2-nucleotide 3′ overhang by an RNAse III protein called Dicer These siRNAs then associate with the RNA-induced silencing complex (RISC), which contains an Argonaute family protein that binds RNA and has endonuclease or “slicer” activity RISC cleaves and evicts one of the two strands of the siRNA and retains the other strand as a siRNA “guide” to recruit RISC to a complementary mRNA RISC then cleaves the mRNA in the middle of the region of siRNA/mRNA complementarity (see ST Figure 2–4) Cleaved mRNA fragments lacking a methylated cap or a poly-A tail are then quickly degraded in the cell by nucleases The discovery of RNAi in 1998 led to an almost immediate revolution in the investigation of gene function As long as a gene’s sequence is known, one can quickly synthesize dsRNA corresponding to that sequence to inhibit or “knock down” that gene’s function As an example of how quickly RNAi became a tool for research, five years after the discovery of RNAi, the Ahringer lab from the University of Cambridge afflicted with dominant disorders are heterozygous; they have a dominant mutant allele and a normal allele In theory, CRISPR/Cas may be used to specifically inactivate the dominant mutant allele, rendering it a recessive, loss-of-function allele In many cases, the undisrupted allele would then be able to restore normal gene function Despite encouraging foundational studies on CRISPR/Cas genome editing, bright minds working on the problem, and financial backing, there are some important hurdles to overcome before Editas Medicine can achieve its goals For example, how will sgRNAs and Cas9 (or genes encoding these components) be delivered to the appropriate target cells in the patient effectively and without causing damaging side-effects? Delivery is a difficult problem for all gene therapy strategies However, a concern specific to CRISPR/Cas is that some studies found that the Cas9 endonuclease cleaves DNA at off-target loci in the genome It will be important to achieve absolute target specificity before this strategy will be safe for therapeutic applications used RNAi on a massive scale to determine the loss-of-function phenotypes for 86 percent of the genes in the C elegans genome Subsequently, many genome-wide RNAi screens have been performed to test genes, one at a time, for roles in molecular, cellular, and developmental biology pathways siRNAs are also important tools for biomedical research and have great potential as therapeutic agents (see Chapter 15) Thus far, there have been over 30 clinical trials for siRNA drugs to treat diseases, such as asthma, cancer, and vision impairment Despite substantial evidence that siRNAs are effective against a broad range of gene targets, siRNA delivery to target cells remains difficult siRNAs are rapidly degraded when injected into the bloodstream, and they cannot passively cross the plasma membrane Due to these siRNA delivery problems, several pharmaceutical companies, such as Novartis, Pfizer, and Roche, abandoned research and clinical trials involving siRNA drugs Since 2011, there has been a recent resurgence of siRNA drug development due to reports of successful delivery of siRNAs using either nanoparticles or cell-penetrating peptides Clearly, an effective delivery solution will need to be SPECIAL TOPIC he CRISPR/Cas system has proved to be an efficient and specific genome-editing tool that has been exploited by scientists to induce mutations in target genes of interest Based on this success, a company called Editas Medicine, based out of Cambridge, Massachusetts, aims “to translate its genome editing technology into a novel class of human therapeutics that enable precise and corrective molecular modification to treat the underlying cause of a broad range of diseases at the genetic level.” Simply put, Editas Medicine hopes to develop CRISPR/ Cas technology for gene therapy Founded by five academic scientists with expertise in CRISPR/Cas technology and supported by $43 million in venture capital investment, Editas Medicine thinks that CRISPR/Cas technology may offer a solution to some of the problems of traditional gene therapy strategies Although we not know their precise strategies or their intended target diseases, one can speculate on how CRISPR/Cas can be used for gene therapy Since the CRISPR/Cas system uses RNA as a guide to direct the Cas9 endonuclease to complementary DNA sequences in the genome, small guide RNAs (sgRNAs) may be engineered to target disease-causing mutant alleles For recessive genetic disorders, CRISPR/Cas can be used to induce a double-stranded break in the target gene, which activates the DNA repair machinery If a normal copy of the gene is inserted into cells along with the CRISPR/Cas components, it can be used as a template for correcting the mutant allele by homologous recombination (see Chapter 14) This has a distinct advantage over traditional gene therapy methods because the gene is corrected in its native location in the genome rather than being randomly inserted or not integrated into a chromosome at all Although traditional gene therapy methods are not useful for treating dominant genetic diseases, CRISPR/Cas may be effective and thus extend the reach of gene therapy to a greater range of diseases Most of the individuals 496 SPECIAL TOPIC: EMERGIN G ROLE S OF RNA Pri-miRNA Drosha Pol II Pre-miRNA Nucleus AAAA Cytoplasm Pre-miRNA dsRNA Dicer Dicer Dicer siRNA AGO SPECIAL TOPIC miRNA pathway siRNA pathway Dicer AGO RISC RISC AGO protein levels in blood serum by 80% and halted progression of nervous system problems in FAP patients Although the initial studies of RNAi involved the introduction of lab-synthesized dsRNAs, naturally occurring dsRNAs have been described as well, such as bi-directional transcription of genomic loci and retroviral genomes siRNAs produced without insertion of labsynthesized RNAs are called endogenous siRNAs (endosiRNAs) In fact, it is thought that the RNAi pathway evolved, in part, as a defense strategy against retroviruses, which often have dsRNA genomes In that model, retroviral dsRNA would be chopped into siRNAs, which then guide RISC to retroviral transcripts to destroy them RISC RISC AGO mRNA AAAA mRNA AAAA ry y ctl enta a Ex em pl m o c AAAA mRNA degradation Partially complementary RISC AGO Translational inhibition AAAA S T F I G U R E 2– RNA interference pathways Double-stranded RNA is processed into short interfering RNAs (siRNAs) by Dicer siRNAs then associate with the RNA induced silencing complex (RISC) containing an Argonaute (AGO) family protein RISC unwinds the siRNAs into single-stranded siRNAs and cleaves mRNAs complementary to the siRNA MicroRNA (miRNA) genes are transcribed as primary-miRNAs (pri-miRNAs), which are trimmed at the 5′ and 3′ ends by Drosha to form pre-miRNAs Pre-miRNAs are exported to the cytoplasm and processed by Dicer These miRNAs then associate with RISC, and single-stranded miRNAs target RISC to mRNAs If the miRNA and mRNA are perfectly complementary, the mRNA is destroyed; however, if there is a partial match, translation is inhibited identified before siRNAs will gain widespread use as therapeutic agents As a promising start, Alnylam Pharmaceuticals recently reported successful phase II clinical trials with Patisiran, an siRNA drug and nanoparticle delivery system that treats familial amyloidotic polyneuropathy (FAP) FAP is a disorder characterized by nervous system and cardiac problems due to aggregation of a mutant form of the transthyretin (TTR) protein Importantly, Patisiran treatment reduced TTR miRNAs Regulate Posttranscriptional Gene Expression Another type of sncRNA present in eukaryotes is the miRNA (see Chapter 15) Unlike siRNAs, miRNAs are derived from the self-complementary transcripts of miRNA genes These initial transcripts called primary miRNAs (pri-miRNAs) are “capped and tailed,” and some contain introns like messenger RNAs However, due to their self-complementary sequences, pri-miRNAs form hairpin structures A nuclear enzyme called Drosha removes the non–self-complementary 5′ and 3′ ends to produce pre-miRNAs These hairpins, which are singlestranded, are then exported to the cytoplasm and cleaved by Dicer to produce mature double-stranded miRNAs miRNAs are very similar to siRNAs and associate with RISC to target complementary mRNAs If the miRNA/ mRNA match is perfect (common in plants), the target mRNA is cleaved by RISC, but if the miRNA/mRNA match is partial (common in animals) it blocks translation by the ribosome (ST Figure 2–4) miRNAs are found in animals, plants, viruses, and possibly fungi, and have been shown to regulate genes involved in diverse cellular processes such as stress responses in plants, development in C elegans, and cell-cycle control in mammalian cells Why is miRNAmediated posttranscriptional regulation so widespread? Wouldn’t it be more efficient for the cell to repress a gene’s transcription rather than transcribe a second gene to destroy/inhibit the mRNA of the first gene? The answer to these questions partly comes from the fact that mRNAs may be translated many times after transcription is stopped To achieve an efficient and rapid change in gene expression, a cell can turn off transcription and employ an miRNA to target the existing mRNAs in the cytoplasm For example, miRNAs are key regulators of mammalian embryonic stem cells (ES cells), the cells of the embryo that give rise to all differentiated cell types of the organism ES cells express the “stemness genes” Oct4, Sox2, KLF4, Lin28, ST SMALL NONCODING RNA s ME DIATE THE RE G U L ATION OF E U KA RYOTIC G E NE EXP R ES S I O N Embryonic Stem Cell Stemness Factors Sox2, Oct4, KLF4 Lin28, Myc Induce let-7 and miR-145 Reduce stemness factors Differentiated Cell let-7 miR-145 Lin28, Myc Sox2, Oct4, KLF4 and Myc, which suppress differentiation and promote stem cell maintenance or self-renewal Loss of these genes results in premature differentiation of ES cells while persistent activity results in an inability to produce specialized cells—both scenarios are lethal To enable daughter cells of ES cells to differentiate, they express miR-145 and let-7 miRNAs, which target and downregulate stemness mRNAs (ST Figure 2–5) A better understanding of miRNA regulation of cellular differentiation is likely to improve stem cell therapy by ensuring that patients receive cells that properly differentiate into the desired cell types rather than fail to differentiate and pose a risk for tumor formation RNA-Induced Transcriptional Silencing Heterochromatic regions of the eukaryotic chromosome, characterized by tightly packaged and transcriptionally repressed DNA, play critical structural roles (see Chapter 11) For example, centromeres are heterochromatic and are important for attaching chromosomes to spindle microtubules during cell division Heterochromatic sequences such as centromeres are characterized by the modification of histone proteins within nucleosomes that are important for packaging and transcriptional repression For example, nucleosomes associated with heterochromatin are generally methylated at lysine on histone (H3K9Me) Proteins associated with chromatin condensation and transcriptional repression such as the heterochromatin protein (HP1) bind to H3K9Me and mediate heterochromatin formation H3K9 methylation is dependent on the activity of a histone methyltransferase (HMT); several studies have identified DNA binding proteins that complex with HMTs and direct this complex to specific sequences Importantly, studies have also found that sncRNAs can also direct heterochromatin formation to specific sites in the genome In contrast to RNAi, which results in posttranscriptional inhibition of gene expression, sncRNAs involved in heterochromatin formation use a mechanism known as RNA-induced transcriptional silencing (RITS) RITS was first described in the fission yeast Schizosaccharomyces pombe when the deletion of several genes involved in the RNAi pathway, such as Argonaute and Dicer, resulted in loss of H3K9 methylation at centromere nucleosomes and loss of centromere function In 2004, the Moazed lab at Harvard University identified a complex including an Argonaute protein, a heterochromatin-associated protein, and siRNAs produced by Dicer The siRNAs associated with this RITS complex are complementary to centromeric DNA and are required for recruiting the complex to the centromere and heterochromatin formation Paradoxically, transcription from heterochromatic regions such as centromeres is important for transcriptional silencing by RITS The siRNAs that target the RITS complex to centromeres have siRNAs derived from centromeric transcription that occurs during S phase of the cell cycle When centromeric transcription occurs, an RNAdirected RNA polymerase (RdRP) binds to nascent transcripts and catalyzes the formation of dsRNA from singlestranded transcripts The dsRNA is then a substrate for Dicer and leads to the formation of siRNAs complementary to centromeric DNA The RITS complex associates with these siRNAs and targets centromeric transcripts Through an unknown mechanism, this complex at centromeric loci leads to recruitment of HMTs and H3K9 methylation, which triggers heterochromatin formation (ST Figure 2–6) In support of this model, tethering of the RNAi machinery to other chromosomal loci leads to heterochromatin formation and transcriptional silencing of these loci While RITS was characterized in S pombe, RNAi machinery is known to be involved in transcriptional silencing and SPECIAL TOPIC S T F I GURE – miRNAs regulate embryonic stem cell differentiation Several genes associated with embryonic stem cell identity (Sox2, Oct4, KLF4, Lin28, and Myc) are repressed in differentiating cells by miR-145 and let-7 to enable cells to acquire specialized functions and inhibit self-renewal 497 498 SPECIAL TOPIC: EMERGIN G ROLE S OF RNA dsRNAs the human genome encodes ∼17,000 lncRNAs; studies of just a small fraction of these have demonstrated that lncRNAs have diverse cellular functions Dcr1 RITS SPECIAL TOPIC Inactive RITS siRNAs lncRNAs Mediate Transcriptional Repression by Interacting with Chromatin-Regulating Complexes An immediate clue regarding the function of lncRNAs emerged from the observation that whereas mature mRNAs are most frequently localized to the cytoplasm, mature lncRNAs are most commonly found in the nucleus Consistent with this finding, many lncRNAs are RITS involved in gene regulation by recruiting chromatin-regulating complexes to specific loci in the genome A classic example is X chromosome inactivation in mammals DNA (see Chapter 5) The X-inactive specific transcript (Xist) gene on the X chromosome encodes a 17-kb lncRNA, which is critical to X chromosome inactivation Of the two X chromosomes present in mammalian females, the one randomly chosen for inactivation expresses Xist H3K9 methylation HMT lncRNAs that coat the chromosome Xist recruits the RITS Heterochromatin polycomb repressor complex (PRC2) to the inactivated spreading X, which mediates chromosome-wide trimethylation of lysine 27 on histone of nucleosomes, leading to chromatin condensation and transcriptional silencing It is DNA unclear how Xist spreads across the inactivated X, but recent studies demonstrate that the first loci on the X S T F I G U RE 2– RNA-induced transcriptional silencing in fission chromosome to acquire Xist lncRNAs are located near yeast (S pombe) Bi-directional transcription of centromeric DNA (or RdRP activity on single-stranded transcripts) produces dsRNA that is the Xist gene in terms of the three-dimensional architecprocessed by Dicer into siRNAs These siRNAs associate with the RITS ture of the chromosome Surprisingly, it appears that proxcomplex and recruit it to nascent transcripts Histone methyltransferase imity to the site of transcription is more important than a (HMT) associated with RITS mediates H3K9 methylation of nearby specific sequence required for Xist lncRNA localization nucleosomes, which leads to heterochromatin assembly Similar mechanisms of lncRNA-mediated transcriptional repression are found in diverse eukaryotes heterochromatin formation in plants, Drosophila, C elegans, and include gene-specific as well as chromosome-wide and mammals Further studies will help to elucidate how transcriptional silencing For example, Hotair is a 2.2-kb similar these mechanisms are in diverse species lncRNA expressed from human chromosome 12 that mediates transcriptional repression of several target genes on chromosome Similar to Xist, Hotair recruits PRC2 to target genes, which causes H3K27 methylation and tranLong Noncoding RNAs Are scriptional silencing (ST Figure 2–7) While the normal Abundant and Have Diverse Functions role of Hotair is not specifically known, overexpression of Hotair in cancer cells is linked with a poor prognosis due to In addition to small noncoding RNAs discussed above, increased chances for metastasis Hopefully, future studies eukaryotic genomes also encode many long noncoding will help to elucidate how Hotair overexpression affects RNAs (lncRNAs) One obvious distinction is that lncRNAs specific genes to promote cancer metastasis are longer than the small noncoding RNAs and are often arbitrarily designated to be longer than 200 nucleotides lncRNAs Regulate Transcription lncRNAs are produced in a similar fashion to mRNAs; folFactor Activity lowing transcription they are modified with a methylated cap, a poly-A tail, and can be spliced In contrast to mRNAs, lncRNAs can also associate with and regulate the activity they have no start and stop codons, indicating that they of transcription factors For example, the androgen recepdo not encode protein The conservative estimate is that tor (AR) both binds testosterone and serves as transcription ST mRNA LOC ALIZ ATION AND TRANS L ATIONAL RE G U L ATION IN EU KARYOT ES PRC2 Hotair PRC2 Chr 12 RNA polymerase II H3K27me3 Chr S T F I GURE – The Hotair long noncoding RNA regulates chromatin The lncRNA Hotair is expressed from human chromosome 12 and recruits the polycomb repressor complex (PRC2) to several target genes on chromosome PRC2 catalyzes methylation of nearby nucleosomes (H3K27me3), which mediates chromatin condensation Overexpression of Hotair in cancer indicates poor prognosis and likelihood of metastasis, presumably due to repression of the target genes on chromosome activate transcription of the target genes, the DNA is looped to bring the enhancer and AR closer to the target gene’s promoter and the transcriptional machinery PRNCR1 and PCGEM1 play important roles as “liaisons” in making this DNA loop by physically bridging the gap between AR and the promoter (ST Figure 2–8) Without these lncRNAs, AR cannot activate transcription of target genes This work has added significance because androgen signaling is known to promote the growth of prostate cancer Even eliminating testosterone by castration only delays cancer progression since the cancer cells often evolve to have mutant forms of AR that not require testosterone for activity Preliminary studies show that inhibiting PRNCR1 and PCGEM1 blocks the growth of both androgen-dependent and androgen-independent prostate cancer cells, suggesting that blocking these lncRNAs may be a new strategy for fighting prostate cancer mRNA Localization and Translational Regulation in Eukaryotes In this final section we shift from noncoding RNAs to messenger RNAs (mRNAs) Although we have known since 1960 that mRNAs encode for proteins, we are still learning about how mRNAs are posttranscriptionally regulated in eukaryotes It has become clear that mRNAs are not Chromatin Androgen response element Androgen receptor Methyl group PCGEM1 PRNCR1 DOT1L Pygo2 H3K4me3 Androgendependent gene expression S T F I GURE – Long noncoding RNAs mediate transcriptional activation The lncRNAs PCGEM1 and PRNCR1 act as liaisons between the androgen receptor (AR) and the promoter to activate the transcription of target genes PRNCR1 binds to AR and recruits DOT1L, an enzyme that methylates AR PCGEM1 binds methylated AR and recruits Pygo2, a protein that binds methylated H3K4 at promoters SPECIAL TOPIC factor to activate target genes important for male development Recent studies have demonstrated that AR’s function as a transcription factor is regulated by two lncRNAs: PRNCR1 and PCGEM1 AR binds to enhancer sequences called androgen response elements (AREs) that are located at a distance from the target genes they enhance In order to 499 500 SPECIAL TOPIC: EMERGIN G ROLE S OF RNA BOX SPECIAL TOPIC I n addition to the abundance and diversity of RNAs within cells, RNAs are also found outside of cells RNAs are prevalent in blood, sweat, tears, and other body fluids These extracellular RNAs (exRNAs), which include mRNAs, miRNAs, siRNAs, lncRNAs, and tRNAs, are often enclosed in vesicles to protect them from degradation by ubiquitous RNases Since RNAs are secreted from cells, it raises the possibility that they are internalized by other cells and can serve as a form of cell–cell communication Although there is not yet convincing evidence that this is true in mammals, studies from plants (A thaliana) and C elegans clearly show that RNAs passed from cell to cell are indeed functional Experiments in A thaliana demonstrated that exRNAs influence cells at a long distance from the cells that secrete them When roots expressing a transgene carrying an inverted repeat RNA to target green fluorescent protein (GFP) were grafted to a plant expressing GFP in leaves, GFP expression was silenced in the leaves and siRNAs derived from the inverted repeats were found in the vascular tissue of the plant’s stem (ST Figure 2–9) This strongly suggests that siRNAs can be transported within plants and affect gene expression in distant cells An attractive implication of this mechanism is that a plant’s ability to sense environmental conditions (such as stress) in one part of the plant can lead to an adaptive response in another part of the plant Studies in C elegans have demonstrated that the effects of RNAi are both systemic and, in some cases, transgenerational When worms are fed dsRNA, a membrane protein called SID-2 mediates the endocytosis of dsRNA into intestinal cells, which in turn likely secrete the dsRNAs into the body cavity A second dsRNA channel (transport) protein, called SID-1, mediates the uptake of dsRNAs from the body cavity into most of the other cells of the worm where Dicer cleaves the dsRNAs into siRNAs that target mRNAs for destruction Even cells of the germ line receive dsRNAs through this mechanism, thus passing the effect on to offspring It is thought that that this mechanism evolved as part Scion TATA G F P Rootstock TATA GF Do Extracellular RNAs Play Important Roles in Cellular Communication? GF Graft junction Dicer siRNA RISC S T FIGUR E 2–9 RNAs are transported between cells in plants A thaliana scion (stem and leaves) expressing the green fluorescent protein (GFP) was grafted onto a rootstock expressing an inverted repeat of a partial GFP sequence Transcription of the inverted repeats in roots created hairpin RNAs that were processed by Dicer to generate siRNAs siRNAs spread through the plant and, with RISC, inhibited GFP expression in the leaves, demonstrating that siRNAs can signal between cells GFP-positive regions are green, while non-GFP (silenced) regions are red due to autofluorescence of chlorophyll Note that silencing is more prominent along vascular tissue, through which the siRNAs are transported ST of an immune response whereby a damaged viral particle that spills its viral dsRNA triggers a systemic RNAi response against that virus preceding an infection In an effort to learn more about exRNAs and the roles they play in mRNA LOC ALIZ ATION AND TRANS L ATIONAL RE G U L ATION IN EU KARYOT ES scientific field could help us determine the role extracellular RNA plays in health and disease, and unlocking its mysteries may provide our nation’s scientists with new tools to better diagnose and treat a wide range of diseases.” humans, in 2013 the National Institutes of Health (NIH) launched a $17 million grant program to specifically fund research in this emerging field As Director of the NIH Francis Collins announced: “Expanding our understanding of this emerging state to the cell periphery where they become translated at the site of actin polymerization Consistent with this model, mouse fibroblasts lacking ZBP1 have reduced actin mRNA localization and reduced directional motility Nucleus Transcription site RNA Export 40S ZBP1 Cytoskeleton ZBP1 MP Transport MP 40S ZBP1 60S AAA Src 40S 60S ZBP1 P Cytoskeleton “Localized” translation S T FIGUR E 2–10 Localization and translational regulation of actin messenger RNA The RNA binding protein ZBP1 associates with actin mRNA in the nucleus and escorts it to the cytoplasm ZBP1 binds cytoskeleton motor proteins (MP), which transport ZBP1 and actin mRNA to the cell periphery At the cell periphery ZBP1 is phosphorylated by Src and dissociates from actin mRNA, allowing it to be translated Actin translation and polymerization at the leading edge direct cell movement SPECIAL TOPIC simply translated as soon as they reach the cytoplasm, but rather that mRNAs can be localized to discrete locations within the cell and then locally translated mRNA localization and localized translational control create asymmetric protein distributions within the cell that define cellular regions with distinct functions For example, different proteins localized to the highly branched dendrites of a neuron enable them to receive sensory information, while the proteins present in the axon of a neuron mediate the release of neurotransmitters that signal to other cells Since mRNA localization and localized translational control are common and important in diverse cell types, it is important to understand how mRNAs are localized and translationally regulated As soon as mRNAs are transcribed, they begin to associate with a class of proteins called RNA-binding proteins (RBPs), which influence splicing, nuclear export, localization, stability, degradation, and translation Much of the research on mRNA localization and translational regulation has focused on RBPs and their interactions with mRNAs One of the best-described RBP/mRNA interactions is the localization and translational control of actin mRNAs in crawling cells Following injury, fibroblasts migrate to the site of the wound and assist in wound healing Fibroblasts and many other types of migrating cells control their direction of movement by controlling where within the cell they polymerize new actin microfilaments The “leading edge” of the cell where this actin polymerization occurs is called the lamellipodium A series of elegant studies by the Singer lab at the Albert Einstein College of Medicine showed that actin mRNA is localized to lamellipodia and that localization is dependent on a 54-nucleotide element in the actin 3′ untranslated region of the mRNA transcript termed a zip code The actin zip code sequence is a binding site for a RBP called zip code binding protein (ZBP1) ZBP1 binds actin mRNAs and prevents translation initiation as well as promotes the transport of the mRNA to the lamellipodia Once the mRNAs arrive at the final destination, a kinase called Src phosphorylates ZBP1, which disrupts RNA binding and allows translation initiation (ST Figure 2–10) Since Src activity is limited to the cell periphery, this mechanism allows cells to transport mRNAs in a translationally repressed 501 502 SPECIAL TOPIC: EMERGIN G ROLE S OF RNA There is some evidence suggesting that this actin mRNA localization mechanism is used in other cell types as well Local translation of actin in the neurites (cellular projections) of neurons is required for neurite outgrowth and axon guidance decisions, and mouse neurons lacking ZBP1 have reduced neurite length and exhibit defects in axon guidance mRNA localization and translational control are likely to be widespread mechanisms in diverse species and cell types For example, Drosophila nanos mRNA, which encodes a translational repressor, is localized to the posterior pole of Drosophila embryos and to the dendrites of some neurons Loss of nanos mRNA localization results in embryos that develop with anterior/posterior patterning defects and neurons with a reduced number of dendritic branches Interestingly, larvae mutant for nanos also exhibit defects in sensory function By tagging nanos mRNAs with a fluorescent molecule in living cells, the Gavis lab from Princeton University demonstrated that nanos mRNAs are transported in ribonucleoprotein (RNP) particles to dendrites where they are locally translated This transport is dependent on the motor protein dynein, which “walks” along microtubules of the cytoskeleton These studies and many others have demonstrated how mRNA localization and localized protein synthesis are critical for neuronal function In fact, defects in mRNA localization in neurons have been implicated in human disorders such as fragile-X syndrome, spinal muscular atrophy, and spinocerebellar ataxia Visit the Study Area in MasteringGenetics for a list of further readings on this topic, including journal references and selected Web sites SPECIAL TOPIC Review Questions What are some of the different roles that RNA plays in biological systems? What arguments support the RNA World Hypothesis? What types of reactions ribozymes catalyze? What types of chemical bonds are formed or broken? How is bacterial DNA methylation and expression of restriction enzymes an innate defense strategy, whereas the CRISPR/Cas system is an adaptive defense strategy? What are the three types of small noncoding RNAs in eukaryotic cells, and how are they different from one another? The mechanism for RNA-induced transcriptional silencing of heterochromatic DNA is paradoxical For example, how does it make sense that centromeric DNA in fission yeast first needs to be transcribed before it can be transcriptionally silenced? Although exRNAs are found in many fluids within plants and animals, why are they usually found within vesicles or bound by proteins? How and why are eukaryotic mRNAs transported and localized to discrete regions of the cell? Discussion Questions The RNA World Hypothesis suggests that the earliest forms of life used RNA as a genome instead of DNA Why then we not see organisms alive today with RNA genomes? Bacterial sRNAs can bind to mRNAs through complementary binding to regulate gene expression What determines whether the sRNA/mRNA binding will promote or repress mRNA translation? In many cases, ncRNAs serve as “guides” to direct proteins to other nucleic acids What are some examples of ncRNAs acting as “guides,” and what purposes they serve? Prokaryotes and eukaryotes have both evolved mechanisms to defend against viral/foreign nucleic acids How are these mechanisms similar, and how are they different? Extracellular RNAs are abundant in human bodily fluids, but there is currently very little evidence for their potential functions in the body Speculate on the roles that exRNAs might play in human biology How is it possible that a given mRNA in a cell is found throughout the cytoplasm but the protein that it encodes is only found in a few specific regions of the cytoplasm? Cite a few different possibilities SPECIAL TOPICS IN MOD ERN G E NE TICS DNA Forensics G 503 SPECIAL TOPIC monitor its uses and potential abuses Although DNA proenetics is arguably the most influential science filing is well validated as a technique and is considered the today—dramatically affecting technologies in fields gold standard of forensic identification, it is not without as diverse as agriculture, archaeology, medical controversy and the need for legislative oversight diagnosis, and disease treatment In this Special Topic chapter, we will explore how DNA One of the areas that has been the most profoundly profiling works and how the results of profiles are interaltered by modern genetics is forensic science Forensic preted We will learn about DNA databases, the potential science (or forensics) uses technological and scientific problems associated with DNA profiling, and the future of approaches to answer questions about the facts of criminal or this powerful technology civil cases Prior to 1986, forensic scientists had a limited array of tools with which to link evidence to specific individuals or suspects These included some reliable methods such as blood typing and fingerprint analysis, but also DNA Profiling Methods many unreliable methods such as bite mark comparisons and hair microscopy VNTR-Based DNA Fingerprinting Since the first forensic use of DNA profiling in 1986 The era of DNA-based human identification began in 1985, (Box 1), DNA forensics (also called forensic DNA fingerwith Dr Alec Jeffreys’s publication on DNA loci known as printing or DNA typing) has become an important method minisatellites, or variable number of tandem repeats for police to identify sources of biological materials DNA (VNTRs) As described earlier in the text (see Chapter 11), profiles can now be obtained from saliva left on cigarette VNTRs are located in noncoding regions of the genome butts or postage stamps, pet hairs found at crime scenes, or and are made up of DNA sequences of between 15 and bloodspots the size of pinheads Even biological samples that 100 bp long, with each unit repeated a number of times are degraded by fire or time are yielding DNA profiles that The number of repeats found at each VNTR locus varies help the legal system determine identity, innocence, or guilt from person to person, and hence VNTRs can be from to Investigators now scan large databases of stored DNA profiles 20 kilobases (kb) in length, depending on the person For in order to match profiles generated from crime scene eviexample, the VNTR dence DNA profiling has proven the innocence of hundreds of people who were convicted of serious crimes and even sen59- GACTGCCTGCTAAGATGACTGCCTGCTAAGAT tenced to death Forensic scientists have used DNA profiling GACTGCCTGCTAAGAT-39 to identify victims of mass disasters such as the Asian Tsunami of 2004 and the September 11, 2001 terrorist attacks is composed of three tandem repeats of a 16-nucleotide in New York They have also used forensic DNA analysis to sequence (highlighted in bold) identify endangered species and animals trafVNTRs are useful for DNA profiling ficked in the illegal wildlife trade The power because there are as many as 30 differ“Even biological of DNA forensic analysis has captured the ent possible alleles (repeat lengths) at samples degraded any VNTR in a population This creates a public imagination, and DNA forensics is feaby fire or time tured in several popular television series large number of possible genotypes For The applications of DNA profiling extend example, if one examined four different are yielding DNA beyond forensic investigations These include VNTR loci within a population, and each profiles that help paternity and family relationship testing, locus had 20 possible alleles, there would determine identity, be more than billion (420) possible genoidentification of plant materials, verification of military casualties, and evolutionary innocence, or guilt.” types in this four-locus profile studies To create a VNTR profile (also known It is important for all of us to understand the basics as a DNA fingerprint), scientists extract DNA from a tissue of forensic DNA analysis As informed citizens, we need to sample and digest it with a restriction enzyme that cleaves 504 SPECIAL TOPIC: D NA F OREN S IC S BOX The Pitchfork Case: The First Criminal Conviction Using DNA Profiling I SPECIAL TOPIC n the mid-1980s, the bodies of two schoolgirls, Lynda Mann and Dawn Ashworth, were found in Leicestershire, England Both girls had been raped, strangled, and their bodies left in the bushes In the absence of useful clues, the police questioned a local mentally retarded porter named Richard Buckland During interrogation, Buckland confessed to the murder of Dawn Ashworth; however, police did not know whether he was also responsible for Lynda Mann’s death In 1986, in order to identify the second killer, the police asked Dr Alec Jeffreys of the University of Leicester to try a new method of DNA analysis called DNA fingerprinting Dr Jeffreys had developed a method of analyzing DNA regions called variable number of tandem repeats (VNTRs), which vary in length between members of a population Dr Jeffreys’s VNTR analysis revealed a match between the DNA profiles from semen samples obtained from both crime scenes, suggesting that the same person was responsible for both rapes However, neither of the DNA profiles matched those from a blood sample taken from Richard Buckland Having eliminated their only suspect, the police embarked on the first mass DNA dragnet in history, by requesting blood samples from every adult male in the region Although 4000 on either side of the VNTR repeat region (ST Figure 3–1) The digested DNA is separated by gel electrophoresis and subjected to Southern blot analysis (which is described in detail in Chapter 17) Briefly, separated DNA is transferred from the gel to a membrane and hybridized with a radioactive probe that recognizes DNA sequences within the VNTR region After exposing the membrane to X-ray film, the pattern of bands is measured, with larger VNTR repeat alleles Individual Individual VNTR-A Allele A5 Allele A2 VNTR-A Allele A3 Allele A4 VNTR-B Allele B2 Allele B1 VNTR-B Allele B3 Allele B2 DNA fingerprint B3 B2 A5 B1 A2 Individual B2 A4 A3 Individual men offered samples, one did not Colin Pitchfork, a bakery worker, paid a friend to give a blood sample in his place, using forged identity documents Their plan was detected when their conversation was overheard at a local pub The conversation was reported to police, who then arrested Pitchfork, obtained his blood sample, and sent it for analysis His DNA profile matched the profiles from the semen samples left at both crime scenes Pitchfork confessed to the murders, pleaded guilty, and was sentenced to life in prison The Pitchfork Case was not only the first criminal case resolved by forensic DNA profiling, but also the first case in which DNA profiling led to the exoneration of an innocent person remaining near the top of the gel and smaller VNTRs, which migrate more rapidly through the gel, being closer to the bottom The pattern of bands is the same for a given individual, no matter what tissue is used as the source of the DNA If enough VNTRs are analyzed, each person’s DNA profile will be unique (except, of course, for identical twins) because of the huge number of possible VNTRs and alleles In practice, scientists analyze about five or six loci to create a DNA profile A significant limitation of VNTR profiling is that it requires a relatively large sample of DNA (10,000 cells or about 50 μg of DNA)—more than is usually found at a typical crime scene In addition, the DNA must be relatively intact (nondegraded) As a result, VNTR profiling has been used most frequently when large tissue samples are available—such as in paternity testing Although VNTR profiling is still used in some cases, it has mostly been replaced by more sensitive methods, as described next S T FIGUR E 3–1 DNA fingerprint at two VNTR loci for two individuals VNTR alleles at two loci (A and B) are shown for two different individuals Arrows mark restriction-enzyme cutting sites that flank the VNTRs Restriction-enzyme digestion produces a series of fragments that can be separated by gel electrophoresis and detected as bands on a Southern blot (bottom) The number of repeats at each locus is variable, so the overall pattern of bands is distinct for each individual The DNA fingerprint profile shows that these individuals share one allele (B2) ST Autosomal STR DNA Profiling 100 bp TPOX D3S1358 D5S818 TH01 D8S1179 FGA D7S820 CSF1PO VWA 10 11 12 AMEL D13S317 D16S539 D18S51 AMEL D21S11 13 14 15 16 17 18 19 20 21 22 X Y S T F I GURE – Chromosomal positions of the 13 core STR loci used for forensic DNA profiling The AMEL (Amelogenin) locus is included with the 13 core loci and is used to determine the gender of the person providing the DNA sample The AMEL locus on the X chromosome contains a 6-nucleotide deletion compared to that on the Y chromosome 200 bp D8S1179 D3S1358 D19S433 A D21S11 300 bp D7S820 TH01 D13S317 D16S539 vWA D5S818 TPOX 505 400 bp CSF1PO D2S1338 D18S51 FGA S T FIGUR E 3–3 Relative size ranges and fluorescent dye labeling colors of 16 STR products generated by a commercially available DNA profiling kit The DNA fragments shown in orange at the bottom of the diagram are DNA size markers The AMEL locus is indicated as an A within the region amplified For example, the primer sets that amplify the D19S433, vWA, TPOX, and D18S51 STR loci are all labeled with a yellow fluorescent tag The sizes of the amplified DNA fragments produced allow scientists to differentiate between the yellow-labeled products For example, the amplified products from the D19S433 locus range from about 100 to 150 bp in length, whereas those from the vWA locus range from about 150 to 200 bp, and so on After amplification, the DNA sample will contain a small amount of the original template DNA sample and a large amount of fluorescently labeled amplification products (ST Figure 3–4) The sizes of the amplified fragments are measured by capillary electrophoresis This method uses thin glass tubes that are filled with a polyacrylamide gel material similar to that used in slab gel electrophoresis The amplified DNA sample is loaded onto the top of the capillary tube, and an electric current is passed through the tube The negatively charged DNA fragments migrate through the gel toward the positive electrode, according to their sizes Short fragments move more quickly through the gel, and larger ones more slowly At the bottom of the tube, a laser detects each fluorescent fragment as it migrates through the tube The data are analyzed by software that calculates both the sizes of the fragments and their quantities, and these are represented as peaks on a graph (ST Figure 3–5) Typically, automated capillary electrophoresis systems analyze as many as 16 samples at a time, and the analysis takes approximately 30 minutes After DNA profiling, the profile can be directly compared to a profile from another person, from crime scene evidence, or from other profiles stored in DNA profile databases (ST Figure 3–6) The STR profile genotype of an individual is expressed as the number of times the STR sequence is repeated For example, in the profile shown in SPECIAL TOPIC The development of the polymerase chain reaction (PCR) revolutionized DNA profiling PCR methods are described in detail earlier in the text (see Chapter 17) Using PCRamplified DNA samples, scientists are able to generate DNA profiles from trace samples (e.g., the bulb of single hairs or a few cells from a bloodstain) and from samples that are old or degraded (such as a bone found in a field or an ancient Egyptian mummy) The majority of human forensic DNA profiling is now done using commercial kits that amplify and analyze regions of the genome known as microsatellites, or short tandem repeats (STRs) STRs are similar to VNTRs, but the repeated motif is shorter—between two and nine base pairs, repeated from to 40 times For example, one locus known as D8S1179 is made up of the four base-pair sequence TCTA, repeated to 20 times, depending on the allele There are 19 possible alleles of the locus that are found within a population Although hundreds of STR loci are present in the human genome, only a subset is used for DNA profiling At the present time, the FBI and other U.S law enforcement agencies use 13 STR loci as a core set for forensic analysis (ST Figure 3–2) Most European countries now use 12 STR loci as a core set Several commercially available kits are currently used for forensic DNA analysis of STR loci The methods vary slightly, but generally involve the following steps As shown in ST Figure 3–3, each primer set is tagged by one of four fluorescent dyes—blue, green, yellow, or red Each primer set is designed to amplify DNA fragments, the sizes of which vary depending on the number of repeats DNA PROFILI N G M ET HO D S 506 SPECIAL TOPIC: D NA F OREN S IC S STR locus TCTA Allele Allele * Primer Primer repeats * Primer D8S1179 D21S11 D7S820 CSF1PO 110 130 150 170 190 210 230 250 270 290 310 330 350 * Primer 10 repeats * 1600 1200 800 400 D3S1358 TH01 D13S317 D16S539 D2S1338 110 130 150 170 190 210 230 250 270 290 310 330 350 PCR amplification * * 1600 1200 800 400 * * * * * * D19S433 vWA TPOX D18S51 110 130 150 170 190 210 230 250 270 290 310 330 350 Capillary electrophoresis 1600 1200 800 400 - SPECIAL TOPIC + A D5S818 FGA 110 130 150 170 190 210 230 250 270 290 310 330 350 Laser detector Steps in the PCR amplification and analysis of one STR locus (D8S1179) In this example, the person is heterozygous at the D8S1179 locus: One allele has repeats and one has 10 repeats Primers are specific for sequences flanking the STR locus and are labeled with a blue fluorescent dye The double-stranded DNA is denatured, the primers are annealed, and each allele is amplified by PCR in the presence of all four dNTPs and Taq DNA polymerase After amplification, the labeled products are separated according to size by capillary electrophoresis, followed by fluorescence detection S T F I G U R E 3– ST Figure 3–6, the person’s profile would be expressed as shown in ST Table 3.1 Scientists interpret STR profiles using statistics, probability, and population genetics, and these methods will be discussed in the section Interpreting DNA Profiles Y-Chromosome STR Profiling In many forensic applications, it is important to differentiate the DNA profiles of two or more people in a mixed sample For example, vaginal swabs from rape cases usually contain a mixture of female somatic cells and male sperm cells In addition, some crime samples may contain evidence material from a number of male suspects In these types of cases, STR profiling of Y-chromosome DNA is useful There are more than 200 STR loci on the Y chromosome that are useful for DNA profiling; however, fewer than 20 of these are used routinely for forensic analysis PCR amplification of Y-chromosome STRs uses specific primers that not amplify DNA on the X chromosome 1600 1200 800 400 S T FIGUR E 3–5 An electropherogram showing the results of a DNA profile analysis using the 16-locus STR profile kit shown in ST Figure 3–3 Heterozygous loci show up as double peaks and homozygous loci as single, higher peaks The sizes of each allele can be calculated from the peak locations relative to the size axis shown at the top of each panel The single peak for the AMEL (A) locus indicates that this DNA profile is that of a female individual, as described in ST Figure 3–2 One limitation of Y-chromosome DNA profiling is that it cannot differentiate between the DNA from fathers and sons or from male siblings This is because the Y chromosome is directly inherited from the father to his sons, as a single unit The Y chromosome does not undergo recombination, meaning that less genetic variability exists on the Y chromosome than on autosomal chromosomes Therefore, all patrilineal relatives share the same Y chromosome-profile Even two apparently unrelated males may share the same Y profile, if they also share a distant male ancestor Although these features of Y-chromosome profiles present limitations for some forensic applications, they are useful for identifying missing persons when a male relative’s DNA is available for comparison They also allow researchers to trace paternal lineages in genetic genealogy studies ST DS1358 vWA Suspect 900 600 300 15 18 15 18 22 25 Victim 600 400 200 16 17 Fluorescence FGA 16 21 26 Epithelial cell fraction 900 600 300 16 17 16 21 26 Sperm fraction 400 200 15 120 18 140 15 160 18 180 22 200 220 240 25 260 STR size (base pairs) Mitochondrial DNA Profiling Another important addition to DNA profiling methods is mitochondrial DNA (mtDNA) analysis Between 200 and 1700 mitochondria are present in each human somatic cell Each mitochondrion contains one or more 16-kb circular DNA chromosomes Mitochondria divide within cells and are distributed to daughter cells after cell division Mitochrondria are passed from the human egg cell to the zygote during fertilization; however, as sperm cells contribute few if any mitochondria to the zygote, they not contribute S T TA BLE STR Profile Genotypes from the Four Profiles Shown in ST Figure 3–6 Profile Genotype from STR Locus Suspect Victim Epithelial Cells Sperm Fraction DS1358 15, 18 16, 17 16, 17 15, 18 vWA 15, 18 16, 16 16, 16 15, 18 FGA 22, 25 21, 26 21, 26 22, 25 507 these organelles to the next generation Therefore, all cells in an individual contain multiple copies of identical mitochondria derived from the mother Like Y-chromosome DNA, mtDNA undergoes little if any recombination and is inherited as a single unit Scientists create mtDNA profiles by amplifying regions of mtDNA that show variability between unrelated individuals and populations Two commonly used regions are known as hypervariable segment I and II (HVSI and HVSII) After PCR amplification, the DNA sequence within these regions is determined by automated DNA sequencing Scientists then compare the sequence with sequences from other individuals or crime samples, to determine whether or not they match The fact that mtDNA is present in high copy numbers in cells makes its analysis useful in cases where crime samples are small, old, or degraded mtDNA profiling is particularly useful for identifying victims of mass murders or disasters, such as the Srebrenica massacre of 1995 and the World Trade Center attacks of 2001, where reference samples from relatives are available The main disadvantage of mtDNA profiling is that it is not possible to differentiate between the mtDNA from maternal relatives or from siblings Like Y-chromosome profiles, mtDNA profiles may be shared by two apparently unrelated individuals who also share a distant ancestor—in this case a maternal ancestor Researchers use mtDNA profiles in scientific studies of genealogy, evolution, and human population migrations Mitochondrial DNA analyses have also been useful in wildlife forensics cases Billions of dollars are generated from the illegal wildlife trade, throughout the world Often, the identification of the species or origin of plant and animal material is the key to successful prosecution of wildlife trafficking cases A case of illegal smuggling of bird eggs in Australia, solved by mitochondrial sequence analysis, is presented in Box Single-Nucleotide Polymorphism Profiling Single-nucleotide polymorphisms (SNPs) are single-nucleotide differences between two DNA molecules They may be base-pair changes or small insertions or deletions (ST Figure 3–7) SNPs occur randomly throughout the genome and on mtDNA, approximately every 500 to 1000 nucleotides This means that there are potentially millions of loci in the human genome that can be used for profiling However, as SNPs usually have only two alleles, many SNPs (50 or more) must be used to create a DNA profile that can distinguish between two individuals as efficiently as STRs Scientists analyze SNPs by using specific primers to amplify the regions of interest The amplified DNA SPECIAL TOPIC S T F I GURE – Electropherogram showing the STR profiles of four samples from a rape case Three STR loci were examined from samples taken from a suspect, a victim, and two fractions from a vaginal swab taken from the victim The x-axis shows the DNA size ladder, and the y-axis indicates relative fluorescence intensity The number below each allele indicates the number of repeats in each allele, as measured against the DNA size ladder Notice that the STR profile of the sperm sample taken from the victim matches that of the suspect DNA PROFILI N G M ET HO D S 3 508 SPECIAL TOPIC: D NA F OREN S IC S BOX The Pascal Della Zuana Case: DNA Barcodes and Wildlife Forensics SPECIAL TOPIC O n August 2, 2006, a freelance photographer named Pascal Della Zuana was stopped by customs officers at Australia’s Sydney International Airport While questioning him about his flight from Thailand to Australia, officers noticed that he was wearing an unusual white vest under his outer clothing Inside the vest, they discovered 23 concealed bird eggs Due to Australia’s strict quarantine regulations, the eggs had to be treated with radiation in order to sterilize them Unable to hatch the eggs, authorities turned to DNA typing in an attempt to identify the origin and species of the eggs The eggs were sent to Dr Rebecca Johnson at the DNA Laboratory at the Australian Museum for forensic identification Dr Johnson took a small sample from each egg and extracted the DNA She used PCR methods to amplify an approximately 650-bp region of the mitochondrial genome, within the cytochrome c oxidase gene She then organized these sequences into a format known as a DNA barcode In order to identify the species, Dr Johnson compared each DNA barcode to barcode entries in a large DNA barcode database compiled at the University of Guelph in Canada The database contains mitochondrial DNA barcode sequences from hundreds of universities and museums throughout the world, cataloging more than 70,000 different species The results of Dr Johnson’s barcode sequence comparisons were regions are then analyzed by a number of different methods such as automated DNA sequencing or hybridization to immobilized probes on DNA microarrays that distinguish between DNA molecules with single-nucleotide differences Forensic SNP profiling has one major advantage over STR profiling Because a SNP involves only one nucleotide of a DNA molecule, the theoretical size of DNA required for a PCR reaction is the size of the two primers and one Allele C G Primer Primer T Allele A more nucleotide (i.e., about 50 nucleotides) This feature makes SNP analysis suitable for analyzing DNA samples that are severely degraded Despite this advantage, SNP profiling has not yet become routine in forensic applications More frequently, researchers use SNP profiling of Y-chromosome and mtDNA loci for lineage and evolution studies Interpreting DNA Profiles SNP region Primer dramatic Della Zuana’s vest had concealed eggs of exotic bird species such as macaws, African grey and Eclectus parrots, as well as a rare threatened species, the Moluccan cockatoo On January 20, 2007, Pascal Della Zuana was found guilty of contravening the Convention on International Trade in Endangered Species (CITES), as well as three Australian Customs and Quarantine Acts He was fined $10,000 and sentenced to two years in prison During the court case, it was learned that, if hatched, the birds would have fetched about $250,000 on the black market The worldwide smuggling of wildlife and wildlife parts is thought to be worth as much as US$150 billion each year—surpassed only by drugs and arms in terms of illegal profit Primer SNP S T F I G U R E 3– Example of a single-nucleotide polymorphism (SNP) from an individual who is heterozygous at the SNP locus The arrows indicate the locations of PCR primers used to amplify the SNP region, prior to DNA sequence analysis If this SNP locus only had two known alleles—the C and T alleles—there would be three possible genotypes in the population: CC, TT, and CT The individual in this example has the CT genotype After a DNA profile is generated, its significance must be determined In a typical forensic investigation, a profile derived from a suspect is compared to a profile from an evidence sample or to profiles already present in a DNA database If the suspect’s profile does not match that of the evidence profile or database entries, investigators can conclude that the suspect is not the source of the sample(s) that generated the other profile(s) However, if the suspect’s profile matches the evidence profile or a database entry, the interpretation becomes more complicated In this case, one could conclude that the two profiles either came from the same person—or they came from two different people who share the same DNA profile by chance To determine the significance of any DNA profile match, it is necessary to estimate the probability that the two profiles are a random match ST S T TA BLE STR Locus INTE RPRE TING D NA P RO FI L ES 509 A Profile Probability Calculation Based on Analysis of Five STR Loci Alleles from Profile Allele Frequency from Population Database* 11 0.361 D5S818 TPOX D8S1179 CSF1PO D19S433 13 0.141 11 0.243 11 0.243 13 0.305 16 0.031 10 0.217 10 0.217 13 0.253 14 0.369 Genotype Frequency Calculation 2pq = * 0.361 * 0.141 = 0.102 p2 = 0.243 * 0.243 = 0.059 2pq = * 0.305 * 0.031 = 0.019 p2 = 0.217 * 0.217 * 0.047 2pq = * 0.253 * 0.369 = 0.187 Genotype frequency from this 5-locus profile = 0.102 * 0.059 * 0.019 * 0.047 * 0.187 = 0.0000009 = * 10 -7 *A U.S Caucasian population database (Butler, J.M., et al 2003 J Forensic Sci 48: 908–911) © 2003 John Wiley & Sons, Inc frequently a person chosen at random from this population would have the genotype shown in the table, by multiplying the two genotype probabilities together This would be 0.102 * 0.059 = 0.006 This analysis would mean that about persons in 1000 (or person in 166) would have this genotype The method of multiplying all frequencies of genotypes at each locus is known as the product rule It is the most frequently used method of DNA profile interpretation and is widely accepted in U.S courts By multiplying all the genotype probabilities at the five loci, we arrive at the genotype frequency for this DNA profile: * 10-7 This means that approximately people in every 10 million (or about person in a million), chosen at random from this population, would share this 5-locus DNA profile The Uniqueness of DNA Profiles As we increase the number of loci analyzed in a DNA profile, we obtain smaller probabilities of a random match Theoretically, if a sufficient number of loci were analyzed, we could be almost certain that the DNA profile was unique At the present time, law enforcement agencies in North America use a core set of 13 STR loci to generate DNA profiles A hypothetical genotype comprised of the most common alleles of each STR locus in the core STR profile would be expected to occur only once in a population of 10 billion people Hence, the frequency of this profile would be in 10 billion Although this would suggest that most DNA profiles generated by analysis of the 13 core STR loci would be unique on the planet, several situations can alter this interpretation For example, identical twins share the same DNA, and their DNA profiles will be identical Identical twins occur at a frequency of about in 250 births SPECIAL TOPIC The profile probability or random match probability method gives a numerical probability that a person chosen at random from a population would share the same DNA profile as the evidence or suspect profiles The following example demonstrates how to arrive at a profile probability (ST Table 3.2) The first locus examined in this DNA profile (D5S818) has two alleles: 11 and 13 Population studies show that the 11 allele of this locus appears at a frequency of 0.361 in this population and the 13 allele appears at a frequency of 0.141 In population genetics, the frequencies of two different alleles at a locus are given the designation p and q, following the Hardy–Weinberg law described earlier in the text (see Chapter 22) We assume that the person having this DNA profile received the 11 and 13 alleles at random from each parent Therefore, the probability that this person received allele 11 from the mother and allele 13 from the father is expressed as p * q = pq In addition, the probability that the person received allele 11 from the father and allele 13 from the mother is also pq Hence, the total probability that this person would have the 11, 13 genotype at this locus, by chance, is 2pq As we see from ST Table 3.2, 2pq is 0.102 or approximately 10 percent It is obvious from this sample that using a DNA profile of only one locus would not be very informative, as about 10 percent of the population would also have the D5S818 11, 13 genotype The discrimination power of the DNA profile increases when we add more loci to the analysis The next locus of this person’s DNA profile (TPOX) has two identical alleles—the 11 allele Allele 11 appears at a frequency of 0.243 in this population The probability of inheriting the 11 allele from each parent is p * p = p2 As we see in the table, the genotype frequency at this locus would be 0.059, which is about percent of the population If this DNA profile contained only the first two loci, we could calculate how 510 SPECIAL TOPIC: D NA F OREN S IC S In addition, siblings can share one allele at any DNA locus in about 50 percent of cases and can share both alleles at a locus in about 25 percent of cases Parents and children also share alleles, but are less likely than siblings to share both alleles at a locus When DNA profiles come from two people who are closely related, the profile probabilities must be adjusted to take this into account The allele frequencies and calculations that we describe here are based on assumptions that the population is large and has little relatedness or inbreeding If a DNA profile is analyzed from a person in a small interrelated group, allele frequency tables and calculations may not apply SPECIAL TOPIC The Prosecutor’s Fallacy It is sometimes stated, by both the legal profession and the public, that “the suspect must be guilty given that the chance of a random match to the crime scene sample is in 10 billion—greater than the population of the planet.” This type of statement is known as the prosecutor’s fallacy because it equates guilt with a numerical probability derived from one piece of evidence, in the absence of other evidence A match between a suspect’s DNA profile and crime scene evidence does not necessarily prove guilt, for many reasons such as human error or contamination of samples, or even deliberate tampering In addition, a DNA profile that does not match the evidence does not necessarily mean that the suspect is innocent For example, a suspect’s profile may not match that from a semen sample at a rape scene, but the suspect could still have been involved in the crime, perhaps by restraining the victim For these and other reasons, DNA profiles must be interpreted in the context of all the evidence in a case A more detailed description of problems with DNA profiles is given in the next section DNA Profile Databases Many countries throughout the world maintain national DNA profile databases The first of these databases was established in the UK in 1995 and now contains approximately million profiles—representing almost 10 percent of the population In the United States, both state and federal governments have DNA profile databases The entire system of databases along with tools to analyze the data is known as the Combined DNA Index System (CODIS) and is maintained by the FBI As of August 2013, there were more than 11 million DNA profiles stored within the CODIS system The two main databases in CODIS are the convicted offender database, which contains DNA profiles from individuals convicted of certain crimes, and the forensic database, which contains profiles generated from crime scene evidence In addition, some states have DNA profile databases containing profiles from suspects and from unidentified human remains and missing persons BOX The Kennedy Brewer Case: Two Bite-Mark Errors and One Hit I n 1992 in Mississippi, Kennedy Brewer was arrested and charged with the rape and murder of his girlfriend’s 3-year-old daughter, Christine Jackson Although a semen sample had been obtained from Christine’s body, there was not sufficient DNA for profiling Forensic scientists were also unable to identify the ABO blood group from the bloodstains left at the crime scene The prosecution’s only evidence came from a forensic bite-mark specialist who testified that the 19 “bite marks” found on Christine’s body matched imprints made by Brewer’s two top teeth Even though the specialist had recently been discredited by the American Board of Forensic Odontology, and the defense’s expert dentistry witness testified that the marks on Christine’s body were actually postmortem insect bites, the court convicted Brewer of capital murder and sexual battery and sentenced him to death In 2001, more sensitive DNA profiling was conducted on the 1992 semen sample The profile excluded Brewer as the donor of the semen sample It also excluded two of Brewer’s friends, and Y-chromosome profiles excluded Brewer’s male relatives Despite these test results, Brewer remained in prison for another five years, awaiting a new trial In 2007, the Innocence Project took on Brewer’s case and retested the DNA samples The profiles matched those of another man, Justin Albert Johnson, a man with a history of sexual assaults who had been one of the original suspects in the case Johnson subsequently confessed to Christine Jackson’s murder, as well as to another rape and murder— that of a 3-year-old girl named Courtney Smith Levon Brooks, the ex-boyfriend of Courtney’s mother, had been convicted of murder in the Smith case, also based on bite-mark testimony by the same discredited expert witness On February 15, 2008, all charges against Kennedy Brewer were dropped, and he was exonerated of the crimes Levon Brooks was subsequently exonerated of the Smith murder in March of 2008 Since 1989, more than 320 people in the United States have been exonerated of serious crimes, based on DNA profile evidence Seventeen of these people had served time on death row In more than 100 of these exoneration cases, the true perpetrator has been identified, often through searches of DNA databases ST TE CHNIC A L AND E THIC AL IS S U E S S U RROU NDING D NA P RO FI L I N G 511 BOX A Case of Transference: The Lukis Anderson Story O Suspects who are not convicted can request that their profiles be removed from the databases DNA profile databases have proven their value in many different situations As of August 2013, use of CODIS databases had resulted in more than 200,000 profile matches that assisted criminal investigations and missing persons searches (Box 3) Despite the value of DNA profile databases, they remain a concern for many people who question the privacy and civil liberties of individuals versus the needs of the state Technical and Ethical Issues Surrounding DNA Profiling Although DNA profiling is sensitive, accurate, and powerful, it is important to be aware of its limitations One limitation is that most criminal cases have either no DNA evidence for analysis or DNA evidence that would not be informative to the case In some cases, potentially valuable DNA evidence exists but remains unprocessed and backlogged Another serious problem is that of human error There are cases in which innocent people have been convicted of violent crimes based on DNA samples that had been inadvertently switched during processing DNA evidence samples from crime scenes are often mixtures derived from any number of people present at the crime scene or even from Several months after Anderson’s release, prosecutors announced that they had solved the puzzle The paramedics who had treated Anderson and taken him to the hospital had then responded to the call at Kumra’s house, where they had inadvertently transferred Anderson’s DNA onto Kumra’s fingernails It is not clear how the transfer had occurred, but likely Anderson’s DNA had been present on the paramedics’ equipment or clothing If Lukis Anderson had not been in the hospital with an irrefutable alibi, he may have faced the death sentence based on DNA evidence His story illustrates how too much confidence in the power of DNA evidence can lead to false accusations It also points to the robustness of DNA, which can remain intact, survive disinfection, and be transferred from one location to another, under unlikely circumstances people who were not present, but whose biological material (such as hair or saliva) was indirectly introduced to the site (Box 4) Crime scene evidence is often degraded, yielding partial DNA profiles that are difficult to interpret One of the most disturbing problems with DNA profiling is its potential for deliberate tampering DNA profile technologies are so sensitive that profiles can be generated from only a few cells—or even from fragments of synthetic DNA There have been cases in which criminals have introduced biological material to crime scenes, in an attempt to affect forensic DNA profiles It is also possible to manufacture artificial DNA fragments that match STR loci of a person’s DNA profile In 2010, a research paper1 reported methods for synthesizing DNA of a known STR profile, mixing the DNA with body fluids, and depositing the sample on crime scene items When subjected to routine forensic analysis, these artificial samples generated perfect STR profiles In the future, it may be necessary to develop methods to detect the presence of synthetic or cloned DNA in crime scene samples It has been suggested that such detections could be done, based on the fact that natural DNA contains epigenetic markers such as methylation Frumkin, D., et al 2010 Authentication of forensic DNA samples Forensic Sci Int Genetics 4: 95–103 SPECIAL TOPIC n November 30, 2012, police discovered the body of Raveesh Kumra at his home in Monte Sereno, California Kumra’s house had been ransacked, and he had suffocated from the tape used to gag him Police collected DNA samples from the crime scene and performed DNA profiling Several suspects were identified through matches to DNA database entries One match, to a sample taken from Kumra’s fingernails, was that of Lukis Anderson, a homeless man who was known to police Based on the DNA profile match, Anderson was arrested, charged with murder, and jailed He remained in jail, with a death sentence over his head, for the next five months The authorities believed that they had a solid case The crime scene DNA profile was a perfect match to Anderson’s DNA profile, and the lab results were accurate Prosecutors planned to pursue the death penalty The only problem for the prosecution was that Anderson could not have been involved in the murder, or even present at the crime scene On the night of the murder, Anderson had been intoxicated and barely conscious on the streets of San Jose and had been taken to the hospital, where he remained for the next 12 hours Given his iron-clad alibi, authorities were forced to release Anderson But they remained baffled about how an innocent person’s DNA could have been found on a murder victim—one whom Anderson had never even met 512 SPECIAL TOPIC: D NA F OREN S IC S Many of the ethical questions related to DNA profiling involve the collection and storage of biological samples and DNA profiles Such questions deal with who should have their DNA profiles stored on a database and whether police should be able to collect DNA samples without a suspect’s knowledge or consent Another ethical question involves the use of DNA profiles that partially match those of a suspect There are cases in which investigators search for partial matches between the suspect’s DNA profile and other profiles in a DNA database On the assumption that the two profiles arise from two genetically related individuals, law enforcement agencies pursue relatives of the person whose profile is stored in the DNA database Testing in these cases is known as familial DNA testing Should such searches be considered scientifically valid or even ethical? It is now possible to accurately predict the eye and hair color of persons based on information in their DNA sample—a method known as DNA phenotyping In addition, scientists are devising DNA-based tests that could provide estimates of age, height, racial ancestry, hairline, facial width, and nose size Should this type of information be used to identify or convict a suspect? As DNA profiling becomes more sophisticated and prevalent, we should carefully consider both the technical and ethical questions that surround this powerful new technology Visit the Study Area in MasteringGenetics for a list of further readings on this topic, including journal references and selected web sites SPECIAL TOPIC Review Questions What is VNTR profiling, and what are the applications of this technique? Why are short tandem repeats (STRs) the most commonly used loci for forensic DNA profiling? Describe capillary electrophoresis How does this technique distinguish between input DNA and amplified DNA? What are the advantages and limitations of Y-chromosome STR profiling? How does the AMEL gene locus allow investigators to tell whether a DNA sample comes from a male or a female? Explain why mitochondrial DNA profiling is often the method of choice for identifying victims of massacres and mass disasters What is a “profile probability,” and what information is required in order to calculate it? Describe the database system known as CODIS What determines whether a person’s DNA profile will be entered into the CODIS system? What is DNA barcoding, and what types of cases use this profiling method? 10 Why is it important to understand the prosecutor’s fallacy? Discussion Questions Given the possibility that synthetic DNA could be purposely introduced to a crime scene in order to implicate an innocent person, what methods could be developed to distinguish between synthetic and natural DNA? Different countries and jurisdictions have different regulations regarding the collection and storage of DNA samples and profiles What are the regulations within your region? Do you think that these regulations sufficiently protect individual rights? If you were acting as a defense lawyer in a murder case that used DNA profiling as evidence against the defendant, how would you explain to the jury the limitations that might alter their interpretation of the crime scene DNA profile? The phenomena of somatic mosaicism and chimerism are more prevalent than most people realize For example, pregnancy and bone marrow transplantation may lead to a person’s genome becoming a mixture of two different genomes Describe how DNA forensic analysis may be affected by chimerism and what measures could be used to mitigate any confusion during DNA profiling Find out more about genetic chimerism in an article by Zimmer, C., DNA double take, New York Times, September 16, 2013 SPECIAL TOPICS IN MOD ERN G E NE TICS Genomics and Personalized Medicine P 513 SPECIAL TOPIC hysicians have always practiced personalized medistored in a digital form within a personal computerized cine in order to make effective treatment decimedical file Medical practitioners will use automated sions for their patients Doctors take into account a methods to scan the sequence information within these patient’s symptoms, family history, lifestyle, and data files for clues to disease susceptibility and reactions to derived from many types of medical tests However, within drugs In the future, genomic profiling and personalized the last 20 years, personalized medicine has taken a new medicine may allow physicians to predict which diseases and potentially powerful direction based on genetics and you will develop, which therapeutics will work for you, and genomics Today, the phrase personalized medicine is used which drug dosages are appropriate to describe the application of information from a patient’s In this Special Topic chapter, we will outline the curunique genetic profile in order to select effective treatments rent uses of genetic and genomic-based personalized medithat have minimal side-effects and to detect disease suscepcine in disease diagnosis and drug selection In addition, tibility prior to development of the disease we will outline the future directions for personalized mediDespite the immense quantities of medical information cine, as well as some ethical and technical challenges assoand pharmaceuticals that are available, the diagnosis and ciated with it treatment of human disease remain imperfect It is sometimes difficult or impossible to accurately diagnose some conditions In addition, some patients not respond to treatPersonalized Medicine and ments, while others may develop side-effects that can be annoying or even life-threatening As much of the basis for Pharmacogenomics disease susceptibility and the variation that patients exhibit toward drug treatments are genetically determined, progPerhaps the most developed area of personalized medicine ress in genetics, genomics, and molecular biology has the is in the field of pharmacogenomics Pharmacogenomics potential to significantly advance medical diagnosis and is the study of how an individual’s entire genetic makeup treatment determines the body’s response to drugs The term pharThe sequencing of the human genome, macogenomics is used interchangeably the cataloging of genetic sequence variwith pharmacogenetics, which refers to the “In the future, ants, and the linking of sequence variants study of how sequence variation within personalized with disease susceptibility form the basis specific candidate genes affects an indiof the newly emerging field of personalized medicine may allow vidual’s drug responses medicine In addition, a rapidly growing list In pharmacogenomics, scientists take physicians to predict of genetic tests helps physicians determine into account many aspects of drug metabowhether a patient will have an adverse drug which diseases you lism and how genetic traits affect these will develop, which aspects When a drug enters the body, it reaction and whether a particular pharmaceutical will be effective for that patient interacts with various proteins, including therapeutics will Although much of the promise of percarriers, cell-surface receptors, transporters, work for you, and sonalized medicine remains in the future, and metabolizing enzymes These proteins which drug dosages affect a drug’s target site of action, absorpsignificant progress is underway As are appropriate.” genome technologies advance and the cost tion, pharmacological response, breakdown, of sequencing personal genomes declines, it and excretion Because so many interactions is becoming easier to examine a patient’s unique genomic occur between a drug and proteins within the patient, many profile in order to diagnose diseases and prescribe treatgenes and many different genetic polymorphisms can affect a ments Proponents of personalized medicine foresee a person’s response to a drug future in which each person will have his or her genome In this subsection, we examine two ways in which sequence determined at birth and will have the sequence genomics and personalized medicine are changing the field 514 SPECIAL TOPIC: GEN OMICS AND PE RS ONA LIZ E D ME DICINE of pharmacogenomics: by optimizing drug therapies and by reducing adverse drug reactions SPECIAL TOPIC Optimizing Drug Therapies When it comes to drug therapy, it is clear that “one size does not fit all.” On average, a drug will be effective in only about 50 percent of patients who take it (ST Figure 4–1) This situation means that physicians often must switch their patients from one drug to another until they find one that is effective Not only does this waste time and resources, but also it may be dangerous to the patient who is exposed to a variety of different pharmaceuticals and who may not receive appropriate treatment in time to combat a progressive illness Pharmacogenomics increases the efficacy of drugs by targeting those drugs to subpopulations of patients who will benefit One of the most common current applications of personalized pharmacogenomics is in the diagnosis and treatment of cancers Large-scale sequencing studies show that each tumor is genetically unique, even though it may fall into a broad category based on cytological analysis or knowledge of its tissue origin Given this genomic variability, it is important to understand each patient’s mutation profile to select an appropriate treatment—particularly those newer treatments based on the molecular characteristics of tumors (Box 1) Drug type Antidepressants (SSRIs) 38% Asthma drugs 40% Diabetes drugs 43% Arthritis drugs 50% Alzheimer drugs 70% Cancer drugs 75% © 2011 Personalized Medicine Coalition S T FIGUR E 4–1 Variations in patient response to drugs This figure gives a general summary of the percentages of patients for which a particular class of drugs is ineffective One of the first success stories in personalized medicine was that of the HER-2 gene and the use of the drug Herceptin® in breast cancer The human epidermal growth factor receptor (HER-2) gene is located on chromosome 17 and codes for a transmembrane tyrosine kinase receptor protein called HER-2 These receptors are located within the cell membranes of normal breast epithelial cells and, when bound to an extracellular growth factor (ligand), BOX The Story of Pfizer’s Crizotinib I n 2007, Beverly Sotir was diagnosed with advanced non-small cell lung cancer (NSCLC) Beverly, a 68-year-old grandmother and nonsmoker, received standard chemotherapy, but her cancer continued to proliferate She was given six months to live At this same time, an apparently unrelated scientific study was underway by the pharmaceutical company, Pfizer Pfizer had developed a compound called crizotinib, which was designed to inhibit the activity of MET, a tyrosine kinase that is abnormal in a number of tumors Although crizotinib also inhibited another kinase called ALK (anaplastic lym-phoma kinase), scientists did not consider it significant After clinical trials for crizotinib began, an article was published* describing a chromosomal translocation found in a small number of NSCLCs This translocation fused the ALK gene to another gene called EML4, leading to production of a fusion protein that stimulated cancer cell growth Pfizer immediately changed its clinical trial to include NSCLC patients Beverly’s doctors at the Dana-Farber Cancer Institute in Boston tested her tumors, discovered that they contained the ALK/EML4 fusion gene, and enrolled Beverly in the trials The results were dramatic Within six months, Beverly’s tumors shrunk by more than 50 percent and some disappeared entirely As of 2011, Beverly continued to well Results of the clinical trials for crizotinib showed that tumors shrank or stabilized in 90 percent of the 82 patients whose tumors contained the ALK fusion gene Those patients who responded well to treatment had positive responses for up to 15 months Scientists report that the ALK fusion gene tends to occur most frequently in young NSCLC patients who have never smoked Approximately percent of patients with NSCLC have this translocation in their tumor cells Although only a small percentage of people might benefit from crizotinib, this means that about 45,000 people a year, worldwide, may be eligible for this treatment Crizotinib is now approved in the United States for treatment of NSCLCs * Choi, S.M., et al 2007 Identification of the transforming EML4-ALK fusion gene in non-small cell lung cancer Nature 448: 561–566 ST (a) 515 the microscope [ST Figure 4–2(a)] The FISH assay (which is described in Chapter 9) assesses the number of HER-2 genes by comparing the fluorescence signal from a HER-2 probe with a control signal from another gene that is not amplified in the cancer cells [ST Figure 4–2(b)] Herceptin has had a major effect on the treatment of HER-2 positive breast cancers When Herceptin is used in combination with chemotherapy, there is a 25 to 50 percent increase in survival, compared with the use of chemotherapy alone Herceptin is now one of the biggest selling biotechnology products in the world, generating more than $5 billion in annual sales There are now dozens of drugs whose prescription and use depend on the genetic status of the target cells Approximately 10 percent of FDA-approved drugs have labels that include pharmacogenomic information (ST Table 4.1) For example, about 40 percent of colon cancer patients respond to the drugs Erbitux® (cetuximab) and Vectibix® (panitumumab) These two drugs are monoclonal antibodies that bind to epidermal growth factor receptors (EGFRs) on the surface of cells and inhibit the EGFR signal transduction pathway In order to work, cancer cells must express EGFR on their surfaces and must also have a wild-type K-RAS gene The presence of EGFR can be assayed using a staining test and observation of cancer cells under a microscope Mutations in the K-RAS gene can be detected using assays based on the polymerase chain reaction (PCR) method, which is described earlier in the text (see Chapter 17) Another example of treatment decisions being informed by genetic tests is that of the Oncotype DX® Assay (Genomic Health Inc.) This assay analyzes the expression (amount of mRNA) from 21 genes in breast cancer samples, in order to (b) S T F I GURE – Protein and gene-amplification assays to determine HER-2 levels in cancer cells (a) Normal and breast cancer cells within a biopsy sample, stained by HER-2 immunohistochemistry Cell nuclei are stained blue Cancer cells that overexpress HER-2 protein stain brown at the cell membrane (b) Cancer cells assayed for HER-2 gene copy number by fluorescence in situ hybridization Cell nuclei are stained blue HER-2 gene DNA appears bright red Chromosome 17 centromeres stain green The degree of HER-2 gene amplification is expressed as the ratio of red staining foci to green staining foci SPECIAL TOPIC send signals to the cell nucleus that result in the transcription of genes involved in cell growth and division In about 25 percent of invasive breast cancers, the HER-2 gene is amplified and the protein is overexpressed on the cell surface In some breast cancers, the HER-2 gene is present in as many as 100 copies per cell The presence of HER-2 overexpression is associated with increased tumor invasiveness, metastasis, and cell proliferation, as well as a poorer patient prognosis Using recombinant DNA technology, Genentech Corporation in California developed a monoclonal antibody known as trastuzumab (or Herceptin) that is designed to bind specifically to the extracellular region of the HER-2 receptor When bound to the receptor, Herceptin appears to inhibit the signaling capability of HER-2 and may also flag the HER-2-expressing cell for destruction by the patient’s immune system In cancer cells that overexpress HER-2, Herceptin treatment causes cell-cycle arrest, and in some cases, death of the cancer cells Because Herceptin will only act on breast cancer cells that have amplified HER-2 genes, it is important to know the HER-2 phenotype of each cancer In addition, Herceptin has potentially serious side-effects Hence, its use must be limited to those who could benefit from the treatment A number of molecular assays have been developed to determine the gene and protein status of breast cancer cells Two types of tests are used routinely to determine the amount of HER-2 overexpression in cancer cells: immunohistochemistry (IHC) and fluorescence in situ hybridization (FISH) In IHC assays, an antibody that binds to the HER-2 protein is added to fixed tissue on a slide The presence of bound antibody is then detected with a stain and observed under P ersonalized M edicine and Pharmacogenomics 516 S T TA B L E SPECIAL TOPIC: GEN OMICS AND PE RS ONA LIZ E D ME DICINE Examples of Personalized Medicine Drugs and Diagnostics Therapy Gene Test Description HER-2 amplification Breast cancer test to accompany Herceptin use Erbitux (cetuximab) EGFR expression, K-RAS mutations Protein and mutation analysis prior to treatment Gleevec® (imatinib) BCR/ABL fusion Gleevec used in treatment of Philadelphia chromosome-positive chronic myelogenous leukemia Gleevec® (imatinib) C-KIT Gleevec used in stomach cancers expressing mutated C-KIT Tarceva® (erlotinib) EGFR expression Lung cancer for EGFR-positive tumors Drugs/surgery MLH1, MSH2, MSH6 Gene mutations related to colon cancers Hormone/chemotherapies Oncotype DX® test Selection of breast cancer patients for chemotherapy Chemotherapies Aviara Cancer TYPE ID® Classifies 39 tumor types using gene-expression assays đ Herceptin (trastuzumab) đ SPECIAL TOPIC â 2011 Personalized Medicine Coalition help physicians select appropriate treatments and predict the course of the disease These genes were chosen because their levels of gene expression correlate with breast cancer recurrence after initial treatment Based on the mRNA expression levels revealed in the assay results, scientists calculate a “Recurrence Score,” estimating the likelihood that the cancer will recur within a ten-year period Those patients with a low-risk rating would likely not benefit by adding chemotherapy to their treatment regimens and so can be treated with hormones alone Those with higher risk scores would likely benefit from more aggressive therapies Reducing Adverse Drug Reactions Every year, about million people in the United States have serious side-effects from pharmaceutical drugs, and approximately 100,000 people die The costs associated with these adverse drug reactions (ADRs) are estimated to be $136 billion annually Although some ADRs result from drug misuse, others result from a patient’s inherent physiological reactions to a drug Sequence variations in a large number of genes can affect drug responsiveness Of particular significance are the genes that encode the cytochrome P450 families of enzymes These family members are encoded by 57 different genes People with some cytochrome P450 gene variants metabolize and eliminate drugs slowly, which can lead to accumulations of the drug and overdose side-effects In contrast, other people have variants that cause drugs to be eliminated quickly, leading to reduced effectiveness An example of gene variants that affect drug responses is that of CYP2D6 gene This member of the cytochrome P450 family encodes the debrisoquine hydroxylase enzyme, which is involved in the metabolism of approximately 25 percent of all pharmaceutical drugs, including acetaminophen, clozapine, beta blockers, tamoxifen, and codeine There are more than 70 variant alleles of this gene Some mutations in this gene reduce the activity of the encoded enzyme, and others can increase it Approximately 80 percent of people are homozygous or heterozygous for the wild-type CYP2D6 gene and are known as extensive metabolizers Approximately 10 to 15 percent of people are homozygous for alleles that decrease activity (poor metabolizers), and the remainder of the population have duplicated genes (ultra-rapid metabolizers) Poor metabolizers are at increased risk for ADRs, whereas ultra-rapid metabolizers may not receive sufficient dosages to have an effect on their conditions In 2005, the FDA approved a microarray gene test called the AmpliChip® CYP450 assay (Roche Diagnostics) that detects 29 genetic variants of two cytochrome P450 genes—CYP2D6 and CYP2C19 This test detects single-nucleotide polymorphisms (SNPs) as well as gene duplications and deletions The AmpliChip CYP450 assay is an example of a genotyping microarray, such as those described earlier in the text (see Chapter 19) After scanning with an automated scanner, the data are analyzed by computer software, and the CYP2D6/CYP2C19 genotype of the individual is generated Another example of pharmacogenomics in personalized medicine is that of the CYP2C9 and VKORC1 genes and the drug warfarin Warfarin (also known as Coumadin) is an anticoagulant drug that is prescribed to prevent blood clots after surgery and to aid people with cardiovascular conditions who are prone to clots Warfarin inhibits the vitamin K-dependent synthesis of several clotting factors There is a more than ten-fold variability between patients in the doses of warfarin that have a therapeutic response In the past, physicians attempted to adjust the doses of warfarin through a trial-and-error process during the first year of treatment If the dosage of warfarin is too high, the patient may experience serious hemorrhaging; if it is too ST P ersonalized M edicine and D isease D iagnostics 517 BOX The Pharmacogenomics Knowledge Base (PharmGKB): Genes, Drugs, and Diseases on the Web T tion to drugs On the PharmGKB Web site (see ST Figure 4–3), you may search for genes and variants that affect drug reactions, information on a large number of drugs, diseases and their genetic links, pharmacogenomic pathways, gene tests, and relevant publications Visit the PharmGKB Web site at http://www.pharmgkb.org Knowledge Consortia Implementation Projects CPIC Dosing Guidelines Data Consortia ac o rm Ph a Clinical Implementation Clinical Interpretation Knowledge Annotation, Aggregation & Integration Knowledge Extraction Primary Pharmacogenic Literature Genotype-Based Pharmacogenomic summaries Associations Between Genetic Variants and Drugs { Drug-Centered Pathways Evaluation Level of Evidence Very Important Pharmacogene (VIP) Summaries Manual Curation Automated (NLP) Curation { Relations Entities S T FI G URE – The PharmGKB Knowledge Pyramid A visual representation of the types of information available at www.pharmgkb.org low, the patient may develop life-threatening blood clots It is estimated that 20 percent of patients are hospitalized during their first six months of treatment due to warfarin side-effects Variations in warfarin activity are affected by polymorphisms in several genes, particularly CYP2C9 and VKORC1 Two single-nucleotide polymorphisms in CYP2C9 lead to reduced elimination of warfarin and increased risk of hemorrhage About 25 percent of Caucasians are heterozygous for one of these polymorphisms, and percent appear to be homozygous About percent of patients of Asian and African descent carry these variants Patients who are heterozygous or homozygous for some alleles of CYP2C9 require a 10 to 90 percent lower dose of warfarin The FDA recommends the use of CYP2C9 and VKORC1 genetic tests to predict the likelihood that a patient may have an adverse reaction to warfarin Several companies offer tests to detect polymorphisms in these genes, using methods based on PCR amplification and allele-specific primers It is estimated that the use of warfarin genetic tests could prevent 17,000 strokes and 85,000 serious hemorrhages per year The savings in health care could reach $1.1 billion per year Pharmacogenomic tests and treatments, and the genetic information on which they are based, are rapidly advancing Updated information on all aspects of pharmacogenomics can be found on the Pharmacogenomics Knowledge Base, which is described in Box Personalized Medicine and Disease Diagnostics The ultimate goal of personalized medicine is to apply information from a patient’s full genome to help physicians diagnose disease and select treatments tailored to that particular patient Not only will this information be gleaned from genome sequencing, but it will also be informed by gene-expression information derived from transcriptomic, proteomic, metabolomic, and epigenetic tests SPECIAL TOPIC ge no m ic s ,K no wl ed ge ,I m pl em en t at io n he Pharmacogenomics Knowledge Base (PharmGKB) is a publicly available Internet database and information source developed by Stanford Uni- versity It is funded by the National Institutes of Health (NIH) and forms part of the NIH Pharmacogenomics Research Network, a U.S research consortium The goal of PharmGKB is to provide researchers and the general public with information that will increase the understanding of how genetic variation contributes to an individual’s reac- SPECIAL TOPIC 518 SPECIAL TOPIC: GEN OMICS AND PE RS ONA LIZ E D ME DICINE At the present time, the most prevalent use of genomic information for disease diagnostics is genetic testing that examines specific disease-related genes and gene variants Most existing genetic tests detect the presence of mutations in single genes that are known to be linked to a disease Currently, more than 1600 such genetic tests are available A comprehensive list of genetic tests can be viewed on the NIH Genetic Testing Registry at www.ncbi.nlm.nih.gov/ gtr/ The technologies used in many of these genetic tests are presented earlier in the text (see Chapter 19) Genetic tests are classified according to their uses, and they fall into one or more groups Diagnostic tests are designed to detect the presence or absence of gene variants or mutations linked to a suspected genetic disorder in a symptomatic patient Predictive tests detect mutations and variants in patients with a family history of a known genetic disorder—for example, Huntington disease or BRCA-linked breast cancer Carrier tests help physicians identify patients who carry a gene mutation linked to a disorder that might be passed on to their offspring—such as Tay–Sachs or cystic fibrosis Preimplantation tests are performed on early embryos in order to select embryos for implantation that not carry a suspected disease Prenatal tests detect potential genetic diseases in a fetus The test for Down syndrome is a well-known example Over the last decade, genome sequencing methods have progressed rapidly in speed, accuracy, and costeffectiveness In addition, other “omics” technologies such as transcriptomics and proteomics are providing major insights into how DNA sequences lead to gene expression and, ultimately, to phenotype (Refer to Chapter 18 for descriptions of techniques and data emerging from human “omics” technologies.) As these technologies become more rapid and costeffective, they will begin to make important contributions to personalized medicine Although the application of “omics” to personalized medicine has not yet entered routine medical care, several proof-of-principle cases illustrate the way in which whole-genome analysis may develop in the future They also reveal some of the limitations that must be overcome before genome-based medicine becomes commonplace and practical In the next two sections, we will describe several of these studies as they pertain to the diagnosis of cancers and other diseases Personal Genomics and Cancer As we learned earlier in the text (see Chapter 16), cancer is a genetic disease at the level of somatic cells High-throughput sequencing of normal and cancer genomes, along with RNA sequencing and protein profiling of normal and cancer cells, has revealed more of the mutations and gene rearrangements associated with specific cancers Studies such as the Cancer Genome Atlas project are amassing data equivalent to 20,000 genome projects on normal and tumor DNA from patients with more than 20 different types of cancer Such studies are revealing that cancers once classified in general terms (such as “prostate cancer”) are in fact many different diseases based on their genetic profiles For example, in the past, blood cancers were categorized into two large groups: leukemias and lymphomas Today, we know that each category can be broken down into dozens of different types, based on gene mutation and expression characteristics Similarly, breast cancer is now thought to be at least 10 separate diseases, based on genomic and gene-expression data The recognition that tumors differ significantly in gene expression will likely be used in the future to tailor therapies to attack or modify specific gene-expression aberrations Another significant discovery from cancer genome research is that every tumor is genetically unique, even though common cellular pathways are involved This realization indicates that each cancer may require a personalized treatment and that the genomic “net” that is cast to detect altered gene function must be wide enough to capture all relevant defects within each cancer The potential for whole-genome sequencing and gene-expression assays in cancer diagnosis and treatment is illustrated by a case described in Box This story illustrates the enormous quantities of resources involved in genomic sequencing and gene-expression assays, as well as the interpretation of the resulting data It also shows that genomic sequencing alone may not be sufficient to detect the most important defects in cancer cells, including those that would be suitable targets for therapy The story points out that few gene-specific drugs are currently available and those that exist are expensive and may not be covered by medical insurance The patient in this story had been fortunate that a key defect in his cancer had been detectable using genomic techniques and could be targeted by an existing drug As most cancers contain dozens to hundreds of genetic and gene-expression defects, and more than one gene product may drive the cells to form cancers, the goal of developing drugs for each of these defects remains a challenging one Despite these challenges, it is a story of future promise for the role of cancer diagnosis and gene-specific treatments in personalized medicine Personal Genomics and Disease Diagnosis: Analyzing One Genome In 2010, the journal Lancet published a report illustrating the type of information that we can currently obtain from personal whole-genome sequencing.1 The personal genome Ashley, E.A., et al 2010 Clinical assessment incorporating a personal genome Lancet 375: 1525–1535 ST P ersonalized M edicine and D isease D iagnostics 519 BOX Personalized Cancer Diagnostics and Treatments: The Lukas Wartman Story1 D sequence in this study was the first one to be sequenced using a method known as true single-molecule sequencing (tSMS™) Some high-throughput methods, such as those described earlier in the text (see Chapters 17 and 18), require cloning or PCR amplification of template DNA prior to sequencing In contrast, the tSMS method directly sequences individual genomic DNA strands with minimum processing The sequencing of this genome took about a week, was performed with one machine, used the services of three people, and cost $48,000 The genome sequence was that of Dr Stephen Quake, a Stanford University professor who developed the technology and headed the research group He was a healthy 40-year-old male who had a family history of arthritis, aortic aneurysm, coronary artery disease, and sudden cardiac death By comparing Dr Quake’s DNA sequence with other human genome sequences in databases, they discovered a total of 2.6 million SNPs and 752 copy number variations The researchers then sorted through the genome sequence data to determine which of these variants might have an effect on phenotype This was accomplished by searching for use in the treatment of some kidney and gastrointestinal cancers Dr Wartman decided to try sunitinib Unfortunately, the drug cost $330 per day, and Dr Wartman’s insurance company refused to pay for it In addition, the drug company Pfizer refused to supply the drug to him under its compassionate use program Despite these setbacks, he collected enough money to buy a week’s worth of sunitinib Within days of starting treatment, his blood counts were approaching normal Within two weeks his bone marrow was free of cancer cells At this point Pfizer reversed its decision and supplied Dr Wartman with the drug In addition, he underwent a second bone marrow transplant to help ensure that the cancer would not return Although Dr Wartman’s long-term prognosis is still uncertain, his successful experience with personalized cancer treatment has given him hope and has spurred research into the regulation of the FLT3 gene in other cancers Kolata, G., In treatment for leukemia, glimpses of the future New York Times, July 7, 2012 known SNPs in several large databases, manually creating their own disease-associated SNP database, and calculating likelihood ratios for various disease risks The analysis required the combined efforts of more than two dozen scientists and clinicians over a period of about a year, and information gleaned from more than a dozen sequence databases, new and existing sequence analysis tools, and hundreds of individually accessed research papers To determine how Dr Quake may respond to pharmaceutical drugs, the researchers searched the PharmGKB database (see Box 2) for the presence of known variants within pharmacogenomically important genes He was found to have 63 clinically relevant SNPs within genes associated with drug reactions In addition, his genome contained six previously unknown SNPs that could alter amino acid sequences in drug-response genes For example, the genome sequence revealed that Dr Quake was heterozygous for a null mutation in the CYP2C19 gene This mutation could make him sensitive to a range of drugs, including those used to treat aspects of heart disease He would also be more sensitive than normal to warfarin, based on SNPs within his VKORC1 and CYP4F2 SPECIAL TOPIC uring his final year of medical school in 2002, Dr Lukas Wartman began to experience symptoms of fatigue, fever, and bone pain After months of tests, he was given a diagnosis of adult acute lymphoblastic leukemia (ALL) Following two years of chemotherapies, his cancer went into remission for three years When the ALL recurred, his doctors treated him with intensive chemotherapy and a bone marrow transplant, which put him back into remission for another three years After his second relapse, all attempts at treatment failed and he was rapidly deteriorating At the time of his second relapse, Dr Wartman was working as a physician-scientist at Washington University, researching the genetics of leukemias His colleagues, including Dr Timothy Ley, associate director of the Washington University Genome Institute, decided to rush into a last-minute effort to save him Using the university’s sequencing facilities and supercomputers, the research team sequenced the entire genomes of his normal and cancer cells They also analyzed his RNA types and expression levels using RNAseq technologies As they had expected, Dr Wartman’s cancer cells contained many gene mutations Unfortunately, there were no known drugs that would attack the products of these mutated genes The RNA sequence analysis, however, revealed unexpected results It showed that the fms-related tyrosine kinase (FLT3) gene, although having a normal DNA sequence, was overexpressed in his cancer cells— perhaps due to mutations in the gene’s regulatory regions The FLT3 gene encodes a protein kinase that is involved in normal hematopoietic cell growth and differentiation, and its overexpression would be a potentially important contributor to Dr Wartman’s cancer Equally interesting, and fortunate, was that the drug sunitinib (Sutent) was known to inhibit the FLT3 kinase and had been approved SPECIAL TOPIC 520 SPECIAL TOPIC: GEN OMICS AND PE RS ONA LIZ E D ME DICINE genes In contrast, Dr Quake’s DNA sequence contained gene variants associated with good responses to statins; however, other gene variants suggested that he might require higherthan-normal statin dosages The search for mutations within genes that directly affect disease conditions revealed several potentially damaging variants Dr Quake was heterozygous for a SNP within the CFTR gene that would change a glycine to arginine at position 458 This mutation could lead to cystic fibrosis if it was passed on to a son or daughter who also inherited a defective CFTR gene from the other parent Similarly, he was heterozygous for a recessive mutation in the hereditary haemochromatosis protein precursor gene (HFE), which is associated with the development of haemochromatosis, a serious condition leading to toxic accumulations of iron Also, Dr Quake was heterozygous for a recessive mutation in the solute carrier family (SLC3A1) gene This mutation is linked to cystinuria, an inherited disorder characterized by inadequate excretion of cysteine and development of kidney stones The scientists discovered a heterozygous SNP within the parafibromin (CDC73) gene that would create a prematurely terminated protein This gene is a tumor-suppressor gene linked to the development of hyperparathyroidism and parathyroid tumors The presence of this SNP increased the risk that Dr Quake might develop these types of tumors, if any of his cells experienced a loss-of-heterozygosity mutation in the other copy of the gene The analysis of Dr Quake’s genome sequence for the purpose of predicting future development of multifactorial disease was more challenging Genome-wide association studies have revealed large numbers of sequence variants that are associated with complex diseases; however, each of these variants most often contributes only a small part of the susceptibility to disease Because not all variants have been discovered or characterized, it is difficult to establish a numerical risk score for each of these diseases based on the presence of one or more SNPs As an example, the researchers discovered SNPs within three genes (TMEM43, DSP, and MYBPC3) that may be associated with sudden cardiac death However, the exact effects of two of these SNPs are still unclear, and the other SNP had not previously been described Dr Quake had five SNPs in genes associated with an increased risk of developing myocardial infarction and two SNPs associated with a lower risk Among the SNPs associated with increased risk, a variant in the apolipoprotein A precursor (LPA) gene is associated with a five-fold increased plasma lipoprotein(a) concentration and a two-fold increased risk of coronary artery disease By taking into consideration the simultaneous potential effects of many different SNPs, as well as the patient’s own environmental and personal lifestyle factors, the researchers concluded that Dr Quake’s genetics contributed to a significantly increased risk for eight conditions (such as Type diabetes, obesity, and coronary artery disease) and a decreased risk for seven conditions (such as Alzheimer disease) Dr Quake was offered the services of clinical geneticists, counselors, and clinical lab directors in order to help interpret the information generated from the genome sequence Genetic counseling covered areas such as the psychological and reproductive implications of genetic disease risk, the possibilities of discrimination based on genetic test results, and the uncertainties in risk assessments In 2012, another study of personal genome analysis was reported (Box 4) This study combined data from whole-genome sequencing, transcriptomics, proteomics, and metabolomics profiles from a single patient at multiple time points over a 14-month period This in-depth multilevel personal profiling allowed the patient to be monitored through both healthy and diseased states, as he contracted two virus infections and a period of Type diabetes This research points out how complex changes in gene expression may affect phenotype and shows the importance of looking beyond the raw sequence of an individual DNA It also indicates that gene-expression profiles can be monitored by current technologies and may be applied in the future as part of personalized medical testing Technical, Social, and Ethical Challenges There are still many technical hurdles to overcome before personalized medicine will become a standard part of medical care The technologies of genome sequencing, “omics” profiling, microarray analysis, and SNP detection need to be faster, more accurate, and cheaper Scientists expect that these challenges will be overcome in the near future; however, genome analysis needs to be used with caution until the technology becomes highly accurate and reliable Even a low rate of error in genetic sequences or test results could lead to misdiagnoses and inappropriate treatments Another challenge will be the storage and interpretation of vast amounts of genomic sequence data Each personal genome generates the letter-equivalent of 200 large phone books, which must be stored in databases, mined for relevant sequence variants, and meaning assigned to each sequence variant To undertake these kinds of analyses, scientists need to gather data from large-scale population genotyping studies that will link sequence variants to phenotype, disease, or drug responses Experts suggest that such studies will take the coordinated efforts of public and private research teams and more than a decade to complete Scientists will also need to develop efficient automated ST T echnical , Social , and Ethical Challenges 521 BOX Beyond Genomics: Personal Omics Profiling A Chen, R et al 2012 Personal omics profiling reveals dynamic molecular and medical phenotypes Cell 148: 1293–1307 systems and algorithms to deal with this massive amount of information Moreover, these data analyses will have to consider that genetic variants contribute only partially to personal phenotype Personalized medicine will also need to integrate information about environmental, personal lifestyle, and epigenetic factors Another technical challenge for personalized medicine is the development of automated health information technologies Health-care providers will need to use electronic health records to store, retrieve, and analyze each patient’s genomic profile, as well as to compare this information with constantly advancing knowledge about genes and disease Currently, fewer than 10 percent of hospitals and physicians in the United States have access to these types of information technologies Personalized medicine has a number of societal implications To make personalized medicine available to everyone, the costs of genetic tests, as well as the genetic counseling that accompanies them, must be reimbursed by insurance companies, even in cases where there are no prior diseases or symptoms Regulatory changes are required to ensure that genetic tests and genomic sequencing are accurate and Using RNAseq technologies, the researchers monitored the numbers and types of more than 19,000 mRNAs and miRNAs transcribed from more than 12,000 genes over 20 time points The data showed that sets of genes were coordinately regulated in response to conditions such as RSV infection and glucose levels The researchers also found that RNA species underwent differential splicing and editing during changes in physiological states Editing events included changes of adenosine to inosine and cytidine to uridine, and many of these RNA edits altered the amino acid sequences of translated proteins The researchers also profiled the levels of more than 6000 proteins and metabolites over the time course of the study Like the RNA data, the protein and metabolite data showed coordinated changes that occur through virus infections and glucoselevel changes Some of these changes were shared between RNA, protein, and metabolites, and others were unique to each category The medical significance of these patterns will be addressed in future studies that the data generated are reliably stored in databases that guarantee the patient’s privacy At the present time, less than percent of genetic tests are regulated by agencies such as the FDA Personalized medicine also requires changes to medical education In the future, physicians will be expected to use genomics information as part of their patient management For this to be possible, medical schools will need to train future physicians to interpret and explain genetic data In addition, more genetic counselors and genomics specialists will be required These specialists will need to understand genomics and disease, as well as to manipulate bioinformatic data As of 2010, there were only about 2500 genetic counselors and 1100 clinical geneticists in North America The ethical aspects of the new personalized medicine are also diverse and challenging For example, it is sometimes argued that the costs involved in the development of genomics and personalized medicine are a misallocation of limited resources Some argue that science should solve larger problems facing humanity, such as the distribution of food and clean water, before embarking on personalized medicine Similarly, some critics argue that such highly SPECIAL TOPIC study published by a research team led by Dr Michael Snyder of Stanford University provides an example of how multiple “omics” technologies can be used to examine one person’s healthy and diseased states.1 Blood samples were taken from a healthy individual (Dr Snyder) at 20 time points over a 14-month study period Dr Snyder’s whole-genome sequence was generated at each time point using two different methods and backed up by exome sequencing using three different methods In addition, his genome sequence was compared to that of his mother Concurrently, whole-transcriptome sequencing, proteomic profiling, and metabolomics assays were performed Dr Snyder’s genome sequence revealed a number of SNPs that are known to be associated with elevated risks for coronary artery disease, basal cell carcinoma, hypertriglyceridemia, and Type diabetes A mutation in the TERT gene, which is involved in telomere replication, gave an increased risk for aplastic anemia These data were followed by a series of medical tests Dr Snyder had no signs of aplastic anemia, and his telomere lengths were close to normal Similarly, his mother, who shared his mutation in the TERT gene, had no symptoms of aplastic anemia Medical tests revealed he did have elevated triglyceride levels, which he subsequently controlled using medication Blood glucose levels were initially normal but became abnormally high after he became infected with respiratory syncytial virus (RSV) In response to these data, Dr Snyder modified his diet and exercise regime and later brought his blood glucose down to normal levels An analysis of drug response gene variants revealed that he should have good responses to diabetic drugs 522 SPECIAL TOPIC: GEN OMICS AND PE RS ONA LIZ E D ME DICINE specialized and expensive medical care will not be available to everyone and represents a worsening of economic inequality There are also concerns about how we will protect the privacy of genome information that is contained in databases and private health-care records In addition, there need to be effective ways to prevent discrimination in employment or insurance coverage, based on information derived from genomic analysis Most experts agree that we are at the beginning of a personalized medicine revolution Information from genetics and genomics research is already increasing the effectiveness of drugs and enabling health-care providers to predict diseases prior to their occurrence In the future, personalized medicine will touch almost every aspect of medical care By addressing the upcoming challenges of the new personalized medicine, we can guide its use for the maximum benefit to the greatest number of people Visit the Study Area in MasteringGenetics for a list of further readings on this topic, including journal references and selected web sites SPECIAL TOPIC Review Questions What is pharmacogenomics, and how does it differ from pharmacogenetics? Describe how the drug Herceptin works What types of gene tests are ordered prior to treatment with Herceptin? What is the Oncotype DX Assay, and how is it used? How the cytochrome P450 proteins affect drug responses? Give two examples What types of genetic tests are currently available, and how are they classified? Give two examples of how genomic studies have altered our understanding of cancers Why is it necessary to examine gene-expression profiles, in addition to genome sequencing, for effective personalized medicine? Using the PharmGKB database, explain the relationship between CYP2D6 variants and the response of patients to the breast cancer drug, tamoxifen Discussion Questions In this chapter, we present three case studies that use personalized genomics analysis to predict and treat diseases Although these cases have shown how personalized medicine may evolve in the future, they have triggered controversy What are some objections to these types of studies, and how can these objections be addressed? What are the biggest challenges that must be overcome before personalized medicine becomes a routine component of medical care? What you think is the most difficult of these challenges and why? How can we ensure that a patient’s privacy is maintained as genome information accumulates within medical records? How would you feel about allowing your genome sequence to be available for use in research? As gene tests and genomic sequences become more commonplace, how can we prevent the emergence of “genetic discrimination” in employment and medical insurance? SPECIAL TOPICS IN MOD ERN G E NE TICS Genetically Modified Foods T 523 SPECIAL TOPIC hroughout the ages, humans have used selective Some countries have outright bans on all GM foods, breeding techniques to create plants and animals whereas others embrace the technologies Opponents cite with desirable genetic traits By selecting organisms safety and environmental concerns, whereas some scienwith naturally occurring or mutagen-induced variations tists and commercial interests extol the almost limitless and breeding them to establish the phenotype, we have virtues of GM foods The topic of GM food attracts hyperevolved varieties that now feed our growing populations bole and exaggerated rhetoric, information, and misinforand support our complex civilizations mation—on both sides of the debate Although we have had tremendous success shuffling So, what are the truths about GM foods? In this Special genes through selective breeding, the process is a slow one Topic chapter, we will introduce the science behind GM When recombinant DNA technologies emerged in the 1970s foods and examine the promises and problems of the new and 1980s, scientists realized that they could modify agritechnologies We will look at some of the controversies and culturally significant organisms in a more precise and rapid present information to help us evaluate the complex quesway by identifying and cloning genes that confer desirable tions that surround this topic traits, then introducing these genes into organisms Genetic engineering of animals and plants promised an exciting new phase in scientific agriculture, with increased productivity, reduced pesticide use, and enhanced flavor and What Are GM Foods? nutrition Beginning in the 1990s, scientists created a large numGM foods are derived from genetically modified organber of genetically modified (GM) food varieties The first isms (GMOs), specifically plants and animals of agriculone, approved for sale in 1994, was the Flavr Savr tomato— tural importance GMOs are defined as organisms whose a tomato that stayed firm and ripe longer than non-GM genomes have been altered in ways that not occur natutomatoes Soon afterward, other GM foods were developed: rally Although the definition of GMOs sometimes includes papaya and zucchini with resistance to virus infection, organisms that have been genetically modified by selective canola containing the tropical oil laurate, breeding, the most commonly used definicorn and cotton plants with resistance to tion refers to organisms modified through insects, and soybeans and sugar beets with “Genetic engineering genetic engineering or recombinant DNA of animals and tolerance to agricultural herbicides By technologies Genetic engineering allows 2012, more than 200 different GM crop plants promised an one or more genes to be cloned and transvarieties had been created Worldwide, GM exciting new phase in ferred from one organism to another— crops are planted on 170 million hectares of either between individuals of the same spearable land, with a global value of $15 bil- scientific agriculture, cies or between those of unrelated species It with increased lion for GM seeds also allows an organism’s endogenous genes Although many people see great potento be altered in ways that lead to enhanced productivity, tial for GM foods—to help address malnutrior reduced expression levels When genes reduced pesticide tion in a world with a growing human poputransferred between unrelated species, use, and enhanced are lation and climate change—others question the resulting organism is called transgenic the technology, oppose GM food develop- flavor and nutrition.” The term cisgenic is sometimes used to ment, and sometimes resort to violence to describe gene transfers within a species stop the introduction of GM varieties (ST Figure 5–1) Even In contrast, the term biotechnology is a more general one, Golden Rice—a variety of rice that contains the vitamin A encompassing a wide range of methods that manipulate precursor and was developed on a humanitarian nonprofit organisms or their components—such as isolating enzymes basis to help alleviate vitamin A deficiencies in the developor producing wine, cheese, or yogurt Genetic modification of ing world—has been the target of opposition and violence plants or animals is one aspect of biotechnology 524 SP E CIA L TOP IC: GE NE T IC A L LY MOD IFIED F OOD S S T TA BLE 5.1 Some GM Crops Approved for Food, Feed, or Cultivation in the United States* Crop SPECIAL TOPIC S T F I G U RE – Anti-GM protesters attacking a field of genetically modified maize in southwestern France In July 2004, hundreds of activists opposed to GM crops destroyed plants being tested by the U.S biotech company Pioneer Hi-Bred International In 2012, it was estimated that GM crops were grown in approximately 30 countries on 11 percent of the arable land on Earth The majority of these GM crops (almost 90 percent) are grown in five countries—the United States, Brazil, Argentina, Canada, and India Of these five, the United States accounts for approximately half of the acreage devoted to GM crops According to the U.S Department of Agriculture, 93 percent of soybeans and 90 percent of maize grown in the United States are from GM crops In the United States, more than 70 percent of processed foods contain ingredients derived from GM crops Soon after the release of the Flavr Savr tomato in the 1990s, agribusinesses devoted less energy to designing GM foods to appeal directly to consumers Instead, the market shifted toward farmers, to provide crops that increased productivity By 2012, approximately 200 different GM crop varieties were approved for use as food or livestock feed in the United States However, only about two dozen are widely planted These include varieties of soybeans, corn, sugar beets, cotton, canola, papaya, and squash ST Table 5.1 lists some of the common GM food crops available for planting in the United States Of these GM crops, by far the most widely planted are varieties that are herbicide tolerant or insect resistant At the time of writing this chapter, no GM food animal was approved for consumption, although a GM salmon variety was nearing market approval in the United States (Box 1) A number of agriculturally important animals such as goats and sheep have been genetically modified to produce pharmaceutical products in their milk The use of transgenic animals as bioreactors is discussed earlier in the text (see Chapter 19) Number of Varieties GM Characteristics Soybeans 19 Tolerance to glyphosate herbicide Tolerance to glufosinate herbicide Reduced saturated fats Enhanced oleic acid Enhanced omega-3 fatty acid Maize 68 Tolerance to glyphosate herbicide Tolerance to glufosinate herbicide Bt insect resistance Enhanced ethanol production Cotton 30 Tolerance to glyphosate herbicide Bt insect resistance Potatoes 28 Bt insect resistance Canola 23 Tolerance to glyphosate herbicide Tolerance to glufosinate herbicide Enhanced lauric acid Papaya 4 Resistance to papaya ringspot virus Sugar beets 3 Tolerance to glyphosate herbicide Rice 3 Tolerance to glufosinate herbicide Zucchini squash 2 Resistance to zucchini, watermelon, and cucumber mosaic viruses Alfalfa 2 Tolerance to glyphosate herbicide Plum 1 Resistance to plum pox virus * Information from the International Service for the Acquisition of Agri-Biotech Applications, www.isaaa.org Herbicide-Resistant GM Crops Weed infestations destroy about 10 percent of crops worldwide To combat weeds, farmers often apply herbicides before seeding a crop and between rows after the crops are growing As the most efficient broad-spectrum herbicides also kill crop plants, herbicide use may be difficult and limited Farmers also use tillage to control weeds; however, tillage damages soil structure and increases erosion Herbicide-tolerant varieties are the most widely planted of GM crops, making up approximately 70 percent of all GM crops The majority of these varieties contain a bacterial gene that confers tolerance to the broad-spectrum herbicide glyphosate—the active ingredient in commercial herbicides such as Roundup® Studies have shown that glyphosate is effective at low concentrations, is degraded rapidly in soil and water, and is not toxic to humans Farmers who plant glyphosate-tolerant crops can treat their fields with glyphosate, even while the GM crop is growing This approach is more efficient and economical than mechanical weeding and reduces soil damage caused by repeated tillage It is suggested that there is less environmental ST W hat A r e G M F oods ? 525 BOX The Tale of GM Salmon— Downstream Effects? I escaped into the wild, could have longterm effects on wild populations A study published in 2013 shows that it is possible for the AquAdvantage salmon to breed successfully with a close relative, the brown trout.* In laboratory conditions, the hybrids grew more quickly than either the GM or non-GM varieties, and in closed stream-like systems, the hybrids outcompeted both parental fish varieties for food supplies The authors point out that these results should be taken into account during environmental assessments, although it is still not known whether the hybrid salmon—trout variety could successfully breed in the wild If GM salmon could escape, breed, and introduce transgenes into wild populations, there could be unknown negative downstream effects on fish ecosystems * Oke, K.B., et al 2013 Hybridization between genetically modified Atlantic salmon and wild brown trout reveals novel ecological interactions Proc R Soc B 280 (1763): 20131047 The AquAdvantage salmon grows twice as fast as a non-GM Atlantic salmon, reaching market size in half the time impact when using glyphosate, compared with having to apply higher levels of other, more toxic, herbicides Recently, evidence suggests that some weeds may be developing resistance to glyphosate, thereby reducing the effectiveness of glyphosate-tolerant crops (This and other concerns about herbicide-tolerant GM plants are described later in this chapter.) One method used to engineer a glyphosate-tolerant plant is described in the next section Insect-Resistant GM Crops The second most prevalent GM modifications are those that make plants resistant to agricultural pests Insect damage is one of the most serious threats to worldwide food production Farmers combat insect pests using crop rotation and predatory organisms, as well as applying insecticides The most widely used GM insect-resistant crops are the Bt crops Bacillus thuringiensis (Bt) is group of soildwelling bacterial strains that produce crystal (Cry) proteins that are toxic to certain species of insects These Cry proteins are encoded by the bacterial cry genes and form crystal structures during sporulation The Cry proteins are toxic to Lepidoptera (moths and butterflies), Diptera (mosquitoes and flies), Coleoptera (beetles), and Hymenoptera (wasps and ants) Insects must ingest the bacterial spores or Cry proteins in order for the toxins to act Within the high pH of the insect gut, the crystals dissolve and are cleaved by insect protease enzymes The Cry proteins bind SPECIAL TOPIC t took 18 years and about $60 million, but the first GM animal to be approved as human food—the AquAdvantage salmon—may soon hit the U.S market The AquAdvantage salmon is an Atlantic salmon that is genetically modified to grow twice as fast as its non-GM cousins, reaching marketable size in one and a half years rather than the usual three years Scientists at AquaBounty Technologies in Massachusetts created the variety by transforming an Atlantic salmon with a single gene encoding the Chinook salmon growth hormone The gene was cloned downstream of the antifreeze protein gene promoter from an eel This promoter stimulates growth hormone synthesis in the winter, a time when the fish’s own growth hormone gene is not expressed The rapid growth of the GM salmon allows fish farmers to double their productivity AquaBounty intends to sell GM fish eggs to two facilities—one in Canada and one in Panama—that will raise the salmon and market them To ensure that the fish will not escape the facilities, the company promises to sell only fertilized eggs that are female, triploid, and sterile The facilities are to be approved only if the tanks are located inland and have sufficient filters to ensure that eggs and small fish cannot escape Despite these assurances, environmental groups are planning to fight the sale of GM salmon Some grocery chains in the United States have banned GM fish, and legislators in several western U.S states are trying to block the approval of the AquAdvantage salmon based on fears that the accidental release of these fish could contaminate wild salmon populations with transgenes and disturb normal ecosystems Supporters of GM fish point out that the GM salmon are very unlikely to escape their facilities, and if any did escape, they would be poorly adapted to wild conditions Critics of the new GM salmon point out that the technique used to create sterile triploids (pressure-shocking the fertilized eggs) still allows a small percentage of fertile diploids to remain in the stock They state that even a few fertile fish, if they 526 SP E CIA L TOP IC: GE NE T IC A L LY MOD IFIED F OOD S to receptors on the gut wall, leading to breakdown of the gut membranes and death of the insect Each insect species has specific types of gut receptors that will match only a few types of Bt Cry toxins As there are more than 200 different Cry proteins, it is possible to select a Bt strain that will be specific to one pest type Bt spores have been used for decades as insecticides in both conventional and organic gardening, usually applied in liquid sprays Sunlight and soil rapidly break down the Bt insecticides, which have not shown any adverse effects on groundwater, mammals, fish, or birds Toxicity tests on humans and animals have shown that Bt causes few negative effects To create Bt crops, scientists introduce one or more cloned cry genes into plant cells using methods described in the next section The GM crop plants will then manufacture their own Bt Cry proteins, which will kill the target pest species when it eats the plant tissues Although Bt crops have been successful in reducing crop damage, increasing yields, and reducing the amounts of insecticidal sprays used in agriculture, they are also controversial Early studies suggested that Bt crops harmed Monarch butterfly populations, although more recent studies have drawn opposite conclusions Other concerns still exist, and these will be discussed in subsequent sections of this chapter BOX SPECIAL TOPIC The Success of Hawaiian GM Papaya I n the mid-1990s, the papaya ringspot virus (PRSV) spread rapidly throughout Hawaii’s papaya fields and threatened to destroy the industry within a few years To try to stop the destruction of Hawaiian papaya, a team of scientists from the University of Hawaii, the USDA Agricultural Research Center in Hawaii, and the Upjohn Company cloned the coat protein gene of PRSV and introduced it into cultured papaya cells using biolistic transformation The goal was to create PRSV resistance using a mechanism known as pathogen-derived resistance The presence of virus coat proteins within the plant is thought to interfere with the disassembly and movement of an infecting virus, slowing or preventing infection Researchers tested resistance to PRSV in the transformed papaya plants and developed two GM varieties—SunUp and Rainbow SunUp was homozygous for the PRSV coat protein gene, and Rainbow was an F1 hybrid of SunUp and a non-GM variety Kapoho After three years of field testing and two years of moving through federal regulatory processes, GM papaya was approved for use Seeds were given for free to farmers, who immediately planted them to replace their virus-devastated fields Within three years, papaya harvests in Hawaii doubled and consumer acceptance was positive Virus-resistant GM papaya is credited with saving the Hawaiian papaya industry An interesting side-effect of the presence of GM papaya in Hawaii was the recovery of non-GM and organically grown papaya Because PRSV levels declined due to the presence of virus-resistant fields and the abandoning of infected fields, some growers can now produce non-GM papaya, albeit on a smaller scale than before the virus spread throughout Hawaii At the present time, more than 70 percent of Hawaiian papaya is genetically modified GM papaya is approved for sale in the United States, Canada, and Japan Since the development of GM papaya in Hawaii, efforts to develop similar varieties in other parts of the world have stalled because of increasing public resistance to GM foods Since 2010, thousands of GM papaya trees in Hawaii have been cut down and destroyed by anonymous attackers Efforts to introduce GM papaya in Thailand have failed, and the government recently banned GM foods Japan has approved the sale of GM papaya, but only if it is labeled as genetically modified A papaya fruit, grown on a non-GM papaya plant infected with PRSV ST GM Crops for Direct Consumption 527 Geranylgeranyldiphosphate psy gene from maize Phytoene synthase (psy gene) Phytoene Phytoene desaturase (pds gene) crtI gene from bacteria Zeta-carotene z-carotene desaturase (zds gene) Lycopene Lycopene b-cyclase (lcy gene) Beta-carotene S T FIGUR E 5–2 Beta-carotene pathway in Golden Rice Rice plant enzymes and genes involved in beta-carotene synthesis are shown on the right The enzymes that are not expressed in rice endosperm are indicated with an “X.” The genes inserted into Golden Rice are shown on the left The resulting plant produced rice grains that were a yellow color due to the presence of beta-carotene (ST Figure 5–3) This strain synthesized modest levels of beta-carotene—but only enough to potentially supply 15–20 percent of the S T FIGUR E 5–3 Non-GM and Golden Rice Golden Rice contains high levels of beta-carotene, giving the rice endosperm a yellow color The intensity of the color reflects the amount of beta-carotene in the endosperm SPECIAL TOPIC To date, most GM crops have been designed to help farmers increase yields Also, most GM food crops are not consumed directly by humans, but are used as animal feed or as sources of processed food ingredients such as oils, starches, syrups, and sugars For example, 98 percent of the U.S soybean crop is used as livestock feed The remainder is processed into a variety of food ingredients, such as lecithin, textured soy proteins, soybean oil, and soy flours However, a few GM foods have been developed for direct consumption Examples are rice, squash, and papaya (Box 2) One of the most famous and controversial examples of GM foods is Golden Rice—a rice variety designed to synthesize beta-carotene (the precursor to vitamin A) in the rice grain endosperm Vitamin A deficiency is a serious health problem in more than 60 countries, particularly countries in Asia and Africa The World Health Organization estimates that 190 million children and 19 million pregnant women are vitamin A deficient Between 250,000 and 500,000 children with vitamin A deficiencies become blind each year, and half of these will die within a year of losing their sight As vitamin A is also necessary for immune system function, deficiencies lead to increases in many other conditions, including diarrhea and virus infections The most seriously affected people live in the poorest countries and have a basic starch-centered diet, often mainly rice Vitamin A is normally found in dairy products and can be synthesized in the body from beta-carotene found in orange-colored fruits and vegetables and in green leafy vegetables Several approaches are being taken to alleviate the vitamin A deficiency status of people in developing countries These include supplying high-dose vitamin A supplements and growing fresh fruits and vegetables in home gardens These initiatives have had partial success, but the expense of delivering education and supplementation has impeded the effectiveness of these programs In the 1990s, scientists began to apply recombinant DNA technology to help solve vitamin A deficiencies in people with rice-based diets Although the rice plant naturally produces beta-carotene in its leaves, it does not produce it in the rice grain endosperm, which is the edible part of the rice The beta-carotene precursor, geranylgeranyldiphosphate, is present in the endosperm, but the enzymes that convert it to beta-carotene are not synthesized (ST Figure 5–2) In the first version of Golden Rice, scientists introduced the genes phytoene synthase (psy) cloned from the daffodil plant and carotene desaturase (crtI) cloned from the bacterium Erwinia uredovora into rice plants The bacterial crtI gene was chosen because the enzyme encoded by this gene can perform the functions of two of the missing rice enzymes, thereby simplifying the transformation process W hat A r e G M F oods ? SPECIAL TOPIC 528 SP E CIA L TOP IC: GE NE T IC A L LY MOD IFIED F OOD S recommended daily allowance of vitamin A In the second version of the GM plant, called Golden Rice 2, the daffodil psy gene was replaced with the psy gene from maize Golden Rice produced beta-carotene levels that were more than 20-fold greater than those in Golden Rice In the next section we describe the methods used to create Golden Rice Clinical trials have shown that the beta-carotene in Golden Rice is efficiently converted into vitamin A in humans and that about 150 grams of uncooked Golden Rice (which is close to the normal daily rice consumption of children aged 4–8 years) would supply all of the childhood daily requirement for vitamin A At the present time, Golden Rice is undergoing field, biosafety, and efficacy testing in preparation for approval by government regulators in Bangladesh and the Philippines If Golden Rice proves useful in alleviating vitamin A deficiencies and is approved for use, seed will be made available at the same price as non-GM seed and farmers will be allowed to keep and replant seed from their own crops Despite the promise of Golden Rice 2, controversies remain Critics of GM foods suggest that Golden Rice could make farmers too dependent on one type of food or might have long-term health or environmental effects These and other controversies surrounding GM foods are discussed in subsequent sections of this chapter Methods Used to Create GM Plants Most GM plants are created using one of two approaches: the biolistic method or Agrobacterium tumefaciensmediated transformation technology Both methods target plant cells that are growing in vitro Scientists can generate plant tissue cultures from various types of plant tissues, and these cultured cells will grow either in liquid cultures or on the surface of solid growth media When grown in the presence of specific nutrients and hormones, these cultured cells will form clumps of cells called calluses, which, when transferred to other types of media, will form roots When the rooted plantlets are mature, they are transferred to soil medium in greenhouses where they develop into normal plants The biolistic method is a physical method of introducing DNA into cells Particles of heavy metals such as gold are coated with the DNA that will transform the cells; these particles are then fired at high speed into plant cells in vitro, using a device called a gene gun Cells that survive the bombardment may take up the DNA-coated particles, and the DNA may migrate into the cell nucleus and integrate into a plant chromosome Plants that grow from the bombarded cells are then selected for the desired phenotype Cytokinin Opine Auxin Left Border T-DNA Region Right Border Ti Plasmid Opine Catabolism Virulence Region Origin of Replication (ORI) S T FIGUR E 5–4 Structure of the Ti plasmid The 250-kb Ti plasmid from Agrobacterium tumefaciens inserts the T-DNA portion of the plasmid into the host cell’s nuclear genome and induces tumors Genes within the virulence region code for enzymes responsible for transfer of T-DNA into the plant genome The T-DNA region contains auxin and cytokinin genes that encode hormones responsible for cell growth and tumor formation The opine genes encode compounds used as energy sources for the bacterium The T-DNA region of the Ti plasmid is replaced with the gene of interest when the plasmid is used as a transformation vector Although biolistic methods are successful for a wide range of plant types, a much improved transformation rate is achieved using Agrobacterium-mediated technology Agrobacterium tumefaciens (also called Rhizobium radiobacter) is a soil microbe that can infect plant cells and cause tumors These characteristics are conferred by a 200-kb tumorinducing plasmid called a Ti plasmid After infection with Agrobacterium, the Ti plasmid integrates a segment of its DNA known as transfer DNA (T-DNA) into random locations within the plant genome (ST Figure 5–4) To use the Ti plasmid as a transformation vector, scientists remove the T-DNA segment and replace it with cloned DNA of the genes to be introduced into the plant cells In order to have the newly introduced gene expressed in the plant, the gene must be cloned next to an appropriate promoter sequence that will direct transcription in the required plant tissue For example, the beta-carotene pathway genes introduced into Golden Rice were cloned next to a promoter that directs transcription of the genes in the rice endosperm In addition, the transformed gene requires appropriate transcription termination signals and signal sequences that allow insertion of the encoded protein into the correct cell compartment Selectable Markers The rates at which T-DNA successfully integrates into the plant genome and becomes appropriately expressed are ST M etho ds U se d to Cr eate G M P lants pV-GMGT04 529 pV-GMGT04 nos low Often, only one cell in 1000 or more will be successfully transformed Before growing cultured plant cells into mature plants to test their phenotypes, it is important to eliminate the background of nontransformed cells This can be done using either positive or negative selection techniques An example of negative selection involves use of a marker gene such as the hygromycin-resistance gene This gene, together with an appropriate promoter, can be introduced into plant cells along with the gene of interest The cells are then incubated in culture medium containing hygromycin—an antibiotic that also inhibits the growth of eukaryotic cells Only cells that express the hygromycinresistance gene will survive It is then necessary to verify that the resistant cells also express the cotransformed gene This is often done by techniques such as PCR amplification using gene-specific primers Plants that express the gene of interest are then tested for other characteristics, including the phenotype conferred by the introduced gene of interest An example of positive selection involves the use of a selectable marker gene such as that encoding phosphomannose isomerase (PMI) This enzyme is common in animals but is not found in most plants It catalyzes the interconversion of mannose 6-phosphate and fructose 6-phosphate Plant cells that express the pmi gene can survive on synthetic culture medium that contains only mannose as a carbon source Cells that are cotransformed with the pmi gene under control of an appropriate promoter and the gene of interest can be positively selected by growing the plant cells on a mannose-containing medium This type of positive selection was used to create Golden Rice Studies have shown that purified PMI protein is easily digested, nonallergenic, and nontoxic in mouse oral toxicity tests A variation in positive selection involves use of a marker gene whose expression results in a visible phenotype, such as deposition of a colored pigment The following descriptions illustrate the methods used to engineer two GM crops: Roundup-Ready soybeans and Golden Rice To produce a glyphosate-resistant soybean plant, researchers cloned an epsps gene from the Agrobacterium strain CP4 This gene encodes an EPSPS enzyme that is resistant to glyphosate They then cloned the CP4 epsps gene downstream of a constitutively expressed promoter from the cauliflower mosaic virus to allow gene expression in all plant tissues In addition, a short peptide known as a chloroplast transit peptide (in this case from petunias) was cloned onto the 5′-end of the epsps gene-coding sequence This allowed newly synthesized EPSPS protein to be inserted into the soybean chloroplast (ST Figure 5–5) The final plasmid contained two CP4 epsps genes and, for the initial experiments, a beta-glucuronidase (GUS) gene from E coli The GUS gene acted as a positive marker, as cells that expressed the plasmid after transformation could be detected by the presence of a blue precipitate The final cell line chosen for production of Roundup-Ready soybeans did not contain the GUS gene The plasmids were introduced into cultured soybean cells using biolistic bombardment Afterward, cells were treated with glyphosate to eliminate any nontransformed cells (ST Figure 5–6) The resulting calluses were grown into plants, which were then field tested for glyphosate resistance and a large number of other parameters, including composition, toxicity, and allergenicity Roundup-Ready® Soybeans Golden Rice The Roundup-Ready soybean GM variety received market approval in the United States in 1996 It is a GM plant with resistance to the herbicide glyphosate, the active ingredient in Roundup, a commercially available broad-spectrum herbicide Glyphosate interferes with the enzyme 5enolpyruvylshikimate-3-phosphate synthase (EPSPS), which is present in all plants and is necessary for plant synthesis of the aromatic amino acids phenylalanine, tyrosine, and tryptophan EPSPS is not present in mammals, which obtain aromatic amino acids from their diets To create Golden Rice 2, scientists cloned three genes into the T-DNA region of a Ti plasmid The Ti plasmid, called pSYN12424, is shown in ST Figure 5–7 The first gene was the carotene desaturase (crtI) gene from Erwinia uredovora, fused between the rice glutelin gene promoter (Glu) and the nos gene terminator region (nos) The Glu promoter directs transcription of the fusion gene specifically in the rice endosperm The nos terminator was cloned from the Agrobacterium tumefaciens nopaline synthase gene and supplies the transcription termination and polyadenylation sequences E35S ctp4 CP4 epsps S T FIGUR E 5–5 Portion of plasmid pV-GMGT04 used to create Roundup-Ready soybeans A 1365-bp fragment encoding the EPSPS enzyme from Agrobacterium CP4 was cloned downstream from the cauliflower mosaic virus E35S promoter and the petunia chloroplast transit peptide signal sequence (ctp4) CTP4 signal sequences direct the EPSPS protein into chloroplasts, where aromatic amino acids are synthesized The CP4 epsps coding region was cloned upstream of the nopaline synthase (nos) transcription termination and polyadenylation sequences The CP4 epsps sequences encode a 455-amino-acid 46-kDa ESPSP protein SPECIAL TOPIC 530 SP E CIA L TOP IC: GE NE T IC A L LY MOD IFIED F OOD S gene phosphomannose isomerase (pmi), cloned from E coli In the Golden Rice Ti plasmid, the pmi gene was fused to the maize polyubiquitin gene promoter (Ubi1) and the nos terminator sequences The Ubi1 promoter is a constitutive promoter, directing transcription of the pmi gene in all plant tissues To introduce the pSYN12424 plasmid into rice cells, researchers established embryonic rice cell cultures and infected them with Agrobacterium tumefaciens that contained pSYN12424 (ST Figure 5–8) The cells were then placed under selection, using culture medium containing only mannose as a carbon source Surviving cells expressing the pmi gene were then stimulated to form calluses that were grown into plants To confirm that all three genes were present in the transformed rice plants, samples were taken and analyzed by the polymerase chain reaction (PCR) using gene-specific primers Plants that Load pV-GMGT04 plasmids into gene gun Bombard cells with plasmids Select transformed cells with glyphosate SPECIAL TOPIC Grow surviving cells into plantlets Ti plasmid pSYN12424 Introduce Ti plasmid into A tumifaciens Agrobacterium tumifaciens Infect cultured cells S T F I G U RE 5– Method for creating Roundup-Ready soybeans Plasmids were loaded into the gene gun and fired at high pressure into cells growing in tissue cultures Cells were grown in the presence of glyphosate to select those that had integrated and expressed the epsps gene Surviving cells were stimulated to form calluses and to grow into plantlets Select by growing on mannose medium required at the 3′-end of plant genes The second gene was the phytoene synthase (psy) gene cloned from maize The maize psy gene has approximately 90 percent sequence similarity to the rice psy gene and is involved in carotenoid synthesis in maize endosperm This gene was also fused to the Glu promoter and the nos terminator sequences in order to obtain proper transcription initiation and termination in rice endosperm The third gene was the selectable marker Glu crtI nos Glu psy nos Ubi1 pmi Grow into calluses and plants Select plants for high endosperm color (beta-carotene) nos S T F I G U RE 5– T-DNA region of Ti plasmid pSYN12424 The Ti plasmid used to create Golden Rice contained the carotene desaturase (crtI) gene cloned from bacteria, the phytoene synthase (psy) gene cloned from maize, and the phosphomannose isomerase (pmi) gene cloned from E coli The glutelin (Glu) gene promoter directs transcription in rice endosperm, and the polyubiquitin (Ubi1) promoter directs transcription in all tissues Transcription termination signals were provided by the nopaline synthase (nos) gene 39 region + ++ Method for creating Golden Rice Rice plant cells were transformed by pSYN12424 and selected on mannose-containing medium, as described in the text Plants that produced high levels of beta-carotene in rice grain endosperm (+ +), based on the intensity of the grain’s yellow color, were selected for further analysis S T FIGUR E 5–8 ST contained one integrated copy of the transgenic construct and synthesized beta-carotene in their seeds were selected for further testing GM Foods Controversies Health and Safety GM food advocates often state that there is no evidence that GM foods currently on the market have any adverse health effects, either from the presence of toxins or from potential allergens These conclusions are based on two observations First, humans have consumed several types of GM foods for more than 20 years, and no reliable reports of adverse effects have emerged Second, the vast majority of toxicity tests in animals, which are required by government regulators prior to approval, have shown no negative effects A few negative studies have been published, but these have been criticized as poorly executed or nonreproducible Critics of GM foods counter the first observation in several ways First, as described previously, few GM foods are eaten directly by consumers Instead, most are used as livestock feed, and the remainder form the basis of purified food ingredients Although no adverse effects of GM foods 531 in livestock have been detected, the processing of many food ingredients removes most, if not all, plant proteins and DNA Hence, ingestion of GM food-derived ingredients may not be a sufficient test for health and safety Second, GM foods critics argue that there have been few human clinical trials to directly examine the health effects of most GM foods One notable exception is Golden Rice 2, which has undergone two small clinical trials They also say that the toxicity studies that have been completed are performed in animals—primarily rats and mice—and most of these are short-term toxicity studies Supporters of GM foods answer these criticisms with several other arguments The first argument is that shortterm toxicity studies in animals are well-established methods for detecting toxins and allergens The regulatory processes required prior to approval of any GM food demand data from animal toxicity studies If any negative effects are detected, approval is not given Supporters also note that several dozen long-term toxicity studies have been published that deal with GM crops such as glyphosate-resistant soybeans and Bt corn, and none of these has shown longterm negative effects on test animals A few studies that report negative long-term effects have been criticized as poorly designed and unreliable GM food advocates note that human clinical trials are not required for any other food derived from other genetic modification methods such as selective breeding During standard breeding of plants and animals, genomes may be mutagenized with radiation or chemicals to enhance the possibilities of obtaining a desired phenotype This type of manipulation has the potential to introduce mutations into genes other than the ones that are directly selected Also, plants and animals naturally exchange and shuffle DNA in ways that cannot be anticipated These include interspecies DNA transfers, transposon integrations, and chromosome modifications These events may result in unintended changes to the physiology of organisms—changes that could potentially be as great as those arising in GM foods Environmental Effects Critics of GM foods point out that GMOs that are released into the environment have both documented and potential consequences for the environment—and hence may indirectly affect human health and safety GM food advocates argue that these potential environmental consequences can be identified and managed Here, we will describe two different aspects of GM foods as they may affect the natural environment and agriculture Emerging herbicide and insecticide resistance Many published studies report that the planting of herbicidetolerant and insect-resistant GM crops has reduced the quantities of herbicides and insecticides that are SPECIAL TOPIC GM foods may be the most contentious of all products of modern biotechnology Advocates of GM foods state that the technologies have increased farm productivity, reduced pesticide use, preserved soils, and have the potential to feed growing human populations Critics claim that GM foods are unsafe for both humans and the environment; accordingly, they are applying pressure on regulatory agencies to ban or severely limit the extent of GM food use These campaigns have affected regulators and politicians, resulting in a patchwork of regulations throughout the world Often the debates surrounding GM foods are highly polarized and emotional, with both sides in the debate exaggerating their points of view and selectively presenting the data So, what are the truths behind these controversies? One point that is important to make as we try to answer this question is that it is not possible to make general statements about all “GM foods.” Each GM crop or organism contains different genes from different sources, attached to different expression sequences, accompanied by different marker or selection genes, inserted into the genome in different ways and in different locations GM foods are created for different purposes and are used in ways that are both planned and unplanned Each construction is unique and therefore needs to be assessed separately We will now examine two of the main GM foods controversies: those involving human health and safety, and environmental effects G M F oo ds Cont rove r sies 532 SP E CIA L TOP IC: GE NE T IC A L LY MOD IFIED F OOD S Monsanto detected large numbers of pink bollworms with resistance to the toxin expressed from the cry1Ac gene in one variety of Bt cotton In order to slow down the development of Bt resistance, several strategies are being followed The first is to develop varieties of GM crops that express two Bt toxins simultaneously Several of these varieties are already on the market and are replacing varieties that express only one Bt cry gene The second strategy involves the use of “refuges” surrounding fields that grow Bt crops These refuges contain non-GM crops Insect pests grow easily within the refuges, which place no evolutionary pressure on the insects for resistance to Bt toxins The idea is for these nonselected insects to mate with any resistant insects that appear in the Bt crop region of the field The resulting hybrid offspring will be heterozygous for any resistance gene variant As long as the resistance gene variant is recessive, the hybrids will be killed by eating the Bt crop In fields that use refuges and plant GM crops containing two Bt genes, resistance to Bt toxins has been delayed or is absent As with emerging herbicide resistance, farmers are also encouraged to combine the use of Bt crops with conventional pest control methods SPECIAL TOPIC S T F I G U RE 5– Herbicide-resistant weeds Water hemp weeds, resistant to glyphosate herbicide, growing in a field of RoundupReady soybeans broadly applied to agricultural crops As a result, the effects of GM crops on the environment have been assumed to be positive However, these positive effects may be transient, as herbicide and insecticide resistance is beginning to emerge (ST Figure 5–9) Since glyphosate-tolerant crops were introduced in the mid-1990s, more than 24 glyphosate-resistant weed species have appeared in the United States Resistant weeds have been found in 18 other countries, and in some cases, the presence of these weeds is affecting crop yields One reason for the rapid rise of resistant weeds is that farmers have abandoned other weed-management practices in favor of using a single broad-spectrum herbicide This strong selection pressure has brought the rapid evolution of weed species bearing gene variants that confer herbicide resistance In response, biotechnology companies are developing new GM crops with tolerance to multiple herbicides However, scientists argue that weeds will also develop resistance to the use of multiple herbicides, unless farmers vary their weed management practices and incorporate tillage, rotation, and other herbicides along with using the GM crop Scientists point out that herbicide resistance is not limited to the use of GM crops Weed populations will evolve resistance to any herbicide used to control them, and the speed of evolution will be affected by the extent to which the herbicide is used Since 1996, more than eight different species of insect pests have evolved some level of resistance to Bt insecticidal proteins For example, in 2011 scientists reported the first cases of resistance of the western corn rootworm to Bt maize expressing the cry3Bb1 gene, in maize fields in Iowa In 2010, scientists from The spread of GM crops into non-GM crops There have been several documented cases of GM crop plants appearing in uncultivated areas in the United States, Canada, Australia, Japan, and Europe For example, GM sugar beet plants have been found growing in commercial top soils GM canola plants have been found growing in ditches and along roadways, railway tracks, and in fill soils, far from the fields in which they were grown A 2011 study1 found “feral” GM canola plants growing in 288 of 634 sample sites along roadways in North Dakota Of these plants, 41 percent contained the CP4 EPSPS protein (conferring glyphosate resistance), and 39 percent contained the PAT protein (conferring resistance to the herbicide glufosinate) In addition, two of the plants (0.7 percent of the sample) expressed both proteins (resistant to both herbicides) GM plants that express both proteins have not been created by genetic modification and were assumed to have arisen by cross-fertilization of the other two GM crops The researchers who conducted this survey were not surprised to find GM canola along transportation routes, as seeds are often spilled during shipping More surprising was the extent of the distribution and the presence of hybridized GM canola plants Schafer, M.G et al 2011 PLoS One 6:e25736 T he F utu r e of G M F oods 533 The Future of GM Foods Over the last 20 years, GM foods have revealed both promise and problems GM advocates are confident that the next generation of GM foods will show even more promising prospects—and may also address many of the problems Research is continuing on ways to fortify staple crops with nutrients to address diet problems in poor countries For example, Australian scientists are adding genes to bananas that will not only provide resistance to Panama disease—a serious fungal disease that can destroy crops— but also increase the levels of beta-carotene and other nutrients, including iron Other GM crops in the pipeline include plants engineered to resist drought, high salinity, nitrogen starvation, and low temperatures Scientists hope that new genome information and more precise technologies will allow them to accurately edit a plant’s endogenous genes—decreasing, increasing, or eliminating expression of one or more of the plant’s genes in order to create a desirable phenotype These approaches avoid the use of transgenes and address some of the concerns about GM foods The current techniques that researchers use to introduce genes into plant cells result in random insertions into the genome New techniques are being devised that will allow genes to be inserted into precise locations in the genome, avoiding some of the potential unknown effects of disrupting a plant’s normal genome with random integrations Researchers are also devising more creative ways to protect plants from insects and diseases One intriguing project involves introducing into wheat a gene that encodes a pheromone that acts as a chemical alarm signal to aphids If successful, this approach could protect the wheat plants from aphids without using toxins Another project involves cassava, which is a staple crop for many Africans and is afflicted by two viral diseases—cassava mosaic virus and brown streak virus—that stunt growth and cause root rot Although some varieties of cassava are resistant to these viruses, the life cycle of cassava is so long that it would be difficult to introduce resistance into other varieties using conventional breeding techniques Scientists plan to transform plants with genes from resistant cassava This type of cisgenic gene transfer is more comparable to traditional breeding than transgenic techniques In the future, GM foods will likely include additional GM animals As described in Box 1, a transgenic Atlantic salmon variety is likely to receive marketing approval in the near future In another project, scientists have introduced a DNA sequence into chickens that protects the birds from spreading avian influenza The sequence encodes a hairpin RNA molecule with similarity to a normal viral RNA that binds to the viral polymerase The presence of the hairpin RNA inhibits SPECIAL TOPIC One of the major concerns about the escape of GM crop plants from cultivation is the possibility of outcrossing or gene flow—the transfer of transgenes from GM crops into sexually compatible non-GM crops or wild plants, conferring undesired phenotypes to the other plants Gene flow between GM crops and adjacent non-GM crops is of particular concern for farmers who want to market their crops as “GM-free” or “organic” and for farmers who grow seed for planting Gene flow of GM transgenes has been documented in GM and non-GM canola as well as sugar beets, and in experiments using rice, wheat, and maize GM critics often refer to controversial studies about GM outcrossing in Oaxaca, Mexico In the first study in 2001, it was reported that the local maize crops contained transgenes from Monsanto’s Roundup-Ready and Bt insect-resistant maize As GM crops were not approved for use in Mexico, it was thought that the transgenes came from maize that had been imported from the United States as a foodstuff, and then had been planted by farmers who were not aware that the seeds were transgenic Over the next ten years, subsequent studies reported mixed results In some studies, the transgenes were not detected, and in others, the same transgenes were detected There is still no consensus about whether gene flow has occurred between the GM and non-GM maize in Mexico It is thought that the presence of glyphosate-resistant transgenes in wild plant populations is not likely to be an environmental risk and would confer no positive fitness benefits to the hybrids The presence of glyphosate-resistant genes in wild populations would, however, make it more difficult to eradicate the plants This is illustrated in a case of escaped GM bentgrass in Oregon, where it has been difficult to get rid of the plants because it is no longer possible to use the relatively safe herbicide glyphosate The potential for environmental damage may be greater if the GM transgenes did confer an advantage—such as insect resistance or tolerance to drought or flooding In an attempt to limit the spread of transgenes from GM crops to non-GM crops, regulators are considering a requirement to separate the crops so that pollen would be less likely to travel between them Each crop plant would require different isolation distances to take into account the dynamics of pollen spreading Several other methods are being considered For example, one proposal is to make all GM plants sterile using RNAi technology Another is to introduce the transgenes into chloroplasts As chloroplasts are inherited maternally, their genomes would not be transferred via pollen All of these containment methods are in development stages and may take years to reach the market ST 534 SP E CIA L TOP IC: GE NE T IC A L LY MOD IFIED F OOD S the activity of the viral polymerase and interferes with viral propagation If this strategy proves useful in vivo, the use of these GM chickens would not only reduce the incidence of avian influenza in poultry production, but also reduce the transmissibility of avian influenza viruses to humans Although these and other GM foods show promise for increasing agricultural productivity and decreasing disease, the political pressure from anti-GM critics remains a powerful force An understanding of the science behind these technologies will help us all to evaluate the future of GM foods Visit the Study Area in MasteringGenetics for a list of further readings on this topic, including journal references and selected web sites Review Questions SPECIAL TOPIC How genetically modified organisms compare with organisms created through selective breeding? Can current GM crops be considered as transgenic or cisgenic? Why? Of the approximately 200 GM crop varieties that have been developed, only a few are widely used What are these varieties, and how prevalent are they? How does glyphosate work, and how has it been used with GM crops to increase agricultural yields? Describe the mechanisms by which the Cry proteins from Bacillus thuringiensis act as insecticides What measures have been taken to alleviate vitamin A deficiencies in developing countries? To date, how successful have these strategies been? What is Golden Rice 2, and how was it created? Describe how plants can be transformed using biolistic methods How does this method compare with Agrobacterium tumefaciensmediated transformation? How positive and negative selection techniques contribute to the development of GM crops? 10 Describe how the Roundup-Ready soybean variety was developed, and what genes were used to transform the soybean plants Discussion Questions What are the laws regulating the development, approval, and use of GM foods in your region and nationally? Do you think that foods containing GM ingredients should be labeled as such? What would be the advantages and disadvantages to such a strategy? One of the major objections to GM foods is that they may be harmful to human health Do you agree or disagree, and why? SPECIAL TOPICS IN MOD ERN G E NE TICS Gene Therapy A What Genetic Conditions Are Candidates for Treatment by Gene Therapy? Two essential criteria for gene therapy are that the gene or genes involved in causing a particular disease have been identified and that the gene can be cloned or synthesized in a laboratory As a result of the Human Genome Project, How Are Therapeutic Genes Delivered? In general, there are two broad approaches for delivering therapeutic genes to a patient being treated by gene therapy, ex vivo gene therapy and in vivo gene therapy 535 SPECIAL TOPIC lthough drug treatments can be effective in controlthe identification of human disease genes and their specific ling symptoms of genetic disorders, the ideal outcome DNA sequences has greatly increased the number of candiof medical treatment is to cure a disease This is the date genes for gene therapy trials Almost all of the early goal of gene therapy—the delivery of therapeutic genes into gene therapy trials and most gene therapy approaches have a patient’s cells to correct genetic disease conditions caused by focused on treating conditions caused by a single gene This a faulty gene or genes The earliest attempts at gene therapy has been the case because theoretically it is technically focused on the delivery of normal, therapeutic copies of a gene easier to affect one gene than disease conditions caused by to be expressed in such a way as to override or negate the effects multiple genes and potentially multiple mutations of the disease gene and thus minimize or eliminate symptoms The cells affected by the genetic condition must be readily of the genetic disease But in recent years newer methods for accessible for treatment by gene therapy For example, blood inhibiting or silencing defective genes, and even approaches disorders such as leukemia, hemophilia, and other conditions for targeted removal of defective genes, have increasingly have been major targets of gene therapy because it is relaemerged as potential mechanisms for gene therapy tively routine to manipulate blood cells outside of the body and Gene therapy is one of the goals of translational return them to the body in comparison to treating cells in the medicine—taking a scientific discovery, such as the identificabrain and spinal cord, skeletal or cardiac muscle, and organs tion of a disease-causing gene, and translating the finding into with heterogeneous populations of cells such as the pancreas an effective therapy, thus moving from the In the past decade, every major catlaboratory bench to a patient’s bedside to treat egory of genetic diseases has been targeted “The treatment of a disease In theory, the delivery of a therapeuby gene therapy (ST Figure 6–1) A majora human genetic tic gene is rather simple, but in practice, gene ity of recently approved clinical trials disease by gene therapy has been very difficult to execute In are for cancer treatment Gene therapy spite of over 20 years of trials, this field has approaches are currently being investherapy is the not lived up to its expectations However, ultimate application tigated for the treatment of hereditary gene therapy is currently experiencing a fastblindness, neurological (neurodegeneraof genetic paced resurgence of sorts, with several hightive) diseases including Alzheimer distechnology.” profile new successes and potentially exciting ease, Parkinson disease and amyotrophic new technologies sitting on the horizon It is lateral sclerosis (ALS), cardiovascular dishoped that gene therapy will soon become part of mainstream ease, muscular dystrophy, hemophilia, a variety of cancers, medicine The treatment of a human genetic disease by gene and infectious diseases, such as HIV, among many other therapy is the ultimate application of genetic technology In conditions, including depression and drug and alcohol this Special Topic chapter we will explore how gene therapy is addiction Over 2000 approved gene therapy clinical trials executed, and we will highlight selected examples of successes have occurred or recently been initiated worldwide and failures as well as discuss new approaches to gene therapy In the United States, proposed gene therapy clinical triFinally, we will consider ethical issues regarding gene therapy als must first be approved by review boards at the institution where they will be carried out, and then the protocols must be approved by the Food and Drug Administration (FDA) 6 536 SPECIAL TOPIC: GEN E THERAPY 1% 2% 2% 1% 1% normal copies of the required protein Genetically altered cells treated in this manner can be transplanted back into the patient without fear of immune system rejection because these cells were derived from the patient initially In vivo gene therapy does not involve removal of a person’s cells Instead, therapeutic DNA is introduced directly into affected cells of the body One of the major challenges of in vivo gene therapy is restricting the delivery of therapeutic genes to only the intended tissues and not to all tissues throughout the body 3% 8% 8% 65% 9% Viral Vectors for Gene Therapy (ST Figure 6–2) In ex vivo gene therapy, cells from a person with a particular genetic condition are removed, treated in a laboratory by adding either normal copies of a therapeutic gene or a DNA or RNA sequence that will inhibit expression of a defective gene, and then these cells are transplanted back into the person where the therapeutic gene will express For both in vitro and ex vivo approaches, the key to successful gene therapy is having a delivery system to transfer genes into a patient’s cells Because of the relatively large molecular size and electrically charged properties of DNA, most human cells not take up DNA easily Therefore, delivering therapeutic DNA molecules into human cells is challenging Since the early days of gene therapy, genetically engineered viruses as vectors have been the main tools for delivering therapeutic genes into human cells Viral vectors for gene therapy are engineered to carry therapeutic DNA as their payload so that the virus infects target cells and delivers the therapeutic DNA without causing damage to cells In a majority of gene therapy trials around the world, scientists have used genetically modified retroviruses as vectors Recall from earlier in the text (see Chapter 9) that retroviruses (HIV is a retrovirus) contain an RNA genome that scientists use as a template for the synthesis of a complementary DNA molecule Retroviral vectors are created by removing replication and disease-causing genes from Ex vivo gene therapy In vivo gene therapy SPECIAL TOPIC Cancer diseases (1186) Monogenic diseases (161) Cardiovascular disease (155) Infectious diseases (147) Neurological diseases (36) Occular diseases (28) Inflammatory diseases (13) Others (25) Gene marking (50) Healthy volunteeers (42) S T F I G U RE – Graphic representation of different genetic conditions being treated by gene therapy clinical trials worldwide Notice that cancers are the major target for treatment Normal gene for a blood clotting protein Remove small portion of liver to isolate cells Viruses as vectors for gene delivery Grow cells in culture Introduce normal genes for clotting protein Transplant liver cells back into patient Genetically altered cells provide clotting protein Patient with liver cell genetic defect, lacks gene for blood clotting protein Directly introduce normal gene for clotting protein into liver cells in patient S T F I G U RE 6– Ex vivo and in vivo gene therapy for a patient with a liver disorder Ex vivo gene therapy involves isolating cells from the patient, introducing normal copies of a therapeutic gene (encoding a blood clotting protein in this example) into these cells, and then returning cells to the body where they will produce the required clotting protein In vivo approaches involve introducing DNA directly into cells while they are in the body ST H ow Are Therapeu tic Genes D eliv ered ? 537 BOX ClinicalTrials.gov O ne of the best resources on the Web for learning about ongoing clinical trials, including current gene therapy trials, is ClinicalTrials.gov The site can easily be searched to find a wealth of resources about ongoing gene therapy trials throughout the United States that are of interest to you To find a gene therapy clinical trial, use the “Search for Studies” box and type in the name of a disease and “gene therapy.” This search string will take you to a page listing active gene therapy clinical trials, with links to detailed information about the trial to deliver genes to specific sites on individual chromosomes Most forms of AAV deliver genes into the host-cell nucleus where it forms small hoops of DNA called episomes that are expressed under the control of promoter sequences contained within the viral genome But because therapeutic DNA delivered by AAV does not usually become incorporated into the genome, it is not replicated when host cells divide, and so the gene therapy approach may require repeated, ongoing applications to be successful [ST Figure 6–3(a)] Work with lentivirus vectors is an active area of gene therapy research [ST Figure 6–3(b)] Lentivirus is a retrovirus that can accept relatively large pieces of genetic material Another positive feature of lentivirus is that it is capable of infecting nondividing cells, whereas other viral vectors often infect cells only when they are dividing It is still not possible to control where lentivirus integration occurs in the host-cell genome, but the virus does not appear to gravitate toward gene-regulatory regions the way that other retroviruses Thus the likelihood of causing insertional mutations appears to be much lower than for other vectors The human immunodeficiency virus (HIV) responsible for acquired immunodeficiency syndrome (AIDS) is a type of lentivirus It may surprise you that HIV could be used as a vector for gene therapy For any viral vector, scientists must be sure that the vector has been genetically engineered to render it inactive so that the virus cannot produce disease or spread throughout the body and infect other tissues In the case of HIV, modified forms of HIV, strains lacking the genes necessary for reconstitution of fully functional viral particles, are being used for gene therapy trials HIV has evolved to infect certain types of T lymphocytes (T cells) and macrophages, making it a good vector for delivering therapeutic genes into the bloodstream Increasingly, viral vectors and nonviral vector approaches are being used to deliver therapeutic genes into stem cells, usually in vitro, and then the stem cells are either reintroduced into the patient or differentiated in vitro into mature cell types before being transplanted into the correct organ of a patient being treated SPECIAL TOPIC the virus and replacing them with a cloned human gene After the altered RNA has been packaged into the virus, the recombinant viral vector containing the therapeutic human gene is used to infect a patient’s cells Technically, virus particles are carrying RNA copies of the therapeutic gene Once inside a cell, the virus cannot replicate itself, but the therapeutic RNA is reverse transcribed into DNA, which enters the nucleus of cells and integrates into the genome of the host cells’ chromosome If the inserted therapeutic gene is properly expressed, it produces a normal gene product that may be able to ameliorate the effects of the mutation carried by the affected individual One advantage of retroviral vectors is that they provide long-term expression of delivered genes because they integrate the therapeutic gene into the genome of the patient’s cells But a major problem with retroviral vectors is that they have produced severe toxicity in some cases due to insertional mutations Retroviral vectors generally integrate their genome into the host-cell genome at random sites Thus, there is the potential for retroviral integration that randomly inactivates genes in the genome or generegulatory regions such as a promoter sequence Adenovirus vectors were used in many early gene therapy trials An advantage of these vectors is that they are capable of carrying large therapeutic genes But because many humans produce antibodies to adenovirus vectors they can mount immune reactions that can render the virus and its therapeutic gene ineffective or cause significant sideeffects to the patient A related virus called adeno-associated virus (AAV) is now widely used as a gene therapy vector [ST Figure 6–3(a)] In its native form, AAV infects about 80–90 percent of humans during childhood, causing symptoms associated with the common cold Disabled forms of AAV are popular for gene therapy because the virus is nonpathogenic, so it usually does not elicit a major response from the immune system of treated patients AAV also does not typically integrate into the host-cell genome, so there is little risk of the insertional mutations that have plagued retroviruses, although modified forms of AAV have been used 538 SPECIAL TOPIC: GEN E THERAPY (a) Adeno-associated virus (AAV) (b) Lentivirus Adenoassociated virus Outside body Therapeutic DNA Inside body Cell Virus taken into cell via endosome Nucleus Endosome Virus binds to cell nucleus and releases contents SPECIAL TOPIC Lentivirus Virus taken into cell via endosome Endosome Endosome and virus break down, Cell releasing RNA Endosome breaks down DNA forms circular episome Therapeutic RNA Protein expressed S T F I G U RE 6– Delivering therapeutic genes (a) Nonintegrating viruses such as modified adeno-associated virus (AAV) deliver therapeutic genes without integrating them into the genome of target cells Delivered DNA resides as minichromosomes (episomes), but over time as cells divide, these nonintegrating hoops Nonviral Delivery Methods Scientists continue to experiment with various in vivo and ex vivo strategies for trying to deliver so called naked DNA into cells without the use of viral vectors Nonviral methods that are being used to transfer genes into cells include chemically assisted transfer of genes across cell membranes, nanoparticle delivery of therapeutic genes, and fusion of cells with artificial lipid vesicles called liposomes that contain cloned DNA sequences Short-term expression of genes through “gene pills” is being explored In this concept, a pill delivers therapeutic DNA to the intestines where the DNA is absorbed by intestinal cells that then express the therapeutic protein and secrete the protein into the bloodstream The First Successful Gene Therapy Trial In 1990 the FDA approved the first human gene therapy trial, which began with the treatment of a young girl named Ashanti DeSilva [ST Figure 6–4(a)], who has a heritable disorder called DNA integrates into nuclear genome RNA is converted into DNA Protein expressed of DNA are gradually lost (b) Integrating viruses include lentivirus, an RNA retrovirus that delivers therapeutic genes into the cytoplasm where reverse transcriptase converts RNA into DNA DNA then integrates into the genome, ensuring that therapeutic DNA will be passed into daughter cells during cell division severe combined immunodeficiency (SCID) Individuals with SCID have no functional immune system and usually die from what would normally be minor infections Ashanti has an autosomal form of SCID caused by a mutation in the gene encoding the enzyme adenosine deaminase (ADA) Her gene therapy began when clinicians isolated some of her white blood cells, called T cells [ST Figure 6–4(b)] These cells, which are key components of the immune system, were mixed with a retroviral vector carrying an inserted copy of the normal ADA gene The virus infected many of the T cells, and a normal copy of the ADA gene was inserted into the genome of some T cells After being mixed with the vector, the T cells were grown in the laboratory and analyzed to make sure that the transferred ADA gene was expressed (ST Figure 6–4) Then a billion or so genetically altered T cells were injected into Ashanti’s bloodstream Repeated treatments were required to produce a sufficient number of functioning T cells In addition, Ashanti also periodically received injections of purified ADA protein throughout this process so the exact effects of gene therapy were difficult to discern Ashanti continues to receive supplements of the ADA enzyme to allow her to lead a normal life ST (a) G ene T herapy S et back s 539 (b) Bacterium carrying plasmid with cloned normal human ADA gene Genetically disabled retrovirus Cloned ADA gene is incorporated into virus T cells isolated from SCID patient Retrovirus infects blood cells, transfers ADA gene to cells Genetically altered cells are reimplanted, produce ADA S T F I GURE – The first successful gene therapy trial (a) Ashanti DeSilva, the first person to be successfully treated by gene therapy (b) To treat SCID using gene therapy, a cloned human ADA gene is transferred into a viral vector, which is then used to infect white blood cells removed from the Subsequent gene therapy treatments for SCID have focused on using bone marrow stem cells and in vitro approaches to repopulate the number of ADA-producing T cells To date, gene therapy has restored the health of about 20 children affected by SCID SCID treatment is still considered the most successful example of gene therapy Gene Therapy Setbacks From 1990 to 1999, more than 4000 people underwent gene therapy for a variety of genetic disorders These trials often failed and thus led to a loss of confidence in gene therapy In the United States, gene therapy plummeted even further in 1999 when teenager Jesse Gelsinger died while undergoing a test for the safety of gene therapy to treat a liver disease called ornithine transcarbamylase patient The transferred ADA gene is incorporated into a chromosome and becomes active After growth to enhance their numbers, the cells are inserted back into the patient, where they produce ADA, allowing the development of an immune response (OTC) deficiency Large numbers of adenovirus vectors bearing the OTC gene were injected into his hepatic artery The vectors were expected to target his liver, enter liver cells, and trigger the production of OTC protein In turn, it was hoped that the OTC protein might correct his genetic defect and cure him of his liver disease Researchers had previously treated 17 people with the therapeutic virus, and early results from these patients were promising But as the 18th patient, Jesse Gelsinger, within hours of his first treatment, developed a massive immune reaction He developed a high fever, his lungs filled with fluid, multiple organs shut down, and he died four days later of acute respiratory failure Jesse’s severe response to the adenovirus may have resulted from how his body reacted to a previous exposure to the virus used as the vector for this protocol In the aftermath of the tragedy, several government and scientific inquiries were conducted Investigators SPECIAL TOPIC Cells are grown in culture to ensure ADA gene is active SPECIAL TOPIC 540 SPECIAL TOPIC: GEN E THERAPY learned that in the clinical trial scientists had not reported other adverse reactions to gene therapy and that some of the scientists were affiliated with private companies that could benefit financially from the trials It was determined that serious side-effects seen in animal studies were not explained to patients during informedconsent discussions The FDA subsequently scrutinized gene therapy trials across the country, halted a number of them, and shut down several gene therapy programs Other groups voluntarily suspended their gene therapy studies Tighter restrictions on clinical trial protocols were imposed to correct some of the procedural problems that emerged from the Gelsinger case Jesse’s death had dealt a severe blow to the struggling field of gene therapy—a blow from which it was still reeling when a second tragedy hit The outlook for gene therapy brightened momentarily in 2000, when a group of French researchers reported what was hailed as the first large-scale success in gene therapy Children with a fatal X-linked form of SCID (X-SCID, also known as “bubble boy” disease) developed functional immune systems after being treated with a retroviral vector carrying a normal gene But elation over this study soon turned to despair, when it became clear that of the 20 patients in the trial developed leukemia as a direct result of their therapy One of these patients died as a result of the treatment, while the other four went into remission from the leukemia In two of the children examined, their cancer cells contained the retroviral vector, inserted near or into a gene called LMO2 This insertional mutation activated the LMO2 gene, causing uncontrolled white blood cell proliferation and development of leukemia The FDA immediately halted 27 similar gene therapy clinical trials, and once again gene therapy underwent a profound reassessment On a positive note, long-term survival data from trials in the UK to treat X-SCID and SCID using hematopoietic stem cells from the patients’ bone marrow for gene therapy have shown that 14 of 16 children have had their immune system restored at least years after the treatment These children formerly had life expectancies of less than 20 years Nevertheless, the above events had major negative impacts on the progress of gene therapy Problems with Gene Therapy Vectors Critics of gene therapy have berated research groups for undue haste, conflicts of interest, sloppy clinical trial management, and for promising much but delivering little Most of the problems associated with gene therapy, including the Jesse Gelsinger case and the French X-SCID trial, have been traced to the viral vectors used to transfer therapeutic genes into cells These vectors have been shown to have several serious drawbacks • First, integration of retroviral genomes, including the human therapeutic gene into the host cell’s genome, occurs only if the host cells are replicating their DNA In the body, only a small number of cells in any tissue are dividing and replicating their DNA • Second, the injection of large amounts of most viral vectors, but particularly adenovirus vectors, is capable of causing an adverse immune response in the patient, as happened in Jesse Gelsinger’s case • Third, insertion of viral genomes into host chromosomes can activate or mutate an essential gene, as in the case of the French patients Viral integrase, the enzyme that allows for viral genome integration into the host genome, interacts with chromatin-associated proteins, often steering integration toward transcriptionally active genes • Fourth, AAV vectors cannot carry DNA sequences larger than about kb, and retroviruses cannot carry DNA sequences much larger than 10 kb Many human genes exceed the 5–10 kb size range • Finally, there is a possibility that a fully infectious virus could be created if the inactivated vector were to recombine with another unaltered viral genome already present in the host cell To overcome these problems, new viral vectors and strategies for transferring genes into cells are being developed in an attempt to improve the action and safety of vectors Fortunately, gene therapy has experienced resurgence in part because of several promising new trials and successful treatments Recent Successful Trials Treating Retinal Blindness In recent years, patients being treated for blindness have greatly benefited from gene therapy approaches Congenital retinal blinding conditions affect about in 2000 people worldwide, many of which are the result of a wide range of genetic defects Over 165 different genes have been implicated in various forms of retinal blindness Successful gene therapy has been achieved in subsets of patients with Leber congenital amaurosis (LCA), a degenerative disease of the retina that affects in 50,000 to in 100,000 infants each year and causes severe blindness Successful Treatment of Hemophilia B A very encouraging gene therapy trial in England successfully treated a small group of adults with hemophilia B, a blood disorder caused by a deficiency in the coagulation protein human factor IX Currently, hemophilia B patients are treated several times each week with infusions of concentrated doses of the factor IX protein In the gene therapy trial, six adult patients received, in vivo, a single dose of an adenovirus vector (AAV8) carrying normal copies of the human factor IX gene introduced into liver cells Of six patients treated, four were able to stop factor IX infusion treatments after the gene therapy trial Several other trials of this AAV treatment RE CE NT S U CCE S S F U L T R IAL S 541 approach are underway, and expectations are high that a gene therapy cure for hemophilia B is close to becoming a routine reality HIV as a Vector Shows Promise in Recent Trials Researchers at the University of Paris and Harvard Medical School reported that two years after gene therapy treatment for B-thalassemia, a blood disorder involving the b-globin gene that reduces the production of hemoglobin, a young man no longer needed transfusions and appeared to be healthy A modified, disabled HIV was used to carry a copy of the normal B-globin gene Although this trial resulted in activation of the growth factor gene called HMGA2, reminiscent of what occurred in the French X-SCID trials, activation of the transcription factor did not result in an overproduction of hematopoietic cells or create a condition of preleukemia In 2013, researchers at the San Raffaele Telethon Institute for Gene Therapy in Milan, Italy, reported two studies using lentivirus vectors derived from HIV in combination with hematopoietic stem cells (HSCs) to successfully treat children with either metachromatic leukodystrophy (MLD) or Wiskott-Aldrich syndrome (WAS) MLD is a neurodegenerative disorder affecting storage of enzymes in lysosomes and is caused by mutation in the arylsulfatase A (ARSA) gene that results in an accumulation of fats called sulfatides These are toxic to neurons, causing progressive loss of the myelin sheath (demyelination) surrounding neurons in the brain, leading to a loss of cognitive functions and motor skills There is no cure for MLD Children with MLD appear healthy at birth but eventually develop MLD symptoms In this trial, researchers used an ex vivo approach with a lentivirus vector to introduce a functional ARSA gene into bone marrow-derived HSCs from each patient and then infused treated HSCs back into patients Three years after the start of a trial involving a total of 16 patients, 10 patients with MLD and with WAS, data from six patients analyzed 18 to 24 months after gene therapy indicated that the trials are safe and effective These initial reports are based on studying three children from each study because these are the first patients for whom sufficient time has passed after gene therapy treatment to make significant conclusions regarding the safety and effectiveness of the trials It took over 15 years of research to get to this point These trials involved a team of over 70 people, including researchers and clinicians, which is indicative of the teamwork approach typical of gene therapy trials In three children with MLD, gene therapy treatment halted disease progression for 18 to 24 months after therapy, as determined by magnetic resonance images of SPECIAL TOPIC Gene therapy treatments for LCA were originally pioneered in dogs Based on the success of these treatments, the protocols were adapted and applied to human gene therapy trials LCA is caused by alterations to photoreceptor cells (rods and cones), light-sensitive cells in the retina, due to 18 or more genes One gene in particular, RPE65, has been the gene therapy target of choice The protein product of the RPE65 gene metabolizes retinol, which is a form of vitamin A that allows the rod and cone cells of the retina to detect light and transmit electrical signals to the brain In one of the earliest trials, young adult patients with defects in the RPE65 gene were given injections of the normal gene Several months after a single treatment, many adult patients, while still legally blind, could detect light, and some of them could read lines of an eye chart This treatment approach for LCA was based on injecting AAVcarrying RPE65 at the back of the eye directly under the retina The therapeutic gene enters about 15 to 20 percent of cells in the retinal pigment epithelium, the layer of cells just beneath the visual cells of the retina Adults treated by this approach have shown substantial improvements in a variety of visual functions tests, but the greatest improvement has been demonstrated in children, all of whom have gained sufficient vision to allow them to be ambulatory Researchers think the success in children has occurred because younger patients have not lost as many photoreceptor cells as older patients Over two dozen gene therapy trials have been completed or are ongoing for various forms of blindness, including age-related degenerative causes of blindness Because of the small size of the eye and the relatively small number of cells that need to be treated, the prospects for gene therapy to become routine treatment for eye disorders appears to be very good Retinal cells are also very long-lived; thus, AAV delivery approaches can be successful for long periods of time even if the gene does not integrate ST 542 SPECIAL TOPIC: GEN E THERAPY BOX Glybera Is the First Commercial Gene Therapy to Be Approved in the West I SPECIAL TOPIC n late 2012, a gene therapy product called Glybera (alipogene tiparvovec) made history when the European Medicines Agency of the European Union approved it as the first gene therapy trial to win commercial approval in the Western world Glybera is an AAV vector system for delivering therapeutic copies of the LPL gene to treat patients with a rare disease called lipoprotein lipase deficiency (LPLD, also called familial hyperchylomicronemia) LPLD patients have high levels of triglycerides in their blood Elevated serum triglycerides are toxic to the pancreas and cause a severe form of pancreatic the brain and through tests of cognitive and motor skills Because disease onset is predicted at to 21 months, scientists are very encouraged by the outcomes of this trial The trial was technically complicated because it required that HSCs travel through the bloodstream and release the ARSA protein that is taken up into neurons A major challenge was to create enough engineered cells to produce a sufficient quantity of therapeutic ARSA protein to counteract the neurodegenerative process Similar results were reported for treating patients with WAS, an X-linked condition resulting in defective platelets that make patients more vulnerable to infections, frequent bleeding, autoimmune diseases, and cancer Genome sequencing of MLD and WAS patients treated in these trials showed no evidence of genome integration near oncogenes Similarly, patients showed no evidence of hematopoietic stem cell overproduction, suggesting that this lentivirus delivery protocol produced a safe and stable delivery of the therapeutic genes Targeted Approaches to Gene Therapy The gene therapy approaches and examples we have highlighted thus far have focused on the addition of a therapeutic gene that functions along with the defective gene However, the removal, correction, and/or replacement of a mutated gene and silencing expression of a defective gene are two other approaches being developed Rapid progress is being made with these approaches DNA-Editing Nucleases for Gene Targeting For nearly 20 years, scientists have been working on modifications of restriction enzymes and other nucleases to engineer proteins capable of gene targeting or gene editing— replacing specific genes in the genome The concept is to inflammation called pancreatitis Developed by Amsterdam-based company uniQure BV, it is still unclear if Glybera will be approved by the U.S FDA Nonetheless, the success of Glybera trials in Europe signals what many gene therapy researchers hope will be the beginning of a wave of approvals for gene therapy treatments in Europe and the United States combine a nuclease with a sequence-specific DNA binding domain that can be precisely targeted for digestion In 1996 researchers fused DNA-binding proteins with a zinc-finger motif and DNA cutting domain from the restriction enzyme FokI to create enzymes called zinc-finger nucleases (ZFNs; ST Figure 6–5) The zinc-finger motif is found in many transcription factors and consists of a cluster of two cysteine and two histidine residues that bind zinc atoms and interact with specific DNA sequences By coupling zinc-finger motifs to DNA cutting portions of a polypeptide, ZFNs provide a mechanism for modifying sequences in the genome in a sequencespecific targeted way The DNA-binding domain of the ZFN can be engineered to attach to any sequence in the genome The zinc fingers bind with a spacing of 5–7 nucleotides, and the nuclease domain of the ZFN cleaves between the binding sites Another category of DNA-editing nucleases called TALENs (transcription activator-like effector nucleases) was created by adding a DNA-binding motif identified in transcription factors from plant pathogenic bacteria known as transcription activator-like effectors (TALEs) to nucleases to create TALENS TALENS also cleave as dimers The DNA-binding domain is a tandem array of amino acid repeats, with each TALEN repeat binding to a specific single base pair The nuclease domain then cuts the sequence between the dimers, a stretch that spans about 13 bp ZFNs and TALENS have shown promise in animal models and cultured cells for gene replacement approaches that involve removing a defective gene from the genome These enzymes can create site-specific cleavage in the genome When coupled with certain integrases, ZFNs and TALENs may lead to gene editing by cutting out defective sequences and using recombination to introduce homologous sequences into the genome that replace defective sequences Although this technology has not yet advanced sufficiently for reliable use in humans, there have been several promising trials For example, ZFNs are actively being used in clinical trials for treating patients with HIV Scientists are ST ZFN TALEN FokI FokI 3‘ 5‘ C G T A CC CGC N NNNNN G G A T G TC C A G C A T GGG CG N NNN NN CC T A C A GGT 5‘ 3‘ 5‘ 3‘ 543 T C G T G A T C T G C A A C T C C A G T C T T T C T A G A A G A T G GG C GG G A A G C A C T A G A C G T T G A GG T C A G A A A G A T C T T C T A C C C G C C C T FokI S T F I GURE – Targeted Approaches to Gene Therapy 3‘ 5‘ FokI Zinc-finger nucleases and TALENs bind and cut DNA at specific sequences transplant these skin cells into patients in an attempt to cure them of RDEB Another group has recently taken a similar approach using TALENS to repair cultured cells in order to correct the mutation in Duchenne muscular dystrophy (DMD) Researchers are optimistic that this approach can soon be adapted to treat patients CRISPR/Cas Method Revolutionizes Gene Editing Applications No gene targeting method has created more excitement than the gene editing technique known as CRISPR/Cas (clustered regularly interspaced short palindrome repeats/ CRISPR-associated proteins) or simply CRISPR Identified in bacterial cells, the CRISPR system functions to provide bacteria and archaea immunity against invading bacteriophages and foreign plasmids First introduced in 2013, a CRISPR craze unfolded that has revolutionized genome-engineering applications including gene editing for gene therapy Because CRISPR works in bacteria, animal, and plant cells, the method offers diverse applications for genetic engineering by targeted gene editing CRISPR is based on delivering a single-stranded “guided” RNA sequence (sgRNA) that is complementary to the target gene sequence in the genome and attached to the endonuclease called Cas9 (ST Figure 6–6) Compared to TALEN approaches, sgRNAs are relative easy to design and synthesize At the same time as the sgRNA sequence is delivered, a DNA template strand coding for a replacement sequence is delivered The sgRNA-Cas9 complex binds to the target DNA sequence, and Cas9 generates a blunt, doublestranded break in the DNA CRISPR recognition of DNA cleavage sites is determined by RNA-DNA base pairing and a protospacer-adjacent motif (PAM), a three-nucleotide sequence adjacent to the complementary sequence As cells repair the DNA damage caused by Cas9, repair enzymes incorporate template DNA into the genome at the CRISPR/ Cas site, thus replacing the target DNA sequence Part of the power of CRISPR is that editing can be done directly in a living, adult animal Within months of the technique being widely available, researchers around SPECIAL TOPIC exploring ways to deliver immune system-stimulating genes that could make individuals resistant to HIV infection or cripple the virus in HIV-positive persons In 2007, Timothy Brown, a 40-year-old HIV-positive American, had a relapse of acute myeloid leukemia and received a stem cell (bone-marrow) transplant Because he was HIV-positive, Brown’s physician selected a donor with a mutation in both copies of the CCR5 gene, which encodes an HIV coreceptor carried on the surface of T cells to which HIV must bind to enter T cells (specifically CD4+ cells) People with naturally occurring mutations in both copies of the CCR5 gene are resistant to most forms of HIV Brown relapsed again and received another stem cell transplant from the CCR5-mutant donor Eventually, the cancer was contained, and by 2010, levels of HIV in his body were still undetectable even though he was no longer receiving immune-suppressive treatment Brown is generally considered to be the first person to have been cured of an HIV infection This example encouraged researchers to press forward with a gene therapy approach to modify the CCR5 gene of HIV patients In one promising trial, T cells were removed from HIV-positive men, and ZFNs were used to disrupt the CCR5 gene The modified cells were then reintroduced into patients In five of six patients treated, immune-cell counts rose substantially and viral loads also decreased following the therapy What percentage of immune cells would have to be treated this way to significantly inhibit spread of the virus is not known, but initial results are very promising Recently, researchers working with human cells used TALENs to remove defective copies of the COL7A1 gene, which causes recessive dystrophic epidermolysis bullosa (RDEB), an incurable and often fatal disease that causes excessive blistering of the skin, pain, and severely debilitating skin damage Researchers at the University of Minnesota used a TALEN to cut DNA near a mutation in COL7A1 gene in skin cells taken from an RDEB patient These cells were then converted into a type of stem cell called induced pluripotent stem cells (iPSCs) The iPSCs were treated with therapeutic copies of the COL7A1 gene and then differentiated into skin cells that expressed the correct protein This is a promising result, and researchers now plan to 544 SPECIAL TOPIC: GEN E THERAPY PAM sequence Matching DNA target sequence for editing Guide RNA Cas9 Genomic DNA Replacement sequence Repair S T F I G U RE 6– The CRISPR/Cas system allows for gene editing by targeting specific sequences SPECIAL TOPIC in the genome the world used CRISPR to target specific genes in human cells, mice, rats, bacteria, fruit flies, yeast, zebrafish, and dozens of other organisms A team from the Massachusetts Institute of Technology (MIT) recently cured mice of a rare liver disorder, type I tyrosinemia, through gene editing by CRISPR In tyrosinemia, a condition affecting about in 100,000 people, mutation of the Fuh gene encoding the enzyme fumarylacetoacetase prevents breakdown of the amino acid tyrosine After an in vivo approach with a onetime treatment, roughly in 250 liver cells accepted the CRISPR-delivered replacement of the mutant gene with a normal copy of the gene But about month later these cells proliferated and replaced diseased cells, taking over about one-third of the liver, which was sufficient to allow mice to metabolize tyrosine and show no effects of disease Mice were subsequently taken off a low-protein diet and a drug normally used to disrupt tyrosine production Other headline-grabbing examples of successful CRISPR applications in mice and humans include therapies for b-thalassemia, cancer genes, and HIV Based on the rapid development and effectiveness of the CRISPR method, CRISPR is clearly the most current, promising tool for gene editing in the future Stay tuned! RNA Silencing for Gene Inhibition Attempts have been made to use antisense oligonucleotides to inhibit translation of mRNAs from defective genes, but this approach to gene therapy has generally not yet proven to be reliable Nonetheless, the emergence of RNA interference as a powerful gene-silencing tool has reinvigorated gene therapy approaches by gene silencing As you learned in earlier in the text (see Chapter 15), RNA interference (RNAi) is a form of gene-expression regulation In animals short, double-stranded RNA molecules are delivered into cells where the enzyme Dicer chops them into 21- to 25-nt long pieces called small interfering RNAs (siRNAs) siRNAs then join with an enyzme complex called the RNA-inducing silencing complex (RISC), which shuttles the siRNAs to their target mRNA, where they bind by complementary base pairing The RISC complex can block siRNA-bound mRNAs from being translated into protein or can lead to degradation of siRNA-bound mRNAs so that they cannot be translated into protein (ST Figure 6–7) A main challenge to RNAi-based therapeutics so far has been in vivo delivery of double-stranded RNA or siRNA RNAs degrade quickly in the body It is also hard to get RNA to penetrate cells in the target tissue For example, how does one deliver RNA-based therapies to cancer cells but not to noncancerous, healthy cells? Two common delivery approaches are to inject the siRNA directly or to deliver them via a DNA plasmid vector that is taken in by cells and transcribed to make double-stranded RNA which Dicer can cleave into siRNAs Lentivirus, liposome, and attachment of siRNAs to cholesterol and fatty acids are other approaches being used to deliver siRNAs (ST Figure 6–7) More than a dozen clinical trials involving RNAi are underway in the United States Several RNAi clinical trials to treat blindness are showing promising results One RNAi strategy to treat a form of blindness called macular degeneration targets a gene called VEGF The VEGF protein promotes blood vessel growth Overexpression of this gene, causing excessive production of blood vessels in the retina, leads to impaired vision and eventually blindness Many ST Plasmids expressing antisense RNA or siRNA • Will it be possible to use gene therapy to treat diseases that involve multiple genes? • Can expression or the timing of expression of therapeutic genes be controlled in a patient so that genes can be turned on or off at a particular time or as necessary? • Will targeted gene delivery approaches become more widely used for gene therapy trials? RNAi Antisense RNA Antisense RNA dsRNA Dicer enzyme mRNA for disease gene siRNAs RISC complex No translation mRNA digestion Cytoplasm Lentivirus delivering antisense RNA or siRNA For many people, the question remains whether gene therapy can ever recover from past setbacks and fulfill its promise as a cure for genetic diseases Clinical trials for any new therapy are potentially dangerous, and often, animal studies will not accurately reflect the reaction of individual humans to the methodology leading to the delivery of new genes However, as the history of similar struggles encountered with such life-saving developments such as the use of antibiotics and organ transplants has shown, there will be setbacks and even tragedies, but step by small step, we will move toward a technology that could—someday—provide reliable and safe treatment for severe genetic diseases Ethical Concerns Surrounding Gene Therapy expect that this disease will soon become the first condition to receive approval for treatment by RNAi therapy Other disease candidates for treatment by RNAi include several different cancers, diabetes, liver diseases, multiple sclerosis, and arthritis Future Challenges and Ethical Issues Despite the progress that we have noted thus far, many questions remain to be answered before we can hope for widespread application of the gene therapy methodology in the treatment of genetic disorders: • What is the proper route for gene delivery in different kinds of disorders? For example, what is the best way to treat brain or muscle tissues? • What percentage of cells in an organ or a tissue need to express a therapeutic gene to alleviate the effects of a genetic disorder? • What amount of a therapeutic gene product must be produced to provide lasting improvement of the condition, and how can sufficient production be ensured? Gene therapy raises several ethical concerns, and many forms of gene therapy are sources of intense debate At present, all gene therapy trials are restricted to using somatic cells as targets for gene transfer This form of gene therapy is called somatic gene therapy; only one individual is affected, and the therapy is done with the permission and informed consent of the patient or family Two other forms of gene therapy have not been approved, primarily because of the unresolved ethical issues surrounding them The first is called germ-line therapy, whereby germ cells (the cells that give rise to the gametes— i.e., sperm and eggs) or mature gametes are used as targets for gene transfer In this approach, the transferred gene is incorporated into all the future cells of the body, including the germ cells As a result, individuals in future generations will also be affected, without their consent Is this kind of procedure ethical? Do we have the right to make this decision for future generations? Thus far, the concerns have outweighed the potential benefits, and such research is prohibited Box mentioned gene doping, which is also an example of enhancement gene therapy, whereby people may be “enhanced” for some desired trait This is another unapproved form of gene therapy—which is extremely controversial and is strongly opposed by many people Should genetic technology be used to enhance human potential? For example, should it be permissible to use gene therapy SPECIAL TOPIC S T F I GURE – Antisense RNA and RNA interference (RNAi) approaches to silence genes for gene therapy Antisense RNA technology and RNAi are two ways to silence gene expression and turn off disease genes 545 Currently, many approaches provide only short-lived delivery of the therapeutic gene and its protein siRNA (synthetic) Nucleus Fu tu re Challenges and Ethical I ss u es 546 SPECIAL TOPIC: GEN E THERAPY BOX Gene Doping for Athletic Performance? G SPECIAL TOPIC ene therapy is intended to provide treatments or cures for genetic diseases, but it can also apply for those seeking genetic enhancements to improve athletic performance As athletes seek a competitive edge, will gene therapy as a form of “gene doping” to improve performance be far behind? We already know that in animal models enhanced muscle function can be achieved by gene addition For example, adding copies of the insulinlike growth factor (IGF-1) gene to mice improves aspects of muscle function The kidney hormone erythropoietin (EPO) increases red blood cell production, which leads to a higher oxygen content of the blood and thus improved endurance Synthetic forms of EPO are banned in Olympic athletes Several groups have proposed using gene therapy to deliver the EPO gene into athletes “naturally.” to increase height, enhance athletic ability, or extend intellectual potential? Presently, the consensus is that enhancement therapy, like germ-line therapy, is an unacceptable use of gene therapy However, there is an ongoing debate, and many issues are still unresolved Gene therapy is currently a fairly expensive treatment But what is the right price for a cure? It remains to be seen how health-care insurance providers will view gene therapy But if gene therapy treatments provide a healthcare option that drastically improves the quality of life for patients for whom there are few other options, it is likely Since 2004 the World AntiDoping Agency (WADA) has included gene doping through gene therapy as a prohibited method in sanctioned competitions However, methods to detect gene doping are not well established If techniques for gene therapy become more routine, many feel it is simply a matter of time before gene doping through gene therapy will be the next generation of performance-enhancement treatments Obviously, many legal and ethical questions will arise if gene doping becomes a reality that insurance companies will reimburse patients for treatment costs Finally, whom to treat by gene therapy is yet another ethically provocative consideration In the Jesse Gelsinger case mentioned earlier, the symptoms of his OTC deficiency were minimized by a low protein diet and drug treatments Whether it was necessary to treat Jesse by gene therapy is a question that has been widely debated Visit the Study Area in MasteringGenetics for a list of further readings on this topic, including journal references and selected web sites Review Questions What is gene therapy? Compare and contrast ex vivo and in vivo gene therapy as approaches for delivering therapeutic genes When treating a person by gene therapy, is it necessary that the therapeutic gene becomes part of a chromosome (integration) when inserted into cells? Explain your answer Describe two ways that therapeutic genes can be delivered into cells Explain how viral vectors can be used for gene therapy and provide two examples of commonly used viral vectors What are some of the major challenges that must be overcome to develop safer and more effective viral vectors for gene therapy? During the first successful gene therapy trial in which Ashanti DeSilva was treated for SCID, did the therapeutic gene delivered to Ashanti replace the defective copy of the ADA gene? Why were white blood cells chosen as the targets for the therapeutic gene? Explain an example of successful gene therapy trial In your answer be sure to consider: a description of the disease condition that was treated, the mutation or disease gene affected, the therapeutic gene delivered, and the method of delivery use for the therapy What is targeted gene therapy or gene editing, and how does this approach differ from traditional gene therapy approaches? How ZFNs work? 10 Describe two gene-silencing techniques and explain how they may be used for gene therapy Discussion Questions Discuss the challenges scientists face in making gene therapy a safe, reliable, and effective technique for treating human disease conditions Who should be treated by gene therapy? What criteria are used to determine if a person is a candidate for gene therapy? Should gene therapy be used for cosmetic purposes or to improve athletic performance? Describe future challenges and ethical issues associated with gene therapy APPENDIX Solutions to Selected Problems and Discussion Questions Chapter Solutions to Problems and Discussion Questions Your essay should include a description of the impact of recombinant DNA technology on the following: plant and animal husbandry and production, drug development, medical advances, forensics, and understanding gene function The genotype of an organism is defined as the specific allelic or genetic constitution of an organism, or, often, the allelic composition of one or a limited number of genes under investigation The observable feature of those genes is called the phenotype A gene variant is called an allele A gene is a portion of DNA that encodes the information required to make a specific protein The DNA is transcribed to make a messenger RNA, which is then translated by ribosomes into a chain of amino acids The chain then folds up into a specific structure and is processed into the finished protein The central dogma of molecular genetics refers to the relationships among DNA, RNA, and proteins The processes of transcription and translation are integral to understanding these relationships Because DNA and RNA are discrete chemical entities, they can be isolated, studied, and manipulated in a variety of experiments that define modern genetics 10 Restriction enzymes (endonucleases) cut double-stranded DNA at particular base sequences When a vector is cleaved with the same enzyme, complementary ends are created such that ends, regardless of their origin, can be combined and ligated to form intact double-stranded structures Such recombinant forms are often useful for industrial, research, and/or pharmaceutical efforts 12 Unique transgenic plants and animals can be patented, as ruled by the United States Supreme Court in 1980 Supporters of organismic patenting argue that it is needed to encourage innovation and allow the costs of discovery to be recovered Capital investors assume that there is a likely chance that their investments will yield positive returns Others argue that natural substances should not be privately owned and that once they are owned by a small number of companies, free enterprise will be stifled 14 All life has a common origin, so genes with similar functions tend to be similar in structure and nucleotide sequence in different organisms That is why what scientists learn by studying the genetics of model organisms can be used to understand human diseases The use of model organisms like the mouse (Mus musculus) also allows scientists to carry out genetic studies that would be unethical to carry out in humans; for e.g., setting up specific matings or creating knockouts or transgenic individuals 16 Advances in bioinformatics are limited only by the advances in information technology in general, which is expanding exponentially due to extensive financial investment in research by the industries Policies and legislation regarding the ethical issues will always lag behind because they are in response to public opinion about the new advances in biotechnology Compared with mitosis, which maintains chromosomal constancy, meiosis provides for a reduction in chromosome number and an opportunity for the exchange of genetic material between homologous chromosomes In mitosis there is no change in chromosome number or kind in the two daughter cells, whereas in meiosis numerous potentially different haploid (n) cells are produced During oogenesis, only one of the four meiotic products is functional; however, four of the four meiotic products of spermatogenesis are potentially functional Chromosomes that are homologous share many properties including overall length, position of the centromere (metacentric, submetacentric, acrocentric, telocentric), banding patterns, type and location of genes, and autoradiographic pattern Diploidy is a term often used in conjunction with the symbol 2n It means that both members of a homologous pair of chromosomes are present Haploidy refers to the presence of a single copy of each homologous chromosome (n) 12 During meiosis I, chromosome number is reduced to haploid complements This is achieved by synapsis of homologous chromosomes and their subsequent separation It would seem to be more mechanically difficult for genetically identical daughters to form from mitosis if homologous chromosomes paired By having chromosomes unpaired at metaphase of mitosis, only centromere division is required for daughter cells to eventually receive identical chromosomal complements 14 The stages of cell cycle follow a specific order of events Moreover, a new stage is never initiated without the completion of the prior stage This is due to sequential activation of cyclin and cyclin-dependent kinases For example, mitotic cyclin/ CDKs can only be active in mitosis and never in G1, S, or G2 This regulation is established by cell cycle specific transcription, post-translational modifications, regulating cellular localization of cyclin/CDK partners, and cyclin/CDK inhibitors 16 There would be 16 combinations with the addition of another chromosome pair 18 One-half of each tetrad will have a maternal homolog: (1/2)10 24 50, 50, 50, 100, 200 26 Duplicated chromosomes Am, Ap, Bm, Bp, Cm, and Cp will align at metaphase, with the centromeres dividing and sister chromatids going to opposite poles at anaphase 28 As long as you have accounted for eight possible combinations in the previous problem, there would be no new ones added in this problem 30 See the products of nondisjunction of chromosome C at the end of meiosis I as follows Chapter Answers to Now Solve This 2-1 32 chromatids, 16 chromosomes moving to each pole 2-2 (a) eight tetrads (b) eight dyads (c) eight monads 2-3 Not necessarily If crossing over occurred in meiosis I, then the chromatids in the secondary oocyte are not identical Two C chromosomes Am or Ap Bm or Bp no Cm or Cp A-1 A-2 SOLUTION S TO SELECTED PROB LEMS AND DIS C U S S ION QU E S TIONS At the end of meiosis II, assuming that, as the problem states, the C chromosomes separate as dyads instead of monads during meiosis II, you would have monads for the A and B chromosomes and dyads (from the cell on the left) for both C chromosomes as one possibility (b) Genotypes AaBBCC AaBBCc AaBBcc aaBBCC aaBBCc aaBBcc Ratio Phenotypes 1/8 2/8 1/8 1/8 2/8 1/8 A_BBC_ = 3/8 A_BBcc = 1/8 aaBBC_ = 3/8 aaBBcc = 1/8 (c) There will be eight (2n) different kinds of gametes from each of the parents and therefore a 64-box Punnett square For the phenotypic frequencies, set up the problem in the following manner: 3/4 C_ = 27/64 A_B_C_ 3/4 B_ 1/4 cc = 9/64 A_B_cc Meiosis II Meiosis II 3/4 A_ 1/4 bb 3/4 C_ etc 1/4 cc 1/4 aa_ 3/4 B_ 3/4 C_ 1/4 cc 1/4 bb 3/4 C_ 1/4 cc Chapter Answers to Now Solve This 3-1 P = checkered; p = plain Checkered is tentatively assigned the dominant function because, especially in cross (b), we see that checkered types are more likely to be produced than plain types Progeny P1 Cross Checkered Plain (a) PP * PP PP (b) PP * pp Pp (c) pp * pp pp (d) PP * pp Pp (e) Pp * pp Pp pp (f) Pp * Pp PP, Pp pp (g) PP * Pp PP, Pp 3-2 Symbolism as before: w = wrinkled seeds W = round seeds g = green cotyledons G = yellow cotyledons Examine each characteristic (seed shape vs cotyledon color) separately (a) Notice a 3:1 ratio for seed shape; therefore, Ww * Ww, and no green cotyledons; therefore, GG * GG or GG * Gg Putting the two characteristics together gives WwGG * WwGG or WwGG * WwGg (b) WwGg * wwGg (c) WwGg * WwGg (d) WwGg * wwgg 3-3 (a) Genotypes AABBCC AABBCc AABbCC AABbCc AaBBCC AaBBCc AaBbCC AaBbCc aaBBCC aaBBCc aaBbCC aaBbCc Ratio (1/16) (1/16) (1/16) (1/16) (2/16) (2/16) (2/16) (2/16) (1/16) (1/16) (1/16) (1/16) Phenotypes y A_B_C_ = 12/16 aaB_C_ = 4/16 t 3-4 (a) x2 = 0.47 The x2 value is associated with a probability greater than 0.90 for degrees of freedom (because there are now four classes in the x2 test) We fail to reject the null hypothesis and conclude that the observed values not differ significantly from the expected values To deal with parts (b) and (c), it is easier to see the observed values for the monohybrid ratios if the phenotypes are listed: round, yellow 315 round, green 108 wrinkled, yellow 101 wrinkled, green 32 (b) For the round: wrinkled monohybrid component, the smooth types total 423 (315 + 108), while the wrinkled types total 133 (101 + 32) The x2 value is 0.35, and in examining the text for degree of freedom, the p value is greater than 0.50 and less than 0.90 We fail to reject the null hypothesis and conclude that the observed values not differ significantly from the expected values (c) For the yellow:green portion of the problem, see that there are 416 yellow plants (315 + 101) and 140 (108 + 32) green plants The x2 value is 0.01, and in examining the text for 1 degree of freedom, the p value is greater than 0.90 We fail to reject the null hypothesis Solutions to Problems and Discussion Questions Your essay should include the following points: Factors occur in pairs Some genes have dominant and recessive alleles Alleles segregate from each other during gamete formation When homologous chromosomes separate from each other at anaphase I, alleles will go to opposite poles of the meiotic apparatus One gene pair separates independently from other gene pairs Different gene pairs on the same homologous pair of chromosomes (if far apart) or on non-homologous chromosomes will separate independently from each other during meiosis (a) Only if the abnormal parent is Mm, crossing with a normal parent (mm) will yield some abnormal (Mm) and some normal (mm) progeny (b) Since all the children are normal, the abnormal parent has to be heterozygous (Mm) Hence, the male is heterozygous, while the female is homozygous recessive Pisum sativum is easy to cultivate It is naturally self-fertilizing, but it can be crossbred It has several visible features (e.g., tall or short, red flowers or white flowers) that are consistent under a variety of environmental conditions, yet contrast due S OLU TIONS TO S E LE C TE D PROBLE MS AND DIS C U S S IO N Q U ES T I O N S to genetic circumstances Seeds could be obtained from local merchants Three of Mendel’s postulates are illustrated in this problem Unit factors occur in pairs (postulate 1) and demonstrate dominance/recessive relationships (postulate 2) The fact that these unit factors separate from each other during gamete formation illustrates postulate 10 Factors occur in pairs Some genes have dominant and recessive alleles Alleles segregate from each other during gamete formation When homologous chromosomes separate from each other at anaphase I, alleles will go to opposite poles of the meiotic apparatus One gene pair separates independently from other gene pairs 12 (a) 35 = 243 (b) 1/243 (c) no (d) yes 14 Symbols: F1: Seed shape Seed color W = round w = wrinkled G = yellow g = green WwGg * wwgg 1/4 WwGg (round, yellow) 1/4 Wwgg (round, green) 1/4 wwGg (wrinkled, yellow) 1/4 wwgg (wrinkled, green) 16 (a) For the 3:1 ratio, the expected numbers should be 270 for phenotype and 90 for phenotype Solving for (o - e)2 , we get x 2 = 72.5 Based on x 2 for df 1, x 2 = π = e p 0.05 Hence, for a critical p value of 0.05, null hypothesis is rejected (b) Solve similarly as shown in part (a) above, but for 1:1 ratio 18 1/8 20 3/4 22 Unit factors in pairs, Dominance and recessiveness, Segregation 24 (a) There are two possibilities Either the trait is dominant, in which case I-1 is heterozygous, as are II-2 and II-3, or the trait is recessive and I-1 is homozygous and I-2 is heterozygous Under the condition of recessiveness, both II-1 and II-4 would be heterozygous; II-2 and II-3 would be homozygous (b) Recessive: Parents Aa, Aa (c) Recessive: Parents Aa, Aa (d) Recessive or dominant: if recessive, parents AA (probably), aa Second pedigree: recessive or dominant, not sex-linked (because this topic has not been covered as yet), if recessive, parents Aa, aa 26 (a) Notice in cross #1 that the ratio of straight wings to curled wings is 3:1 and the ratio of short bristles to long bristles is also 3:1 This would indicate that straight is dominant to curled and short is dominant to long Possible symbols would be (using standard Drosophila symbolism): straight wings = w short bristles = B + (b) Cross #1: Cross #2: Cross #3: Cross #4: Cross #5: + curled wings = w long bristles = B w +>w ; B +>B * w +>w ; B +>B w +>w ; B>B * w +>w ; B>B w>w ; B>B * w +>w ; B +>B w +>w +; B +>B * w +>w +; B +>B (one parent could be w+/w) w>w ; B +>B * w +>w ; B +>B A-3 Chapter Answers to Now Solve This 4-1 (a) ckca * cdca 2/4 sepia; 1/4 cream; 1/4 albino (b) Because the cream guinea pig had two sepia parents, (c kc d * c kc d or c kc d * c kc a), the cream parent could be cdcd or cdca ckca * cdcd 1/2 sepia; 1/2 cream* *(If the cream parent is assumed to be homozygous) or ckca * cdca 1/2 sepia; 1/4 cream; 1/4 albino (c) Crosses possible: various other possibilities exist depending on the genotypes of the parents c kc k * c dc d all sepia c kc k * c dc a all sepia c kc d * c dc d 1/2 sepia; 1/2 cream c kc d * c dc a 1/2 sepia; 1/2 cream c kc a * c dc d 1/2 sepia; 1/2 cream c kc a * c dc a 1/2 sepia; 1/4 cream; 1/4 albino (d) Crosses possible: other possibilities exist, depending on the genotypes of the parents c kc a * c dc d 1/2 sepia; 1/2 cream c kc a * c dc a 1/2 sepia; 1/4 cream; 1/4 albino 4-2 A = pigment; a = pigmentless (colorless) B = purple; b = red AaBb * AaBb: A_B_ = purple; A_bb = red; aaB_ = colorless; aabb = colorless 4-3 Let a represent the mutant gene and A represent its normal allele (a) This pedigree is consistent with an X-linked recessive trait because the male would contribute an X chromosome carrying the a mutation to the aa daughter The mother would have to be heterozygous Aa (b) This pedigree is consistent with an X-linked recessive trait because the mother could be Aa and transmit her a allele to her one son (a/Y) and her A allele to her other son (c) This pedigree is not consistent with an X-linked mode of inheritance because the aa mother has an A/Y son Solutions to Problems and Discussion Questions Your essay should include a description of alleles that not function independently of each other or they reduce the viability of a class of offspring With multiple alleles, there are more than two alternatives of a given gene 20 (large leaves), 40 (medium leaves), 20 (small leaves); incomplete dominance Flower color: RR = red; Rr = pink; rr = white Flower shape: P = personate; p = peloric Plant height: D = tall; d = dwarf (a) RRPPDD * rrppdd (b) 2/4 pink * 3/4 personate * 3/4 tall = 18/64 RrPpDd (pink, personate, tall) (a) Each parent produces only one type of gamete (CrP or cRp ) Hence, all the progeny are CcRrPp, that is, all are purple (b) Each progeny inherits at least one copy of C and one copy of R C, R, and P are always inherited together Hence, all the progeny are red (c) Using the forked-line method, we find that 9/32 are purple, 9/32 are red, and 14/32 are colorless 10 (a) 1/4 (b) 1/2 (c) 1/4 (d) zero 12 This situation is similar to sex-influenced pattern baldness in humans Consider two alleles that are autosomal and let BB = beardless in both sexes Bb = beardless in females Bb = bearded in males bb = bearded in both sexes P1: female: bb (bearded) * male: BB (beardless) F1: Bb = female beardless; male bearded A-4 SOLUTION S TO SELECTED PROB LEMS AND DIS C U S S ION QU E S TIONS Because half of the offspring are males and half are females, one could, for clarity, rewrite the F2 as: 1/2 females 1/2 males 1/4 BB 2/4 Bb 1/4 bb 1/8 beardless 2/8 beardless 1/8 bearded 1/8 beardless 2/8 bearded 1/8 bearded One could test the above model by crossing F1 (heterozygous) beardless females with bearded (homozygous) males Comparing these results with the reciprocal cross would support the model if the distributions of sexes with phenotypes were the same in both crosses 14 (a) F1: carrier female and affected male, F2: normal male, affected male, carrier female, and affected female (b) F1: carrier female and normal male, F2: normal female, carrier female, normal male, and affected male If the hemophilic gene was autosomal, then hh would be an affected individual, while Hh and HH would be normal 16 (a) P1: X vX v; +/+ * X + /Y; bw/bw F1: 1/2 X+Xv; +/bw (female, normal) 1/2 Xv/Y; +/bw (male, vermilion) F2: 3>16 1>16 3>16 1>16 3>16 1>16 3>16 1>16 = = = = = = = = female, normal female, brown eyes female, vermilion eyes female, white eyes male, normal male, brown eyes male, vermilion eyes male, white eyes (b) P1: X+X+; bw/bw * Xv/Y;+/+ AABBrr * AAbbRR which would produce an F1 of AABbRr, which would be blue-eyed and utterer The F2 will follow a pattern of a 9:3:3:1 ratio because of homozygosity for the A locus and heterozygosity for both the B and R loci 9/16 AAB_R_ 3/16 AAB_rr 3/16 AAbbR_ 1/16 AAbbrr F2: 6/16 2/16 3/16 1/16 3/16 1/16 = = = = = = female, normal female, brown eyes male, normal male, brown eyes male, vermilion eyes male, white eyes (c) P1: X vX v; bw/bw * X + /Y; +/+ F1: 1/2 X+Xv; +/bw (female, normal) 1/2 Xv/Y; +/bw (male, vermilion) F2: 3/16 = female, normal 1/16 = female, brown eyes 3/16 = female, vermilion eyes 1/16 = female, white eyes 3/16 = male, normal 1/16 = male, brown eyes 3/16 = male, vermilion eyes 1/16 = male, white eyes 18 (a) Because the denominator in the ratios is 64, one would begin to consider that three independently assorting gene pairs were operating in this problem Because there are only two characteristics (eye color and croaking), however, one might hypothesize that two gene pairs are involved in the inheritance of one trait while one gene pair is involved in the other (b) Notice that there is a 48:16 (or 3:1) ratio of rib-it to kneedeep and a 36:16:12 (or 9:4:3) ratio of blue to green to purple eye color Because of these relationships, one would conclude that croaking is due to one (dominant/recessive) gene pair while eye color is due to two gene pairs Because there is a 9:4:3 ratio regarding eye color, some gene interaction (epistasis) is indicated = = = = blue-eyed, utterer blue-eyed, mutterer purple-eyed, utterer purple-eyed, mutterer 20 A 12:3:1 ratio is obtained, which is a clear sign that epistasis has modified a typical 9:3:3:1 ratio In this case, cattle in one of the 3/16 classes has the same phenotype as cattle in the 9/16 class Since the 9/16 class typically takes the genotype of A_B_, it seems reasonable to think of the following genotypic classifications: A_B_ aaB_ A_bb aabb = = = = solid white (9/16) solid white (3/16) black and white spotted (3/16) solid black (1/16) The selection of bb as giving the spotted phenotype is arbitrary One could obtain AAbb true-breeding black and white spotted cattle 22 Symbolism: A_B_ = black aabb = golden + v F1: 1/2 X X ; +/bw (female, normal) 1/2 X+/Y; +/bw (male, normal) (c) Symbolism: Croaking: R_ = utterer; rr = mutterer Eye color: Since the most frequent phenotype is blue eye, let A_B_ represent the genotypes For the purple class, a 3/16 group uses the A_bb genotypes The 4/16 class (green) would be the aaB_ and the aabb groups (d) The cross involving a blue-eyed, mutterer frog and a purple-eyed, utterer frog would have the genotypes A_bb = golden aaB_ = brown The combination of bb is epistatic to the A locus (a) AAB_ * aaBB (other configurations are possible, but each must give all offspring with A and B dominant alleles) (b) AaB_ * aaBB (other configurations are possible, but both parents cannot be Bb) (c) AABb * aaBb (d) AABB * AAbb (e) AaBb * Aabb (f ) AaBb * aabb (g) aaBb * aaBb (h) AaBb * AaBb Those genotypes that will breed true will be as follows: black = AABB golden = all genotypes that are bb brown = aaBB 24 (a) C chC ch = chestnut C cC c = cremello C chC c = palomino (b) The F1 resulting from matings between cremello and chestnut horses would be expected to be all palomino The F2 would be expected to fall in a 1:2:1 ratio as in the third cross in part (a) above 26 Cross #1 = (c) Cross #4 = (e) Cross #2 = (d) Cross #5 = (a) Cross #3 = (b) Given that each parental/offspring grouping can only be used once, there are no other combinations 28 The homozygous dominant type is lethal Polled is caused by an independently assorting dominant allele, while horned is caused by the recessive allele to polled 30 Maternal effect genes produce products that are not carried over for more than one generation, as is the case with S OLU TIONS TO S E LE C TE D PROBLE MS AND DIS C U S S IO N Q U ES T I O N S organelle and infectious heredity Crosses that illustrate the transient nature of a maternal effect could include the following: Female Aa * male aa ¡ all offspring of the A phenotype Take a female A phenotype from the above cross and conduct the following mating: aa * male Aa All offspring may be of the a phenotype because all of the offspring will reflect the genotype of the mother, not her phenotype This cross illustrates that maternal effects last only one generation However, depending on particular biochemical/developmental parameters, all crosses may not give these types of patterns 32 (a, b) The reduced ratio is 12 white, orange, and brown, and in a dihybrid cross (AaBb * AaBb) the following would occur: 12 white A_B_ or aaB_: orange A_bb: brown aabb 34 Beatrice, Alice of Hesse, and Alice of Athlone are carriers There is a 1/2 chance that Princess Irene is a carrier Chapter Answers to Now Solve This 5-1 (a) Something is missing from the male-determining system of sex determination at the level of the genes, gene products, or receptors, and so on, and the loss is correlated with CMD1 (b) The SOX9 gene, or its product, is probably involved in male development Perhaps it is activated by SRY (c) There is probably some evolutionary relationship between the SOX9 gene and SRY There is considerable evidence that many other genes and pseudogenes are also homologous to SRY (d) Normal female sexual development does not require the SOX9 gene or gene product(s) 5-2 Because of X chromosome inactivation in mammals, scientists would be interested in determining whether the nucleus taken from Rainbow (donor) would continue to show such inactivation Would the inactivated X chromosome retain the property of inactivation? Since X chromosome inactivation is random, CC would have a different patch pattern from her genetic mother based on the random X inactivation alone Solutions to Problems and Discussion Questions Your essay should include various aspects of sex chromosomes that contain genes responsible for sex determination Mention should also be made of those organisms in which autosomes play a role in concert with the sex chromosomes In humans, the sex is determined by the presence or absence of the SRY gene located on the Y chromosome Therefore, XO individuals are females with ovaries, a uterus, and oviducts, but very few ova In Drosophila, the sex is determined by the balance of female determinants on the X chromosome(s) and male determinants on the autosomes, mediated by the Sxl gene The Y chromosome does not determine sex, but is required for sperm production Therefore, XO individuals are sterile males The Y chromosome is male determining in humans, and it is a particular region of the Y chromosome that causes maleness, the sex-determining region (SRY) SRY releases a product called the testis-determining factor (TDF), which causes the undifferentiated gonadal tissue to form testes Individuals with the 47,XXY complement are males, while 45,XO produces females In Drosophila it is the balance between the number of X chromosomes and the number of haploid sets of autosomes that determines sex In contrast to humans, XO Drosophila are males and the XXY complement is female A-5 10 (a) female X rwY * male X +X + F1: females: X +Y (normal) F2: males: females: males: XrwX + (normal) X +Y (normal) XrwY (reduced wing) XrwX + (normal) X +X + (normal) (b) female X rwX rw * male X +Y F1: females: males: F2: females: males: X rwX + (normal) X rwY (reduced wing) X rwX+ (normal) X rwX rw (reduced wing) X +Y (normal) X rwY (reduced wing) 12 The zygote develops masculinized characteristics because of the hormones and transcription factors that are produced by the activity of the Y chromosome Even though the zygote is a female genetically, it has masculinized reproductive organs 14 Because attached-X chromosomes have a mother-to-daughter inheritance and the father’s X is transferred to the son, one would see daughters with the white-eye phenotype and sons with the miniature wing phenotype In addition, there would be rare (6 3%) metafemales (attached-X + X) with wild-type eye color YY zygotes fail to develop into larvae 16 A Barr body is a differentially staining chromosome seen in some interphase nuclei of mammals with two X chromosomes There will be one fewer Barr body than the number of X chromosomes The Barr body is an X chromosome that is considered to be genetically inactive 18 The Lyon hypothesis states that the inactivation of the X chromosome occurs at random early in embryonic development Such X chromosomes are in some way “marked,” such that all clonally related cells have the same X chromosome inactivated 20 1/4 of the offspring will be calico 22 Many organisms have evolved over millions of years under the fine balance of numerous gene products, usually occurring with two copies (identical or similar) of each gene Many genes required for normal cellular and organismic function in both males and females are located on the X chromosome where only one copy occurs These gene products have nothing to with sex determination or sex differentiation, but their output must be balanced in some manner 24 Nondisjunction could have occurred either at meiosis I or meiosis II in the mother, thus giving the XwXwY complement in the offspring 26 In snapping turtles, sex determination is strongly influenced by temperature such that males are favored in the 26–34°C range Lizards, on the other hand, appear to have their sex determined by factors other than temperature in the 20–40°C range Chapter Answers to Now Solve This 6-1 If the father had hemophilia, it is likely that the Turner syndrome individual inherited the X chromosome from the father and no sex chromosome from the mother If nondisjunction occurred in the mother, either during meiosis I or meiosis II, an egg with no X chromosome can be the result See the text for a diagram of primary and secondary nondisjunction 6-2 The sterility of interspecific hybrids is often caused from a high proportion of univalents in meiosis I As such, viable gametes are rare and the likelihood of two such gametes “meeting” is remote Even if partial homology of chromosomes allows some pairing, sterility is usually the rule The horticulturist may attempt to reverse the sterility by treating the sterile A-6 SOLUTION S TO SELECTED PROB LEMS AND DIS C U S S ION QU E S TIONS hybrid with colchicine Such a treatment, if successful, may double the chromosome number, so each chromosome would now have a homolog with which to pair during meiosis 6-3 The rare double crossovers within the boundaries of a paracentric inversion heterozygote produce only minor departures from the standard chromosomal arrangement as long as the crossovers involve the same two chromatids With two-strand double crossovers, the second crossover negates the first However, three-strand and four-strand double crossovers have consequences that lead to anaphase bridges as well as a high degree of genetically unbalanced gametes Solutions to Problems and Discussion Questions Your essay can draw from many examples discussed in the text as examples of deletions, duplications, inversions, translocations, and copy number variations Some cases of Down syndrome are due to a translocation between chromosomes 14 and 21 The parents are phenotypically normal, yet the translocation leads to trisomy 21 during meiosis Chromosomes can break spontaneously or due to chemicals, and can abnormally fuse with other nonhomologous chromosomes, leading to a loss or rearrangement of the genetic material Telomere shortening can be another reason why chromosome ends fuse with each other Due to gene redundancy, polyploid plants have a faster growth They are, therefore, larger than their diploid relatives and have larger flowers and fruits that have higher economical value The most significant disadvantage is infertility The fertility of polyploid plants depends on the ability to generate balanced gametes Two homologs of each chromosome are required for successful meiosis and fertilization 10 Organisms with one inverted chromosome and one noninverted homolog are called inversion heterozygotes Pairing of two such chromosomes is possible only through inversion loops In case of pairing within the inversion loop, abnormal chromatids consisting of deletions and duplications will be produced 12 Modern globin genes resulted from a duplication event in an ancestral gene about 500 million years ago Mutations occurred over time and a chromosomal aberration separated the duplicated genes, leaving the eventual a cluster on chromosome 16 and the eventual b cluster on chromosome 11 14 Given the basic chromosome set of nine unique chromosomes (a haploid complement), other forms with the “n multiples” are forms of autotetraploidy In the illustration below the n basic set is multiplied to various levels as is the autotetraploid in the example Basic set of nine unique chromosomes (n) Autotetraploid (4n) Individual organisms with 27 chromosomes are triploids (3n) and are more likely to be sterile because there are trivalents at meiosis I that cause a relatively high number of unbalanced gametes to be formed 16 The cross would be as follows: WWWW * wwww (assuming that chromosomes pair at meiosis) F1: WWww F2: 1WW 4Ww 1ww 1WW 4Ww 35W_ _ _ and 1wwww 1ww 18 Since two Gl1 alleles and two ws3 alleles are present in the triploid, they must have come from the pollen parent By the wording of the problem, it is implied that the pollen parent contributed an unreduced (2n) gamete; however, another explanation, dispermic fertilization, is possible In this case two Gl1ws3 gametes could have fertilized the ovule 20 (a) reciprocal translocation (b) D D A A C B C F B G H F G H E E (c) Notice that all chromosomal segments are present and there is no apparent loss of chromosomal material However, if the breakpoints for the translocation occurred within genes, then an abnormal phenotype may be the result In addition, a gene’s function is sometimes influenced by its position—its neighboring genes, in other words If such “position effects” occur, then a different phenotype may result 22 Instances of Down syndrome are either due to nondisjunction during meiosis in one of the parents, resulting in trisomy 21, or due to a 14/21, D/G translocation in one of the parents (familial Down syndrome) The latter results in a zygote that has 46 chromosomes but three copies of chromosome 21 24 Below is a description of breakage/reunion events that illustrate a translocation in the relatively small, similarly sized chromosomes 19 (metacentric) and 20 (metacentric/submetacentric) The case described here is shown occurring before S phase duplication Since the likelihood of such a translocation is fairly small in a general population, inbreeding played a significant role in allowing the translocation to “meet itself.” S OLU TIONS TO S E LE C TE D PROBLE MS AND DIS C U S S IO N Q U ES T I O N S A-7 (c) The progeny phenotypes that are missing are + + c and a b +, of which, from 1000 offspring, 1.4 (0.07 * 0.02 * 1000) would be expected Perhaps by chance or some other unknown selective factor, they were not observed Solutions to Problems and Discussion Questions 19 20 20 After S 19 May or may not have a centromere Regardless, since it is often small and/or contains a significant amount of heterochromatin, it tends to be lost during meiosis (fails to pair properly) 20 19 26 This female will produce meiotic products of the following types: normal: 18 + 21 translocated: 18/21 translocated plus 21: 18/21 + 21 deficient: 18 only Note: The 18/21 + 18 gamete is not formed because it would require separation of primarily homologous chromosomes at anaphase I Fertilization with a normal 18 + 21 sperm cell will produce the following offspring: normal: 46 chromosomes translocation carrier: 45 chromosomes 18/21 + 18 + 21 trisomy 21: 46 chromosomes 18/21 + 21 + 21 monosomic: 45 chromosomes 18 + 18 + 21, lethal Chapter Answers to Now Solve This 7-1 (a) 1/4 AaBb 1/4 Aabb 1/4 aaBb 1/4 aabb (b) 1/4 AaBb 1/4 Aabb 1/4 aaBb 1/4 aabb (c) If the arrangement is AB/ab * ab/ab then the two types of offspring will be as follows: 1/2 Ab/ab 1/2 aB/ab If, however, A and B are not coupled, then the symbolism would be Ab/aB * aabb The offspring would occur as follows: 1/2 Ab/ab 1/2 aB/ab 7-2 The most frequent classes are PZ and pz These classes represent the parental (noncrossover) groups, which indicates that the original parental arrangement in the testcross was PZ/pz * pz/pz Adding the crossover percentages together (6.9 + 7.1) gives 14 percent, which would be the map distance between the two genes 7-3 Examine the progeny list to see which types are not present In this case, the double crossover classes are the following: + + c and a b + (a, b) Gene b is in the middle and the arrangement is as follows +b c/ a + + a - b = map units b - c = map units Your essay should include methods of detection through crosses with appropriate, distinguishable markers and that in most cases, the frequency of crossing over is directly related to the distance between genes With some qualification, especially around the centromeres and telomeres, one can say that crossing over is somewhat randomly distributed over the length of the chromosome Two loci that are far apart are more likely to have a crossover between them than two loci that are close together If the probability of one event is 1/X, the probability of two events occurring at the same time will be 1/X2 Each cross must be set up in such a way as to reveal crossovers because it is on the basis of crossover frequency that genetic maps are developed It is necessary that genetic heterogeneity exist so that different arrangements of genes, generated by crossing over, can be distinguished The organism that is heterozygous must be the sex in which crossing over occurs Lastly, the cross must be set up so that the phenotypes of the offspring readily reveal their genotypes 10 The heterozygous parent in the test cross is RY/ry * ry/ry with the two dominant alleles on one chromosome and the two recessives on the homolog The map distance would be 10 map units between the R and Y loci 12 The map for parts (a) and (b) is the following: d b .pr vg c .adp 31 48 54 67 75 83 (+++++)+++++* Map units The expected map units between d and c would be 44, between d and vg 36, and between d and adp 52 However, because there is a theoretical maximum of 50 map units possible between two loci in any one cross, that distance would be below the 52 determined by simple subtraction 14 (a) P1: sc s v/sc s v * + + +/Y F1: + + +/sc s v * sc s v/Y (a) The map distances are determined by first writing the proper arrangement and sequence of genes, and then computing the distances between each set of genes sc v s + + + sc - v = 33 percent (map units) v - s = 10 percent (map units) (c, d) The coefficient of coincidence = 0.727, which indicates that there were fewer double crossovers than expected; therefore, positive chromosomal interference is present 16 (a) The short gene is on chromosome with the black gene (b) The parental cross is now the following: Females: b sh p/+ + + * Males: b sh p/b sh p The new gametes resulting from crossing over in the female would be b + and + sh Because the gene p is assorting independently, it is not important in this discussion Fifteen percent of the offspring now contain these recombinant chromatids; therefore, the map distance between the two genes must be 15 18 The genetic distance is 22 cM (mu) 20 Assign the following symbols, for example: R = red r = yellow O = oval o = long Progeny A: Ro/rO * rroo = 10 map units Progeny B: RO/ro * rroo = 10 map units A-8 SOLUTION S TO SELECTED PROB LEMS AND DIS C U S S ION QU E S TIONS 22 (a) There would be 2n = genotypic and phenotypic classes, and they would occur in a 1:1:1:1:1:1:1:1 ratio (b) There would be two classes, and they would occur in a 1:1 ratio (c) There are 20 map units between the A and B loci, and locus C assorts independently from both the A and B loci 24 By having microscopically visible markers on the chromosomes, Creighton and McClintock were able to show microscopically that homologous chromosomal material physically exchanged segments during crossing over 26 Because sister chromatids are genetically identical (with the exception of rare new mutations), crossing over between sisters provides no increase in genetic variability Chapter Answers to Now Solve This 8-1 Hfr Strain Order tchro hromb > 5.23 * 1011 nm3 * 100 = about 36.3 percent Solutions to Problems and Discussion Questions Your essay should include a description of overall chromosomal configuration, such as linearity or circularity, as well as association with chromosomal proteins In addition, it should describe higher level structures such as condensation in the case of eukaryotic chromosomes Polytene chromosomes are formed from numerous DNA replications, pairing of homologs, and absence of strand separation or cytoplasmic division Each chromosome contains about 1000– 5000 DNA strands in parallel register They appear in specific tissues, such as salivary glands, of many dipterans such as Drosophila They appear as comparatively long, wide fibers with sharp light and dark sections (bands) along their length Such bands (chromomeres) are useful in chromosome identification, etc Lampbrush chromosomes are homologous pairs of chromosomes held together by chiasmata, with numerous loops of DNA protruding from a central axis of chromomeres They are located in oocytes in the diplotene stage of the first prophase of meiosis Digestion of chromatin with endonucleases, such as micrococcal nuclease, gives DNA fragments of approximately 200 base pairs or multiples of such segments X-ray diffraction data indicated a regular spacing of DNA in chromatin Regularly spaced bead-like structures (nucleosomes) were identified by electron microscopy 10 As chromosome condensation occurs, a 300-Å fiber is formed It appears to be composed of five or six nucleosomes coiled together Such a structure is called a solenoid These fibers form a series of loops that further condense into the chromatin fiber and are then coiled into chromosome arms making up each chromatid 14 SINE = short interspersed elements, a moderately repetitive sequence class; LINE = long interspersed elements They are called “repetitive” because multiple copies exist 16 In E coli, the enzymes that control the number of supercoils in the double-stranded circular chromosome are called topoisomerase I and II Topoisomerase I reduces the number of supercoils in the DNA, and topoisomerase II increases it by binding to the DNA, twisting it, cleaving both the strands, passing the end through the loop it has created, and reforming the phosphodiester bonds The linking number is the number of turns in the DNA In order to increase the number of supercoils by five, you would need to reduce the linking number by five 18 The finding that natural chemical modification of nucleosomal components, as indicated in the question, increases gene activity suggests that changes in the binding of nucleosomes to DNA enable genes to be more accessible to factors that promote gene function In addition, the finding that heterochromatin, containing fewer genes and more repressed genes, is undermethylated further supports the suggestion that histone modification is functionally related to changes in gene activity 20 Dividing * 109 base pairs by 106 gives an average of 3000 base pairs or kb between Alu sequences Chapter 12 Answers to Now Solve This 12-1. (a) GGG GGC GCG CGG = = = = 3/4 3/4 3/4 1/4 * * * * 3/4 * 3/4 = 27/64 3/4 * 1/4 = 9/64 1/4 * 3/4 = 9/64 3/4 * 3/4 = 9/64 CCG CGC GCC CCC = = = = 1/4 1/4 3/4 1/4 * * * * 1/4 3/4 1/4 1/4 * * * * 3/4 1/4 1/4 1/4 = = = = 3/64 3/64 3/64 1/64 (b) Glycine: GGG and one G2C (adds up to 36/64) Alanine: one G2C and one C2G (adds up to 12/64) Arginine: one G2C and one C2G (adds up to 12/64) Proline: one C2G and CCC (adds up to 4/64) (c) Glycine: GGG, GGC Alanine: CGG, GCC, CGC, GCG Arginine: GCG, GCC, CGC, CGG Proline: CCC,CCG 12-2. Because of a triplet code, a trinucleotide sequence will, once initiated, remain in the same reading frame and produce the same code all along the sequence regardless of the initiation site If a tetranucleotide is used, such as ACGUACGUACGU : Codons: ACG UAC GUA CGU ACG Amino acids: thr tyr val arg thr CGU ACG UAC GUA CGU arg thr tyr val arg GUA CGU ACG UAC GUA val arg thr tyr val UAC GUA CGU ACG UAC tyr val arg thr tyr 12-3 Apply complementary bases, substituting U for T: (a) Sequence 1: 39-GAAAAAACGGUA-59 Sequence 2: 39-UGUAGUUAUUGA-59 Sequence 3: 39-AUGUUCCCAAGA-59 (b) Sequence 1: met-ala-lys-lys Sequence 2: ser-tyr-[ter] Sequence 3: arg-thr-leu-val (c) Apply complementary bases: 39-GAAAAAACGGTA-59 Solutions to Problems and Discussion Questions Your essay should include a description of the nature and structure of the genetic code, the enzymes and logistics of transcription, and the chemical nature of polymerization This sequence can be read as three possible repeating triplets—UUC, UCU, and CUU—depending on the initiation point Hence, three different polypeptide homopolymers are produced, containing either phenylalanine (phe), serine (ser), or leucine (leu) Given that AGG = arg, information from the AG copolymer indicates that AGA also codes for arg and that GAG must therefore code for glu Coupling this information with that of the AAG copolymer, one can see that GAA must also code for glu and AAG must code for lys S OLU TIONS TO S E LE C TE D PROBLE MS AND DIS C U S S IO N Q U ES T I O N S List the substitutions, then from the code table apply the codons to the original amino acids Select codons that provide single base changes Substitutions Solutions to Problems and Discussion Questions threonine — alanine _AC (U, C, A, or G) _GC (U, C, A, or G) glycine — serine _GG (U or C) _AG (U or C) isoleucine — valine _AU (U, C, or A) _GU (U, C, or A) Your essay should include a general description of translation in which a functional ribosome, in conjunction with mRNA, orders amino acids and forms peptide bonds between them Amino acids are specifically and individually attached to the 39 end of tRNAs, which possess a three-base sequence (the anticodon) that can base-pair with three bases of mRNA (codons) Messenger RNA contains a copy of the triplet codes, which are stored in DNA The sequences of bases in mRNA interact, three at a time, with the anticodons of tRNAs Enzymes involved in transcription include the following: RNA polymerase (E coli) and RNA polymerase I, II, III (eukaryotes) Those involved in translation include the following: aminoacyl tRNA synthetases, peptidyl transferase, and GTPdependent release factors (a) The final output of a given gene may be influenced by the stability of an mRNA, and the stability of an mRNA is determined in part by its base content and sequence (b) Differential splicing of mRNA (actually mRNA precursors) can influence how much of a given product will be made from a gene The different regions present in a tRNA molecule are the anticodon arm and loop, acceptor arm, amino acid binding site, D arm, T°C arm, and variable loop 10 The quaternary level results from the associations of individual polypeptide chains 14 Having the precise intragenic location of mutations as well as the ability to isolate the products, especially mutant products, allows scientists to compare the locations of lesions within genes Mutations occurring nearer the initiation site in a gene will produce proteins with defects near the N-terminus In this problem, the lesions cause chain termination; therefore, the nearer the mutations to the 59 end of the mRNA, the shorter the polypeptide product Relating the position of the mutation with the length of the product establishes the colinear relationship 18 All of the substitutions involve one base change 20 Because cross (a) is essentially a monohybrid cross, there would be no difference in the results if crossing over occurred (or did not occur) between the a and b loci Sequence 1: met-pro-asp-tyr-ser-(term) Sequence 2: met-pro-asp-(term) The 12th base (a uracil) is deleted from sequence #1, thereby causing a frameshift mutation that introduced a terminating triplet UAA 14 GCU, GCC, GCA, and GCG—all code for alanine Therefore, six single-base substitutions will result in an amino acid substitution at position 180 They are ACU, UCU, CCU, GAU, GUU, and GGU 16 Leu: UUA, UUG, CUU, CUC, CUA, CUG Hence, there is a preponderance of these codons and lysine is positively charged Given these significant charge changes, one would predict some, if not considerable, influence on protein structure and function Original 10 The enzyme generally functions in the degradation of RNA; however, in an in vitro environment, with high concentrations of the ribonucleoside diphosphates, the direction of the reaction can be forced toward polymerization In vivo, the concentration of ribonucleoside diphosphates is low and the degradative process is favored 12 Applying the coding dictionary, the following sequences are “decoded”: A-11 Ala: GCU, GCC, GCA, GCG Tyr: UAU, UAC 20 In an overlapping code, two amino acids will be affected; in a nonoverlapping code, one amino acid will be affected 22 Proline: Histidine: Threonine: Glutamine: Asparagine: Lysine: C3 and one of the C2A triplets one of the C2A triplets one C2A triplet and one A2C triplet one of the A2C triplets one of the A2C triplets A3 24 (a, b) Alternative splicing occurs when pre-mRNAs are spliced in more than one way to yield various combinations of exons in the final mRNA product Upon translation of a group of alternatively spliced mRNAs, a series of related proteins, called isoforms, are produced It is likely that alternative splicing evolved to provide a variety of functionally related proteins in a particular tissue from one original source Some tissues might be more prone to develop alternative splicing if they depend on a number of related protein functions Chapter 13 Answers to Now Solve This 13-1. One can conclude that the tRNA and not the amino acid is involved in recognition of the codon 13-2. With the codes for valine being GUU, GUC, GUA, and GUG, single base changes from glutamic acid’s GAA and GAG can cause the glu>>>val switch The same can be said for lysine with its AAG codon The normal glutamic acid is a negatively charged amino acid, whereas valine carries no net charge 22 5' IF1 AUGUUCGGU U AC A AGUGA AA IF2 + GDP 3' G + EF-Tu + GTP Chapter 14 Answers to Now Solve This 14-1. The phenotypic influence of any base change is dependent on a number of factors including, its location in coding or non-coding regions, its potential in dominance or recessiveness, and its interaction with other base sequences in the genome If a base change is located in a non-coding region, there may be no influence on the phenotype, however, some non-coding regions in a traditional sense, may influence other genes and/or gene products A-12 SOLUTION S TO SELECTED PROB LE MS A ND DIS C U S S ION QU E S TIONS 14-2. If a gene is incompletely penetrant, it may be present in a population and only express itself under certain conditions It is unlikely that the gene for hemophilia behaved in this manner If a gene’s expression is suppressed by another mutation in an individual, it is possible that offspring may inherit a given gene and not inherit its suppressor Such offspring would have hemophila Since all genetic variations must arise at some point, it is possible that the mutation in the Queen Victoria family was new, arising in a cell of the father Lastly, given that the mother was heterozygous by chance, no other individuals in her family were unlucky enough to receive the mutant gene 14-3. Any agent that inhibits DNA replication, either directly or indirectly, through mutation and/or DNA crosslinking, will suppress the cell cycle and may be useful in cancer therapy Since guanine alkylation often leads to mismatched bases, they can often be repaired by a variety of mismatched repair mechanisms However, DNA crosslinking can be repaired by recombinational mechanisms; thus, for such agents to be successful in cancer therapy, suppressors of DNA repair systems are often used in conjunction with certain cancer drugs 14-4. Ethylmethane sulfonate (EMS) alkylates the keto groups at the 6th position of guanine and at the 4th position of thymine In each case, base-pairing affinities are altered and transition mutations result Altered bases are not readily repaired and once the transition to normal bases occurs through replication, such mutations avoid repair altogether Solutions to Problems and Discussion Questions Your essay should include a brief description of the genomic differences between diploid and haploid organisms and, with the exception of phenomena such as cell death, disease, and cancer, mutational circumstances are attributable to both groups of organisms Since a somatic mutation first appears in a single cell, it is highly unlikely that the organism will be sufficiently altered to respond to a screen because none of the other cells in that organism will have the same mutation That’s not to say that somatic mutations can’t influence the organism Cancer cells generally originate from a single altered cell and can have a profound influence on the fate of an organism Diploid organisms have homologous chromosomes, so the wild type gene may compensate for the mutated gene Haploid organisms have a single set of chromosomes that can be mutated easily with an observable phenotype Tautomeric shifts can result in mutations by allowing hydrogen bonding of normally noncomplementary bases so that incorrect nucleotide bases may be added during DNA replication 10 Frameshift mutations are likely to change more than one amino acid in a protein product because as the reading frame is shifted, a different set of codons is generated In addition, there is the possibility that a nonsense triplet could be introduced, thus causing premature chain termination 12 When DNA is damaged, mutations are likely In many cases, such mutations are deleterious to the health of the organism Several mechanisms have evolved to reduce the impact of such mutations; cell-cycle arrest to quarantine a cell line or allow DNA repair and programmed cell death (apoptosis) If damaged DNA cannot be repaired through cell-cycle arrest, programmed cell death is often activated to rid the cell population of mutant cell lines 14 The polymerase would encounter cytosines more frequently, which would be the nucleotide to be misincorporated more frequently 16 Xeroderma pigmentosum is a form of human skin cancer caused by perhaps several rare autosomal genes, which interfere with the repair of damaged DNA Since cancer is caused by mutations in several types of genes, interfering with DNA repair can enhance the occurrence of these types of mutations 18 Mutations in MutH, MutL, and MutS in E coli can adversely affect DNA mismatch repair The equivalent genes in humans are hMSH2 and hMLH1 20 Replication slippage is a process that generates small deletions and insertions during DNA replication While it can occur anywhere in the genome, it is most prevalent in regions already containing repeated sequences Thus, repeated sequences are hypermutable 22 Unscheduled DNA synthesis represents DNA repair Complementation groups: XP1 XP2 XP3 XP4 XP5 XP6 XP7 The groupings (complementation groups) indicate that there are at least three “genes” that form products necessary for unscheduled DNA synthesis All of the cell lines that are in the same complementation group are defective in the same product 26 It is probable that the IS occupied or interrupted normal function of a controlling region related to the galactose genes, which are in an operon with one controlling upstream element 28 First, while less likely, one might suggest that transposons, for one reason or another, are more likely to insert in noncoding regions of the genome One might also suggest that they are more stable in such regions Second, and more likely, it is possible that transposons insert rather randomly and that selection eliminates those that have interrupted coding regions of the genome Chapter 15 Answers to Now Solve This 15-1. (a) It is likely that either premature termination of translation will occur (from the introduction of a nonsense triplet in a reading frame) or the normal chain termination will be ignored Regardless, a mutant condition for the Z gene will be likely If such a cell is placed on a lactose medium, it will be incapable of growth because β-galactosidase is not available (b) If the deletion occurs early in the A gene, one might expect impaired function of the A gene product, but it will not influence the use of lactose as a carbon source 15-2. (a) With no lactose and no glucose, the operon is off because the lac repressor is bound to the operator and although CAP is bound to its binding site, it will not override the action of the repressor (b) With lactose added to the medium, the lac repressor is inactivated and the operon is transcribing the structural genes With no glucose, the CAP is bound to its binding site, thus enhancing transcription (c) With no lactose present in the medium, the lac repressor is bound to the operator region, and since glucose inhibits adenyl cyclase, the CAP protein will not interact with its binding site The operon is therefore “off.” (d) With lactose present, the lac repressor is inactivated; however, since glucose is also present, CAP will not interact with its binding site Under this condition transcription is severely diminished and the operon can be considered to be “off.” 15-3. Should hypermethylation occur in one of many DNA repair genes, the frequency of mutation would increase because the DNA repair system is compromised The resulting increase in mutations might occur in tumor suppressor genes or protooncogenes S OLU TIONS TO S E LE C TE D PROBLE MS AND DIS C U S S IO N Q U ES T I O N S 15-4. General transcription factors associate with a promoter to stimulate transcription of a specific gene Some trans-acting elements, when bound to enhancers, interact with coactivators to enhance transcription by forming an enhanceosome that stimulates transcription initiation Transcription can be repressed when certain proteins bind to silencer DNA elements and generate repressive chromatin structures The same molecule may bind to a different chromosomal regulatory site (enhancer or silencer), depending on the molecular environment of a given tissue type Solutions to Problems and Discussion Questions Your essay should include a description of the evolutionary advantages of the efficient response to environmental resources and challenges (antibiotics, for example) when such resources are present Having related functions in operons provides for coordinated responses See that under negative control, the regulatory molecule interferes with transcription, whereas in positive control, the regulatory molecule stimulates transcription In an inducible system, the repressor that normally interacts with the operator to inhibit transcription is inactivated by an inducer, thus permitting transcription In a repressible system, a normally inactive repressor is activated by a corepressor, thus enabling it (the activated repressor) to bind to the operator to inhibit transcription I +O+Z+ = inducible because a repressor protein can interact with the operator to turn off transcription A-13 (b) Without a CAP binding site there would be a reduction in the inducibility of the lac operon 14 X is a repressor, Y is a protein that binds X and removes repression, and A induces the expression of S by binding Y 16 Oil stimulates the production of a protein, which turns on (positive control) genes to metabolize oil The different results in strains #2 and #4 suggest a cis-acting system Because the operon by itself (when mutant as in strain #3) gives constitutive synthesis of the structural genes, a cis-acting system is also supported The cis-acting element is most likely part of the operon 20 Transcription factors are transcription regulatory proteins that can increase or reduce the levels of transcription initiation They bind promoters, enhancers, and silencers 22 Generally, one determines the influence of various regulatory elements by removing necessary elements or adding extra elements In addition, examining the outcome of mutations within such elements often provides insight as to function 24 Net target genes I -O+Z+ = constitutive because the repressor gene is mutant; therefore, no repressor protein is available I +OcZ+ = constitutive because even though a repressor protein is made, it cannot bind with the mutant operator Neutral Conditions I -O+Z+/F9I+ = inducible because even though there is one mutant repressor gene, the other I+ gene, on the F factor, produces a normal repressor protein that is diffusible and capable of interacting with the operon to repress transcription I +OcZ+/F9Oc = constitutive because there is a constitutive operator (Oc) next to a normal Z gene Constitutive synthesis of b-galactosidase will occur Net–P target genes I sO+Z+ = repressed because the product of the Is gene is insensitive to the inducer lactose and thus cannot be inactivated The repressor will continually interact with the operator and shut off transcription regardless of the presence or absence of lactose I sO+Z+/F9I + = repressed because, as in the previous case, the product of the I s gene is insensitive to the inducer lactose and thus cannot be inactivated The repressor will continually interact with the operator and shut off transcription regardless of the presence or absence of lactose The fact that there is a normal I + gene is of no consequence because once a repressor from I s binds to an operator, the presence of normal repressor molecules will make no difference The mutations described are consistent with the structure of the lac repressor The N-terminal portion of the repressor is involved in DNA binding, while the C-terminal portion is more involved in association with lactose and its analogs 10 Generally, cooperative binding occurs when the final outcome is greater than the simple sum of its parts In the case of transcription factors, each factor has little impact on transcription; however, when all components are present, a cooperative interaction (binding) occurs and a functional complex is made 12 (a) Because activated CAP is a component of the cooperative binding of RNA polymerase to the lac promoter, absence of a functional crp would compromise the positive control exhibited by CAP Phosphorylated Net Net target genes UV and Heat Shock Sketches modified from Ducret et al 1999 Molecular and Cellular Biology 19:7076–7087 A-14 SOLUTION S TO SELECTED PROB LE MS A ND DIS C U S S ION QU E S TIONS 26 Following is a sketch of several RNA polymerase molecules (filled circles) in what might be a transcription factory In this diagram there are eight RNA molecules shown being transcribed Nascent transcripts are shown extending from the RNA polymerase molecules For simplicity, only one promoter is shown and one structural gene is shown Gene Promoter Chapter 16 Answers to Now Solve This 16-1. Being able to distinguish leukemic cells from healthy cells allows one to not only target therapy to specific cell populations, but it also allows for the quantification of responses to therapy Because such cells produce a hybrid protein, it may be possible to develop a therapy, perhaps an immunotherapy, based on the uniqueness of the BCR/ABL protein 16-2. p53 is a tumor suppressor gene that protects cells from multiplying with damaged DNA It is present in its mutant state in more than 50 percent of all tumors Since the immediate control of a critical and universal cell cycle checkpoint is mediated by p53, mutation will influence a wide range of cell types p53’s action is not limited to specific cell types 16-3. Even if a major “cancer-causing” gene is transmitted, other genes, often new mutations, are usually necessary in order to drive a cell towards tumor formation Full expression of the cancer phenotype is likely to be the result of interplay among a variety of genes and therefore show variable penetrance and expressivity Solutions to Problems and Discussion Questions Your essay should describe the general influence of genetics in cancer Since a variety of factors alter gene output and such output controls the cell cycle, it is likely that such factors could cause cancer Cancer cells not require growth factors and proliferative signals to enter the mitotic cycle as opposed to normal cells Negative regulatory signals that stop normal cells from proliferating are also not effective in cancer cells One of the determinants of entry into the cell cycle is the G1 restriction point, controlled by the retinoblastoma protein, pRB pRB is a tumor suppressor that is commonly mutated in certain cancers, thereby a block is lifted at the G1 entry This causes growth-factor independency in cancer cells pRB-bound E2F transcription factors prevent the transcription of S-phase-specific early response genes (including cyclin D) pRB is released from E2Fs only in its hyperphosphorylated state, directing the cell-cycle entry pRB is a major tumor suppressor Apoptosis is a natural process involved in morphogenesis and a protective mechanism against cancer formation During apoptosis, nuclear DNA becomes fragmented, cellular structures are disrupted, and the cells are dissolved Caspases are involved in the initiation and progress of apoptosis 10 The nonphosphorylated form of pRB binds to transcription factors such as E2F, causing inactivation and suppression of the cell cycle Phosphorylation of pRB activates the cell cycle by releasing transcription factors (E2F) to advance the cell cycle With the phosphorylation site inactivated in the PSMRB form, phosphorylation cannot occur, thereby leaving the cell cycle in a suppressed state 12 Various kinases can be activated by breaks in DNA One kinase, called ATM, and/or a kinase called Chk2 phosphorylates BRCA1 and p53 The activated p53 arrests replication during the S phase to facilitate DNA repair The activated BRCA1 protein, in conjunction with BRCA2, mRAD51, and other nuclear proteins, is involved in repairing the DNA 14 In the mutant state oncogenes induce or maintain uncontrolled cell division; that is, there is a gain of function Generally, this gain of function takes the form of increased or abnormally continuous gene output On the other hand, loss of function is generally attributed to mutations in tumorsuppressor genes, which function to halt passage through the cell cycle When such genes are mutant, they have lost their capacity to halt the cell cycle Such mutations are generally recessive 16 It is less expensive, in terms of both human suffering and money, to seek preventive measures for as many diseases as possible However, having gained some understanding of the mechanisms of disease, in this case cancer, it must also be stated that no matter what preventive measures are taken, it will be impossible to completely eliminate disease from the human population It is extremely important, however, that we increase efforts to educate and protect the human population from as many hazardous environmental agents as possible 18 The p53 protein initiates several different responses to DNA damage including cell-cycle arrest followed by DNA repair and apoptosis if DNA cannot be repaired Mutations in p53 make cells unable to achieve this As a result, they move unchecked through the cell cycle, regardless of the condition of the DNA Therefore, cells lacking p53 have high mutation rates, and accumulate those types of mutations that lead to cancer 20 DNA lesions brought about by natural radiation (X rays, ultraviolet light), dietary substances, and substances in the external environment can lead to cancer In addition, normal metabolism creates oxidative end products that can damage DNA, proteins, and lipids 22 No, she will still have the general population risk of about 10 percent In addition, it is possible that genetic tests will not detect all breast cancer mutations 24 All cancer cells can proliferate to form tumors However, if cells in the tumor also have the ability to break loose, enter the bloodstream, invade other tissues, and form secondary tumors (metastases), they become malignant 26 As with many forms of cancer, a single gene alteration is not the only requirement The authors (Bose et al.) state “but only infrequently the cells acquire the additional changes necessary to produce leukemia in humans.” Some studies indicate that variations (often deletions) in the region of the breakpoints may influence expression of CML 28 (a, b) Even though there are changes in the BRCA1 gene, they not always have physiological consequences S OLU TIONS TO S E LE C TE D PROBLE MS AND DIS C U S S IO N Q U ES T I O N S Such neutral polymorphisms make screening difficult in that one cannot always be certain that a mutation will cause problems for the patient (c) The polymorphism in PM2 is probably a silent mutation because the third base of the codon is involved (d) The polymorphism in PM3 is probably a neutral missense mutation because the first base is involved However, because there is some first codon position degeneracy, it is possible for the mutation to be silent Chapter 17 Answers to Now Solve This 17-1 (a) Bacteria that have been transformed with the recombinant plasmid will be resistant to tetracycline, and therefore tetracycline should be added to the medium (b) Colonies that grow on a tetracycline medium only should contain the insert Those bacteria that not grow on the ampicillin medium probably contain the Drosophila DNA insert (c) Resistance to both antibiotics by a transformed bacterium could be explained in several ways First, if cleavage with the PstI was incomplete, then no change in biological properties of the uncut plasmids would be expected Also, it is possible that the cut ends of the plasmid were ligated together in the original form with no insert 17-2. Using the human nucleotide sequence, one can produce a probe to screen the library of the African okapi Second, one can use the amino acid sequence and the genetic code to generate a complementary DNA probe for screening of the library The probe is used, through hybridization, to identify the DNA that is complementary to the probe and can allow one to identify the library clone containing the DNA of interest Cells with the desired clone are then picked from the original plate and the plasmid is isolated from the cells Solutions to Problems and Discussion Questions Your essay should include an appreciation for the relative ease in which sections of DNA can be inserted into various vectors and the amplification and isolation of such DNA You should also include the possibilities of modifying recombinant molecules Even though the human gene coding for insulin contains a number of introns, a cDNA generated from insulin mRNA is free of introns Plasmids containing insulin genes (from cDNA) are free of introns, so no processing issue surfaces EcoRI should be used for the restriction site GAATTC The complementary sequence is CTTAAG Plasmids were the first to be used as cloning vectors, and they are still routinely used to clone relatively small fragments of DNA Because of their small size, they are relatively easy to separate from the host bacterial chromosome, and they have relatively few restriction sites They can be engineered fairly easily (i.e., polylinkers and reporter genes added) BACs are artificial bacterial chromosomes that can be engineered for certain qualities such as carrying relatively large inserts YACs (yeast artificial chromosomes) contain telomeres, an origin of replication, and a centromere and are extensively used to clone DNA in yeast With selectable markers (TRP1 and URA3) and a cluster of restriction sites, DNA inserts ranging from 100 kb to 1000 kb can be cloned and inserted into yeast Since yeast, being a eukaryote, undergoes many of the typical RNA and protein processing steps of other, more complex eukaryotes, the advantages are numerous when working with eukaryotic genes 10 No The tumor-inducing plasmid (Ti) that is used to produce genetically modified plants is specific for the bacterium Agrobacterium tumifaciens, which causes tumors in many plant species There is no danger that this tumor-inducing plasmid will cause tumors in humans A-15 12 The total number of molecules after 15 cycles would be 16,384, or (2)14 14 A cDNA library provides DNAs from RNA transcripts and is, therefore, useful in identifying what are likely to be functional DNAs If one desires an examination of noncoding as well as coding regions, a genomic library would be more useful 16 Assuming that one has knowledge of the amino acid sequence of the protein product or the nucleotide sequence of the target nucleic acid, a degenerate set of DNA strands can be prepared for cloning into an appropriate vector or amplified by PCR A variety of labeling techniques can then be used, through hybridization, to identify complementary base sequences contained in the genomic library One must know at least a portion of the amino acid sequence of the protein product or its nucleic acid sequence in order for the procedure to be applied Some problems can occur through degeneracy in the genetic code (not allowing construction of an appropriate probe), the possible existence of pseudogenes in the library (hybridizations with inappropriate related fragments in the library), and variability of DNA sequences in the library due to introns (causing poor or background hybridization) 18 Taq polymerase is from a bacterium called Thermus aquaticus, which typically lives in hot springs It is heat stable like some other enzymes used in PCR that are isolated from thermal vents in the ocean floor 20 A knockout animal has a piece of DNA missing, whereas a transgenic animal usually has a piece of DNA added 22 Until the host organism contains the knockout gene in the homozygous state in its sex cells, the knockout gene can not be faithfully transmitted at high frequency Chapter 18 Answers to Now Solve This 18-1 (a) To annotate a gene, one identifies gene-regulatory sequences found upstream of genes (promoters, enhancers, and silencers), downstream elements (termination sequences), and in-frame triplet nucleotides that are part of the coding region of the gene In addition, 59 and 39 splice sites that are used to distinguish exons from introns as well as polyadenylation sites are also used in annotation (b) Similarity to other annotated sequences often provides insight as to a sequence’s function and may serve to substantiate a particular genetic assignment Direct sequencing of cDNAs from various tissues and developmental stages aids in verification (c) Taking an average of 20,000 for the estimated number of genes in the human genome and computing the percentage represented by 3141 gives 15.7 percent It appears as if chromosome is gene rich 18-2 Since structural and chemical factors determine the function of a protein, it is likely to have several proteins share a considerable amino acid sequence identity, but not be functionally identical Since the in vivo function of such a protein is determined by secondary and tertiary structures, as well as local surface chemistries in active or functional sites, the nonidentical sequences may have considerable influence on function Note that the query matches to different site positions within the target proteins A number of other factors suggesting different functions include associations with other molecules (cytoplasmic, membrane, or extracellular), chemical nature and position of binding domains, posttranslational modification, signal sequences, and so on 18-3 Because blood is relatively easy to obtain in a pure state, its components can be analyzed without fear of tissue-site contamination Second, blood is intimately exposed to virtually all cells of the body and may therefore carry chemical markers to certain abnormal cells it represents Theoretically, it is an ideal probe into the human body However, when blood is removed from the body, its proteome changes, and those A-16 SOLUTION S TO SELECTED PROB LE MS A ND DIS C U S S ION QU E S TIONS changes are dependent on a number of environmental factors Thus, what might be a valid diagnostic for one condition might not be so for other conditions In addition, the serum proteome is subject to change depending on the genetic, physiologic, and environmental state of the patient Age and sex are additional variables that must be considered Validation of a plasma proteome for a particular cancer would be strengthened by demonstrating that the stage of development of the cancer correlates with a commensurate change in the proteome in a relatively large, statistically significant pool of patients Second, the types of changes in the proteome should be reproducible and, at least until complexities are clarified, involve tumorigenic proteins It would be helpful to have comparisons with archived samples of each individual at a disease-free time Solutions to Problems and Discussion Questions Your essay should include a description of traditional recombinant DNA technology involving cutting and splicing genes, as well as modern methods of synthesizing genes of interest, PCR amplification, microarray analysis, etc Whole-genome shotgun sequencing involves randomly cutting the genome into numerous smaller segments Overlapping sequences are used to identify segments that were once contiguous, eventually producing the entire sequence Difficulties in alignment often occur in repetitive regions of the genome Map-based sequencing relies on known landmarks (genes, nucleotide polymorphisms, etc.) to orient the alignment of cloned fragments that have been sequenced Compared to whole-genome sequencing, the map-based approach is somewhat cumbersome and time consuming Whole-genome sequencing has become the most common method for assembling genomes, with map-based cloning being used to resolve the problems often encountered during whole-genome sequencing The main goals of the Human Genome Project are to establish, categorize, and analyze functions for human genes As stated in the text: To analyze genetic variations between humans, including the identification of single-nucleotide polymorphisms (SNPs) To map and sequence the genomes of several model organisms used in experimental genetics, including E coli, S cerevisiae, C elegans, D melanogaster, and M musculus (the mouse) To develop new sequencing technologies, such as highthroughput computer-automated sequencers, to facilitate genome analysis To disseminate genome information, both among scientists and the general public One initial approach to annotating a sequence is to compare the newly sequenced genomic DNA to the known sequences already stored in various databases The National Center for Biotechnology Information (NCBI) provides access to BLAST (Basic Local Alignment Search Tool) software, which directs searches through databanks of DNA and protein sequences A segment of DNA can be compared to sequences in major databases such as GenBank to identify matches that align in whole or in part One might seek similarities to a sequence on chromosome 11 in a mouse and find that or similar sequences in a number of taxa BLAST will compute a similarity score or identity value to indicate the degree to which two sequences are similar BLAST is one of many sequence alignment algorithms (RNA-RNA, protein-protein, etc.) that may sacrifice sensitivity for speed 10 Because many repetitive regions of the genome are not directly involved in production of a phenotype, they tend to be isolated from selection and show considerable variation in redundancy Length variation in such repeats is unique among individuals (except for identical twins) and, with various detection methods, provides the basis for DNA fingerprinting Single-nucleotide polymorphisms also occur frequently in the genome and can be used to distinguish individuals 12 The Personal Genome Project (PGP) provides individual sequences of diploid genomes, and results of such projects indicate that the HGP may underestimate genome variation by as much as fivefold Genome variation between individuals may be 0.5 percent rather than the 0.1 percent estimated from the HGP Since the PGP provides sequence information on individuals, fundamental questions about human diversity and evolution may be more answerable 14 A number of new subdisciplines of molecular biology will provide the infrastructure for major advances in our understanding of living systems The following terms identify specific areas within that infrastructure: proteomics—proteins in a cell or tissue metabolomics—enzymatic pathways glycomics—carbohydrates of a cell or tissue toxicogenomics—toxic chemicals metagenomics—environmental issues pharmacogenomics—customized medicine transcriptomics—expressed genes Many other “-omics” are likely in the future 16 Most microarrays, known also as gene chips, consist of a glass slide that is coated, using a robotic system, with singlestranded DNA molecules Some microarrays are coated with single-stranded sequences of expressed sequenced tags or DNA sequences that are complementary to gene transcripts A single microarray can have as many as 20,000 different spots of DNA, each containing a unique sequence Researchers use microarrays to compare patterns of gene expression in tissues under different conditions or to compare geneexpression patterns in normal and diseased tissues In addition, microarrays can be used to identify pathogens Microarray databases allow investigators to compare any given pattern to others worldwide 18 In general, one would expect certain factors (such as heat or salt) to favor evolution to increase protein stability: distribution of ionic interactions on the surface, density of hydrophobic residues and interactions, and number of hydrogen and disulfide bonds As seen from examining the codon table, a high GC ratio would favor the amino acids Ala, Gly, Pro, Arg, and Trp and minimize the use of Ile, Phe, Lys, Asn, and Tyr How codon bias influences actual protein stability is not yet understood Most genomic sequences change by relatively gradual responses to mild selection over long periods of time They strongly resemble patterns of common descent; that is, they are conserved Although the same can be said for organisms adapted to extreme environments, extraordinary physiological demands may dictate unexpected sequence bias 20 (a, b) With exon sequencing, one may get lucky and find an issue within a gene that has relevance Given the multitude of genetic variations among individuals, this might be similar to finding a needle in a haystack Many significant genomic functions are regulated outside the exon pool Introns and mitochondria are highly variable among individuals, but may have some relevance in some health-related conditions Chapter 19 Answers to Now Solve This 19-1. Antigens are usually quite large molecules, and in the process of digestion, they are sometimes broken down into smaller molecules, thus becoming ineffective in stimulating the immune system Some individuals are allergic to the food they eat, testifying to the fact that all antigens are not completely degraded or modified by digestion In some cases, ingested antigens indeed stimulate the immune S OLU TIONS TO S E LE C TE D PROBLE MS AND DIS C U S S IO N Q U ES T I O N S system (e.g., oral polio vaccine) and provide a route for immunization Localized (intestinal) immunity can sometimes be stimulated by oral introduction of antigens, and in some cases, this can offer immunity to ingested pathogens 19-2. It will hybridize by base complementation to the normal DNA sequence Solutions to Problems and Discussion Questions Your essay should include a description of genomic applications that relate to agriculture, health and welfare, scientific exploration and appreciation of the earth’s flora and fauna, etc In addition, areas of patent protection, personal privacy, and potential agricultural and environmental hazards should be addressed From a purely scientific viewpoint, there will be no added danger to consuming cow’s milk from cloned animals However, some individuals may have an aversion to organismic cloning, and supporting such activities through consumption of products of cloned organisms may be viewed negatively on moral grounds It is likely that public sentiment will pressure for labeling of “cloned products” on the grounds that consumers should be able to make an informed choice as to the origin of such products The Venter team compared a number of genomes each with a small number of genes and identified 256 genes that may represent the minimum number of genes for life The team also used transposon-based mutations to determine the number of genes essential for life Finally, it synthesized short DNA segments and assembled them into a synthetic genome that possessed characteristics of living systems Recipient bacteria having different characteristics indicated that genome conversion had occurred Since both mutations occur in the CF gene, children who possess both alleles will suffer from CF With both parents heterozygous, each child born will have a 25 percent chance of developing CF 10 Using restriction enzyme analysis to detect point mutations in humans is a tedious trial-and-error process Given the size of the human genome in terms of base sequences and the relatively low number of unique restriction enzymes, the likelihood of matching a specific point mutation, separate from other normal sequence variations, to a desired gene is low 12 A GWAS analyzes SNPs, specific differences in genes, CNVs, or changes in the epigenome, such as methylation patterns in particular regions of a chromosome By determining which SNPs, CNVs, or epigenome changes co-occur in individuals with the disease, scientists can calculate the disease risk associated with each variation 14 In the case of haplo-insufficient mutations, gene therapy holds promise; however, in “gain-of-function” mutations, in all probability the mutant gene’s activity or product must be compromised Addition of a normal gene probably will not help unless it can compete out the mutant gene product 16 Certainly, information provided to physicians and patients about genetic testing is a strong point in favor of wide distribution It would probably be helpful for companies involved in genetic testing to also participate by providing information peculiar to their operations It would be necessary, however, that any individual results from tests would be held in strict confidence It would be helpful if pooled statistical data would be available to the public in terms of frequencies of false positives and negatives, as well as population and/or geographical distributions 18 It is a personal decision to have one’s genome sequenced, but in doing so one must be armed with information as to the expected variability of the genome and the possibility of false positives A publicly available genome might lead to employment bias, changes in personal relationships, and so on 20 Raw genomic information is difficult to interpret 23andMe now provides only ancestry information and has stopped A-17 giving health-related data In November 2013, the FDA warned 23andMe to stop marketing its genomic service However, while not reversing its 2013 decision, 23andMe has recently (February, 2015) received authorization to market a specific personal test for Bloom syndrome 22 (a) Since a gene is a product of the natural world, it does not conform to section 101 of U.S patent laws, which govern patentable matter (b) Since both the direct-to-consumer test for the BRCA1 and BRCA2 genes and Venter’s “first-ever human-made life form” are original in their process or development, they should be patentable However, the BRCA1 and BRCA2 genes are works of nature, and the genes themselves should not be patentable Chapter 20 Answers to Now Solve This 20-1. It is possible that your screen was more inclusive, that is, it identified more subtle alterations than the screen of others You may have identified several different mutations (multiple alleles) in some of the same genes 20-2. Because in ftz/ftz embryos, the engrailed product is absent and in en/en embryos ftz expression is normal, one can conclude that the ftz gene product regulates, either directly or indirectly, en Because the ftz gene is expressed normally in en/en embryos, the product of the engrailed gene does not regulate expression of ftz 20-3. Since her-1- mutations cause males to develop into hermaphrodites, and tra-1- mutations cause hermaphrodites to develop into males, one may hypothesize that the her-1+ gene produces a product that suppresses hermaphrodite development, while the tra-1+ gene product is needed for hermaphrodite development Solutions to Problems and Discussion Questions Your essay should describe the overall development in both plants and animals as being dependent on families of genes that are controlled by regulatory elements Describe elements that evolved in plants and animals given their independent evolution The syncytial blasterm is formed as nuclei migrate to the egg’s outer margin or cortex, where additional divisions take place Plasma membranes organize around each of the nuclei at the cortex, thus creating the cellular blastoderm Maternal genes regulate the expression of first three groups of zygotic genes—gap, pair-rule, and segment polarity genes Mutations in zygotic genes have embryo-lethal phenotypes Because the polar cytoplasm contains information to form germ cells, one would expect such a transplantation procedure to generate germ cells in the anterior region 10 This experiment will include designing suitably tagged antibodies that will bind the protein and react with a substrate to produce a colored or chromogenic product that can be visualized using microscopy or imaging techniques 12 A dominant gain-of-function mutation is one that changes the specificity or expression pattern of a gene or gene product The “gain-of-function” Antp mutation causes the wild type Antennapedia gene to be expressed in the eye-antenna disc and mutant flies have legs on the head in place of antenna 14 Given the information in the problem, it is likely that this gene normally controls the expression of BX-C genes in all body segments The wild type product of esc stored in the egg may be required to interpret the information correctly stored in the egg cortex 16 The Polycomb gene family induces changes in chromatin that influence Hox gene expression A gene in Arabidopsis has significant homology to the Polycomb gene family and also works by altering chromatin structure The cross reactivity is thus related to the Polycomb product’s effect on chromatin Such A-18 SOLUTION S TO SELECTED PROB LE MS A ND DIS C U S S ION QU E S TIONS parallel functions indicate that mechanisms of regulation are conserved over vast evolutionary distances Chapter 21 Answers to Now Solve This 21-1 (a) Since 1/256 of the F2 plants are 20 cm and 1/256 are 40 cm, there must be gene pairs involved in determining flower size (b) Since there are nine size classes, one can conduct the following backcross: AaBbCcDd * AABBCCDD The frequency distribution in the backcross would be 1/16 = 40 cm 4/16 = 32.5 cm 4/16 = 37.5 cm 1/16 = 30 cm 6/16 = 35 cm 21-2 (a) Taking the sum of the values and dividing by the number in the sample gives the following means: mean sheep fiber length = 7.7 cm mean fleece weight = 6.4 kg The variance for each is variance sheep fiber length = 6.097 variance fleece weight = 3.12 The standard deviation is the square root of the variance: sheep fiber length = 2.469 fleece weight = 1.766 (b, c) The covariance for the two traits is 30.36/7, or 4.34, while the correlation coefficient is +0.998 (d) There is a very high correlation between fleece weight and fiber length and it is likely that this correlation is not by chance Even though correlation does not mean causeand-effect, it would seem logical that as you increased fiber length, you would also increase fleece weight It is probably safe to say that the increase in fleece weight is directly related to an increase in fiber length 21-3. The role of genetics and the role of the environment can be studied by comparing the expression of traits in monozygotic and dizygotic twins The higher concordance value for monozygotic twins as compared with the value for dizygotic twins indicates a significant genetic component for a given trait Notice that for traits including blood type, eye color, and mental retardation, there is a fairly significant difference between MZ and DZ groups However, for measles, the difference is not as significant, indicating a greater role of the environment Hair color has a significant genetic component as idiopathic epilepsy, schizophrenia, diabetes, allergies, cleft lip, and club foot The genetic component to mammary cancer is present but minimal according to these data Solutions to Problems and Discussion Questions Your essay should include a description of various ratios typical of Mendelian genetics as compared with the more blending, continuously varying expressions of neo-Mendelian modes of inheritance It should contrast discontinuous inheritance and continuous patterns (a) There are two alleles at each locus for a total of four alleles (b, c) We can say that each gene (additive allele) provides an equal unit amount to the phenotype and the colors differ from each other in multiples of that unit amount The number of additive alleles needed to produce each phenotype is given below 1/16 = dark red = AABB 4/16 = medium–dark red = 2AABb, 2AaBB 6/16 = medium red = AAbb, 4AaBb, aaBB 4/16 = light red = 2aaBb, 2Aabb 1/16 = white = aabb (d) F1 = all light red F2 = 1/4 medium red 2/4 light red 1/4 white (a, b) There are four gene pairs involved (c) Since there is a difference of 24 cm between the extremes, 24 cm/8 = cm for each increment (each of the additive alleles) (d) A typical F1 cross that produces a “typical” F2 distribution would be where all gene pairs are heterozygous (AaBbCcDd), independently assorting, and additive There are many possible sets of parents that would give an F1 of this type The limitation is that each parent has genotypes that give a height of 24 cm as stated in the problem Because the parents are inbred, it is expected that they are fully homozygous An example is AABBccdd * aabbCCDD (e) Since the aabbccdd genotype gives a height of 12 cm and each uppercase allele adds cm to the height, there are many possibilities for an 18 cm plant: AAbbccdd, AaBbccdd, aaBbCcdd, etc Any plant with seven uppercase letters will be 33 cm tall: AABBCCDd, AABBCcDD, AABbCCDD, for example For height, notice that average differences between MZ twins reared together (1.7 cm) and those MZ twins reared apart (1.8 cm) are similar (meaning little environmental influence) and considerably less than differences of DZ twins (4.4 cm) or sibs (4.5 cm) reared together These data indicate that genetics plays a major role in determining height However, for weight, notice that MZ twins reared together have a much smaller (1.9 kg) difference than MZ twins reared apart, indicating that the environment has a considerable impact on weight By comparing the weight differences of MZ twins reared apart with DZ twins and sibs reared together one can conclude that the environment has almost as much an influence on weight as genetics For ridge count, the differences between MZ twins reared together and those reared apart are small For the data in the table, it would appear that ridge count and height have the highest heritability values 10 Monozygotic twins are derived from the splitting of a single fertilized egg and are therefore of identical genetic makeup When such twins are raised in the same versus different settings, an estimate of relative hereditary and environmental influences can often be made 12 (a) Using the following equations, H2 and h2 can be calculated as follows For back fat: Broad-sense heritability = H2 = 12.2/30.6 = 398 Narrow-sense heritability = h2 = 8.44/30.6 = 276 For body length: Broad-sense heritability = H2 = 26.4/52.4 = 504 Narrow-sense heritability = h2 = 11.7/52.4 = 223 (b) Of the two traits, selection for back fat would produce more response 14 (a) For vitamin A: hA2 = VA/VP = VA/(VE + VA + VD) = 0.097 For cholesterol: hA2 = 0.223 (b) Cholesterol content should be influenced to a greater extent by selection 16 Using the equation h2 = (M2 - M )/(M1 - M), 0.25 = (M2 - 20)/ (24 - 20) = 21 g 18 Using the equation h2 = (M2 - M)/(M1 - M ), 0.2 = (M2 - 52)/ (61 - 52), M2 = (0.2 * 9) + 52 = 53.8” 20 (a) There are two ways to answer this section, a hard way and an easy way The hard way would to take a big sheet of paper, make the cross (AaBbCcDdEeFf * AaBbCcDdEeFf), collect the genotypes, and calculate the ratios This method would be very laborious and error-prone The easy way would be to re-read the material on the binomial expansion and note the pattern preceding S OLU TIONS TO S E LE C TE D PROBLE MS AND DIS C U S S IO N Q U ES T I O N S each expression Notice that all numbers other than the 1’s are equal to the sum of the two numbers directly above them By enlarging the numbers to include six gene pairs, you can arrive at the thirteen classes and their frequencies 3” 6” 9” 12” 15” (b) 3” 6” 9” = = = = = = = = 4” 220 7” 924 10” 220 13” 1 4” 20 7” = = = = 12 5” 495 8” 792 11” 66 14” = = = = 66 792 495 12 = 5” = 15 = 15 8” = 22 The level of blood sugar varies considerably from individual to individual, day to day, and hour to hour, and on a population level, it displays continuous variation However the diagnosis of Type II diabetes is set by relatively fixed criteria A fasting blood sugar level of 126 mg/dL or higher, repeated on different days, is diagnostic of diabetes A casual (non-fasting) blood sugar level of 200 mg/dL or higher is suggestive of diabetes In either case, while the level of blood sugar is influenced by a variety of factors (polygenic and environmental), the actual diagnosis of the disease leads one to be classified as diabetic or not diabetic Since there are only two phenotypic classes (or three if one included the prediabetic state), diabetes is referred to as a threshold trait Chapter 22 Answers to Now Solve This 22-1. Because the alleles follow a dominant/recessive mode, one can use the equation 2q2 to calculate q, from which all other aspects of the answer depend The frequency of aa types is determined by dividing the number of nontasters (37) by the total number of individuals (125) q2 = 37/125 = 0.296 q = 0.544 p = 1-q p = 0.456 The frequencies of the genotypes are determined by applying the formula p2 + 2pq + q2 as follows: Frequency of AA = p2 = (0.456)2 = 0.208 or 20.8% Frequency of Aa = 2pq = 2(0.456)(0.544) = 0.496 or 49.6% Frequency of aa = q2 = (0.544)2 = 0.296 or 29.6% 22-2 (a) For the CCR5 analysis, first determine p and q Since one has the frequencies of all the genotypes, one can add 0.6 and 0.351/2 to provide p (= 7755); q will be 0.049 and 351/2 = 0.2245 The equilibrium values will be as follows Frequency of l/l = p2 = (.7755)2 = 6014 or 60.14% Frequency of l/∆32 = 2pq = 2(.7755)(.2245) = 3482 or 34.82% Frequency of ∆32 /∆32 = q = (.2245)2 = 0504 or 5.04% Comparing these equilibrium values with the observed values strongly suggests that the observed values are drawn from a population in Hardy-Weinberg equilibrium (b) For the AS (sickle-cell) analysis, first determine p and q Since one has the frequencies of all the genotypes, one can add 756 and 242/2 to provide p (= 877); q will be 877 or 123 The equilibrium values will be as follows: Frequency of AA Frequency of AS Frequency of SS = = = = p2 = (.877)2 = 7691 or 76.91% 2pq = 2(.877)(.123) 2157 or 21.57% q2 = (.123)2 = 0151 or 1.51% A-19 Comparing these equilibrium values with the observed values suggests that the observed values may be drawn from a population that is not in equilibrium Notice that there are more heterozygotes than predicted, and fewer SS types in the population Since data are given in percentages, x values can not be computed 22-3 Given that the recessive allele a is present in the homozygous state (q2) at a frequency of 0.0001, the value of q is 0.01 and p = 0.99 (a) q is 0.01 (b) p = - q or 0.99 (c) 2pq = 2(0.01)(0.99) = 0.0198 (or about 1/50) (d) 2pq * 2pq = 0.0198 * 0.0198 = 0.000392 (or about 1/255) 22-4 The probability that the woman (with no family history of CF) is heterozygous is 2pq or 2(1/50)(49/50) The probability that the man is heterozygous is 2/3 The probability that a child with CF will be produced by two heterozygotes is 1/4 Therefore, the overall probability of the couple producing a CF child is 98/2500 * 2/3 * 1/4 = 0.00653, or about 1/153 Solutions to Problems and Discussion Questions Your essay should include a discussion of the original sources of variation coming from mutation and that migration can cause gene frequencies to change in a population if the immigrants have different gene frequencies compared to the host population You should also describe selection as resulting from the biased passage of gametes and offspring to the next generation There must be evidence that gene flow does not occur among the groups being called different species Classifications above the species level (genus, family, etc.) are not based on such empirical data Indeed, classification above the species level is somewhat arbitrary and based on traditions that extend far beyond DNA sequence information In addition, recall that DNA sequence divergence is not always directly proportional to morphological, behavioral, or ecological divergence While the genus classifications provided in this problem seem to be invalid, other factors, well beyond simple DNA sequence comparison, must be considered in classification practices Calculate p and q, then apply the equation p2 + 2pq + q2 to determine genotypic frequencies in the next generation p = frequency of A = 0.2 + 0.3 = 0.5 q = - p = 0.5 Frequency of AA = p2 = 0.25 or 25% Frequency of Aa = 2pq = 0.5 or 50% Frequency of aa = q2 = 0.25 or 25% The initial population was not in equilibrium; however, after one generation of mating under the Hardy–Weinberg conditions the population is in equilibrium and will continue to be so (and not change) until one or more of the Hardy– Weinberg conditions is not met Note that equilibrium does not necessarily mean p and q equal 0.5 In order for the Hardy–Weinberg equations to apply, the population must be in Hardy–Weinberg equilibrium 10 Yes, it is in equilibrium The distribution will be the same in the next generation 12 M = 0.6; N = 0.4 14 18% of the population will be heterozygous 18 The approximate similarity of mutation rates among genes and lineages should provide more credible estimates of divergence times of species and allow for broader interpretations of sequence comparisons It also provides for increased understanding of the mutational processes that govern evolution among mammalian genomes For instance, if the rate A-20 SOLUTION S TO SELECTED PROB LE MS A ND DIS C U S S ION QU E S TIONS of mutation is fairly constant among lineages or cells that have a more rapid turnover, it indicates that replicationrelated errors not make a significant contribution to mutation rates 22 Because of degeneracy in the code, there are some nucleotide substitutions, especially in the third base, that not change amino acids In addition, if there is no change in the overall charge of the protein, it is likely that electrophoresis will not separate the variants If a positively charged amino acid is replaced by an amino acid of like charge, then the overall charge on the protein is unchanged The same may be said for other negatively charged and neutral amino acid substitutions 24 In general, speciation involves the gradual accumulation of genetic changes to a point where reproductive isolation occurs Depending on environmental or geographic conditions, genetic changes may occur slowly or rapidly They can involve point mutations or chromosomal changes 26 (a,b) The pattern of genetic distances through time indicates that from the present to about 25,000 years ago, modern humans and Cro-Magnons show an approximately constant number of differences Conversely, there is an abrupt increase in genetic distance seen in comparing modern humans and Cro-Magnons with Neanderthals The results indicate a clear discontinuity among modern humans, Cro-Magnons, and Neanderthals with respect to genetic variation in the mitochondrial DNAs sampled Assuming that the sampling and analytical techniques used to generate the data are valid, it appears that Neanderthals made little, if any, genetic contributions to the Cro-Magnon or modern European gene pool It could be argued that the absence of Neanderthal mtDNA lineages in living humans is a consequence of random drift or lineage extinction since the disappearance of Neanderthals However, the examination of ancient Cro-Magnon mtDNA shows no evidence of a historical relationship and suggests that Neanderthals were not genetically related to the ancestors of modern humans of the ART child having Beckwith-Wiedemann syndrome Given these data, it would seem reasonable that such information should be provided to prospective parents of an ART child Each couple would need to reach a decision based on available science and their own value and belief sets Plant miRNAs are known to downregulate gene expression and some foods are the source of miRNAs that circulate in body fluids of humans Given this information, it has been suggested that as yet poorly understood environmental factors may play a significant role in the regulation of gene function in humans At this point it might be premature to design a dietary regimen based on such a frail understanding of the role of plant miRNAs in humans Special Topic Review Question Answers Since RNA can both serve in information storage and transfer and catalyze reactions, it has been hypothesized that RNA was the precursor to molecular life-like events In addition, RNAs are components of many primitive yet biologically significant reactions DNA methylation provides a defense to the integration of foreign DNA into the bacterial chromosome, whereas CRISPR loci transcribe crRNAs that guide nucleases to invading complementary DNAs in order to destroy them Through a series of transcriptive and Dicer-related activities, siRNAs are formed that are complementary to centromeric DNA A RITS silencing complex forms that leads to methylation, thus triggering heterochromatin formation At this point, the evolution of such a complex process of heterchromatin formation is not well understood Different regions within cells are often associated with specialized functions Some mRNAs are specific for products that are exported while some are destined for intracellular functions To achieve various cellular functions, localization is required Special Topic Review Question Answers Discussion Question Answers In general, periodic methylation occurs at CpG-rich regions and promoter sequences When a gene is imprinted by methylation, it remains transcriptionally silent In a mammalian embryo, imprinting may silence only the paternal set of chromosomes, for example Reversible histone modifications influence the structure of chromatin by altering the accessibility of nucleosomes to the transcriptional machinery These chromatin alterations “open” or “close” genes for transcription Imprinting usually involves certain genes, restricted in number, that are altered by passage through meiosis A maternally derived imprint or a paternally derived imprint may occur Imprinted alleles are transcriptionally silent in all cells of the organism, whereas epigenetic modifications (methylation) can be reactivated by environmental signals In addition to functioning in cellular signaling, microRNAs play a significant role in the developing embryo MiRNAs are involved with RNA silencing through RISCs that act as repressors of gene expression They so by making mRNAs less likely to be translated Negative or positive regulation depends on whether the ribosome binding site is masked or available When repression occurs, the RBS is masked by sRNA When positive regulation occurs, sRNA pairing unmasks the RBS In bacteria and Archaea, foreign DNA can be inserted into CRISPR loci in the genome, which brings about transcription of crRNAs that guide nucleases to invading complementary DNAs In addition, foreign DNA can be digested by restriction endonucleases One form of eukaryotic genome protection involves piRNAs that are pivotal in silencing transposons, mobile DNA sequences that change position Associated with Piwi proteins, certain proteins (such as RNA-induced silencing complexes [RISC]) target transposon-derived RNAs and their complementary sequences This process represses transposon transcription by promoting DNA methylation of transposon DNA A broadly functioning protective mechanism involves siRNA in association with RISC and Dicer, an RNase III protein Even though a particular species of mRNA may be fairly uniformly distributed throughout a cell, it does not follow that it is uniformly translated It is likely that different domains reside in cells that contain different translational signals If an mRNA finds itself in a particular molecular environment, it may be destined for translation, whereas that same mRNA in another part of a cell may not have the environmental stimulation necessary for translation Discussion Question Answers While data are scant, some studies have shown that children born after in vitro fertilization are at risk for low to very low birth weight that may have resulted from abnormal imprinting There also appears to be an increased risk S OLU TIONS TO S E LE C TE D PROBLE MS AND DIS C U S S IO N Q U ES T I O N S Special Topic Review Question Answers With the development of the polymerase chain reaction, trace samples of DNA can be used, commonly in forensic applications STRs are like VNTRs, but the repeat portion is shorter, between two and nine base pairs, repeated from to 40 times A core set of STR loci, about 13, is most often used in forensic applications Since males typically contain a Y chromosome (exceptions include transgender and mosaic individuals), gender separation of a mixed tissue sample is easily achieved by Y chromosome profiling In addition, STR profiling is possible for over 200 loci; however, because of the relative stability of DNA in the Y chromosome, it is difficult to differentiate between DNA from fathers and sons or male siblings Like Y chromosome DNA, mtDNA is relatively stable because it undergoes very little, if any, recombination Since there is a high copy number of mitochondria in cells, it is especially useful in situations where samples are small, old, or degraded, which is often the case in catastrophes The Combined DNA Index System (CODIS) is a collection of DNA databases and analytical tools of both state and federal governments, maintained by the FBI DNA profiles are collected from convicted offenders, forensic investigations, and in some states, those suspected of crimes as well as from unidentified human remains and missing persons (in cases where DNA is available) 10 The prosecutor’s fallacy attempts to equate guilt with a numerical probability produced by a single piece of evidence Just because a match occurs between a crime scene and a suspect, it does not mean that the suspect is guilty Human error, contamination, or evidence tampering all contribute to the complexities of interpreting DNA profiling data Discussion Question Answers To gain information as to laws and regulations in various states, one could navigate to “Welcome to the DNA Laws Database” within the National Conference of State Legislatures website There, one can select a particular state for its laws and regulations regarding DNA collection and profiling In general, one will see that most states contain descriptions of the following topics: (a) various DNA databases used (b) methods of DNA collection (c) post-conviction DNA collection of felons (d) oversight and advisory committees (e) convicted offender statutes Somatic mosaicism and chimerism involve a mixture of cell types, the origin of which may involve a variety of embryonic events, some of which are understood Since a single individual may contain a mixed population of cells, a DNA sample taken from one tissue site may not match a DNA sample taken from another site This can lead to a conflicted set of results when it comes to matching a DNA sample to a sample of DNA from a crime scene Taking DNA samples from various sites on an individual may be useful in mitigating such confusion In addition, in STR DNA profiling, mosaicism may present itself at the electrophoresis/analysis stage by additional peaks or peak height imbalances Special Topic Review Question Answers Herceptin is used in the treatment of breast cancer that targets the epidermal growth factor receptor (HER-2) gene located on chromosome 17 Overexpression of this gene occurs A-21 in about 25 percent of invasive breast cancer cases Herceptin is a monoclonal antibody that binds specifically to inhibit the HER-2 receptor Cytochrome P450 is composed of a family of enzymes that are encoded by 57 different genes Certain variants of cytochrome P450 metabolize drugs slowly and can lead to harmful accumulations of a drug Other variants cause drugs to be eliminated quickly, which can lead to drug ineffectiveness A pivotal gene, CYP2D6, influences the metabolism of approximately 25 percent of all drugs, while VKORC1 influences the response to warfarin, an anticoagulant drug Recently, large-scale genomic sequencing has shown that each tumor is genetically unique With such information, it is often possible to provide a personalized diagnosis and possibly apply personalized treatments One such example is the use of Herceptin for the treatment of breast cancer; another is the use of Erbitux and Vectibix to inhibit epidermal growth factor receptors that are commonly expressed in cancer cells Using the search function in the PharmGKB database one can find a number of references that discuss the variants of CYP2D6 and tamoxifen For example, according to Hertz et al (Hertz, D et al 2012 The Oncologist 17(5): 2011-0418), tamoxifen efficacy is dependent on the highly polymorphic cytochrome P450 gene (CYP2D6) Depending on a particular variant genotype, tamoxifen treatment outcome is highly inconsistent The entire Hertz et al paper is available through the PharmGKB database and provides a complete and detailed description of the interactions of CYP2D6 variants and tamoxifen Discussion Question Answers There are several bridges that must be crossed before one can claim universal use and acceptance First, it will be necessary to close the gap between data collection and interpretation of complex interactions Second, personalized medicines are dependent on the development of effective therapies that have few side effects and a reasonable cost Finally, given the complexity of living systems, there will likely be diseases for which therapies will be difficult to develop In addition, hopefully, incentives will be sufficient for entities to develop therapies for rare, financially less-rewarding diseases At present, genetic discrimination does exist; however, recent developments in health care laws seek to minimize such discrimination by medical insurance companies It remains to be seen whether genetic discrimination in the workplace continues Special Topic Review Question Answers Genetic engineering allows genetic material to be transferred within and between species and to alter expression levels of genes A transgenic organism is one that involves the transfer of genetic material between different species, whereas the term cisgenic is sometimes used in cases where gene transfers occur within a species Herbicide-tolerant plants make up approximately 70 percent of all GM plants, the majority of which confer tolerance to the herbicide glyphosate Glyphosate interferes with the enzyme 5-enolpyruvylshikimate-3-phosphate synthetase, which is present in all plants and is required for the synthesis of aromatic amino acids phenylalanine, tyrosine, and tryptophan The first iteration of Golden Rice involved the introduction of phytoene synthetase originating from the daffodil plant and carotene desaturase from a bacterium engineered into the rice plant Resulting rice grains were yellow in color due to the production of beta-carotene Later versions of Golden A-22 SOLUTION S TO SELECTED PROB LE MS A ND DIS C U S S ION QU E S TIONS Rice involved the introduction of similar genes from maize thus leading to a much higher production of beta-carotene At present, Golden Rice is being tested in preparation for use in Bangladesh and the Philippines The biolistic method of gene introduction achieves DNA transfer by coating the transforming DNA in a heavy metal to form particles that are fired at high speed into plant cells using a gene gun The introduced DNA may migrate into the cell nucleus and integrate into a plant chromosome 10 This GM plant is resistant to the herbicide glyphosate, a broad-spectrum herbicide, because glyphosate interferes with the enzyme 5-enolpyruvylshikimate-3-phosphate synthetase, which is necessary for the plant to synthesize the aromatic amino acids phenylalanine, tyrosine, and tryptophan The epsps gene was cloned from Agrobacterium strain CP4 and introduced into soybeans using biolistic bombardment Discussion Question Answers There are many positions taken and bills filed in various states to address the question of GM food labeling Generally, many feel a “right to know” would allow consumers to make educated choices about the food they consume They would consider it an advantage to be able to judge the safety of a given food if they had information about the possibility that it contains GM components Others wonder about the usefulness of a GM label if there is little information provided as to how the food has been modified Of what value would it be to know that food was genetically modified if the science and specifics about the modifications were not included? How much background knowledge would be needed by the consumer to be able to interpret such information? Special Topic Review Question Answers In ex vivo gene therapy, a potential genetic correction takes place in cells that have been removed from the patient In vivo gene therapy treats cells of the body through the introduction of DNA into the patient In many cases, therapeutic DNA hitches a ride with genetically engineered viruses, such as retrovirus or adenovirus vectors Nonviral delivery methods may use chemical assistance to cross cell membranes, nanoparticles, or cell fusion with artificial vesicles White blood cells, T cells in this case, were used because they are key players in the mounting of an immune response, which Ashanti was incapable of developing A normal copy of the ADA gene was engineered into a retroviral vector, which then infected many of her T cells Those cells that expressed the ADA gene were then injected into Ashanti’s bloodstream, and some of them populated her bone marrow At the time of Ashanti’s treatment, targeted gene therapy was not possible, so integration of the ADA gene into Ashanti’s genome probably did not replace her defective gene To some extent, targeted gene therapy is designed to alleviate one of the major pitfalls of gene therapy, random DNA integration In addition, recent research holds promise for approaches of targeted removal and even the silencing of defective genes DNA editing makes use of nucleases and zinc-finger arrangements to remove defective genes from the genome 10 One method of gene inhibition follows from the use of RNA interference (RNAi) whereby double-stranded RNA molecules are delivered into cells, and the enzyme Dicer cleaves them into relatively short pieces of RNA (siRNA) siRNA can form a complex with enzymes that target mRNA Another approach to silence genes involves the use of antisense RNA in which RNA is introduced that is complementary to a strand of mRNA, thus blocking its translation Discussion Question Answers Generally, gene therapy is an accepted procedure, given appropriate conditions, for the relief of genetic disease states Since it is a fairly expensive medical approach, considerable debate attends its use It remains to be seen whether insurance companies will embrace what might be considered experimental treatments Use of gene therapy to enhance the competitive status of individuals (genetic enhancement or gene doping) is presently viewed as cheating by most organizations and the public It is unlikely that germ-line therapy will be viewed favorably by the public or scientific communities; however, this and other issues mentioned here will be the subject of considerable future debate Glossary abortive transduction An event in which amniocentesis A procedure in which fluid autotetraploid An autopolyploid condi- transducing DNA fails to be incorporated into the recipient chromosome and fetal cells are withdrawn from the amniotic layer surrounding the fetus; used for genetic testing of the fetus tion composed of four copies of the same genome accession number An identifying number or code assigned to a nucleotide or amino acid sequence for entry and cataloging in a database aneuploidy A condition in which the chro- acentric chromosome Chromosome or chro- annotation Analysis of genomic nucleotide mosome fragment with no centromere mosome number is not an exact multiple of the haploid set auxotroph A mutant microorganism or cell line that requires the addition of a nutritional substance for growth Wild-type strains can synthesize this substance, and not require it added for growth backcross A cross between an F1 heterozy- that bind to DNA and intercalate into the double-stranded structure, producing local disruptions of base pairing These disruptions result in nucleotide additions or deletions in the next round of replication sequence data to identify the protein-coding genes, the nonprotein-coding genes, and the regulatory sequences and function(s) of each gene anticodon In a tRNA molecule, the nucleo- bacteriophage A virus that infects bacteria, tide triplet that binds to its complementary codon triplet in an mRNA molecule using it as the host for reproduction (also, phage) acrocentric chromosome Chromosome with antiparallel A term describing molecules in balanced lethals Recessive, nonallelic lethal acridine dyes A class of organic compounds the centromere located very close to one end Human chromosomes 13, 14, 15, 21, and 22 are acrocentric additive variance Genetic variance attrib- parallel alignment but running in opposite directions Most commonly used to describe the opposite orientations of the two strands of a DNA molecule uted to the substitution of one allele for another at a given locus This variance can be used to predict the rate of response to phenotypic selection in quantitative traits antisense oligonucleotide A short, single- allele One of the possible alternative forms sized in vivo or in vitro) with a ribonucleotide sequence that is complementary to part of an mRNA molecule of a gene, often distinguished from other alleles by phenotypic effects allele-specific oligonucleotide (ASO) Synthetic nucleotides, usually 15–20 bp in length, that under carefully controlled conditions will hybridize only to a perfectly matching complementary sequence allopatric speciation Process of speciation associated with geographic isolation allopolyploid Polyploid condition formed by the union of two or more distinct chromosome sets with a subsequent doubling of chromosome number allotetraploid An allopolyploid containing two genomes derived from different species allozyme An allelic form of a protein that can be distinguished from other forms by electrophoresis alternative splicing Generation of different protein molecules from the same premRNA by incorporation of a different set and order of exons into the mRNA product Alu sequence A DNA sequence of approximately 300 bp found interspersed within the genomes of primates that is cleaved by the restriction enzyme Alu I In humans, 300,000–600,000 copies are dispersed throughout the genome and constitute some 3–6 percent of the genome See short interspersed elements Ames test A bacterial assay developed by Bruce Ames to detect mutagenic compounds; it assesses reversion to histidine independence in the bacterium Salmonella typhimurium aminoacyl tRNA A covalently linked combination of an amino acid and a tRNA molecule Also referred to as a charged tRNA stranded DNA or RNA molecule complementary to a specific sequence antisense RNA An RNA molecule (synthe- apoptosis A genetically controlled program of cell death, activated as part of normal development or as a result of cell damage Argonaute A family of proteins that are found within the RNA-induced silencing complex (RISC) and have endonuclease activity associated with the destruction of target mRNAs artificial selection See selection ascospore A meiotic spore produced in certain fungi attached-X chromosome Two conjoined X chromosomes that share a single centromere and thus migrate together during cell division attenuator A nucleotide sequence between the gote and one of the P1 parents (or an organism with a genotype identical to one of the parents) genes, each carried on different homologous chromosomes When organisms carrying balanced lethal genes are interbred, only organisms with genotypes identical to the parents (heterozygotes) survive balanced translocation carrier An individual with a chromosomal translocation in which there has been an exchange of genetic information with no associated extra or missing genetic material balancer chromosome A chromosome containing one or more inversions that suppress crossing over with its homolog and which carries a dominant marker that is usually lethal when homozygous Barr body Densely staining DNA-positive mass seen in the somatic nuclei of mammalian females Discovered by Murray Barr, this body represents an inactivated X chromosome base analog A purine or pyrimidine base that differs structurally from one normally used in biological systems but whose chemical behavior is the same base substitution A single base change in a DNA molecule that produces a mutation promoter and the structural gene of some bacterial operons that regulates the transit of RNA polymerase, reducing transcription of the neighboring structural gene bidirectional replication A mechanism of autogamy A process of self-fertilization binary switch gene A gene that acts to pro- resulting in homozygosis autonomously replicating sequences (ARS) Origins of replication, about 100 nucleotides in length, found in yeast chromosomes autopolyploidy Polyploid condition resulting from the duplication of one diploid set of chromosomes autoradiography Production of a photographic image by radioactive decay Used to localize radioactively labeled compounds within cells and tissues or to identify radioactive probes in various blotting techniques See Southern blotting autosomes Chromosomes other than the sex chromosomes In humans, there are 22 pairs of autosomes DNA replication in which two replication forks move in opposite directions from a common origin gram a cell to follow one of a number of possible developmental pathways bioinformatics A field that focuses on the design and use of software and computational methods for the storage, analysis, and management of biological information such as nucleotide or amino acid sequences biometry The application of statistics and statistical methods to biological problems biotechnology Commercial and/or industrial processes that utilize biological organisms or products bivalents Synapsed homologous chromosomes in the first prophase of meiosis BLAST (Basic Local Alignment Search Tool) A software application for comparing sequence G-1 G-2 GLOSSARY data (DNA, RNA, protein) to search for sequence similarities broad heritability That proportion of total phenotypic variance in a population that can be attributed to genotypic variance CAAT box A highly conserved DNA sequence found in the untranslated promoter region of eukaryotic genes This sequence is recognized by transcription factors cancer stem cells Tumor-forming cells in a cancer that can give rise to all the cell types in a particular form of cancer These cells have the properties of normal stem cells: self-renewal and ability to differentiate into multiple cell types capillary electrophoresis A collection of analytical methods that separates large and small charged molecules in a capillary tube by their size to charge ratio carrier An individual heterozygous for a recessive trait cDNA (complementary DNA) DNA synthesized from an RNA template by the enzyme reverse transcriptase cell cycle The sequence of growth phases of an individual cell; divided into G1 (gap 1), S (DNA synthesis), G2 (gap 2), and M (mitosis) Cells that temporarily or permanently withdraw from the cell cycle are said to enter the G0 stage CEN The DNA region of centromeres critical to their function In yeasts, fragments of chromosomal DNA, about 120 bp in length, that when inserted into plasmids confer the ability to segregate during mitosis centimorgan (cM) A unit of distance between genes on chromosomes representing percent crossing over between two genes Equivalent to map unit (m.u.) central dogma The concept that genetic information flow progresses from DNA to RNA to proteins Although exceptions are known, this idea is central to an understanding of gene function centriole A cytoplasmic organelle composed of nine groups of microtubules, generally arranged in triplets Centrioles function in the generation of cilia and flagella and serve as foci for the spindles in cell division centromere The specialized heterochromatic chromosomal region at which sister chromatids remain attached after replication, and the site to which spindle fibers attach to the chromosome during cell division Location of the centromere determines the shape of the chromosome during the anaphase portion of cell division Also known as the primary constriction centrosome Region of the cytoplasm containing a pair of centrioles chaperone A protein that regulates the folding of a polypeptide into a functional threedimensional shape chiasma (pl., chiasmata) The crossed strands of nonsister chromatids seen in diplotene of the first meiotic division Regarded as the cytological evidence for exchange of chromosomal material, or crossing over chi-square (X ) analysis Statistical test cline A gradient of genotype or phenotype dis- to determine whether or not an observed set of data is equivalent to a theoretical expectation clone Identical molecules, cells, or organisms chorionic villus sampling (CVS) A technique of prenatal diagnosis in which chorionic fetal cells are retrieved intravaginally or transabdominally and used to detect cytogenetic and biochemical defects in the embryo chromatid One of the longitudinal subunits of a replicated chromosome chromatin The complex of DNA, RNA, histones, and nonhistone proteins that make up uncoiled chromosomes, characteristic of the eukaryotic interphase nucleus chromatin immunoprecipitation (ChIP) An analytical method used to identify DNAbinding proteins that bind to DNA sequences of interest chromatin remodeling A process in which the structure of chromatin is chemically altered by a protein complex, resulting in changes in the transcriptional state of genes within the altered region chromomere A coiled, beadlike region of a chromosome, most easily visualized during cell division chromosomal aberration Any duplication, deletion, or rearrangement of the otherwise diploid chromosomal content of an organism Sometimes referred to as a chromosomal mutation chromosome In prokaryotes, a DNA molecule containing the organism’s genome; in eukaryotes, a DNA molecule complexed with proteins and RNA to form a threadlike structure containing genetic information that is visible during mitosis and meiosis chromosome banding Technique for the differential staining of mitotic chromosomes to produce a characteristic banding pattern chromosome map A diagram showing the location of genes on chromosomes chromosome puff A localized uncoiling and swelling in a polytene chromosome, usually regarded as a sign of active transcription chromosome theory of inheritance The idea put forward independently by Walter Sutton and Theodore Boveri that chromosomes are the carriers of genes and the basis for the Mendelian mechanisms of segregation and independent assortment cis-acting sequence A DNA sequence that regulates the expression of a gene located on the same chromosome This contrasts with a trans-acting element where regulation is under the control of a sequence on the homologous chromosome cis configuration The arrangement of two genes (or two mutant sites within a gene) on the same homolog, such as a a + + cis–trans test A genetic test to determine whether two mutations are located within the same cistron (or gene) tributed over a geographic range derived from a single ancestor by asexual or parasexual methods CODIS (Combined DNA Index System) A standardized set of 13 short tandem repeat (STR) DNA sequences used by law enforcement and government agencies in preparing DNA profiles codominance Condition in which the phenotypic effects of a gene’s alleles are fully and simultaneously expressed in the heterozygote codon A triplet of ribonucleotides that specifies a particular amino acid or a start or stop signal in the genetic code coefficient of coincidence A ratio of the observed number of double crossovers divided by the expected number of such crossovers coefficient of inbreeding The probability that two alleles present in a zygote are descended from a common ancestor coefficient of selection ( s) A measurement of the reproductive disadvantage of a given genotype in a population cohesin A protein complex that holds sister chromatids together during mitosis and meiosis and facilitates attachments of spindle fibers to kinetochores colchicine An alkaloid compound that inhibits spindle formation during cell division used during the preparation of karyotypes colinearity The linear relationship between the nucleotide sequence in a gene (or the RNA transcribed from it) and the order of amino acids in the polypeptide chain specified by the gene competence In bacteria, the transient state or condition during which the cell can bind and internalize exogenous DNA molecules, making transformation possible complementarity Chemical affinity between nitrogenous bases of nucleic acid strands as a result of hydrogen bonding Responsible for the base pairing between the strands of the DNA double helix and between DNA and RNA strands complementation test A genetic test to determine whether two mutations occur within the same gene (or cistron) If two mutations are present in a cell at the same time and produce a wild-type phenotype (i.e., they complement each other), they are often nonallelic If a mutant phenotype is produced, the mutations are noncomplementing and are often allelic complete linkage A condition in which two genes are located so close to each other that no recombination occurs between them complex trait A trait whose phenotype is determined by the interaction of multiple genes and environmental factors concordance Pairs or groups of individuals with identical phenotypes In twin studies, a condition in which both twins exhibit or fail to exhibit a trait under investigation G LO S SARY G-3 conditional mutation A mutation expressed only under a certain condition; that is, a wild-type phenotype is expressed under certain (permissive) conditions and a mutant phenotype under other (restrictive) conditions conjugation Temporary fusion of two singlecelled organisms for the sexual transfer of genetic material consanguineous Related by a common ancestor within the previous few generations consensus sequence The sequence of nucleotides in DNA or amino acids in proteins most often present in a particular gene or protein under study in a group of organisms contig A continuous DNA sequence reconstructed from overlapping DNA sequences derived by cloning or sequence analysis continuous variation Phenotype variation in which quantitative traits range from one phenotypic extreme to another in an overlapping or continuous fashion copy number variation (CNV) DNA segments larger than kb that are repeated a variable number of times in the genome cosmid A vector designed to allow cloning of large segments of foreign DNA composed of the cos sites of phage l inserted into a plasmid CpG island A short region of regulatory DNA found upstream of genes that contain unmethylated stretches of sequence with a high frequency of C and G nucleotides CRISPR Regions of the prokaryotic genome that contain Clusters of Regularly Interspaced Short Palindromic Repeats Following insertion of foreign DNA into CRISPR loci, transcription from these regions produces CRISPR RNAs, which guide nucleases to invading complementary DNAs to destroy them CRISPR/Cas The adaptive immunity mechanism present in many prokaryotes, which utilizes CRISPR RNAs to guide Cas nucleases to invading complementary DNAs to destroy them The CRISPR/Cas mechanism has also been exploited to introduce specific mutations in many types of eukaryotes crossing over The exchange of chromosomal material (parts of chromosomal arms) between homologous chromosomes by breakage and reunion The exchange of material between nonsister chromatids during meiosis is the basis of genetic recombination crRNA biogenesis One step in the CRISPR/Cas dalton (Da) A unit of mass equal to that of the hydrogen atom, which is 1.67 * 10-24 gram A unit used in designating molecular weights degenerate code The representation of a given amino acid by more than one codon deletion A chromosomal mutation, also The fragments, known as Okazaki fragments, are subsequently joined by DNA ligase to form a continuous strand discontinuous variation Pattern of variation for a trait whose phenotypes fall into two or more distinct classes discordance In twin studies, a situation referred to as a deficiency, involving the loss of chromosomal material where one twin expresses a trait but the other does not deme A local interbreeding population de novo Newly arising; synthesized from less disjunction The separation of chromosomes complex precursors density gradient centrifugation A method of separating macromolecular mixtures by the use of centrifugal force and solutions of varying density deoxyribonuclease (DNase) A class of enzymes that breaks down DNA into oligonucleotide fragments by introducing single-stranded or double-stranded breaks into the double helix deoxyribonucleic acid (DNA) A macromol- during the anaphase stage of cell division disruptive selection Simultaneous selection for phenotypic extremes in a population, usually resulting in the production of two phenotypically discontinuous strains dizygotic twins Twins produced from separate fertilization events; two ova fertilized independently Also known as fraternal twins DNA fingerprinting A molecular method for identifying an individual member of a population or species using restriction enzyme digestion followed by Southern blot hybridization with minisatellite probes ecule usually consisting of nucleotide polymers comprising antiparallel chains in which the sugar residues are deoxyribose and which are held together by hydrogen bonds The primary carrier of genetic information DNA footprinting A technique for identify- determination Establishment of a specific DNA microarray An ordered arrangement ing a DNA sequence that binds to a particular protein pattern of gene activity and developmental fate for a given cell, usually prior to any manifestation of the cell’s future phenotype of DNA sequences or oligonucleotides on a substrate (often glass) used in quantitative assays of DNA–DNA or DNA–RNA binding to measure profiles of gene expression dicentric chromosome A chromosome having DNA profiling A method for identification two centromeres, which can be pulled in opposite directions during anaphase of cell division Dicer An enzyme (a ribonuclease) that of individuals that uses variations in the length of short tandem repeating DNA sequences (STRs) that are widely distributed in the genome cleaves double-stranded RNA (dsRNA) and pre-microRNAs (miRNAs) to form small interfering RNA (siRNA) molecules about 20 to 25 nucleotides long that serve as guide molecules for the degradation of mRNA molecules with sequences complementary to the siRNA dominant negative mutation A mutation dideoxynucleotide A nucleotide containing double crossover Two separate events of chro- a deoxyribose sugar lacking a 39 hydroxyl group It stops further chain elongation when incorporated into a growing polynucleotide and is used in the Sanger method of DNA sequencing differentiation The complex process of change by which cells and tissues attain their adult structure and functional capacity dihybrid cross A genetic cross involving two whose gene product acts in opposition to the normal gene product, usually by binding to it to form dimers dosage compensation A genetic mechanism that equalizes the levels of expression of genes at loci on the X chromosome mosome breakage and exchange occurring within the same tetrad during meiosis double helix The model for DNA structure proposed by James Watson and Francis Crick, in which two antiparallel hydrogenbonded polynucleotide chains are wound into a right-handed helical configuration nm in diameter, with 10 base pairs per full turn mechanism in which RNAs are transcribed and processed by Cas proteins characters in which the parents possess different forms of each character (e.g., yellow, round * green, wrinkled peas) driver mutation A mutation in a cancer cell C value The haploid amount of DNA present diploid (2n) A condition in which each chro- ity that is involved in the maturation of microRNAs, which removes 59 and 39 nonself-complementary regions of a primary miRNA to produce a pre-mRNA in a genome C value paradox The apparent paradox that there is no relationship between the size of the genome and the evolutionary complexity of species cytogenetics A branch of biology in which the techniques of both cytology and genetics are used in genetic investigations cytokinesis The division or separation of the cytoplasm during mitosis or meiosis mosome exists in pairs; having two of each chromosome directional selection A selective force that that contributes to tumor progression Drosha A nuclear enzyme with nuclease activ- changes the frequency of an allele in a given direction, either toward fixation (frequency of 100%) or toward elimination (frequency of 0%) duplication A chromosomal aberration in discontinuous replication of DNA The syn- disjunction at meiotic prophase I Each dyad consists of two sister chromatids joined at the centromere thesis of DNA in discontinuous fragments on the lagging strand during replication which a segment of the chromosome is repeated dyad The products of tetrad separation or G-4 GLOSSARY thought to possess favorable genes, and negative eugenics refers to the discouragement of breeding among those thought to have undesirable traits filial generations See F1, F2 generations fitness A measure of the relative survival and euphenics Medical or genetic intervention to fluctuation test A statistical test demonstrat- electrophoresis A technique that separates a mixture of molecules by their differential migration through a stationary medium (such as a gel) under the influence of an electrical field electroporation A technique that uses an electric pulse to move polar molecules across the plasma membrane into the cell reduce the impact of defective genotypes euploid Polyploid with a chromosome number that is an exact multiple of a basic chromosome set ELSI (Ethical, Legal, Social Implications) A program established by the National Human Genome Research Institute in 1990 as part of the Human Genome Project to sponsor research on the ethical, legal, and social implications of genomic research and its impact on individuals and social institutions evolution Descent with modification The embryonic stem cells (ESC) Cells derived tain the sequences that, through transcription and translation, are eventually represented in the amino acid sequence of the final polypeptide product from the inner cell mass of early blastocyst mammalian embryos These cells are pluripotent, meaning they can differentiate into any of the embryonic or adult cell types characteristic of the organism emergence of new kinds of plants and animals from preexisting types excision repair Removal of damaged DNA segments followed by repair synthesis with the correct nucleotide sequence exon The DNA segments of a gene that con- expressed sequence tag (EST) All or part of endogenous siRNAs (endo-siRNAs) Short interfering RNAs that are derived from endogenous sources such as bi-directional transcription of repetitive sequences (centromeres and transposons) that are processed by Dicer epigenesis The idea that an organism or organ arises through the sequential appearance and development of new structures, in contrast to preformationism, which holds that development is the result of the assembly of structures already present in the egg epigenetics The study of modifications in an organism’s pattern of gene expression or phenotypic expression that are not attributable to alterations in the nucleotide sequence (mutations) of the organism’s DNA epimutation The abnormal repression or expressivity The degree to which a phenotype for a given trait is expressed extracellular RNAs (exRNAs) Various types tific methods to obtain data used in criminal and civil law cases used to identify a gene controlling a phenotypic trait in the absence of knowledge of the gene’s location in the genome or its DNA sequence An approach contrasted with reverse genetics fragile site A heritable gap, or nonstain- traits by genetic information contained in cytoplasmic organelles such as mitochondria and chloroplasts Sometimes called extrachromosomal inheritance lation by a small number of individuals whose genotypes carry only a fraction of the alleles in the parental population ing region, of a chromosome that can be induced to generate chromosome breaks frameshift mutation A mutational event fertility factor and that acts as a recipient in bacterial conjugation leading to the insertion of one or more base pairs in a gene, shifting the codon reading frame in all codons that follow the mutational site F1 cell A bacterial cell that contains a fertility functional genomics The study of gene func- F cell A bacterial cell that does not contain a factor and that acts as a donor in bacterial conjugation F factor An episomal plasmid in bacterial cells that confers the ability to act as a donor in conjugation F9 factor A fertility factor that contains a portion of the bacterial chromosome F pilus On bacterial cells possessing an F fac- improvement of the human species by selective breeding Positive eugenics refers to the promotion of breeding between people chromosome organization in which each sister chromatid consists of a single chromatin fiber wound like a tightly coiled skein of yarn extranuclear inheritance Transmission of nonallelic genes such that one gene influences or interferes with the expression of another gene, leading to a specific phenotype eugenics A movement advocating the folded-fiber model A model of eukaryotic founder effect The establishment of a popu- F2 generation Second filial generation; the regions that are lightly staining and are relatively uncoiled during the interphase portion of the cell cycle Euchromatic regions contain most of the structural genes tion growth followed by a drastic reduction in population size of RNAs (such as mRNAs and microRNAs) that are secreted in association with proteins or in vesicles for protection and that serve to signal other cells epistasis Nonreciprocal interaction between euchromatin Chromatin or chromosomal flush–crash cycle A period of rapid popula- forward genetics The classical approach rying promoter regions designed to cause expression of inserted DNA sequences F1 generation First filial generation; the prog- the number of centromeres is not reduced by half method of in situ hybridization that utilizes probes labeled with a fluorescent tag, causing the site of hybridization to fluoresce when viewed using ultraviolet light expression vector Plasmids or phages car- activation of a gene caused by errors in epigenetic mechanisms of gene regulation equational division A division stage where fluorescence in situ hybridization (FISH) A forensic science The use of laboratory scien- enhancer A DNA sequence that enhances transcription and the expression of structural genes, often acting over a distance of thousands of base pairs located upstream, downstream, or internal to the gene they affect ing that bacterial mutations arise spontaneously, in contrast to being induced by selective agents the nucleotide sequence of a cDNA clone ESTs are used as markers in the construction of genetic maps ENCODE (Encyclopedia of DNA Elements) An international effort to identify and analyze all functional DNA elements of the human genome that are involved in the regulation of gene expression reproductive success of a given individual or genotype eny resulting from the first cross in a series progeny resulting from a cross of the F1 generation tor, a filament-like projection that plays a role in conjugation familial trait A trait transmitted through and expressed by members of a family Often used to describe a trait whose precise mode of inheritance is not clear fate map A diagram of an embryo showing the location of cells whose developmental fate is known fetal cell sorting A noninvasive method of prenatal diagnosis that recovers and tests fetal cells from the maternal circulation tion based on the resulting RNAs or proteins they encode G1 checkpoint A point in the G1 phase of the cell cycle when a cell either becomes committed to initiating DNA synthesis and continuing the cycle or withdraws into the G0 resting stage G0 A nondividing but metabolically active state that cells may enter from the G1 phase of the cell cycle gain-of-function mutation A mutation that produces a phenotype different from that of the normal allele and from any loss-offunction alleles gamete A specialized reproductive cell with a haploid number of chromosomes gap genes Genes expressed in contiguous domains along the anterior–posterior axis of the Drosophila embryo that regulate the process of segmentation in each domain GC box In eukaryotes, a region in a promoter containing a 59-GGGCGG-39 sequence, which is a binding site for transcriptional regulatory proteins G LO S SARY G-5 that encode the 20 amino acids or specify termination of translation RNA polymerase II binds The consensus sequence is TATAAAA Also known as a TATA box ity, whose existence can be confirmed by allelic variants and which occupies a specific chromosomal locus A DNA sequence coding for a single polypeptide genetic drift Random variation in allele gynandromorphy An individual composed of frequency from generation to generation, most often observed in small populations haploid (n) A cell or an organism having one gene amplification The process by which the genetic constitution of cells or individuals by the selective removal, insertion, or modification of individual genes or gene sets GenBank An international, open-source database of publicly available DNA sequences gene The fundamental physical unit of hered- gene sequences are selected and differentially replicated either extrachromosomally or intrachromosomally gene conversion The process of nonrecipro- genetic code The deoxynucleotide triplets genetic engineering The technique of altering genetic equilibrium A condition in which allele frequencies in a population are neither increasing nor decreasing cal recombination by which one allele in a heterozygote is converted into the corresponding allele genetic erosion The loss of genetic diversity gene duplication An event leading to the genetic fine structure analysis Intragenic production of a tandem repeat of a gene sequence during replication gene family A number of closely related from a population or a species recombinational analysis that provides intragenic mapping information at the level of individual nucleotides genes derived from a common ancestral gene by duplication and sequence divergence over evolutionary time genetic load Average number of recessive gene flow The gradual exchange of genes genetic polymorphism The stable coexis- between two populations; brought about by the dispersal of gametes or the migration of individuals gene interaction Production of novel phenotypes by the interaction of alleles of different genes gene knockout A gene in an organism that is inactivated for the purpose of studying gene function Sometimes called gene targeting gene pool The total of all alleles possessed by the reproductive members of a population gene-regulatory networks Genes and DNA sequences that interact with each other and with cell signaling systems to coordinate the expression of gene sets that control the formation of body structures gene targeting A transgenic technique used to create and introduce a specifically altered gene into an organism Gene targeting often involves the induction of a specific mutation in a cloned gene that is then introduced into the genome of a gamete involved in fertilization The organism produced is bred to produce adults homozygous for the mutation, for example, the creation of a gene knockout gene therapy The delivery of therapeutic sequences (DNA or RNA) to treat or correct genetic disease conditions genetically modified organism (GMO) A plant or animal whose genome carries a gene transferred from another species by recombinant DNA technology that is expressed to produce a gene product genetic anticipation The phenomenon in lethal genes carried in the heterozygous condition by an individual in a population tence of two or more distinct genotypes for a given trait in a population When the frequencies of two alleles for such a trait are in equilibrium, the condition is called a balanced polymorphism genetics The branch of biology concerned with study of inherited variation More specifically, the study of the origin, transmission, expression, and evolution of genetic information genome The set of hereditary information encoded in the DNA of an organism, including both the protein-coding and non–protein-coding sequences genome-wide association studies (GWAS) Analysis of genetic variation across an entire genome, searching for linkage (associations) between variations in DNA sequences and a genome region encoding a specific phenotype genomic imprinting The process by which the expression of an allele depends on whether it has been inherited from a male or a female parent Also referred to as parental imprinting genomic library A collection of clones that contains all the DNA sequences of an organism’s genome genomics A subdiscipline of the field of genetics generated by the union of classical and molecular biology with the goal of sequencing and understanding genes, gene interaction, genetic elements, as well as the structure and evolution of genomes genotype The allelic or genetic constitution of an organism; often, the allelic composition of one or a limited number of genes under investigation which the severity of symptoms in genetic disorders increases from generation to generation and the age of onset decreases from generation to generation It is caused by the expansion of trinucleotide repeats within or near a gene and was first observed in myotonic dystrophy germ line An embryonic cell lineage that forms genetic background The impact of the col- Goldberg–Hogness box A short nucleotide lective genome of an organism on the expression of a gene under investigation the reproductive cells (eggs and sperm) germ plasm Hereditary material transmitted from generation to generation sequence 20–30 bp upstream from the initiation site of eukaryotic genes to which cells with both male and female genotypes member of each pair of homologous chromosomes Also referred to as the gametic chromosome number haploinsufficiency In a diploid organism, a condition in which an individual possesses only one functional copy of a gene with the other inactivated by mutation The amount of protein produced by the single copy is insufficient to produce a normal phenotype, thus leading to an abnormal phenotype In humans, this condition is present in many autosomal dominant disorders haplotype A set of alleles from closely linked loci carried by an individual inherited as a unit HapMap Project An international effort by geneticists to identify haplotypes (closely linked genetic markers on a single chromosome) shared by certain individuals as a way of facilitating efforts to identify, map, and isolate genes associated with disease or disease susceptibility Hardy–Weinberg law The principle that genotype frequencies will remain in equilibrium in an infinitely large, randomly mating population in the absence of mutation, migration, and selection helix–turn–helix (HTH) motif In DNA-binding proteins, the structure of a region in which a turn of four amino acids holds two a helices at right angles to each other hemizygous Having a gene present in a single dose in an otherwise diploid cell Usually applied to genes on the X chromosome in heterogametic males heritability A relative measure of the degree to which observed phenotypic differences for a trait are genetic heterochromatin The heavily staining, latereplicating regions of chromosomes that are prematurely condensed in interphase heteroduplex A double-stranded nucleic acid molecule in which each polynucleotide chain has a different origin It may be produced as an intermediate in a recombinational event or by the in vitro reannealing of single-stranded, complementary molecules heterogametic sex The sex that produces gametes containing unlike sex chromosomes In mammals, the male is the heterogametic sex heterokaryon A somatic cell containing nuclei from two different sources heterozygote An individual with different alleles at one or more loci Such individuals will produce unlike gametes and therefore will not breed true Hfr Strains of bacteria exhibiting a high frequency of recombination These strains have a chromosomally integrated F factor that is able to mobilize and transfer part of the chromosome to a recipient F- cell G-6 GLOSSARY high-throughput DNA sequencing A collection of DNA sequencing methods that outperform the standard (Sanger) method of DNA sequencing by a factor of 100–1000 and reduce sequencing costs by more than 99 percent Also called next generation sequencing histone methyltransferase An enzyme that catalyzes the addition of methyl groups to the histone proteins and thus modifies chromatin condensation Holliday structure In DNA recombination, an intermediate seen in transmission electron microscope images as an X-shaped structure showing four single-stranded DNA regions homeobox A sequence of about 180 nucleotides that encodes a sequence of 60 amino acids called a homeodomain, which is part of a DNA-binding protein that acts as a transcription factor homeotic mutation A mutation that causes a tissue normally determined to form a specific organ or body part to alter its pathway of differentiation and form another structure homogametic sex The sex that produces gametes that not differ with respect to sex-chromosome content; in mammals, the female is homogametic homologous chromosomes Chromosomes that synapse or pair during meiosis and that are identical with respect to their genetic loci and centromere placement homozygote An individual with identical alleles for a gene or genes of interest These individuals will produce identical gametes (with respect to the gene or genes in question) and will therefore breed true horizontal gene transfer The nonreproductive transfer of genetic information from an organism to another, across species and higher taxa (even domains) This mode is contrasted with vertical gene transfer, which is the transfer of genetic information from parent to offspring In some species of bacteria and archaea, up to percent of the genome may have originally been acquired through horizontal gene transfer hot spots Genome regions where mutations hybrid vigor The general superiority of a hybrid over a purebred imprinting See genomic imprinting inborn error of metabolism A genetically of about 400 nm, located within the centromere It is the site of microtubule attachment during cell division inbreeding depression A decrease in viabil- Kozak sequence A short nucleotide sequence ity, vigor, or growth in progeny after several generations of inbreeding incomplete dominance Expressing a heterozygous phenotype that is distinct from the phenotype of either homozygous parent Also called partial dominance independent assortment The independent behavior of each pair of homologous chromosomes during their segregation in meiosis I The random distribution of maternal and paternal homologs into gametes inducible enzyme system An enzyme system under the control of an inducer, a regulatory molecule that acts to block a repressor and allow transcription initiation codon The nucleotide triplet AUG that in an mRNA molecule codes for incorporation of the amino acid methionine as the first amino acid in a polypeptide chain interference (I) A measure of the degree to which one crossover affects the incidence of another crossover in an adjacent region of the same chromatid Negative interference increases the chance of another crossover; positive interference reduces the probability of a second crossover event interphase In the cell cycle, the interval between divisions sequence complete genomes for an estimated 600 to 1000 microorganisms (bacteria, viruses and yeast) that live on and inside humans hybrid An individual produced by crossing parents from two different genetic strains strand synthesized in a discontinuous fashion, in the direction opposite of the replication fork lampbrush chromosomes Meiotic chromosomes characterized by extended lateral loops Although most intensively studied in amphibians, these structures occur in meiotic cells of organisms ranging from insects to humans lariat structure A structure formed during pre-mRNA processing by formation of a 59 to 39 bond in an introns, leading to removal of that intron from an mRNA molecule leader sequence That portion of an mRNA molecule from the 59 end to the initiating codon, often containing regulatory or ribosome binding sites leading strand During DNA replication, the strand synthesized continuously in the direction of the replication fork lethal gene A gene whose expression results in premature death of the organism at some stage of its life cycle leucine zipper In DNA-binding proteins, a inversion A chromosomal aberration in locus (pl., loci) The site or place on a chromo- which a chromosomal segment has been reversed long interspersed elements (LINEs) Long, in vitro Literally, in glass; outside the living organism; occurring in an artificial environment in vivo Literally, in the living; occurring within Human Microbiome Project An effort to lagging strand During DNA replication, the structural motif characterized by a stretch in which every seventh amino acid residue is leucine, with adjacent regions containing positively charged amino acids Leucine zippers on two polypeptides may interact to form a dimer that binds to DNA Human Genome Project (HGP) An interna- An RNA-containing retrovirus associated with the onset and progression of acquired immunodeficiency syndrome (AIDS) adjacent to the initiation codon that is recognized as the translational start site in eukaryotic mRNA coding regions in a gene Introns are transcribed but are spliced out of the RNA product and are not represented in the polypeptide encoded by the gene Also known as an intervening sequence intron Any segment of DNA that lies between in vitro evolution A stepwise process of human immunodeficiency virus (HIV) kinetochore A fibrous structure with a size controlled biochemical disorder; usually an enzyme defect that produces a clinical syndrome are observed with a high frequency These include a predisposition toward singlenucleotide substitutions or unequal crossing over tional effort to determine the sequence of the human genome, to identify all genes in the genome, and to map all genes to specific chromosomes, among other goals to the arrangement of metaphase chromosomes in a sequence according to length and centromere position small changes to a nucleic acid that mimics natural selection, but that occurs outside of living cells the living body of an organism isotopes Alternate forms of atoms with identical chemical properties that have the same atomic number but differ in the number of neutrons (and thus their mass) contained in the nucleus isozyme Any of two or more distinct forms of an enzyme with identical or nearly identical chemical properties but differ in some property such as net electrical charge, pH optima, number and type of subunits, or substrate concentration karyokinesis The process of nuclear division karyotype The chromosome complement of a cell or an individual Often used to refer some where a particular gene is located repetitive sequences found interspersed in the genomes of higher organisms long noncoding RNAs (lncRNAs) RNAs that are longer than 200 nucleotides and not encode for polypeptides lncRNAs have various functions including epigenetic modifications of DNA and regulation of the activity of transcription factors long terminal repeat (LTR) A sequence of several hundred base pairs found at both ends of a retroviral DNA loss-of-function mutation Mutations that produce alleles that encode proteins with reduced or no function Lyon hypothesis The proposal describing the random inactivation of the maternal or paternal X chromosome in somatic cells of mammalian females early in development lysis The disintegration of a cell brought about by the rupture of its membrane lysogenic bacterium A bacterial cell carrying the DNA of a temperate bacteriophage integrated into its chromosome G LO S SARY G-7 lysogeny The process by which the DNA of an infecting phage becomes repressed and integrated into the chromosome of the bacterial cell it infects molecular marker in a variety of methods Also called a simple sequence repeat (SSR) minimal medium A medium containing only between two genes, corresponding to a recombination frequency of percent See centimorgan (cM) the essential nutrients needed to support the growth and reproduction of wild-type strains of an organism Usually comprised of inorganic components that include a carbon and nitrogen source maternal effect Phenotypic effects in off- minisatellite Series of short tandem repeat map unit A measure of the genetic distance spring attributable to genetic information transmitted through the oocyte derived from the maternal genome maternal inheritance The transmission of traits strictly through the maternal parent, usually due to DNA found in the cytoplasmic organelles, the mitochondria, or chloroplasts meiosis The process of cell division in gametogenesis or sporogenesis during which the diploid number of chromosomes is reduced to the haploid number melting profile (T m) The temperature at which a population of double-stranded nucleic acid molecules is half-dissociated into single strands merozygote A partially diploid bacterial cell containing, in addition to its own chromosome, a chromosome fragment introduced into the cell by transformation, transduction, or conjugation messenger RNA (mRNA) An RNA molecule transcribed from DNA and translated into the amino acid sequence of a polypeptide metacentric chromosome A chromosome that has a centrally located centromere and therefore chromosome arms of equal lengths metafemale In Drosophila, a poorly devel- sequences (STRs) 10–100 nucleotides in length that occur frequently throughout the genome of eukaryotes Because the number of repeats at each locus is variable, the loci are known as variable number tandem repeats (VNTRs) Used in DNA fingerprinting and DNA profiles mismatch repair A form of excision repair of DNA in which the repair mechanism is able to distinguish between the strand with the error and the strand that is correct missense mutation A mutation that changes a codon to that of another amino acid and thus results in an amino acid substitution in the translated protein mitosis A form of cell division producing two progeny cells identical genetically to the parental cell—that is, the production of two cells from one, each having the same chromosome complement as the parent cell model genetic organism An experimental organism conducive to efficiently conducted research whose genetics is intensively studied on the premise that the findings can be applied to other organisms molecular clock In evolutionary studies, a method that counts the number of differences in DNA or protein sequences as a way of measuring the time elapsed since two species diverged from a common ancestor oped female of low viability with a ratio of X chromosomes to sets of autosomes that exceeds 1.0 monohybrid cross A genetic cross involving metagenomics The study of DNA recovered monophyletic group A taxon (group of from organisms collected from the environment as opposed to those grown as laboratory cultures Often used for estimating the diversity of organisms in an environmental sample metamale In Drosophila, a poorly developed male of low viability with a ratio of X chromosomes to sets of autosomes that is below 0.5 metastasis The process by which cancer cells spread from the primary tumor and establish malignant tumors in other parts of the body methylation Enzymatic transfer of methyl groups from S-adenosylmethionine to biological molecules, including phospholipids, proteins, RNA, and DNA Methylation of DNA is associated with the regulation of gene expression and with epigenetic phenomena such as imprinting microRNA Single-stranded RNA molecules only one character (e.g., AA * aa) organisms) consisting of an ancestor and all its descendants monosomic An aneuploid condition in which one member of a chromosome pair is missing; having a chromosome number of 2n – monozygotic twins Twins produced from a single fertilization event; the first division of the zygote produces two cells, each of which develops into an embryo Also known as identical twins multigene family A set of genes descended from a common ancestral gene usually by duplication and subsequent sequence divergence The globin genes are an example of a multigene family multiple alleles The presence of three or more alleles of the same gene in a population of organisms mutagen Any agent that causes an increase in the spontaneous rate of mutation approximately 20–23 nucleotides in length that regulate gene expression by participating in the degradation of mRNA mutation The process that produces an alter- microsatellite A short, highly polymorphic mutation rate The frequency with which DNA sequence of 1–4 base pairs, widely distributed in the genome, that is used as a ation in DNA or chromosome structure; in genes, the source of new alleles mutations take place at a given locus or in a population natural selection Differential reproduction among members of a species owing to variable fitness conferred by genotypic differences network map Computer-generated representation of interacting genes, proteins, and other molecules based on experimental data or proposed interactions neutral mutation A mutation with no immediate adaptive significance or phenotypic effect noncoding RNAs (ncRNAs) RNAs that not encode for polypeptides noncrossover gamete A gamete whose chromosomes have undergone no genetic recombination nondisjunction A cell division error in which homologous chromosomes or the sister chromatids fail to separate and migrate to opposite poles; responsible for defects such as monosomy and trisomy noninvasive prenatal genetic diagnosis (NIPGD) A noninvasive method of fetal genotyping that uses a maternal blood sample to analyze thousands of fetal loci using fetal DNA fragments present in the maternal blood nonsense codons The nucleotide triplets (UGA, UAG, and UAA) in an mRNA molecule that signal the termination of translation nonsense mutation A mutation that changes a codon specifying an amino acid into a termination codon, leading to premature termination during translation of mRNA Northern blotting An analytic technique in which RNA molecules are separated by electrophoresis and transferred by capillary action to a nylon or nitrocellulose membrane Specific RNA molecules can then be identified by hybridization to a labeled nucleic acid probe nuclease An enzyme that breaks bonds in nucleic acid molecules nucleoid The DNA-containing region within the cytoplasm in prokaryotic cells nucleolar organizer region (NOR) A chromosomal region containing the genes for rRNA; most often found in physical association with the nucleolus nucleolus The nuclear site of ribosome biosynthesis and assembly; usually associated with or formed in association with the DNA comprising the nucleolar organizer region nucleoside In nucleic acid chemical nomenclature, a purine or pyrimidine base covalently linked to a ribose or deoxyribose sugar molecule nucleosome In eukaryotes, a nuclear complex consisting of four pairs of histone molecules wrapped by two turns of a DNA molecule The major structure associated with the organization of chromatin in the nucleus nucleotide In nucleic acid chemical nomenclature, a nucleoside covalently linked to one or more phosphate groups Nucleotides containing a single phosphate linked to the 59 carbon of the ribose or deoxyribose are the building blocks of nucleic acids G-8 GLOSSARY nucleus The membrane-bound cytoplasmic pedigree In human genetics, a diagram polycistronic mRNA A messenger RNA mol- organelle of eukaryotic cells that contains the chromosomes and nucleolus showing the ancestral relationships and transmission of genetic traits over several generations in a family P element In Drosophila, a transposable DNA element responsible for hybrid dysgenesis penetrance The frequency, expressed as a percentage, with which individuals of a given genotype manifest at least some degree of a specific mutant phenotype associated with a trait pericentric inversion A chromosomal inversion that involves both arms of the chromosome and thus the centromere pharmacogenomics The study of how genetic variation influences the action of pharmaceutical drugs in individuals phenotype The overt appearance of a genetically controlled trait Philadelphia chromosome The product of a reciprocal translocation in humans that contains the short arm of chromosome 9, carrying the C-ABL oncogene, and the long arm of chromosome 22, carrying the BCR gene phosphodiester bond In nucleic acids, the system of covalent bonds by which a phosphate group links adjacent nucleotides, extending from the 59 carbon of one pentose sugar (ribose or deoxyribose) to the 39 carbon of the pentose sugar in the neighboring nucleotide Phosphodiester bonds create the backbone of nucleic acid molecules photoreactivation repair Light-induced repair of damage caused by exposure to ultraviolet light Associated with an intracellular enzyme system phyletic evolution The gradual transformation of one species into another over time; so-called vertical evolution pilus A filamentlike projection from the surface of a bacterial cell Often associated with cells possessing F factors piRNAs PIWI-interacting RNAs Short dsRNA sequences associated with PIWI proteins Participate in transcriptional and/or post transcriptional mechanisms to silence transposons and repetitive sequences in germ cells plaque On an otherwise opaque bacterial lawn, a clear area caused by the growth and reproduction of a single bacteriophage plasmid An extrachromosomal, circular DNA molecule that replicates independently of the host chromosome pleiotropy Condition in which a single mutation causes multiple phenotypic effects ploidy A term referring to the basic chromosome set or to multiples of that set point mutation A mutation that can be mapped to a single locus At the molecular level, a mutation that results in the substitution of one nucleotide for another Also called a gene mutation polar body Produced in females at either the first or second meiotic division of gametogenesis, a discarded cell that contains one of the nuclei of the division process, but almost no cytoplasm as a result of an unequal cytokinesis ecule that encodes the amino acid sequence of two or more polypeptide chains in adjacent structural genes null allele A mutant allele that produces no functional gene product Usually inherited as a recessive trait null hypothesis (H0) Used in statistical tests, the hypothesis that there is no real difference between the observed and expected datasets Statistical methods such as chisquare analysis are used to test the probability associated with this hypothesis Okazaki fragment The short, discontinuous strands of DNA produced on the lagging strand during DNA synthesis oligonucleotide A linear sequence of about 10–20 nucleotides connected by 59-39 phosphodiester bonds oncogene A gene whose activity promotes uncontrolled proliferation in eukaryotic cells Usually a mutant gene derived from a proto-oncogene Online Mendelian Inheritance in Man (OMIM) A database listing all known genetic disorders and disorders with genetic components It also contains a listing of all known human genes and links genes to genetic disorders open reading frame (ORF) A nucleotide sequence organized as triplets that encodes the amino acid sequence of a polypeptide, including an initiation codon and a termination codon operator region In bacterial DNA, a region that interacts with a specific repressor protein to regulate the expression of an adjacent gene or gene set operon A genetic unit consisting of one or more structural genes encoding polypeptides, and an adjacent operator gene that regulates the transcriptional activity of the structural gene or genes outbreeding depression Reduction in fitness in the offspring produced by mating genetically diverse parents It is thought to result from a lowered adaptation to local environmental conditions overlapping code A hypothetical genetic code in which any given triplet is shared by more than one adjacent codon pair-rule genes Genes expressed as stripes around the blastoderm embryo during development of the Drosophila embryo paleogenomics The recovery, sequencing, and analysis of genes and genomes from fossils of extinct species palindrome In genetics, a double-stranded DNA segment where each strand’s base sequence is identical when read 59 to 39 For example: 59-GAATTC-39 39-CTTAAG-59 Palindromic sequences are noteworthy as recognition and cleavage sites for restriction endonucleases paracentric inversion A chromosomal inversion that does not include the region containing the centromere polygenic inheritance The transmission of a phenotypic trait whose expression depends on the additive effect of a number of genes polylinker A segment of DNA that has been engineered to contain multiple sites for restriction enzyme digestion Polylinkers are usually found in engineered vectors such as plasmids polymerase chain reaction (PCR) A method for amplifying DNA segments that depends on repeated cycles of denaturation, primer annealing, and DNA polymerase–directed DNA synthesis polymerases Enzymes that catalyze the formation of DNA and RNA from deoxynucleotides and ribonucleotides, respectively polymorphism The existence of two or more discontinuous, segregating phenotypes in a population polynucleotide A linear sequence of 20 or more nucleotides, joined by 59-39 phosphodiester bonds polypeptide A molecule composed of amino acids linked together by covalent peptide bonds This term is used to denote the amino acid chain before it folds into its functional three-dimensional protein configuration polyploid A cell or individual having more than two haploid sets of chromosomes polysome A structure composed of two or more ribosomes associated with an mRNA and associated tRNAs engaged in translation Also called a polyribosome polytene chromosome Literally, a manystranded chromosome; one that has undergone numerous rounds of DNA replication without separation of the replicated strands, which remain in exact parallel register The result is a giant chromosome with aligned chromomeres displaying a characteristic banding pattern, most often studied in Drosophila larval salivary gland cells population A local group of actually or potentially interbreeding individuals belonging to the same species population bottleneck A drastic reduction in population size and consequent loss of genetic diversity, followed by an increase in population size The rebuilt population has a gene pool with reduced diversity caused by genetic drift positional cloning The identification and subsequent cloning of a gene in the absence of knowledge of its polypeptide product or function The process uses cosegregation of mutant phenotypes with DNA markers to identify the chromosome containing the gene; the position of the gene is identified establishing linkage with additional markers position effect Change in expression of a gene associated with a change in the gene’s location within the genome posttranslational modification The processing or modification of the translated polypeptide chain by enzymatic cleavage, G LO S SARY G-9 or the addition of phosphate groups, carbohydrate chains, or lipids posttranscriptional modification Changes Bacterial cells carrying prophages are said to be lysogenic and to be capable of entering the lytic cycle, whereby the phage is replicated quorum sensing A mechanism used to regulate gene expression in bacteria, in response to changes in cellular population density made to pre-mRNA molecules during conversion to mature mRNA These include the addition of a methylated cap at the 59 end and a poly-A tail at the 39 end, excision of introns, and exon splicing propositus (female, proposita) See proband protein domain Amino acid sequences with rad A unit of absorbed dose of radiation with specific conformations and functions that are structurally and functionally distinct from other regions on the same protein postzygotic isolation mechanism A barrier radioactive isotope An unstable isotope proteome The entire set of proteins expressed that prevents or reduces inbreeding by acting after fertilization to produce nonviable, sterile hybrids or hybrids of lowered fitness preadaptive mutation A mutational event that later becomes of adaptive significance preimplantation genetic diagnosis (PGD) The removal and genetic analysis of unfertilized oocytes, polar bodies, or single cells from an early embryo (3–5 days old) prezygotic isolation mechanism A barrier that reduces inbreeding by preventing courtship, mating, or fertilization Pribnow box In prokaryotic genes, a 6-bp sequence to which the sigma (s) subunit of RNA polymerase binds, upstream from the beginning of transcription The consensus sequence for this box is TATAAT primary miRNAs (pri-miRNAs) The product of transcription of a microRNA gene, which has a 59 methylated cap, a 39 polyadenylated tail, and a hairpin structure due to self-complementary sequences by a cell, tissue, or organism at a given time The study of the proteome is referred to as proteomics with an altered number of neutrons that emits ionizing radiation during decay as it is transformed to a stable atomic configuration proto-oncogene A gene that functions to random amplified polymorphic DNA (RAPD) initiate, facilitate, or maintain cell growth and division Proto-oncogenes can be converted to oncogenes by mutation A PCR method that uses random primers about 10 nucleotides in length to amplify unknown DNA sequences protoplast A bacterial or plant cell with the reading frame A linear sequence of codons cell wall removed Sometimes called a spheroplast prototroph A strain (usually of a micro- pseudoalleles Genes that behave as alleles to reciprocal translocation A chromosomal one another by complementation but can be separated from one another by recombination pseudoautosomal region A region on the primer In nucleic acids, a short length of RNA pseudodominance The expression of a or single-stranded DNA required for initiating synthesis directed by polymerases recessive allele on one homolog owing to the deletion of the dominant allele on the other homolog of nucleic acid and composed of a protein, PrP, with a molecular weight of 27,000– 30,000 Da Prions are known to cause scrapie, a degenerative neurological disease in sheep; bovine spongiform encephalopathy (BSE, or mad cow disease) in cattle; and similar diseases in humans, including kuru and Creutzfeldt–Jakob disease pseudogene A nonfunctional gene with proband An individual who is the focus of a genetic study leading to the construction of a pedigree tracking the inheritance of a genetically determined trait of interest Formerly known as a propositus probe A macromolecule such as DNA or RNA that has been labeled and can be detected by an assay such as autoradiography or fluorescence microscopy Probes are used to identify target molecules, genes, or gene products product law In statistics, the probability that two independent events occurring simultaneously is equal to the product of their individual probabilities aberration in which nonhomologous chromosomes exchange parts recombinant DNA technology A collection of methods used to create DNA molecules by in vitro ligation of DNA from two different organisms, and the replication and recovery of such recombinant DNA molecules recombination The process that leads to the formation of new allele combinations on chromosomes reductional division The chromosome division that halves the number of centromeres and thus reduces the chromosome number by half in the daughter cells The first division of meiosis is a reductional division rem Radiation equivalent in humans; the punctuated equilibrium A pattern in the renaturation The process by which a dena- fossil record of long periods of species stability, punctuated with brief periods of species divergence repetitive DNA sequence A DNA sequence pyrosequencing A high-throughput method of DNA sequencing that determines the sequence of a single-stranded DNA molecule by synthesis of a complementary strand During synthesis, the sequence is determined by the chemiluminescent detection of pyrophosphate release that accompanies nucleotide incorporation into a newly synthesized strand of DNA quantitative real-time PCR (qPCR) A varia- region of a gene to which RNA polymerase binds prior to the initiation of transcription proofreading A molecular mechanism for quantitative trait loci (QTLs) Two or more scanning and correcting errors in replication, transcription, or translation genes that act on a single polygenic trait in a quantitative way prophage A bacteriophage genome integrated quantum speciation Formation of a new into a bacterial chromosome that is replicated along with the bacterial chromosome expression is overridden in the heterozygous condition by a dominant allele sequence homology to a known structural gene present elsewhere in the genome It differs from the functional version by insertions or deletions and by the presence of flanking direct-repeat sequences of 10–20 nucleotides tion of PCR (polymerase chain reaction) that uses fluorescent probes to quantitate the amount of DNA or RNA product present after each round of amplification promoter element An upstream regulatory DNA molecules from denatured single strands recessive An allele whose potential genetic at fertilization, often expressed in decimal form (e.g., 1.06) prion An infectious pathogenic agent devoid in a nucleic acid reannealing Formation of double-stranded organism) that is capable of growth on a defined, minimal medium Wild-type strains are usually regarded as prototrophs and contrasted with auxotrophs human Y chromosome that is also represented on the X chromosome Genes found in this region of the Y chromosome have a pattern of inheritance that is indistinguishable from genes on autosomes primary sex ratio Ratio of males to females an energy equal to 100 ergs per gram of irradiated tissue species within a single or a few generations by a combination of selection and drift dosage of radiation that will cause the same biological effect as one roentgen of X rays tured protein or nucleic acid returns to its normal three-dimensional structure present in many copies in the haploid genome replication fork The Y-shaped region of a chromosome associated with the site of DNA replication replicon The unit of DNA replication, beginning with DNA sequences necessary for the initiation of DNA replication In bacteria, the entire chromosome is a replicon replisome The complex of proteins, including DNA polymerase, that assembles at the bacterial replication fork to synthesize DNA repressible enzyme system An enzyme or group of enzymes whose synthesis is regulated by the intracellular concentration of certain metabolites repressor A protein that binds to a regulatory sequence adjacent to a gene and blocks transcription of the gene G-10 GLOSSARY reproductive isolation Absence of interbreeding between populations, subspecies, or species resistance transfer factor (RTF) A component of R plasmids that confers the ability to transfer the R plasmid between bacterial cells by conjugation restriction endonuclease A bacterial nuclease that recognizes specific nucleotide sequences in a DNA molecule, often a palindrome, and cleaves or nicks the DNA at those sites restriction fragment length polymor phism (RFLP) Variation in the length of DNA fragments generated by restriction endonucleases These variations are caused by mutations that create or abolish cutting sites for restriction enzymes RFLPs are inherited in a codominant fashion and are extremely useful as genetic markers restriction map A map of restriction enzyme cutting sites in a sequence of DNA restriction site A DNA sequence, often palindromic, recognized by a restriction endonuclease The enzyme binds to the restriction site and cleaves the DNA at that site retrotransposon Mobile genetic elements that are major components of many eukaryotic genomes; these elements are copied by means of an RNA intermediate and can be inserted at a distant chromosomal site retrovirus A type of virus that uses RNA as its genetic material and employs the enzyme reverse transcriptase during its life cycle reverse genetics An experimental approach used to discover gene function after the gene has been identified, cloned, and sequenced reversion A mutation that restores the wildtype phenotype R factor (R plasmid) A bacterial plasmid that carries antibiotic resistance genes Most R plasmids have two components: an r-determinant, which carries the antibiotic resistance genes, and the resistance transfer factor (RTF) Rh factor An antigenic system first described in the rhesus monkey ribonucleic acid (RNA) A nucleic acid similar to DNA but characterized by the pentose sugar ribose, the pyrimidine uracil, and the single-stranded nature of the polynucleotide chain Several forms are recognized, including ribosomal RNA, messenger RNA, transfer RNA, and a variety of small regulatory RNA molecules ribonucleoprotein (RNP) particles Complexes of RNA-binding proteins that regulate associated mRNAs ribose The five-carbon sugar associated with ribonucleosides and ribonucleotides associated with RNA ribosomal RNA (rRNA) The RNA molecules that are the structural components of the ribosomal subunits In prokaryotes, these are the 16S, 23S, and 5S molecules; in eukaryotes, they are the 18S, 28S, and 5S molecules ribosome A ribonucleoprotein organelle consisting of two subunits, each containing RNA and protein molecules Ribosomes are the site of translation of mRNA codons into the amino acid sequence of a polypeptide chain segregation The separation of maternal and ribozymes RNAs that catalyze specific bio- selection Changes in the frequency of alleles chemical reactions RNA-binding proteins (RBPs) Proteins that bind to RNAs and have various activities such as influencing RNA translation, splicing, stability, degradation, and localization RNA-directed RNA polymerase (RdRP) An enzyme that uses an RNA molecule as a template to synthesize the formation of a complementary RNA strand RNA-induced gene silencing A mechanism by which small noncoding RNAs and the RNAinduced silencing complex (RISC) negatively regulate transcription or negatively regulate mRNAs post transcriptionally RNA-induced silencing complex (RISC) A protein complex containing an Argonaute family protein with endonuclease activity siRNAs and miRNAs guide RISC to complementary mRNAs to cleave them RNA-induced transcriptional silencing (RITS) A mechanism by which small noncoding RNAs direct a protein complex to complementary DNA sequences to methylate nearby histones and thus silence transcription from this locus in the genome RNA interference (RNAi) Inhibition of gene expression in which a protein complex (RNA-induced silencing complex, or RISC), containing a partially complementary RNA strand binds to an mRNA, leading to degradation or reduced translation of the mRNA paternal homologs of each homologous chromosome pair into gametes during meiosis and genotypes in populations as a result of differential reproduction selection coefficient (s) A quantitative measure of the relative fitness of one genotype compared with another Same as coefficient of selection semiconservative replication A mode of DNA replication in which a double-stranded molecule replicates in such a way that the daughter molecules are each composed of one parental (old) and one newly synthesized strand semisterility A condition in which a percentage of all zygotes are inviable sex chromosome A chromosome, such as the X or Y in humans, which is involved in sex determination sexduction Transmission of chromosomal genes from a donor bacterium to a recipient cell by means of the F factor sex-influenced inheritance Phenotypic expression conditioned by the sex of the individual A heterozygote may express one phenotype in one sex and an alternate phenotype in the other sex (e.g., pattern baldness in humans) sex-limited inheritance A trait that is expressed in only one sex even though the trait may not be X-linked Shine–Dalgarno sequence The nucleotides ing life initially based on RNA serving informational (genomes) and catalytic (ribozymes) roles that predates current life forms based on DNA genomes and protein catalysts AGGAGG that serve as a ribosome-binding site in the leader sequence of prokaryotic genes The 16S RNA of the small ribosomal subunit contains a complementary sequence to which the mRNA binds Robertsonian translocation A chromosomal short interspersed elements (SINEs) Repeti- RNA World Hypothesis A hypothesis describ- aberration created by breaks in the short arms of two acrocentric chromosomes followed by fusion of the long arms of these chromosomes at the centromere Also called centric fusion roentgen (R) A unit of measure of radiation emission, corresponding to the amount that generates 2.083 * 109 ion pairs in one cubic centimeter of air at 0°C and an atmospheric pressure of 760 mm of mercury Sanger sequencing DNA sequencing by synthesis of DNA chains that are randomly terminated by incorporation of a nucleotide analog (dideoxynucleotides) followed by sequence determination by analysis of resulting fragment lengths in each reaction satellite DNA DNA that forms a minor band when genomic DNA is centrifuged in a cesium salt gradient This DNA usually consists of short repetitive sequences secondary sex ratio The ratio of males to females at birth, usually expressed in decimal form (e.g., 1.05) segment polarity genes Genes that regulate the spatial pattern of differentiation within each segment of the developing Drosophila embryo tive sequences found in the genomes of higher organisms The 300-bp Alu sequence is a SINE shotgun cloning The cloning of random fragments of genomic DNA into a vector (a plasmid or phage), usually to produce a library from which clones can be selected for use, as in sequencing sibling species Species that are morphologically similar but reproductively isolated from one another sigma ( S) factor In RNA polymerase, a polypeptide subunit that recognizes the DNA binding site for the initiation of transcription single-nucleotide polymorphism (SNP) A variation in a single nucleotide pair in DNA, usually detected during genomic analysis Present in at least percent of a population, a SNP is useful as a genetic marker single-stranded binding proteins (SSBs) In DNA replication, proteins that bind to and stabilize the single-stranded regions of DNA that result from the action of unwinding proteins siRNAs Small (or short) interfering RNAs Short 20- to 25-nucleotide-long doublestranded RNA sequences with nucleotide G LO S SARY G-11 39 overhangs processed by Dicer that participate in transcriptional and/or post transcriptional mechanisms of gene regulation sister chromatid exchange (SCE) A cross- spore A unicellular body or cell encased in a syndrome A group of characteristics or protective coat Produced by some bacteria, plants, and invertebrates, spores are capable of surviving in unfavorable environmental conditions and give rise to a new individual upon germination In plants, spores are the haploid products of meiosis symptoms associated with a disease or an abnormality An affected individual may express a number of these characteristics but not necessarily all of them ing over event in meiotic or mitotic cells involving the reciprocal exchange of chromosomal material between sister chromatids joined by a common centromere Such exchanges can be detected cytologically after BrdU incorporation into the replicating chromosomes Src A protein kinase that phosphorylates site-directed mutagenesis A process that sRNA Small noncoding RNA in prokaryotes many target proteins to regulate their activity such as regulating the activity of the RNA-binding protein ZBP1 uses a synthetic oligonucleotide containing a mutant base or sequence as a primer for inducing a mutation at a specific site in a cloned gene that regulate gene expression by binding to mRNAs to influence translation or by binding to proteins and modifying their function small noncoding RNAs (sncRNAs) A general SRY The sex-determining region of the Y chro- term used to describe short RNAs that not encode for polypeptides and associate with the RNA-induced silencing complex (RISC) to regulate transcription or to regulate mRNAs post-transcriptionally small nuclear RNA (snRNA) Abundant species of small RNA molecules ranging in size from 90 to 400 nucleotides that in association with proteins form RNP particles known as snRNPs or snurps Located in the nucleoplasm, snRNAs have been implicated in the processing of pre-mRNA and may have a range of cleavage and ligation functions solenoid structure A feature of eukaryotic chromatin conformation generated by nucleosome supercoiling somatic mutation A nonheritable mutation occurring in a somatic cell SOS response The induction of enzymes for repairing damaged DNA in Escherichia coli The response involves activation of an enzyme that cleaves a repressor, activating a series of genes involved in DNA repair Southern blotting A technique developed by Edwin Southern in which DNA fragments produced by restriction enzyme digestion are separated by electrophoresis and transferred by capillary action to a nylon or nitrocellulose membrane Specific DNA fragments can be identified by hybridization to a complementary radioactively labeled nucleic acid probe using the technique of autoradiography spacer DNA DNA sequences found between genes Usually, these are repetitive DNA segments species A group of actually or potentially interbreeding individuals that is reproductively isolated from other such groups spectral karyotype A display of all the chromosomes in an organism as a karyotype with each chromosome stained in a different color spliceosome The nuclear macromolecule complex within which splicing reactions occur to remove introns from pre-mRNAs spontaneous mutation A mutation that arises in the absence of an external force; one that is not induced mosome, found near the chromosome’s pseudoautosomal boundary Accumulated evidence indicates that this gene’s product is the testis-determining factor (TDF) synthetic biology A field of research that combines science and engineering to construct novel biological-based systems and/or organisms that not exist in nature systems biology A field that identifies and analyzes gene and protein networks to gain an understanding of intracellular regulation of metabolism, intra- and intercellular communication, and complex interactions within, between, and among cells TATA box See Goldberg–Hogness box tautomeric shift A reversible isomerization tion of individuals with genotypes close to the mean for the population A selective elimination of genotypes at both extremes in a molecule, brought about by a shift in the location of a hydrogen atom In nucleic acids, tautomeric shifts in the bases of nucleotides can cause changes in other bases during replication and can act as a source of mutations standard deviation ( s) A quantitative mea- telocentric chromosome A chromosome in sure of the amount of variation in a sample of measurements from a population calculated as the square root of the variance which the centromere is located at its very end stabilizing selection Preferential reproduc- stone age genomics Sequence analysis of ancient DNA samples strain A group of organisms with common ancestry with physiological or morphological characteristics of interest for genetic analysis or domestication STR sequences Short tandem repeats 2–9 base pairs long that are found within minisatellites These sequences are used to prepare DNA profiles in forensics, paternity identification, and other applications structural gene A gene that encodes the amino acid sequence of a polypeptide chain sublethal gene A mutation causing lowered viability, with death before maturity in less than 50 percent of the individuals carrying the gene submetacentric chromosome A chromosome with the centromere placed so that one arm of the chromosome is slightly longer than the other sum law In statistics, the probability of one of two mutually exclusive outcomes occurring, where that outcome can be achieved by two or more events, being equal to the sum of their individual probabilities suppressor mutation A mutation that acts to completely or partially restore the function lost by a mutation at another site sympatric speciation Speciation occurring in populations that inhabit, at least in part, the same geographic range synapsis The pairing of homologous chromosomes at meiosis synaptonemal complex (SC) A submicroscopic structure consisting of a tripartite nucleoprotein ribbon that forms between the paired homologous chromosomes of the first meiotic division telomerase The enzyme that adds short, tandemly repeated DNA sequences to the ends of eukaryotic chromosomes telomere The heterochromatic terminal region of a chromosome telomere repeat-containing RNA (TERRA) Large noncoding RNA molecules transcribed from telomere repeats that are an integral part of telomeric heterochromatin temperate phage A bacteriophage that can become a prophage, integrating its DNA into the chromosome of the host bacterial cell and making the latter lysogenic temperature-sensitive mutation A conditional mutation that produces a mutant phenotype at one temperature range and a wild-type phenotype at another template The single-stranded DNA or RNA molecule that specifies the sequence of a complementary nucleotide strand synthesized by DNA or RNA polymerase testcross A cross between an individual whose genotype at one or more loci may be unknown and an individual who is homozygous recessive for the gene or genes in question tetrad The four chromatids that make up paired homologs in the prophase of the first meiotic division In eukaryotes with a predominant haploid stage (some algae and fungi), a tetrad also denotes the four haploid cells produced by a single meiotic division tetrad analysis A method that analyzes gene linkage and recombination in organisms with a predominant haploid phase in their life cycle tetranucleotide hypothesis An early theory of DNA structure proposing that the molecule was composed of repeating units, each consisting of the four nucleotides represented by adenine, thymine, cytosine, and guanine G-12 GLOSSARY sheet A specific protein can be identified through hybridization to a labeled antibody third-generation sequencing (TGS) Tech- transposable element A DNA segment that nologies based on high-throughput methods that sequence a single-stranded DNA molecule moves to other sites in the genome, essentially independent of sequence homology Usually, such elements are flanked at each end by short inverted repeats of 20–40 base pairs wild type The most commonly observed transversion mutation A mutational event wobble hypothesis An idea proposed by thymine dimer In a polynucleotide strand, a lesion consisting of two adjacent thymine bases that become joined by a covalent bond Usually caused by exposure to ultraviolet light, this lesion inhibits DNA replication totipotent The capacity of a cell or an embryo in which a purine is replaced by a pyrimidine or a pyrimidine is replaced by a purine trinucleotide repeat A tandemly repeated part to differentiate into all cell types characteristic of an adult This capacity is usually progressively restricted during development Used interchangeably with pluripotent cluster of three nucleotides (such as CTG) within or near a gene Certain diseases (myotonic dystrophy, Huntington disease) are caused by expansion in copy number of such repeats trait Any detectable phenotypic variation of a triploidy The condition in which a cell or an particular inherited character trans-acting element A gene product (usually a diffusible protein or an RNA molecule) that acts to regulate the expression of a target gene trans configuration An arrangement in which two mutant alleles are on opposite homologs, such as a1 + + a2 transcription Transfer of genetic information from DNA by the synthesis of a complementary RNA molecule using a DNA template transcriptome The set of mRNA molecules present in a cell at any given time transdetermination Change in developmental fate of a cell or group of cells transduction Virally mediated bacterial recombination Also used to describe the transfer of eukaryotic genes mediated by a retrovirus transfer RNA (tRNA) A small ribonucleic acid molecule that “adapts” a triplet codon to its corresponding amino acid during translation transformation Heritable change in a cell or an organism brought about by exogenous DNA Known to occur naturally and also used in recombinant DNA studies transgenic organism An organism whose genome has been modified by the introduction of external DNA sequences into the germ line (sometimes called knock-in organism) transition mutation A mutational event in which one purine is replaced by another or one pyrimidine is replaced by another translation The derivation of the amino acid sequence of a polypeptide from the base sequence of an mRNA molecule in association with a ribosome and tRNAs translocation A chromosomal mutation asso- organism possesses three haploid sets of chromosomes trisomy The condition in which a cell or an organism possesses two copies of each chromosome except for one, which is present in three copies (designated 2n + 1) tumor-suppressor gene A gene that encodes Francis Crick, stating that the third base in an anticodon can align in several ways to allow it to recognize more than one base in the codons of mRNA W, Z chromosomes The sex chromosomes in species where the female is the heterogametic sex (WZ) X chromosome The sex chromosome present in species where the female is the homogametic sex (XX) X chromosome inactivation In mammalian females, the random cessation of transcriptional activity of the maternally or paternally derived X chromosome This event, which occurs early in development, is a mechanism of dosage compensation, and all progeny cells inactivate the same X chromosome a product that normally functions to suppress cell division Mutations in tumorsuppressor genes activate cell division and cause tumor formation XIST A locus in the X-chromosome inactiva- two-dimensional gel electrophoresis (2DGE): Method for separating polypeptides ing from genes located on the X chromosome in two dimensions, first by size (molecular weight) and second by electrical charge (isoelectric point) unequal crossing over A crossover between two improperly aligned homologs, producing one homolog with three copies of a region and the other with one copy of that region variable number tandem repeats (VNTRs) Short, repeated DNA sequences (of 2–20 nucleotides) present as tandem repeats between two restriction enzyme sites Variation in the number of repeats creates DNA fragments of differing lengths following restriction enzyme digestion Used in early versions of DNA fingerprinting variance (s2) A statistical measure of the variation of values from a central value, calculated as the square of the standard deviation variegation Patches of differing phenotypes, such as color, in a tissue vector In recombinant DNA, an agent such as a phage or plasmid into which a foreign DNA segment will be inserted and used to transform host cells vertical gene transfer The transfer of genetic information from parents to offspring generation after generation ciated with the reciprocal or nonreciprocal transfer of a chromosomal segment from one chromosome to another Also denotes the movement of mRNA through the ribosome during translation virulent phage A bacteriophage that infects, transmission genetics The field of genetics Western blotting An analytical technique in concerned with heredity and the mechanisms by which genes are transferred from parent to offspring phenotype or genotype, designated as the norm or standard replicates within, and lyses bacterial cells, releasing new phage particles which proteins are separated by gel electrophoresis and transferred by capillary action to a nylon membrane or nitrocellulose tion center that controls inactivation of the X chromosome in mammalian females X-linkage The pattern of inheritance result- X-ray crystallography A technique for determining the three-dimensional structure of molecules by analyzing X-ray diffraction patterns produced by bombarding crystals of the molecule under study with X-rays YAC Yeast artificial chromosome A cloning vector constructed using chromosomal components including telomeres (from a ciliate), and centromeres, origin of replication, and marker genes from yeast YACs are used to clone long stretches of eukaryotic DNA Y chromosome The sex chromosome in species where the male is heterogametic (XY) Z-DNA An alternative “zig-zag” structure of DNA in which the two antiparallel polynucleotide chains form a left-handed double helix Implicated in regulation of gene expression zinc finger A class of DNA-binding domains seen in proteins They have a characteristic pattern of cysteine and histidine residues that complex with zinc ions, throwing intermediate amino acid residues into a series of loops or fingers zip code A sequence found in some mRNAs that serves as a binding site for RNAbinding proteins that influence mRNA localization within the cell and localized translation zip code binding protein (ZBP1) An RNAbinding protein that binds to actin mRNAs and regulates where they are translated within the cell zygote The diploid cell produced by the fusion of haploid gametic nuclei Credits PHOTOS Murti/Science Source; F08-11a, M Wurtz/Science Source; F08-13b, Christine Case Chapter 1 01-COa, Sinclair Stammers/Science Source; 01-COb, Mark Smith/Science Source; 01COc, Alberto Salguero; F01-02, Biophoto Association/Science Source; F01-03, Photo Researchers/ Science Source; F01-07, Eye of Science/Science Source; F01-08, Roslin Institute; F01-09a, Pearson Education; F01-09b, Hermann Eisenbeiss/Science Source; F01-10a, Jeremy Burgess/Science Source; F01-10b, David Mc Carthy/Science Source Chapter 02-CO, Dr Andrew S Bajer, University of Oregon;F02-02, CNRI/SPL/Science Source; F0204, Dr David Ward, Yale University; F02-12a, Biophoto Associates/Science Source; F02-12b, Andrew Syred/Science Source; F02-12c, Science Source Chapter 3 03-CO National Library of Medicine Chapter 4 04-CO, Juniors Bildarchiv/GmbH/ Stock Photo/Alamy; P78, Dr Ralph Somes; J James Bitgood, University of Wisconsin, Animal Sciences Department; Dr Ralph Somes; J James, Bitgood, University of Wisconsin, Animal Sciences Department; P82, Shout It Out Design/Shutterstock; Zuzule/Shutterstock; Julia Remezova/Shutterstock; F04-01a, John Kaprielian/Science Source; F0403b, Jackson Laboratory; F04-03c, Jackson Laboratory; F04-08, RoJo Images/Shutterstock; F04-12a, Prisma Bildagentur/AG/Stock Photo/Alamy; F0413, Hans Rienhard/Photoshot Holdings Ltd.; F0414, Debra P Hershkowitz/Photoshot Holdings Ltd.; F04-15a, Tanya Wolff; F04-15b, Tanya Wolff; F0415c, Tanya Wolff; F04-16a, Jane Burton/Photoshot Holdings Ltd.; F04-16b, Dr William S Klug; F0417b, Aida Ricciardiello/Shutterstock; F04-18a, Dr Ronald A Butow, Department of Molecular Biology and Oncology, University of Texas Southwestern Medical Center; F04-18b, Dr Ronald A Butow, Department of Molecular Biology and Oncology, University of Texas Southwestern Medical Center Chapter 5 05-CO, Wessex Reg/Genetics Centre/ Wellcome Images; P92, Texas A&M University/ AP Images; F05-02a, Catherine G Palmer; F05-8a Maria Gallegos; F05-04a, Michael Abbey/Science Source;F05-06a, Sari ONeal/Shutterstock; F0506b, William S Klug; F05-02b, Catherine G Palmer; F05-04b, Michael Abbey/Science Source Chapter 6 06-CO, Evelin Schrock, Stan du Manoir,Tom Reid/National Institutes of Health; F06-02a, Courtesy of the Greenwood Genetic Center, Greenwood, SC; F06-02b, Kristy-Anne Glubish/Alamy; F06-04a, David D Weaver/Indiana University; F06-08, Courtesy of National Cotton Council of America; F06-11a, Douglas Chapman, University of Washington Medical Center Pathology; F06-11b, Cri du chat Syndrome Support Group, UK; F06-13, Mary Lilly, The Observatories for the Carnegie Institution for Science; F06-17b, Jorge J Yunis; F06-18, Professor Christine Harrison, Newcastle University, UK Chapter 7 07-CO,James Kezer C/O Stanley Sessions; F07-14, Dr Sheldon Wolff & Judy Bodycote/ Laboratory of Radiobiology and Environmental Health, University of California, San Francisco Chapter 8 08-CO, Dr L Caro/Science Source; F08-01, Pearson Education; F08-10a, Gopal Chapter 9 09-CO, Ken Eward/Photo Researchers; F09-03b, Oliver Meckes/Science Source; F09-10,M.H.F Wilkins; F09-13, Ventana Medical Systems, Inc Chapter 10 10-CO,Gopal Murti/Science Source; F10-14, Dr Harold Weintraub, Howard Hughes Medical Institute, Fred Hutchinson Cancer Center/”Essential Molecular Biology” 2e, Freifelder & Malachinski, Jones & Bartlett, Fig 7-24, pp 141 Chapter 11 11-CO, Don W Fawcett/Science Source; F11-01a, M Wurtz/Science Source; F1101b, William S Klug; F11-02, Biology Pics/Science Source; F11-03, G Murti/Science Source; F11-04, Don W Fawcett/Science Source; F11-05, Dr Richard D Kolodnar, Dana-Farber Cancer Institute; F11-06a, Harald Eggert; F11-06b, The Company of Biologists; F11-07b, John Ellison, Richardson Lab, Integrative Biology, The University of Texas at Austin; F11-08a, Omikron/Science Source; F11-08b, William S Klug; F11-09, William S Klug; F11-13, William S Klug Chapter 12 12-CO, Oscar L Miller/Science Source; F12-10a Bert W O’Malley, M.D., Baylor College of Medicine Chapter 13 FF13-09a, Cold Spring Harbor Laboratory; F13-09b, Elena Kiseleva; F13-11, Sebastian Kaulitzki/Shutterstock; F13-15, Kenneth Eward/ Photo Researchers; F13-CO, From Science, Marat M Yusupov, Gulnara Zh Yusupova, Albion Baucom, Kate Lieberman, Thomas N Earnest, J H D Cate, Harry F Noller Crystal Structure of the Ribosome at 5.5 Resolution Reprinted with permission from AAAS Chapter 14 14-CO, M.G Neuffer; F14-13, The Xeroderma Pigmentosum Association, otherwise known as The Children of the Moon, organized a trip to Paris on December 17 and 18, 2005 for children with XP Newscom Photo Used with permission Chapter 15 15-CO, T Cremer/Dr I Solovei/Dr F Haberman/Biozentrum (LMU) Chapter 16 16-CO, SPL/Science Source; F1601a, Courtesy of Hesed M Padilla-Nash, Antonio Fargiano, and Thomas Ried Affiliation is Section of Cancer Genomics, Genetics Branch, Center for Research, National Cancer Institute, National Institutes of Health, Bethesda, MD; F16-01b, Courtesy of Hesed M Padilla-Nash, Antonio Fargiano, and Thomas Ried Affiliation is Section of Cancer Genomics, Genetics Branch, Center for Research, National Cancer Institute, National Institutes of Health, Bethesda, MD; F16-02a, Roland Birke/ Getty Images; F16-02b, Biophoto/Science Source; F16-02c, Biophoto Associates/Science Source; F1602d, Biophoto Associates/Science Source Chapter 17 17-CO, Pascal Goetgheluck/Science Source; F17-03a, Gopal Murti/Science Source; F17-05b, Michael Gabridge/Custom Medical Stock Photo; F17-10a, Proceedings of the National Academy of Sciences; F17-10b, Proceedings of the National Academy of Sciences; F17-11, Gerald B Downes; F17-14a, Scientist microinjecting cloned DNA into a fertilized egg Copyright 2014 The Regents of the University of California Used by permission; F17-14b, Courtesy of Ralph Brinster University of Pennsylvania School of Veterinary Medicine; F17-15, Eye of Science/Science Source; Chapter 18 F18-11a, Volker Steger/Science Source; F18-11b, Dra Schwartz/Getty Images; F1812, Swiss Institute of Bioinformatics Chapter 19 F19-01a, SIU Biomed Com/Custom Medical Stock Photo; F19-02,From Doebley, J Plant Cell, 2005 Nov; 17(11): 2859-72 Courtesy of John Doebley/University of Wisconsin; F19-CO, GloFish® Fluorescent Fish; F19-09, Affymetrix; F19-10, National Institutes of Health-National Cancer Institute; F19-11 top left, Sebastian Kaultizki/Shutterstock; F19-11 top center, C.D Humphrey T.G Ksiazek, Centers for Disease Control and Prevention; F19-11 top right, Janice Haney Carr, Centers for Disease Control and Prevention; F19-11 mice, C.D Humphrey T.G Ksiazek, Centers for Disease Control and Prevention; F19-11 gene chips, Affymetrix; F19-11 microarray, M C Lorence Chapter 20 F20-05 Jim Langeland, Stephen Paddock, and Sean Carroll, University of Wisconsin at Madison; F20-07a Peter A Lawrence; F20-07b, Peter A Lawrence; F20-08, Jim Langeland, Stephen Paddock, and Sean Carroll, University of Wisconsin at Madison; F20-09a © 2001 The Rockefeller University Press et al The Journal of Cell Biology VOL: 153 no 87-100 doi: 10.1083/ jcb.153.1.87; F20-09b, © 2001 The Rockefeller University Press et al The Journal of Cell Biology VOL: 153 no 87-100 doi: 10.1083/jcb.153.1.87; 20-10a, F Rudolf Turner/Indiana University; F2010b, F Rudolf Turner/Indiana University; F20-14, P Barber,RBP /CustomMedical; F20-15a, Darwin Dale/Science Source; F20-15b, Tanya Wolff; F2015c, Urs Kloter/Georg Halder; F20-16 Dr Elliot M Meyerowitz; 20-17a, Max-Planck-Institut fur Entwicklungsbiologie; F20-17b, Max-Planck-Institut fur Entwicklungsbiologie; 20-19a, Dr Elliot M Meyerowitz; F20-19b, Dr Elliot M Meyerowitz; F20-19c, Dr Elliot M Meyerowitz; F20-19d Dr Elliot M Meyerowitz; F20-CO, Edward B Lewis/ Calfornia institute of technology; F21-08 Rekemp/ Getty Images Chapter 21 F21-CO, 2009fotofriends/Shutterstock; 22-01, Pixshots/Shutterstock; 22-06, Michel Sam-son, Frederick Libert, et al., “Resistance to HIV- infection in Caucasian in-dividuals bearing mutant alleles of the CCR- chemokine receptor gene” Reprinted with permission from Nature [vol 382, 22 August 1996, p 725, Fig 3] © 1996 Macmillan Magazines Limited Chapter 22 F22-23, Ria Novosti/Science Source; F22-20b, Peter Scoones/Getty Images; 22-13, American Journal of Human Genetics, Elsevier; 22-16, Smithsonian Institution Libraries; 22-17a, Niels Poulsen/Alamy; F22-17b, Niklas Lindqvist/National Ciklid Society; F22-20a, Mchugh Tom/Getty Images; F22-CO, Simon Booth/Science Source Special Topic 1 ST1-07, Environmental Health Perspectives C-1 C-2 CRED IT S Special Topic 2 ST2-09a, Brosnan and Voinnet, 2011 Current Opinions in Plant Biology (Fig 1B) Original experiment: Brosnan et al 2007 PNAS; ST2-09b, Brosnan and Voinnet, 2011 Current Opinions in Plant Biology (Fig 1B) Original experiment: Brosnan et al 2007 PNAS Special Topic 3 ST3-05, Viessmann Manufacturing Company, Inc Special Topic 4 ST4-02a, Reproduced with permission of Dako Denmark A/S, a subsidiary of Agilent Technologies, Inc., Santa Clara, California, USA All rights reserved; ST4-02b, Reproduced with permission of Dako Denmark A/S, a subsidiary of Agilent Technologies, Inc., Santa Clara, California, USA All rights reserved Special Topic 5 ST5-01, Pascal Pavani/AFP/ Getty Images; ST5-03, International Rice Research Institute; ST5-06a, Helios Gene Gun Courtesy of Bio-Rad Laboratories, Inc.; ST5-09, Bob Hartzler, Department of Agronomy, Iowa State University; ST5-Box 1, AquaBounty Technologies Inc; ST5-Box 2, S.A Ferreira/US Pacific Basin Agricultural Research Center Special Topic 6 ST6–04a, Van DeSilva; ST606, Jennifer Doudna, H Adam Steinberg, artforscience.com TEXT Chapter 1 p 019, N Hartsoeker, Essay de dioptrique Paris, 1694, p 246 National Library of Medicine © 1964 National Library of Medicine Chapter 7 p 140, 7A History of Genetics, by Alfred H Sturtevant New York: Harper & Row, 1965 Chapter 10 p 213, Geron Corporation Chapter 12 p 234, M.W Nirenberg and H.J Matthaei (1961) “The Dependence Of Cell-Free Protein Synthesis In E coli Upon Naturally Occurring Or Synthetic Polyribonucleotides” Proceedings of the National Academy of Sciences of the USA 47 (10): 1588–160 Chapter 16 p 326, Data from Vogelstein, B et al 2013 Cancer Genome Landscapes Science 339: 1546-1558 Chapter 18 p 378, Adapted from Palladino, M A Understanding the Human Genome Project, 2nd ed Benjamin Cummings, 2006 Chapter 19 p 410, “A genome-wide association study of brain lesion distribution in multiple sclerosis”, Pierre-Antoine Gourraud, Michael Sdika, Pouya Khankhanian, Roland G Henry, Azadeh Beheshtian, Paul M Matthews, Stephen L Hauser, Jorge R Oksenberg, Daniel Pelletier and Sergio E Baranzini Brain, a Neurology Journal, February 13, 2013 Published by Oxford University Press on behalf of the Guarantors of Brain All rights reserved p.417, Data from Lee et al 2001 Infect and Immunity 69: 5786–5793 Chapter 20 p 432, Gerhart, J 1999 1998 Warkany lecture: Signaling pathways in development Teratology 60: 226–239 Chapter 21 p 452, “Mapping Polygenes” by S.D Tanksley, from ANNUAL REVIEW OF GENET- ICS, December 1993, Volume 27 Reproduced with permission of Annual Review of Genetics, Volume © 1993 by Annual Reviews, http://www.annualreviews.org” Chapter 22 p 459, From Noonan, J.P et al ‘Sequencing and analysis of Neanderthal genomic DNA.’ SCIENCE 314: 1113–1118 Special Topic 2 p 492, Based on http://mcb illinois.edu/faculty/profile/cfanderp p.500, Pearson Education, Inc p.499, Wang et al The Long Arm of Long Noncoding RNAs: Roles as Sensors Regulating Gene Transcriptional Programs Cold Spring Harbor Perspectives of Biology 2011 Jan; 3(1):a003756;5 Copyright Cold Spring Harbor Laboratory Press Reprinted with permission Special Topic 3 p 505, Reprinted with permission form The Journal of Forensic Sciences, Vol 48, No 4, copyright ASTM International, 100 Barr Harbor Drive, West Conshohocken, PA 19428 Special Topic 4 p 519, Based on a story reported in Kolata, G., In treatment for leukemia, glimpses of the future New York Times, July 7, 2012 p.514, “Personalized Medicine Coalition.” The Case for Personalized Medicine: 4th ed, fig.2 Copyright (c) 2014 Personalized Medical Coalition Used by permission p.516, “Personalized Medicine Coalition.” The Case for Personalized Medicine: 4th ed., fig.2 Copyright (c) 2014 Personalized Medical Coalition Used by permission Special Topic 5 p 524, Information from the International Service for the Acquisition of AgriBiotech Applications, www.isaaa.org Index Note: Pages numbers followed by f, t, and b indicate figures, tables, and boxes, respectively A A (aminoacyl) site, 260, 262 ABO blood group, 72–73, 75f Absorption spectrum, of UV light, 183, 183f, 281f Ac elements, in maize, 289–290 Accession number, 364 Ac-Ds system, in maize, 289–290 Acentric chromatids, 129 Acetylation, histone, 224 Achondroplasia, 282t, 468 Acquisition, 493 Actin, 270 Action spectrum, of UV light, 183, 183f, 281f Activator mutations, 289–290 Activators, transcriptional, 245, 312, 314 Acute lymphoblastic leukemia, 519b Adaptive immunity, 493 Adaptor hypothesis, 255 Additive alleles, 440, 440f Additive variance, 446 Adduct-forming agents, mutagenic, 280 Adenine, 21, 185, 185f, 187 Adenine diphosphate (ADP), 187 Adenine methylase, in mismatch repair, 283 Adenine triphosphate (ATP), 187 Adeno-associated virus, as gene therapy vector, 537 Adenoma, colonic, 326, 326f, 333 Adenomatous polyposis coli (APC) gene, 326, 333 Adenosine-uracil rich elements, 317 Adenovirus vectors, in gene therapy, 537 Adenyl cyclase, 302 Adh locus, in Drosophila melanogaster, 459f A-DNA, 190 ADP (adenine diphosphate), 187 Adverse drug reactions, pharmacogenomics and, 516–517 Aflatoxin, as carcinogen, 334 Africa, human origin in, 475–477 African-Americans, sickle-cell anemia in, 22, 22f, 23f, 266–267, 266f, 270 prenatal diagnosis of, 404 RFLP analysis of, 404, 404f Agarose gel, in electrophoresis, 192, 192f Age of Genetics, 26–28 Aging cancer and, 325 chromosomal defects and, 105 mitochondrial mutations and, 91–92 telomeres and, 212–213 Y chromosome and, 105 agouti gene, 74, 74f, 487 Agricultural biotechnology, 23–24, 398–400, 451–452 See also Crop plants Agrobacterium tumefaciens–mediated transformation, 528 Albinism founder effect in, 469–470, 470f pedigree analysis for, 62–63 Alignment, DNA-sequence, 363, 363f Alkaptonuria, 264 Alkylating agents, mutagenic, 279, 280f Allele(s), 20, 32, 70–75 additive, 440, 440f definition of, 50, 70 epistatic, 76–80 genes and, 70–75 See also Gene(s) hypostatic, 76 multiple, 72–74 from mutation, 70, 467–468 See also Mutation(s) nonadditive, 440 null, 70 overview of, 70 recessive lethal, 74, 74f symbols for, 50, 70–71 wild-type, 70 Allele frequency calculation of, 461–464, 462t, 464t genetic drift and, 469–470 Hardy-Weinberg law and, 459–464 migration and, 468, 469f mutations and, 467–468 natural selection and, 465, 466f Allele-specific oligonucleotides, 404–405, 405f Allopolyploidy, 116t, 121, 121f, 122–123 Allosteric molecules, 299 Allotetraploids, 122 Alloway, Lionel J., 179 a helix, 268, 269f a-globin genes, 366–367, 381 Alphoid family, 226 Alternative splicing, 249, 369 in gene regulation, 315–316, 315f, 316f mutations affecting, 316 in sex determination, 110 Altman, Sidney, 491 Alu element, 291–292 Alu family, 227 Alzheimer disease, 270 Ames test, 288, 288f Amino acid(s), 21, 22 See also Protein(s) assignment in genetic code, 236–239, 236t, 237t, 238f structure of, 267–268, 268f Amino acid chains, 267–268 Amino acid sequences, 236–239, 236t, 237f, 237t, 238f phylogenetic trees from, 473–474 Amino group, 267 18-Amino purine, as mutagen, 279 Aminoacyl (A) site, 260, 262 Aminoacyl tRNA synthetases, 257–258, 258f Amniocentesis, 119, 402–403, 402f Amphidiploidy, 116t, 122–123, 122f Amphilophus citrinellus, 472–473, 473f Amphilophus zalious, 472–473, 473f AmpliChip CYP450 assay, 516 Anaphase in meiosis, 38f, 39, 39f, 40, 40f in mitosis, 33f, 34f, 36 Anderson, Lukis, 511b Androgens, in sex differentiation, 112 Anemia, sickle-cell See Sickle-cell anemia Aneuploidy, 116, 116t, 117–120 definition of, 116 in humans, 116, 116t, 120, 120f monosomy, 116, 116t, 117 trisomy, 116, 116t, 117–120 See also Trisomy Angelman syndrome, 485, 742 Angiogenesis, in cancer, 119 Animal(s) knockout, 24, 355–357 selective breeding of, 458 transgenic (knock-in), 105, 184, 357–358, 399–400, 400f as bioreactors, 396, 400 creation of, 357–358 examples of, 105 as recombinant protein hosts, 396 Annealing, 340, 340f in polymerase chain reaction, 347 Annotation, 364 Antennapedia complex, 426, 426f, 427t Antibiotic resistance, R plasmids and, 168, 289 Anticipation, 132 Anticodon, 235, 255 Anticodon loops, 257 Antigens, blood group, 72–73, 75f Antiparallel strands, 205–206 Antisense oligonucleotides, 250–251, 543 Antisense RNA, 191 Antisense–mediated skipping, 250 Antiterminator hairpins, 306 Antithrombin, recombinant human, 396, 397 AP endonuclease, 285, 285f APC (adenomatous polyposis) gene, 326, 333 Apoptosis, 329 X-linked inhibitor of, 409 Aptamer, 306 Apurinic sites, 278, 285 Apyrimidinic sites, 285 Arabidopsis thaliana, 25, 26t development in, 430–432, 431f, 432f extracellular RNA in, 500 genome of, 378, 378t Arber, Werner, 338 Archaea, 30 Arginosuccinate aciduria, 477 Argonaute protein, 495 Aristotle, 18 Aromatase, 112 Arrhythmogenic right ventricular cardiomyopathy, 409 Arrow cichlid, speciation and, 472–473, 473f Artificial selection, 446–448, 458 selective breeding and, 398–399, 399f, 446–448, 453 Ashworth, Dawn, 504b Assortive mating, 470 Astbury, William, 187 Athletes, gene doping in, 546b ATP (adenine triphosphate), 187 Attenuated vaccines, 397 Attenuation, in operons, 304, 305–306, 306f Attenuators, 304, 305–306, 305f AU rich elements, 317 Autism, copy number variants and, 128 Autonomously replicating sequences, 209 Autopolyploidy, 116t, 121, 121f Autoradiography, 199 Autosomal mapping, 147–149, 148f, 150f Autosomal mutations, 276 Autosomal STR DNA profiling, 505–506, 505f– 507f, 509t Autosomes, in sex determination in Drosophila melanogaster, 109–110, 110f Autotetraploids, 121 Autotriploids, 121 Auxotrophs, 160, 161f Avery, Oswald, 21, 178, 179–180 Avery-Collins-McCarty experiment, 179–180, 180f Avirulent strains, 178 21-Azacytidine, 310 B Bacillus thuringiensis (Bt), 525–526 Bacteria See also Prokaryotes archaea, 30 avirulent strains of, 178 I-1 I-2 IN D EX Bacteria (Continued) cell division in, 30, 30f chromosomes of, 164f, 165, 216, 216f, 218f conjugation in, 161–166, 161f–163f DNA repair in, 283–285 drug-resistant, R plasmids, 168, 289 eubacteria, 30 gene mapping in, 159–174 See also Gene mapping, in bacteria gene regulation in, 297–303 genome of, 377 high-frequency recombination (Hfr), 163–165, 164f, 165f, 167f insulin production in, 395–396 linkage in, 169 lysogenic, 172 as model organisms, 25, 25f See also Escherichia coli notation for, 166n quorum sensing in, 318 recombination in, 160–174 See also Recombination, in bacteria replication in, 201–207, 201f, 208t serotypes of, 178 spontaneous mutations in, 160 transcription in, 241–243, 243f transduction in, 172–173 transfection in, 181–182 transformation in, 168–169, 178–180, 179f, 180f virulent strains of, 178 Bacterial artificial chromosomes, 343 Bacterial colonies, smooth vs rough, 178 Bacterial cultures, 160, 160f Bacterial operons See Operon(s) Bacterial plasmid vectors, 340–342, 341f Bacteriophage(s), 170–172 chromosomes of, 216, 216f, 218f gene mapping in, 173 See also Gene mapping, in bacteriophages in Hershey-Chase experiment, 180–181, 182f life cycle of, 170, 170f, 180–181, 180f lysogeny in, 171–172 lytic, 171–172 plaque assays for, 170–171, 171f prophages, 171–172 reproduction of, 180–181, 180f, 182f RNA as genetic material in, 184 temperate, 172 T-even, 169, 170f transfection and, 181–182 virulent, 172 Bacteriophage fX174, 216, 217t, 240 Bacteriophage G4, 240 Bacteriophage lambda (l), 216, 216f, 217t as cloning vector, 342 Bacteriophage MS2, 239 Bacteriophage T2, 180–181, 180f, 216, 217f, 217t Bacteriophage T4, 169–170, 170f Baculoviruses, 396–397 Balancer chromosomes, 129 Baldness, pattern, 85 Banding, chromosome, 224–225 Bar mutation, 126–127, 127f Barr bodies, 106–107, 107f Barr, Murray L., 106 Basal lamina, 332 Base(s), 21, 21f, 185–187, 185f pairing of, 188–189, 188f deamination and, 278 depurination and, 278 errors in, 277 standard vs anomalous, 277–278, 277f wobble hypothesis for, 238, 238t Base analogs, mutagenic, 279 Base excision repair, 284–285, 285f Base substitution mutations, 274, 274f Basic Local Alignment Search Tool (BLAST), 193, 364–365, 365f, 366, 390 Bateson, William, 79, 263–264, 439 B-DNA, 190 Beadle, George, 264–265, 385 Beadle-Tatum one-gene:one-enzyme hypothesis, 264–265 Beasley, J.O., 123 Beckwith-Wiedemann syndrome, 450, 484 Bees, colony collapse order and, 382–383 Beet, E.A., 266 Behavior, epigenetics and, 488 Behavioral mutations, 275 BER pathway, 285, 285f Berget, Susan, 246 Bertram, Ewart, 106 Bertrand, Kevin, 305 b-globin chain, 267, 267f b-globin gene, 22, 247, 381, 381f, 540 b-pleated sheet, 268, 269f b-subunit sliding clamp, 203, 206–207, 206f, 207f, 208 b-thalssemia, gene therapy for, 541 Beutler, Ernest, 107 bicoid gene, 92 Bidirectional replication, 200–201, 201f Binary switch genes, 429–430, 429f Biochemical mutations, 275 Bioengineering See Genetic engineering Biofactories, 396 Bioinformatics, 24, 152 BLAST Search Tool, 193, 364–365, 365f, 366, 390 databases and See Databases definition of, 364 gene prediction programs in, 365–366 genomics and, 362–363, 364–366 See also Genomics transcriptome analysis and, 383–384 Biolistic method, 528 Biological diversity See Variation Biology synthetic, 401 legal aspects of, 413–415 systems, 388–389, 390f Biopharmaceuticals, 395 Biopharming, 395 Bioreactors, 396, 400 Biotechnology, 23–24 agricultural, 23–24, 23f, 398–400, 451–452 See also Crop plants applications of, 23–24 cloning in See Clones/cloning definition of, 394 in dog breeding, 92–93 ethical issues in, 412–413 genetic engineering and, 394–416 See also Genetic engineering historical perspective on, 23–24 legal issues in, 412–415 in medicine, 24 nucleic acid–based gene silencing in, 250–251 overview of, 23–24 recombinant DNA technology and, 338–359 See also Recombinant DNA technology Biparental inheritance, 32 Bipotential gonads, 104 Birds, feathering in, 85, 85f bithorax complex, 426, 426f, 427t Bivalents, 38, 38f Blackburn, Elizabeth, 210 BLAST Search Tool, 193, 364–365, 365f, 366, 390 Blindness, gene therapy for, 540–541 Blood group antigens ABO, 72–73 Bombay phenotype and, 73, 73f MN, 72 Bloom syndrome, 154 Blue-white screening, 341, 343 Blunt ends, 339 Bombay phenotype, 73, 73f Bonds hydrogen, in DNA, 189 nucleotide, 186f, 187, 187f peptide, 267–268 phosphodiester, 187, 187f Boveri, Theodor, 20, 57, 136 Bovine spongiform encephalopathy, 270 Boyer, Herbert, 395 Branch diagrams, 53f, 56, 56f BRCA gene, 292, 334–335, 413 BRE element, 311 Breast cancer, 292, 334–335, 413, 514–516 Breeding, selective, 92–93, 398–399, 399f, 446–448, 453, 458 Brenner, Sidney, 232 Brewer, Kennedy, 510b Bridges, Calvin, 74, 109, 126, 142 Broad-sense heritability, 446 Brockdorff, Neil, 109 Bromodeoxyuridine, 154, 279, 280f 21-Bromouracil, as mutagen, 279, 280f Brooks, Levon, 510b Bt crops, 525–526 Buckland, Richard, 504b Buoyant density gradient centrifugation, 198 Burkitt lymphoma, 325 C CAAT box, 311, 312f Caenorhabditis elegans, 25, 26t development in, 432–435 overview of, 433–434, 433f signaling in, 432–433, 432t, 433f, 434f vulva formation in, 434–435 genome of, 378, 378t sex determination in, 102f, 110–111 Cairns, John, 200–201, 202 Calcitonin/calcitonin gene-related peptide gene, splicing of, 315, 315f Calico cats, 107, 108f cAMP, in catabolite repression, 302 Cancer, 323–335 abnormal cell growth in, 325, 328–329 age and, 325 angiogenesis in, 119 apoptosis and, 329 breast, 292, 334–335, 413, 514–516 Burkitt lymphoma, 325 causative agents in, Ames test for, 288f cell cycle regulation in, 328–329, 334 chromatin remodeling in, 327–328 chronic myelogenous leukemia, 327, 327f clonal expansion in, 326 clonal origin of, 324–325 colorectal See Colorectal cancer copy number variants and, 128 defective DNA repair in, 327 diet and, 334 DNA methylation in, 485–486 in Down syndrome, 118–119 environmental causes of, 333–334 epigenetic defects in, 327–328, 485–486, 485f fragile sites and, 132–133 as genetic disease, 324–326, 518 genomic instability and, 327 inherited susceptibility to, 332–333, 333t loss of heterozygosity in, 333 lung, 128, 132–133, 334, 514b malignant transformation in, 326 I N D EX I-3 metastasis in, 325, 332 methylation in, 485–486, 485f, 485t model organisms for, 25–26, 26t as multistep process, 325–326 mutations in, 325–326, 485–486 mutator phenotype and, 327 nucleic acid–based drugs for, 250–251 oncogenes and, 330, 330t p53 gene in, 331 personalized medicine and in diagnosis, 517–520 in treatment, 513–516, 519b pharmacogenomics and, 514b progression of, 325–326, 326t, 332 proto-oncogenes and, 330–332, 330t radiation and, 286, 334 retinoblastoma, 331–332 stem cells and, 325 telomeres in, 212 treatment of, 486 tumor suppressor genes and, 330–332, 330t, 331f virus-related, 333–334, 333t Cancer Genome Atlas, 372, 518 Capecchi, Mario, 355 Capillary electrophoresis, in STR DNA profiling, 505 Carboxyl group, in amino acids, 267 Carcinogens, 325 Ames test for, 288, 288f environmental, 333–334 Carcinoma, 326 See also Cancer Cardiomyopathy, 409 Carr, David H., 120 Carriers, 84 testing for, 518 Caspases, 329 Catabolite repression, 302, 303f Catabolite-activating protein, 302, 303f Cats, calico, 107, 108f Cattle, transgenic, 400, 400f Cavalli-Sforza, Luca, 162, 163 C-banding, 225 CCR5 gene, HIV infection and, 461–462, 462f, 543 cdc mutations, 37 cDNA libraries, 344–345, 345f Cech, Thomas, 247, 490 Celera Genomics, 368 Cell(s) daughter, 36 differentiated, 328 eukaryotic, 30 information flow in, 177, 177f, 232, 232f, 241 See also Transcription; Translation prokaryotic, 30 structure of, 29–30, 29f Cell coat, 29f, 30 Cell cycle, 33–37, 33f, 328–329, 329f in cancer, 328–329 checkpoints in, 37, 328–329, 334 interphase in, 33–35, 33f, 42–43, 42f meiosis in, 20, 37–40 See also Meiosis mitosis in, 20, 33–37 See also Mitosis regulation of, 37, 328–329 signal transduction in, 328 Cell death, programmed, 329 Cell division See also Cell cycle in bacteria, 30, 30f cytokinesis in, 33 karyokinesis in, 33 in mitosis, 33–37 cell division cycle (cdc) mutations, 37 Cell furrow, 36 Cell growth, regulation of, 328–329 Cell plate, 34f, 36 Cell surface receptors, 29f, 30 Cell theory, 18–19 Cell wall, 29f, 30 Cellulose, 30 CEN region, 226 centi-Morgan (cM), 142n Central carbon atom, of amino acids, 267 Central dogma, 177 Centrifugation, sedimentation equilibrium, 198 Centrioles, 29f, 30, 34f, 35 Centromeres, 31, 31f, 226–227 CEN region of, 226 definition of, 226 DNA sequences in, 226 in meiosis, 40 in mitosis, 35–36, 35f Centrosomes, 30, 35 Cetuximab (Erbitux), 515, 516t Chain elongation in replication, 201, 202f in transcription, 242–243 Chain-termination sequencing, 352–354, 373f Chambon, Pierre, 247 Chance deviation, 59 Chaperones, 269 Chargaff, Erwin, 178, 187 Charging, of tRNA, 257–258, 258f Chase, Martha, 181 Checkpoints, cell-cycle, 37, 328–329, 334 Chemical mutagens, 279–280, 280f Chemotherapy, 486 Chiasmata, 38f, 39, 140 Chimpanzees, genome of, 378t, 379–380 Chi-square analysis, 59–60, 60t, 61f Chloroplast(s), 30 chromosomes of, 217, 218f inheritance via, 89–90, 90f mutations and, 89–90, 90f Chloroplast DNA (cpDNA), 218, 218f Cholesterol, elevated, 17–18 Cholesterol-lowering agents, development of, 17–18 Chorionic villus sampling, 119, 402–403 Chromatids acentric, 129 dicentric, 129 nonsister, 38f, 39 sister, 31, 35, 35f, 37–39, 38f in meiosis, 37–40, 38f–39f in mitosis, 35, 35f, 154 Chromatin, 29, 29f, 30, 221–224 beads-on-a-string configuration of, 221, 221f closed, 308–309 coiling of, 222f, 223 definition of, 221 euchromatin, 224 heterochromatin, 224 histones and, 221–225, 222f interphase, 42–43, 42f nucleosomal, 221–223, 221f, 222f, 308–309 replication through, 209 structure of, 221–223, 221f, 222f Chromatin assembly factors, 209 Chromatin modification/remodeling, 223–224, 244, 308–310, 309f in cancer, 327–328, 486 epigenetic, 481–482, 483f in gene regulation, 308–310, 314 histone modification in, 308–309, 309f, 328 nucleosomal, 308–309, 309f in RNA-induced gene silencing, 317 Chromomeres, 219 Chromosomal sex determination, 111–112 Chromosomal theory of inheritance, 19, 20, 57, 84, 142 Chromosome(s), 20, 20f, 28, 30, 31 acrocentric, 31 bacterial artificial, 343 balancer, 129 breaks in, 123–124, 124f, 131–133, 132f in chloroplasts, 217, 218 circular bacterial, 164f, 165 daughter, 34f, 36 deletions in, 124–125, 124f, 127–128 diploid number of, 20, 31 disjunction of, 36, 39 duplication of, 126–128, 127–128, 127f E coli, mapping of, 164–165, 164f early studies of, 57 electron microscopy of, 42–43, 42f folded-fiber model of, 42f, 43 fragile sites in, 131–133, 132f genes on, 20, 20f, 57–58 haploid number of, 20, 31, 32t harlequin, 154 heteromorphic, 100 homologous, 20, 31, 31f, 32f, 32t, 42, 57–58 lampbrush, 220, 220f in meiosis, 37–40, 38f, 42–43, 42f, 57–58 metacentric, 31 in mitochondria, 217–218 in mitosis, 33–37, 34f, 42–43, 42f nondisjunction of, 39, 103, 116–117, 116f partitioning of, 33–37 See also Mitosis Philadelphia, 327, 327f polyploid, 120–123 polytene, 219–220, 219f sex, 32, 100, 101–111 See also X chromosome; Y chromosome sex-determining, 32 structure of, 31, 32, 32f, 215–229 in viruses, 216, 216f submetacentric, 31 telocentric, 31 telomeres of, 210–213 Ti plasmid, 343–344 yeast artificial, 343 Chromosome banding, 224–225 Chromosome maps See Maps/mapping Chromosome mutations See Mutation(s), chromosome Chromosome puffs, 219 Chromosome territory, 308 Chronic myelogenous leukemia, 327, 327f Church, George, 373 Cichlids, speciation and, 472–473, 473f cis-acting elements, 242, 244 cis-acting sites, 297 in gene regulation, 310–313 Cisgenic organisms, 523 Cistrons, 243 Clark, B., 257 Classical (forward) genetics, 24 Cleaver, James, 286 Cleidocranial dysplasia, 425–426 Clinical trials, for colon cancer, 416 Clonal cells, in cancer, 324–325 Clonal expansion, in cancer, 326 Clones/cloning, 23–24, 338, 339–347 of animals, 23–24, 23f, 108f definition of, 107 Lyon hypothesis and, 107–108 multiple, 341 whole-genome shotgun, 344 Cloning sites, multiple, 341 Cloning vectors, 23, 339, 340–344 bacterial artificial chromosome, 343 bacterial plasmid, 340–342, 341f expression, 343 Closed chromatin, 308–309 Cloverleaf model, of tRNA, 256, 257f cnRNA (CRISPR-derived RNA), 357, 493–494, 495b, 542–543 Coactivators, 314, 314f I-4 IN D EX Coat color, in mice, 74, 74f, 78–80 epigenetics and, 487–488, 487f Coding dictionary, 238–239, 238f CODIS (Combined DNA Index System), 510–511 Codominance, 72 Codons, 21 initiator, 239 nonsense, 261 termination (stop), 237, 239, 261 Coefficient of coincidence, 150–151 Coefficient of inbreeding, 470, 471f Cohesin, 35, 35f Cohesive ends, 339 Col plasmid, 168 Colchicine, polyploidy and, 121, 122f Colcins, 168 Colinearity, 239 Collagen, 270 Collins, Francis, 367 Colon cancer See Colorectal cancer Colony collapse order, 382–383 Color blindness, 84, 84f, 108 Colorectal cancer, 333 clinical trials for, 416 familial adenomatous polyposis and, 333 hereditary nonpolyposis and, 327 model organisms for, 25–26 progression of, 325–326, 326f, 332 Combined DNA Index System (CODIS), 510–511 Comparative genomics, 376–383, 378t applications of, 377 chimpanzee genome, 378t, 379–380 definition of, 377 dog genome, 378t, 379 maque genome, 378t, 380 multigene families and, 381 Neanderthal genome, 380, 475–476 prokaryotic, vs eukaryotic, 377–378 Rhesus monkey genome, 378t, 380 sea urchin genome, 378–379, 378t3 Compensation loop, 124, 125f Competence, 168 Complementarity, 189 Complementary DNA libraries, 344–345, 345f Complementary genes, 76, 79 Complementation, 80–82, 286 Complementation analysis, 80–82 Complementation groups, 81–82 Complex (multifactorial) traits, 439 heritability of, 444–448 See also Heritability Computerization See Bioinformatics Concordant twins, 448 Concurrent replication, 206–207 Conditional knockout, 357 Conditional mutations, 87, 87f, 208, 275 Confidentiality in genetic testing, 412–413 genomics and, 412–413, 415–416 Congenital disorders See Genetic disorders Conjugation, 161–166 F ′ merozygotes and, 166, 167f + in F X F matings, 161–166, 162f, 163f, 164f Lederberg-Tatum experiments on, 161–162 Consanguineous relatives, 62, 264 Consensus sequences, 209, 242 Conservative replication, 197 Constitutive enzymes, 297 Constitutive mutants, 299 Contigs (continuous fragments), 363, 363f Continuous replication, 205–206 Continuous variation, additive alleles and, 440 Cooperative binding, 303 Copia elements, 290–291, 291f Copy number variations, 127–128, 228–229, 370 in twins, 449 Core enzymes, 203 Core promoters, 244, 310, 311f transcription factors and, 313–315 Core sequences, 263 Corepressors, 304 Corey, Robert, 268 Correlation coefficient, 443 Correns, Carl, 19, 57, 89–90 Cosmic rays, as mutagens, 281, 281f Cotranscriptional splicing, 245f Cotransduction, 173 Coumadin (warfarin), 516–517 Covariance, 443 Cows, transgenic, 400, 400f cpDNA (chloroplast DNA), 218, 218f CpG islands, 224, 481, 481f Creighton, Harriet, 153 Creutzfeldt-Jakob disease, 270 Cri du chat syndrome, 125, 125f Crick, Francis, 21, 26, 184, 185, 196, 197, 233, 255, 262, 490 Criminality, 63,XYY karyotype and, 103 CRISPR/Cas technology, 357, 493–494, 495b in gene therapy, 543–544 Crisscross pattern of inheritance, 83–84 Crizotinib, 514b Croce, Carlo, 132–133 Crop plants genetically modified See Genetically modified crops selective breeding of, 23, 398–399, 399f, 451–452, 453 Crosses, 48–56 dihybrid, 52–55, 53f, 54f modified, 75–80 monohybrid, 48–52, 49f, 51f, 52f reciprocal, 49 testcrosses, 52, 52f trihybrid, 55–56, 55f, 57f Crossing over, 37, 38f, 40, 42, 136, 153–154 definition of, 140 duplications and, 126–128, 127f early studies of, 140–142 interference and, 150–151 inversions and, 128–129, 129f linkage with, 137–138, 137f See also Linkage mapping and, 140–152 See also Maps/mapping Crossover(s) double, 143–144, 143f frequency of, 140–142 multiple, 143–144 single, 142–143, 142f, 143f, 146 Crossover gametes, 137, 139f CRP2D6 gene, drug metabolism and, 516 crRNA biogenesis, 493 Cryo-EM, 262 CT/CGRP gene, splicing of, 315, 315f Cultures, bacterial, 160, 160f CURLY LEAF gene, 431 Cyclic adenosine monophosphate (cAMP), in catabolite repression, 302 Cyclin-dependent kinases, 329 Cyclins, 37, 329, 329f Cystic fibrosis, 152, 282t heterozygote frequency calculation for, 464 Cytokinesis, 33, 36 Cytoplasm, 30 Cytosine, 21, 185, 185f, 187 methylation of, 224 D Danio rerio, 25, 26t Darnell, James, 245 Darwin, Charles, 19, 457, 464–465 Databases See also Bioinformatics BLAST search tool for, 193, 364–365, 365f, 366, 390 CODIS, 510–511 Database of Genomic Variants, 228–229 DNA profile, 510–511 GenBank, 193 gene-expression, 383 Map Viewer, 370 for mapping, 152, 155 MaTCH, 449 Online Mendelian Inheritance in Man, 64–65, 282 PharmGKB, 517b REBASE, 359 Daughter cells, 36 Daughter chromosomes, 34f, 36 Davis, Bernard, 162 Davis U-tube, 162, 162f Dawson, Henry, 179 De Vries, Hugo, 19, 57 Deafness, 76 Deamination, 278, 278f Decitabine (Vidaza), 486 Deficiencies, 124–125, 124f Degrees of freedom, 60 Deletion editing, 249 Deletion loop, 124, 125f Deletions, 124–125, 124f–126f Della Zuana, Pascal, 508b DeLucia, Paula, 202 Denaturation, in polymerase chain reaction, 347 Denisovan fossils, 476 Deoxyribonuclease, 180 Deoxyribonucleic acid See DNA Deoxyribose, 185 Depurination, 278 Designer babies, 412–413, 415 DeSilva, Ashanti, 538–539 Determination, in development, 419 Development, 419–436 in Arabidopsis thaliana, 430–432, 431f, 432f homeotic genes in, 430–432, 430t, 431f binary switch genes in, 429–430, 429f in Caenorhabditis elegans, 432–435 overview of, 433–434, 433f signaling in, 432–433, 432t, 433f vulva formation in, 434–435 definition of, 420 determination in, 419 differentiation in, 419, 420 in Drosophila melanogaster, 421–428 embryogenesis in, 422–423, 422f homeotic mutations in, 426–428, 426f Hox genes in, 426–427, 426f, 427t, 428f, 431–432 overview of, 421, 421f, 422f segment formation in, 424–425, 425f, 426f specification in, 426–428 gene-regulatory networks in, 429–430, 429f human Hox genes in, 427–428, 429f segmentation genes in, 425–426, 426f specification in, 419, 426–428 variable gene activity hypothesis for, 420 Developmental biology, evolutionary, 435 Developmental genetics, 419–436 model organisms in, 421 Developmental mechanisms, evolutionary conservation of, 420–421 Diamond-Blackfan syndrome, 271 Dicentric bridges, 129 Dicentric chromatids, 129 Dicer, 495 Dictionary, coding, 238–239, 238f Dideoxynucleotide chain-termination sequencing, 352–354, 373f I N D EX I-5 Dideoxynucleotides, 352–354, 353f Diet cancer and, 334 epigenetics and, 488 Differentiated cells, 328 Differentiation, in development, 419 Dihybrid crosses, 52–55, 53f, 54f modified, 75–80 Dipeptides, 268 Diplococcus pneumoniae, transformation in, 178–180, 178t Diploid number, 20, 31, 57, 116t Diploid organisms homologous chromosomes in, 31, 31f, 32f, 32t, 42 sexual reproduction in, 42 Direct terminal repeats, 290–291 Directional selection, 466, 466f Direct-to-consumer genetic testing, 413 Discontinuous replication, 205–206 Discordant twins, 448 Diseases See Genetic disorders Disjunction, 36, 39 Dispersed promoters, 310, 310f Dispersive replication, 197–198 Disruptive selection, 467, 467f Dissociation mutations, 289–290 Distribution, normal, 442, 442f Dizygotic twins, 448, 454 See also Twin studies in pedigrees, 62, 62f DMPK gene, 316 DNA alternative forms of, 190 Alu family, 227 analytic techniques for, 191–192, 349–351 electrophoresis, 192, 192f melting profile, 191, 191f molecular hybridization, 191 nucleic acid blotting, 350–352 restriction mapping, 349–350 bacterial, 216, 217t centromeric, 226–227 chloroplast, 217, 218, 218f coiling/supercoiling of, 204, 221, 222 compaction of, 223 complementary, 344–345, 345f denaturing/renaturing of, 191 distribution of, 183, 183t double-stranded breaks in, 210 repair of, 286–287 early studies of, 20–21 as genetic material, 178–184 Alloway’s experiment and, 179–180 Avery-Collins-McCarty experiment and, 179–180, 180f Dawson’s experiment and, 179–180, 180f direct evidence for, 183–184 Griffith’s experiment and, 178–179 Hershey-Chase experiment and, 180–181, 182f indirect evidence for, 183–184 transfection experiments and, 181–182 information expression by, 177 information storage in, 177 L1, 227 major/minor groove in, 188, 188f mitochondrial, 217–218, 218f, 240 in DNA profiling, 507 hypervariable segments I and II in, 507 mutations in See Mutation(s) naked, in gene therapy, 538 noncoding, 490 long, 482–483 small, 317, 492 organization in chromosomes, 221–225 packaging of, 221–223, 222f recombinant, 338 See also Recombinant DNA technology repetitive, 225–228, 378 ribosomal, 126, 256 satellite, 225–226, 225f single-crystal X-ray analysis of, 190 as source of variation, 177 structure of, 21, 21f, 184–189 See also Double helix Watson-Crick model of, 188–189, 188f synthesis of, 196–213 See also Replication telomeric, 211–212 transformation studies of, 178–180 type A, 190 type B, 190 type P, 190 type Z, 190 X-ray diffraction analysis of, 186f, 187–188 DNA cloning vectors See Cloning vectors DNA fingerprinting, 227, 503–504 See also DNA profiling DNA forensics See DNA profiling DNA glycosylase, 285, 285f DNA gyrase, 204–205, 207, 207f DNA helicase, 154 DNA libraries, 344–347, 345f, 346f complementary, 344–345, 345f definition of, 344 genomic, 344 DNA ligase, 206, 340 DNA markers, in mapping, 152 DNA methylation in cancer, 485–486, 485f epigenetic, 481, 481f, 483f in gene regulation, 309–310 in genomic imprinting, 89, 483 in mismatch repair, 283 DNA microarrays, 383–384, 384f, 405–407, 406f See also Microarrays DNA polymerase(s) in bacteria, 201–207, 202f, 203t, 206f, 207f in eukaryotes, 209 in polymerase chain reaction, 347–349 in proofreading, 207, 283 vs RNA polymerase, 241 DNA polymerase I, 201–202, 202f, 203t DNA polymerase II, 203, 203t DNA polymerase III, 203–204, 203t, 205, 205f, 206, 206f, 207, 207f in mismatch repair, 283 processivity of, 204, 206, 209 in proofreading, 283 DNA polymerase III holoenzyme, 203–204, 203t, 204f, 206 DNA polymerase IV, 203 DNA polymerase V, 203 DNA probes, 191 allele-specific oligonucleotides, 404–407, 405f in library screening, 345–347, 346f DNA profiling, 503–512 autosomal STR, 505–506, 505f–507f, 509t in criminal identification, 503, 504b, 510b databases for, 510–511 definition of, 503 DNA fingerprinting, 227, 503–504 ethical issues in, 511–512 historical perspective on, 504b limitations of, 510, 511–512 mitochondrial, 507 product rule in, 509 profile interpretation in, 508–511 profile probability and, 509 profile uniqueness and, 509–510 prosecutor’s fallacy and, 510 single-nucleotide polymorphism, 507–508, 508f technical issues in, 511 in wildlife forensics, 508b Y chromosome, 506 DNA recombination See Recombination DNA repair, 202, 282–288 base excision, 284–285, 285f in cancer, 327 double-stranded break, 287 in eukaryotes, 286–287 homologous recombination, 287 mismatch, 283 nonhomologous end joining in, 287 nucleotide excision, 283, 284–285, 285f p53 protein in, 331 photoreactivation, 284, 284f postreplication, 283, 284f SOS, 283 DNA replicase, 184 DNA replication See Replication DNA sequences interspecies similarities in, 370, 376–381, 381– 383, 386 See also Comparative genomics phylogenetic trees for, 474 repetitive, 225–228, 225f, 378 See also Repetitive DNA DNA sequencing annotation in, 364, 365–366, 365f databases for, 364–365, 365f See also Databases in dogs, 93 in functional genomics, 366–367 metagenomic, 381–383 of microbial communities, 381–383 next-generation, 354 Sanger, 352–354, 373f true single-molecule sequencing, 519b whole-genome shotgun, 362–383 DNA template, for transcription, 242, 243f DNA topoisomerases, 204–205 DNA typing See DNA profiling DNA viruses, cancer and, 333, 333t DnaA, 204 DnaB, 204 DNA-based vaccines, 398 DNA-binding proteins, 216 DnaC, 204 DNA-sequence alignment, 363 Dog(s) genome of, 93, 378t, 379 progressive retinal atrophy in, 92–93 selective breeding of, 458 Dog Genome Project, 93 Dolly (cloned sheep), 23–24, 23f Domains, protein, 270–271 structural analysis of, 367 Dominance, 71, 72f codominance and, 72 incomplete (partial), 71, 72f pedigrees and, 62–64, 63f Dominance variance, 446 Dominant alleles, lethal, 74–75 Dominant epistasis, 79 Dominant lethal alleles, 74–75 Dominant mutations, 275 gain-of-function, 70, 275 negative, 70, 275 Dominant traits, 48, 49–50 pedigrees and, 62–64 Doping, gene, 546b Dosage compensation, 106–109 Barr bodies and, 106–107, 108f Lyon hypothesis and, 107–108, 108f Double crossovers, 143–144, 143f Double helix, 21, 21f, 188–189, 188f antiparallel strands in, 205–206 left-handed, 190 unwinding of, 204–205 The Double Helix (Watson), 184 I-6 IN D EX doublesex gene, 110 Double-stranded breaks, 210 repair of, 286–287 Down, Langdon, 117 Down syndrome, 117–120, 118f, 119f familial, 130–131, 130f paternal age effect and, 105 Down syndrome critical region, 118–119 DPE sequence motif, 311 Driver mutations, in cancer, 326 Drosha, 496 Drosophila melanogaster Adh locus in, 458, 459f Bar mutation in, 126–127, 127f development in, 421–428 embryonic, 92, 422–423, 422f gene-regulatory networks in, 429–430, 429f homeotic mutations in, 426–428, 426f Hox genes in, 426–427, 426f, 427t, 428f, 431–432 overview of, 421, 421f, 422f segment formation in, 424–425, 425f, 426f specification in, 426–428 eye color in, 82–83, 83f eye development in, 429–430, 429f eyeless mutation in, 86, 86f genome of, 378 as model organism, 25, 25f, 26t mutations in, 20, 20f sex determination in, 109–110, 110f three-point mapping in, 144–147, 145f transposable elements in, 290–291, 291f white locus in, 73–74 wingless mutation in, 81–82, 81f Drug development, 17–18 of antisense therapeutics, 250–251 biotechnology in, 24, 396–397, 400 genomics in, 372, 513–518 See also Pharmacogenomics rational drug design and, 411 Drug therapy adverse reactions in, 516–517 genomics in, 513–518 See also Pharmacogenomics Drug-resistant bacteria, R plasmids and, 168, 289 Ds elements, in maize, 289–290 DSCR1 gene, in Down syndrome, 118–119 Duchenne muscular dystrophy, 84, 87, 250–251, 291 Duplications, 124f, 126–128, 127f copy number variants and, 127–128 definition of, 126 in evolution, 127 gene families and, 127 in gene redundancy, 126 variation from, 127, 127f Dwarfism, achondroplastic, 282t, 292, 468 dystrophin gene, 250, 291 E E (exit) site, 260, 262 Ebola vaccine, 398 E-cadherin glycoprotein, 332 EcoRI, 339, 340f Edgar, Robert, 170 Edwards, Robert, 27 Edwards syndrome, 120 Electron microscopy of chromosomes, 42–43, 42f time-resolved single particle, 262 of transcription, 250 Electrophoresis, 192, 192f Electroporation, 341 Elongation complex, 314 ELSI Program, 367–368, 412 Embryo selection, 412–413, 415 Embryogenesis, in Drosophila melanogaster, 92, 422–423, 422f Embryonic stem cells, 435–436 cancer and, 325 in knockout technology, 356–357 Encyclopedia of DNA Elements (ENCODE) Project, 374 Endogenous siRNA, 496 Endonucleases in mismatch repair, 283 restriction, 23 Endoplasmic reticulum, 29f, 30 Enhancement gene therapy, 545–546 Enhanceosomes, 314, 314f Enhancers, 244–245, 311, 314, 314f Environmental carcinogens, 333–334 Environmental genomics, 381–383 Environmental risks, of genetically modified organisms, 531–533 Environmental variation, 445 Enzymes, 22, 270 constitutive, 297 core, 203 holoenzymes, 203 inducible, 297 one-gene:one-enzyme hypothesis and, 265 photoreactivation, 284, 284f restriction, 339–340 Ephrussi, Boris, 90, 264 Epidermal growth factor receptor, 128, 515 Epidermolysis bullosa, gene therapy for, 543 Epigenesis, 18, 76 Epigenetic traits, 480 Epigenetics, 89, 109, 480–489 behavior and, 488 cancer and, 327–328, 485–486, 485f, 485t chromatin modification and, 481–482, 483f definition of, 224, 327, 480 in development, 483–485, 484f early childhood experiences and, 488 environmental factors in, 486–488 genome alteration and, 480–483 histone modification and, 481–482, 483f historical perspective on, 481, 481b imprinting and, 483–485, 484f mechanisms of, 480–483 methylation and, 481, 481f, 483f prenatal environment and, 488 stress and, 488 twins and, 450 variation and, 480, 481f Epigenome, 480–483, 481f, 488 Epimutations, 484 Episomes, 172 Epistasis, 76–80 dominant, 79 recessive, 79 Erbitux (cetuximab), 515, 516t Erlotinib (Tarceva), 516t Escherichia coli See also Bacteria DNA repair in, 283–285 gene regulation in, 297–303 genome of, 377, 378, 378t as model organism, 25, 25f, 26t cell division in, 29f, 30 circular chromosome of, 164f, 165 gene mapping for, 164–165, 164f replication in, 201–207, 208t transcription in, 241–243, 243f Estrogens, in sex differentiation, 112 Ethical issues DNA profiling, 511–512 ELSI Program and, 367–368, 412 embryo selection, 412–413 gene therapy, 545–546 genetic testing, 412–413, 415 genome sequencing, 367–368, 414–416 newborn screening, 414 patents, 415 personalized medicine, 520–521 prenatal diagnosis, 412–413, 415 stem cells, 435–436 synthetic biology, 415 whole-genome sequencing, 414–415 Ethical, Legal, and Social Implications (ELSI) Program, 367–368, 412–413, 414–415 22-Ethylguanine, as mutagen, 279, 280f Eubacteria, 30 Euchromatic regions, of Y chromosome, 104 Euchromatin, 224 Eugenics, 415 Eukaryotes, 30 cells of, 30 chromatin in, 221–225, 244 DNA repair in, 286–287 gene mapping in, 140–155 gene regulation in, 307–317 genome of, 225–229 See also Genome, eukaryotic recombination in, 160 replication in, 199–200, 208–212 Euploidy, 116, 116t Evans, Martin, 355 Evolution Darwin’s theory of, 19, 457, 464–465 founder effect in, 469 gene duplication in, 127 genetic drift and, 469 human, 475–477 inversions in, 129 macroevolution and, 458 microevolution and, 458 migration and, 468, 469f of multigene families, 381, 381f mutations and, 467–468 natural selection in, 19, 457, 465–467 See also Natural selection neo-Darwinism and, 457 nonrandom mating and, 470–471 out-of-Africa hypothesis and, 476–477 phylogenies and, 473–474 rate of, 474–475, 475f RNA World and, 491 speciation in, 457–458, 471–473 transposable elements in, 292 in vitro, 491–492 Evolution by Gene Duplication (Ohno), 127 Evolutionary developmental biology, 420 Evolutionary genetics, 457–458 Excision repair base, 284–285, 285f nucleotide, 284–286, 285f Exit (E) site, 260, 262 Exome sequencing, 373–374, 409 Exons, 246, 366 sequencing of, 373–374 Exon-skipping therapy, 250 Exonucleases, in mismatch repair, 283 Expansion sequences, 263 Expressed sequence tags, 383 Expression microarrays, 405–407, 406f, 408f Expression platform, 306 Expression quantitative trait loci, 452–453 Expression vectors, 343 Expressivity, 86 Extension, in polymerase chain reaction, 347 Extracellular matrix, 332 Extracellular RNA (exRNA), in signaling, 500b Extranuclear inheritance, 89–92 Eye color, in Drosophila melanogaster, 82–83, 83f eyeless mutation, 86, 86f I N D EX I-7 F F factor, 162–166, 163f, 164f as plasmid, 162, 167–168 F′ merozygotes, 166, 167f F pilus, 162 F+ X F matings, conjugation in, 161–166, 163f, 164f F1 generation, 48 F2 (25:19:19:17) dihybrid ratio, 55 modification of, 75–76, 75f F2 generation, 48 Fabbri, Muller, 133 Familial adenomatous polyposis, 333 Familial hypercholesterolemia, 282t pedigree for, 63–64 Family Traits Inheritance Calculator, 415 Fecal microbial transplantation, 391 Females See under Sex Fertility factor See F factor Fiers, Walter, 239 Finch, John T., 222 Fink, Gerald, 122 Finnegan, David, 290 Fire, Andrew, 494 First filial (F1) generation, 48 First polar body, 40–41, 41f Fischer, Emil, 267 Fischer, Niels, 262 Fish genetically modified, 525b, 533–534 transgenic, 400 FISH (fluorescent in situ hybridization), 192, 192f, 350–352 Fitness, natural selection and, 465–466 21’ cap, 245, 245f Flavell, Richard, 246 Flemming, Walter, 57 Floral meristem, 430 Flowering plants See also Arabidopsis thaliana; Plants chloroplast-based inheritance in, 89–90, 90f development in, 430–432, 430t, 431f, 432f Fluorescent in situ hybridization (FISH), 192, 192f, 350–352 fmet (formylmethionine), 259 in transcription, 239 Focused promoters, 310, 310f, 311f Folded-fiber model, 42f, 43 Fomivirsen (Vitravene), 250 Food genetically modified See Genetically modified crops nutritionally enhanced, 399 Food and Drug Administration (FDA), 395 Forensic science See DNA profiling Forked-line method, 53f, 56, 56f Formylmethionine (fmet), 259 in transcription, 239 61,X karyotype, in Turner syndrome, 102, 102f, 107, 107f 61,X/62,XX karyotype, 103 61,X/62,XY karyotype, 103 63,XXX syndrome, 103 63,XXY karyotype, in Klinefelter syndrome, 102f, 103, 107, 107f 63,XYY condition, 103 64,XXXX karyotype, 103 64,XXXY karyotype, 103 64,XXYY karyotype, 103 65,XXXXX karyotype, 103 65,XXXXY karyotype, 103 65,XXXYY karyotype, 103 Forward genetics, 24 Fossils Denisovan, 476 human, 476–477 Founder effect, 469, 470f Fragile sites, 131–133, 132f Fragile-X syndrome, 132, 132f Frameshift mutations, 233, 233f, 274, 274f Franklin, Rosalind, 188, 189 Fraternal twins, 448, 454 See also Twin studies in pedigrees, 62, 62f Free radicals, as mutagens, 281 Fruit flies See Drosophila melanogaster Functional genomics, 366–367 Fungi, meiosis in, 42 Fusion proteins, 396 G G0 phase, 33f, 35 G1 cyclins, 122 G1 phase, 33–35, 33f G1/S checkpoint, 328–329 G2 phase, 33–35, 33f G2/M checkpoint, 328–329 G6PD deficiency, Lyon hypothesis and, 107–108 Gain-of-function mutations, 70, 275 b-Galactosidase in cloning, 341–342 in lac operon, 298, 298f Gall, Joe, 210, 226 Gametes, 28 chromosome number in, 57–58 crossover, 137 formation of, 57–58 inversion during, 128–129, 129f noncrossover, 137 parental, 137, 140 reciprocal classes of, 146 recombinant, 137, 140 unbalanced, 130, 130f Gametogenesis, 40–42, 41f Gametophyte stage, 42 Gamma rays, as mutagens, 281, 281f Gap genes, 424, 424f, 424t Gap phases, 33–35, 33f Gardasil, 398 Garrod, Archibald, 263–264 G-banding, 225 GC box, 311 Gel electrophoresis, two-dimensional, 385–386, 386f Gelsinger, Jesse, 539–540, 546 GenBank, 193, 364 Gender See under Sex Gene(s) See also specific genes alleles and, 70–75 See also Allele(s) binary switch, 429–430, 429f on chromosomes, 20, 20f, 57–58 comparative size of, 247t complementary, 76 definition of, 50 functional assignment of, 369, 369f globin, 366–367, 381, 381f homeotic, 426–432 See also Hox genes homologous, identification of, 366–367 intervening sequences in, 246, 246f, 247 jumping, 288–292 See also Transposable elements linked, 75–76, 136–138, 137f, 139f, 169 See also Linkage marker, 529 maternal-effect, 422–423, 423f in multigene families and superfamilies, 381 number in genome, 369, 377, 378t one-gene:one-enzyme hypothesis and, 265 orthologs, 366 overlapping, 240 paralogs, 367 patented, 413–414 polygenes, 438 pseudogenes and, 228 repressor, 299 segment polarity, 424–425, 424t, 425f, 426f segmentation, 423–426, 424t split, 244, 246 structural, 298–299 symbols for, 50, 70–71 unit factors as, 49, 50, 57, 264 zygotic, 422–426, 423f, 424t Gene chips, 383–384, 384f, 405–407, 406f Gene density, 369, 377, 378t Gene doping, 546b Gene duplications, 126–128, 127f copy number variants and, 127–128 Gene editing, 357 CRISPR/Cas technology for, 357, 493–494, 495b, 542–543 in gene therapy, 357, 541–544 interference in, 493–494 Gene expression, 21f, 22 DNA methylation and, 89, 309–310 dosage compensation and, 106–109 epigenetics and, 89 expressivity and, 86 extranuclear inheritance and, 89–92 gene silencing and, 88–89 genetic anticipation and, 88 genetic background and, 86 genomic (parental) imprinting and, 88–89, 108–109, 483–485 global analysis of, 383–384 maternal effect and, 92 onset of, 87–88 penetrance and, 86 phenotypic, 86–88 position effects in, 86, 224 in prokaryotes, repressors in, 299, 302, 304 regulation of See Gene regulation sex-linked/sex-influenced inheritance and, 84–85 single-gene, multiple effects of, 82 splicing in, 249, 315–316, 315f, 316f See also Splicing temperature effects in, 87, 87f transcriptome in, 383–384 X-linked inheritance and, 82–84, 83f, 84f Gene families, 127 Gene flow, from genetically modified crops, 533 Gene function cell structure and, 29–30 prediction by sequence analysis, 366–367 Gene gun, 528 Gene inactivation See also Gene silencing in dosage compensation, 106–109 mechanism of, 108–109 Gene interaction complementary, 76, 79 definition of, 76 epigenesis and, 76 epistasis in, 76–80 phenotypes and, 76–78 novel, 80 Gene knockout, 24, 354–357 conditional, 357 Gene locus, 32, 57 Gene mapping See also Maps/mapping in bacteria, 159–174 Escherichia coli, 163–165 interrupted mating technique for, 163–165, 164f transformation in, 168–169 in bacteriophages, 173 in eukaryotes, 140–155 accuracy of, 151 autosomal, 147–149, 148f I-8 IN D EX Gene mapping (Continued) bioinformatics and, 152 DNA markers in, 152 gene distance in, 140–142, 151 of gene sequences, 146–147 interference in, 151 linkage, 140–152 microsatellites in, 152 reciprocal classes in, 146 restriction fragment length polymorphisms in, 152 sequence maps in, 152 single crossovers and, 142–143, 142f, 143f, 146 single-nucleotide polymorphisms in, 152 steps in, 147–149 three-point, in Drosophila melanogaster, 144–147, 145f for genetic diseases, 370, 371f Internet resources for, 152, 155 Gene mutations See Mutation(s), gene Gene pills, 538 Gene pool, 458 Gene prediction programs, 365–366 Gene redundancy, 126 Gene regulation, 296–319 DNA methylation in, 309–310 in eukaryotes, 307–317 activators in, 312, 314 alternative splicing in, 315–316, 315f, 316f See also Splicing chromatin modification in, 308–310, 314 See also Chromatin modification/ remodeling cis-acting sites in, 310–313 enhancers in, 311 histone modification in, 308–309, 309f mRNA stability and, 316–317 overview of, 307–308, 307f posttranscriptional, 315–317 posttranslational, 317 promoters in, 244, 310–311, 310f, 311f repressors in, 312, 314 RNA-induced gene silencing in, 317 silencers in, 244, 312, 314 transcriptional, 307–315 translational, 317 in prokaryotes, 297–303 catabolite repression in, 302–303, 303f corepressors in, 304 inducible, 297, 299, 299f lac operon in, 298–303 See also lac operon mutations in, 299 negative, 297, 299–302 positive, 297, 302–303, 303f repressible, 297, 302–303 riboswitches in, 306–307, 307f RNA secondary structures in, 305–307, 306f structural genes in, 298–299 transcription attenuation in, 304, 305–306 trp operon in, 304, 305f RNA interference in, 317 Gene sequences, mapping of, 146–147 See also DNA sequencing; Gene mapping Gene silencing in dosage compensation, 106–109 in gene therapy, 544–545 genomic imprinting and, 88–89, 108, 483–485 mechanism of, 108–109 nucleic acid-based drugs in, 250–251 RNA-induced, 317, 482, 494, 497–498 Gene targeting See Gene editing Gene therapy, 411, 535–546 approval for, 542b conditions used for, 535 CRISPR/Cas technology in, 543–544 definition of, 535 enhancement, 545–546 ethical concerns in, 545–546 future of, 545–546 gene delivery methods in, 535–538 gene pills, 538 nonviral, 538 viral vectors, 536–538, 540–541 gene editing in, 357, 541–544 gene silencing in, 544–545 for hemophilia, 541 for HIV infection, 542–543 for lipoprotein lipase deficiency, 542b for metachromatic leukodystrophy, 541–542 for retinal blindness, 540–541 setbacks in, 539–541 for severe combined immunodeficiency, 538–539, 540 somatic, 545 successful trials of initial, 538–539 recent, 541–542 targeted approaches for, 542–545 transcription activator–like effector nucleases in, 357, 542 translational medicine and, 535 for Wiskott-Aldrich syndrome, 541–542 zinc-finger nucleases in, 357, 541–542 Gene transfer horizontal, 161 vertical, 161 Gene-expression analysis DNA microarrays in, 383–384, 384f transcriptome, 383 Gene-expression databases, 383 Gene-expression microarrays, 405–407, 406f, 408f Genentech, 395 GenePeeks, 415 Gene-protein correlation, 384–385 General transcription factors, 245, 313–315, 314f Gene-regulatory networks, 429–430, 429f Genetic anticipation, 88, 132 Genetic background, phenotypic expression and, 86 Genetic bottlenecks, 469 Genetic code, 22, 232–240 amino acid assignment in, 236–239, 236t, 237f, 237t, 238f coding dictionary for, 238–239, 238f confirmation of, 239 deciphering of by Grunberg-Manago and Ochoa, 233 by Nirenberg and Matthaei, 233 polynucleotide phosphorylase in, 233, 233f repeating RNA copolymers in, 236–237 RNA heteropolymers in, 234, 235f RNA homopolymers in, 234 triplet binding assay in, 234–236, 236f, 236t in vitro protein-synthesizing system in, 233 degeneracy of, 232, 236, 238 early studies of, 232–233 exceptions to, 239–240, 240t frameshift mutations and, 233, 233f general features of, 232 nonoverlapping, 232 ordered, 239 triplet nature of, 232–233, 233f unambiguous nature of, 232, 236 universality of, 232, 239–240 wobble hypothesis and, 238 Genetic counseling, 119–120 Genetic disorders achondroplasia, 282t, 292, 468 age at onset of, 87–88 albinism, 62–63, 469–470, 470f alkaptonuria, 264 Angelman syndrome, 88, 485 arginosuccinate aciduria, 477 arrhythmogenic right ventricular cardiomyopathy, 409 Beckwith-Wiedemann syndrome, 450, 484 bioengineered therapeutics for, 395–398, 396t Bloom syndrome, 154 cancer, 324–326 See also Cancer carriers of, 84 testing for, 518 cleidocranial dysplasia, 425–426, 426f copy number variants and, 128 cri du chat syndrome, 125, 125f cystic fibrosis, 152, 282t, 464 diagnosis of, 24, 402–408 See also Genetic testing personalized medicine and, 517–520 Diamond-Blackfan syndrome, 271 Down Syndrome 117-104, 130-115 Edwards syndrome, 120 epidermolysis bullosa, 543 epigenetics and, 484–485 expression quantitative trait loci and, 452–453 familial hypercholesterolemia, 63–64, 282t fragile-X syndrome, 132, 132f G6PD deficiency, 107–108 gene mapping for, 152, 370, 371f See also Gene mapping gene therapy for, 411 genetic anticipation in, 88 genetic engineering and, 402–408 See also Medical applications, of genetic engineering and genomics genome sequencing for, 370, 408–409 genomic imprinting in, 88–89, 108, 484–485, 484f, 484t hemophilia, 291, 541 Huntington disease, 63, 74–75, 88, 282t inborn errors of metabolism, 263–264 interrelationships of, 388–389, 390f Klinefelter syndrome, 102–103, 102f, 107 Leber congenital amaurosis, 540–541 Lesch-Nyhan syndrome, 87 lipoprotein lipase deficiency, 542b Marfan syndrome, 82, 275, 282t metachromatic leukodystrophy, 541 microbiome and, 391 model organisms for, 25–26, 26t muscular dystrophy, 84, 87, 88, 250–251, 291, 319 myoclonic epilepsy and ragged-red fiber disease, 91 myotonic dystrophy, 88, 316 newborn screening for, 414 Noonan syndrome, 412 ornithine transcarbamylase deficiency, 539–540 Patau syndrome, 120, 120f paternal age effect and, 105 pattern baldness, 85 porphyria variegata, 82 Prader-Willi syndrome, 88, 485 preconception testing for, 415 Proteus syndrome, 409 Rubenstein-Taybi syndrome, 486 severe combined immunodeficiency, 538–539, 540 sickle-cell anemia, 22, 266–267, 266f, 270, 404 single-gene, 281–282, 282t systems biology model of, 388–389, 390f Tay-Sachs disease, 69, 71, 87 Turner syndrome, 102–103, 102f, 107 in vitro fertilization and, 484–485 Wiskott-Aldrich syndrome, 541 xeroderma pigmentosum, 286, 286f, 327 Genetic diversity See Variation I N D EX I-9 Genetic drift, 469–470, 470f Genetic engineering, 394–416 See also Biotechnology; Recombinant DNA technology definition of, 394 ethical aspects of, 412–415 genetically modified organisms and, 395–400, 522–533 See also under Transgenic medical applications of, 402–408 Genetic information flow, 177, 232, 232f, 241 Genetic Information Nondiscrimination Act, 413 Genetic linkage See Linkage Genetic material definition of, 176 DNA as, 178–184 essential characteristics of, 177 protein as, 177–178 RNA as, 184 Genetic recombination See Recombination Genetic testing, 24, 402–408, 518 allele-specific oligonucleotides in, 404–405, 405f carrier, 518 direct-to-consumer, 413 DNA microarrays in, 405–407 ethical aspects of, 367–368, 412–413, 415–416 gene-expression microarrays in, 405–407, 406f, 408f preconception, 415 predictive, 518 preimplantation, 404–405, 412, 518 prenatal, 119–120, 402–403, 402f, 518 privacy issues and, 412–413, 415–416 RFLP analysis in, 403–404, 404f Genetic Testing Registry, 413 Genetic variation See Variation Genetically modified crops, 23, 398–399, 399f, 400, 451–452, 453, 523–534 approved, 524t controversial aspects of, 531–533 creation of, 528–531 Agrobacterium tumefaciens–mediated transformation in, 528 biolistic method of, 528 selectable markers in, 528–529 definition of, 523 development of, 523 environmental effects of, 531–533 future of, 533–534 gene flow from, 533 herbicide-resistant, 23, 399, 523–524 insect-resistant, 525–526 outcrossing from, 533 overview of, 523–524 papaya, 526b quantitative trait loci in, 451–452, 452f rice, 453, 526–527, 527–528, 528–529 safety of, 531 soybeans, 529 tomatoes, 523, 524 types of, 524t Genetically modified organisms (GMOs), 395–400, 396t, 523 animals, 399–400, 400f, 524 definition of, 394 overview of, 394–395 plants, 23, 398–399, 399f, 400f, 451–452, 453 See also Genetically modified crops Genetically modified plants Bt, 525–526 creation of, 528–531 crop See Genetically modified crops quantitative trait loci in, 451–452 vaccines from, 398 Genetics classical (forward), 24 definition of, 19 developmental, 419–436 evolutionary, 457–458 historical perspective on, 26–27, 26f Mendelian, 47–65 Nobel Prizes for, 27 population, 457–464 reverse, 24 societal impact of, 27 terminology of, 50 timeline for, 26f transmission, 47 Genic balance theory, 110 Genital ridges, 104 Genome(s) copy number variations in, 228–229, 370 definition of, 23, 31, 361 duplication of, 127 epigenetic alterations to, 480–483 eukaryotic, 93, 225–229 gene density in, 377, 378t introns in, 377–378 repetitive sequences in, 378 vs prokaryotic, 377–378, 378t gene density in, 369, 377, 378t haploid, 373 human functional categories for, 369, 369f major features of, 368–369, 368t sequencing of See Human Genome Project interspecies similarities in, 369, 376–383 See also Comparative genomics of model organisms, 378, 378t noncoding regions of, 228, 490 physical maps of, 152 prokaryotic basic features of, 377 gene density in, 377 vs eukaryotic, 377–378, 378t reference, 368 sequencing of See Genome sequencing synthetic, 401, 401f Genome 10K plan, 376 Genome editing, CRISPR/Cas, 357, 493–494, 495b, 542–543 Genome, protein set of, 384–387 See also Proteomics Genome scanning, 407 Genome sequencing, 24, 152, 367–370, 369f See also DNA sequencing for dogs, 93 ethical issues in, 367–368, 412, 414–416 exome, 409 high throughput, 364 for humans See Human Genome Project individual, 408–409 medical applications of, 408–409 for nonhuman organisms, 374–376, 378–381 See also Comparative genomics personalized, 372–373, 518–520, 519b privacy issues and, 412–413, 415–416 single-cell, 409 whole-genome (shotgun), 362–363, 363f whole-genome amplification and, 409 Genome-wide association studies, 409–411 Genomic imprinting, 88–89, 108, 483–485 epigenetics and, 483–485, 484f genetic disorders and, 88–89, 484–485 Genomic libraries, 344 Genomic variation, 370, 459 Genomics, 24, 347 bioinformatics in, 363, 364–366 comparative, 376–383, 378t definition of, 361 DNA sequencing in See DNA sequencing environmental, 381–383 functional, 366–367 Genome 10K plan and, 376 genome sequencing in See Genome sequencing Human Genome Project and See Human Genome Project Human Microbiome Project and, 374–375 medical applications of, 24, 370, 402–408, 513–521 metagenomics, 381–383 overview of, 361–362 paleogenomics, 475–476 personalized medicine and, 372–373, 411, 513–521 See also Personalized medicine privacy issues and, 412–413, 415–416 Stone Age, 376, 380–381, 475–476 structural, 362 techniques in, 361–362 transcriptome analysis and, 383–384 Genomics era, 347 Genotype(s), 20 definition of, 50 expressivity of, 86 penetrance of, 86 phenotypes and, 22, 438–439 See also Heritability Genotype frequency, Hardy-Weinberg law and, 459–464 Genotype-by-environment interaction variance, 445 Genotypic ratios, 71 Genotypic sex determination, 111–112 Genotypic variation, 445 Genotyping microarrays, 407–408 George III (King of England), 82 Germ cells, mutations in, 276 German, James, 154 Germ-line therapy, 545 Germ-line transformation, 291 Germ-line transpositions, 292 Gilbert, Walter, 302, 491 Gleevec (imatinib), 411, 516t Global analysis of gene expression, 383–384 Global Ocean Sampling Expedition, 382, 382f Globin genes, 366–367, 381, 381f GloFish, 400 Glucose-22-pyruvate dehydrogenase (G6PD) deficiency, Lyon hypothesis and, 107–108 Glutamate receptor channels (GluR), 249 Glycocalyx, 29f, 30 Glycomics, 372 Glyphosphate (Roundup), 524–525, 529, 533 GMOs See Genetically modified organisms (GMOs) Goldberg-Hogness box, 244 Golden Rice, 453, 526–527, 529–530 Gonadal ridges, 104 Gonads, bipotential, 104 Gossypium, 123, 123f G-quartets, 210 Gratuitous inducers, 299 Greece, Ancient, 18 Green Revolution, 453 Greider, Carol, 211 Griffith, Frederick, 178–179 gRNA (guide RNA), 249 Growth hormone, genetically engineered, 396 Grunberg-Manago, Marianne, 233 GTP (guanine triphosphate), 187 GTP-dependent release factors, 261 Guanine, 21, 185, 185f, 187 Guanine triphosphate (GTP), 187 Guide RNA (gRNA), 249 Gurdon, John, 27 Gut microbiome, inflammatory bowel disease and, 391 Guthrie, Arlo, 74 Guthrie, Woody, 74 I-10 IN D EX H H substance, 73 Haemophilus influenzae, whole-genome sequencing of, 363, 368 Hairpins, 243 terminator/antiterminator, 306 Half-life, of mRNA, 316–317 Haploid genome, 373 Haploid number, 20, 31, 32t Haploinsufficiency, 117, 275 Haplotypes, 403 Hardy-Weinberg equilibrium, testing for, 462–463 Hardy-Weinberg law, 459–464 in allele frequency calculation, 462–464, 462t, 464t application to humans, 461–464, 462f, 462t calculating allele frequencies and, 460 calculating genotype frequencies and, 459–460 fitness and, 465–466 genetic drift and, 470–471 in heterozygote frequency calculation, 464 migration and, 468 mutation and, 467–468 natural selection and, 464–465, 465–467, 466f predictions of, 460–461 speciation and, 473 underlying assumptions for, 460–461 Harlequin chromosomes, 154, 154f Hartwell, Lee, 37 Harvey, William, 18 Hayes, William, 162–163 HbA, 266 HbS, 266 Heat-shock proteins, 269 Helicases, 204, 207, 207f Helix alpha, 268, 269f double, 21, 21f, 188–189, 188f antiparallel strands in, 205–206 left-handed, 189 unwinding of, 204 Hemizygosity, 83 Hemoglobin, 270, 381, 381f globin genes and, 366–367, 381, 381f one-gene:one-polypeptide hypothesis and, 264–265 sickle-cell, 22, 23f, 266–267, 266f Hemophilia, 291 gene therapy for, 541 Henking, H., 101 Hepatitis B vaccine, 397 HER-18, in breast cancer, 514–515, 515f Herbicide-resistant crops, 23, 399, 523–524 Herceptin (trastuzumab), 514–515, 515f, 516t Hereditary deafness, 76 Hereditary nonpolyposis colorectal cancer, 327 Heredity See also Inheritance organelle, 89 Heritability, 444–448 broad-sense, 446 definition of, 444 narrow-sense, 446, 448t realized, 447 Heritability estimates, 444–448 twin studies and, 448–450 Heritability studies, limitations of, 448 Hershey, Alfred, 181 Hershey-Chase experiment, 180–181, 182f Heterochromatic regions, of Y chromosome, 104 Heterochromatin, 86, 224 Heterochromosomes, 101 Heteroduplexes, 169, 246, 246f Heterogametic sex, 101 Heterogeneous nuclear ribonucleoprotein particles (hnRNPs), 244 Heterogeneous nuclear RNA (hnRNA), 244, 245 Heterogeneous traits, 76 Heterokaryons, 286 Heteromorphic chromosomes, 100 See also Sex chromosomes Heteroplasmy, 89 Heterozygosity, loss of, in cancer, 333 Heterozygote(s), 50 inversion, 128, 129f Heterozygote frequency, calculation of, 464 Heterozygous, 50 Hexosaminidase A (Hex-A), in Tay-Sachs disease, 69, 71 High-frequency recombination (Hfr) bacteria, 163–165, 164f, 165f, 167f Highly repetitive DNA, 226 High-throughput sequencing, 364 Hippocrates, 18 Histone(s), 221, 270 acetylation of, 224 definition of, 221 methylation of, 224 modification of in cancer, 328, 486 in chromatin remodeling, 308–309, 309f, 328 epigenetic changes and, 481–482 See also Epigenetics in nucleosomes, 221–223, 222f phosphorylation of, 224 Histone acetyltransferase (HAT), 224, 309 Histone code, 482 Histone deacetylase inhibitors, for cancer, 486 Histone methyltransferase, 497 Histone tails, 223 Histone-like nucleoid structuring proteins, 216 HIV infection gene therapy for, 542–543 resistance to, 461–462, 462f, 462t vaccine for, 398 hMTIIA gene, 312–313, 313f hnRNA (heterogeneous nuclear RNA), 244, 245 hnRNPs (heterogeneous nuclear ribonucleoprotein particles), 244 H-NS proteins, 216 Hogness, David, 290 Holley, Robert, 256 Holliday, Robin, 481b Holoenzymes in DNA replication, 203–204, 206 in transcription, 241, 243f Homeobox, 426–427 Homeodomains, 427 Homeotic mutations, 426–428, 426f Homeotic selector genes See Hox genes Homo erectus, 476 Homo heidelbergensis, 476–477 Homo neanderthalis divergence from modern humans, 475–476 genome of, 376, 380–381 Homo sapiens See also under Human evolution of, 475–476 Homogametic sex, 101, 276 Homologous chromosomes, 20, 31, 31f, 32f, 32t, 42, 57–58 Homologous genes, identification of, 366–367 Homologous recombination repair, 287, 287f Homozygote, 50 Homozygous, 50 Homunculus, 18, 19f Honeybees, colony collapse order and, 382–383 Horizontal gene transfer, 161 Howard-Flanders, Paul, 285 Howeler, C.J., 88 Hox genes, 423 in Drosophila melanogaster, 426–427, 426f, 427t, 428f, 431–432 in humans, 427–428, 429f in plants, 430–432, 431f, 432f HU proteins, 216 Huebner, Kay, 132–133 Hughes, Walter, 199 Human Epigenome Project, 372, 489 Human evolution, 475–476 Human genome functional categories for, 369 major features of, 368–369, 368t, 369f sequencing of See Human Genome Project Human Genome Nomenclature Committee, 364 Human Genome Project, 24, 152, 367–370 applications of, 370–376, 518–520 ELSI Program and, 367–368, 412, 414–415 future directions for, 370–376 mapping and, 152 “omics” era and, 370–372 origins of, 367–368 sequencing of nonhuman organisms in, 374–376, 378–381 Human immunodeficiency virus infection gene therapy for, 542–543 resistance to, 461–462, 462f, 462t vaccine for, 398 Human metallothionein IIA gene, 312–313, 313f Human Microbiome Project, 374–375, 382–383 Human migration, out-of-Africa hypothesis and, 476–477 Human papillomavirus vaccine, 398 Human Proteome Project, 385 Humulin, 395 Hunchback gene, 424 Hunt, Tim, 37 Huntington disease, 88, 270, 282t pedigree for, 63 Hybrid dysgenesis, 291 Hybridization molecular, 191, 226, 246 in polymerase chain reaction, 347 in probe processing, 345 Hydrogen bonds, in DNA, 189 Hydrophilic bases, 189 Hydrophobic bases, 189 Hypercholesterolemia, familial, 282t pedigree for, 63–64 Hyperchromic shift, 191 Hyperlipidemia, drug therapy for, 17–18 Hypervariable segment I and II, in mtDNA profiling, 507 Hypoallergenic milk, 400 Hypostatic alleles, 76 I Identical twins, 448 See also Twin studies copy number variations in, 449 in pedigrees, 62, 62f Identity values, 364 Ideograms, 370, 371f Imatinib (Gleevec), 411, 516t Immunity adaptive, 493 innate, 492–493 RNA-guided, 492–494 Immunization See Vaccines Immunoglobulins, 270 Imprinting, genomic, 88–89, 108, 483–485 epigenetics and, 483–485, 484f genetic disorders and, 88–89, 484–485 In situ hybridization fluorescent, 192 molecular, 192, 226 In vitro evolution, 491 I N D EX I-11 In vitro fertilization ethical aspects of, 415 genetic testing and, 402–403, 415 imprinting and, 484–485 In vitro protein-synthesizing system, in genetic code cracking, 233 Inactivated vaccines, 397 Inborn errors of metabolism, 263–264 See also Genetic disorders Inbreeding, 470–471 Incomplete dominance, 71, 72f Indels, 379 Independent assortment, 53–55, 57, 58, 137, 137f Induced mutations, 276 Induced pluripotent stem cells, 435 Inducers, 297 gratuitous, 299, 299f Inducible enzymes, 297 Inflammatory bowel disease, gut microbiome and, 391 Information flow, cellular, 177, 177f, 231, 232f, 241 See also Transcription; Translation Information technology See Bioinformatics Ingram, Vernon, 266 Inheritance biparental, 32 chromosomal theory of, 19, 20, 57, 84, 142 codominant, 72 crisscross pattern of, 83–84 dominant, 62–64, 63f, 71, 72f extranuclear, 89–92 heritability and, 444–448 quantitative, 438 recessive, 49–50, 62–499, 63f sex-influenced, 84–85 sex-limited, 84–85 Initiation complex, 259 Initiation factors, in bacterial translation, 259 Initiator codons, 239 Innate immunity, 492–493 Inr element, 311 Insect-resistant crops, 525–526 Insertion sequences, 289, 289f Insertion/deletion editing, 249 Insulin, recombinant human, 395–396 Intellectual property, 413–414, 415 Interactive variance, 446 Interactomes, 389 Intercalary deletions, 124, 125f Intercalating agents, mutagenic, 279, 280f Interchromosomal domains, 308 Interference crossing over and, 150–151 See also Gene editing in gene editing, 493–494 RNA See RNA interference (RNAi) International Cancer Genome Consortium, 489 International HapMap Project, 372 International Human Epigenome Consortium, 489 Internet resources databases See Databases for gene mapping, 155 PubMed, 43 Webcutter, 358 Interphase, 33–35, 33f, 34f electron microscopy of, 42–43, 42f Interrupted mating technique, 163–165, 164f Intersex, 102, 110 Intervening sequences, 246, 246f, 247 Introns, 218, 246f, 247, 247t, 377–378 genomic number and size of, 377–378 splicing of, 247–248, 248f See also Splicing Inversion(s), 124f, 128–129 definition of, 128 evolutionary advantages of, 129 during gamete formation, 128 paracentric, 128, 128f pericentric, 128 Inversion heterozygotes, 128, 129f Inversion loops, 128, 129f Inverted terminal repeats, 289 Ionizing radiation cancer and, 334 as mutagen, 281 IS elements, 289 Isoaccepting tRNA, 259 Isoagglutinogens, 73 Isoelectric focusing, 385 Isopropylthiogalactoside, 299, 299f J J Craig Venter Institute (JCVI), 382, 401, 415 Jackson, Christine, 510b Jacob, Franỗois, 163165, 232, 241, 297, 299–302 Jacobs, Patricia, 103 Janssens, F.A., 140 Jeffreys, Alec, 503, 504b Johnson, Justin Albert, 510b Johnson, Rebecca, 508b Jumping genes, 288–292 See also Transposable elements K Karyokinesis, 33 Karyotypes, 20, 20f, 31 in Klinefelter syndrome, 102–103, 102f spectral, 351–352 in Turner syndrome, 102–103, 102f Kazazian, Haig, 291 Keratin, 270 Khorana, Gobind, 236–237 Kinases, 37 in cancer, 327 in chromatin remodeling, 224 cyclin-dependent, 329 Kinetochore, 35–36, 35f, 226 Kinetochore microtubules, 36 Klinefelter syndrome, 102–103, 102f, 116 Barr bodies in, 107, 107f Klug, Aaron, 222, 257 knirps gene, 424 Knockout, 24, 354–357 conditional, 357 Kornberg, Arthur, 201 Kornberg, Roger, 222 Kozak sequences, 263 Kras gene, 326 Kreitman, Martin, 458 Krüppel gene, 424 Kumra, Raveesh, 511b Kynamro (mipomersen), 250 L L1 family, 227 La Apoyo (Nicaragua), 472, 473f lac genes, constitutive mutants and, 299 lac operon, 298–303 components of, 299, 300f genetic proof of, 299–301 as inducible system, 297–302 operator region of, 299 operon model and, 299–302 regulation of negative, 299–302 positive, 302–303 structural genes in, 298–299, 298f lac repressor, 299 isolation of, 302 Lactose metabolism, 297–303 regulation of, 297–303 See also lac operon lacZ gene, in recombinant human insulin production, 396 Lagging strands, 205f, 206, 206f, 207, 207f Lambda (l) phage, 216, 216f, 217t as cloning vector, 342 Lampbrush chromosomes, 220, 220f Landsteiner, Karl, 72 Lariats, 248f, 249 Laws of probability, 58–59 Leader sequence, in trp operon, 304 Leading strands, 205f, 206, 206f, 207, 207f Leaf variegation, chloroplast-based inheritance in, 89–90, 90f Leber congenital amaurosis, gene therapy for, 540–541 Leder, Philip, 234, 246 Lederberg, Joshua, 161–162, 172, 297, 298 Lederberg-Tatum experiment, 161–162, 161f Lederberg-Zinder experiment, 172–173 Legal issues DNA profiling, 503–512 genetic testing, 412–415 intellectual property and patents, 413–414, 415 LeJeune, Jérôme, 125 Lentivirus vectors, in gene therapy, 537 Lesch-Nyhan syndrome, 87 Lethal alleles dominant, 74–75 recessive, 74, 74f Lethal mutations, 275 Leukemia acute lymphoblastic, 519b chronic myelogenous, 327 Levan, Albert, 102 Levene, Phoebus, 178, 187 Lewis, Edward B., 81, 92, 422 Ley, Timothy, 519b Libraries, DNA, 344–347, 345f, 346f Light, ultraviolet absorption spectrum of, 183, 183f, 281f action spectrum of, 183, 183f, 281f mutations from, 183, 280, 281f, 284, 284f Lincoln, Abraham, 82 LINEs (long interspersed elements), 227, 291 Linkage, 75–76, 136–138 in bacteria, 169 complete, 137, 139f with crossing over, 137–138, 137f transformation and, 169 without crossing over, 137, 137f Linkage groups, 138, 179 Linkage maps, 140–152 See also Maps/mapping Linkage ratio, 138, 139f Lipid-lowering agents, development of, 17–18 Lipoprotein lipase deficiency, gene therapy for, 542b Livestock, transgenic, 23–24, 23f Locus, 32, 57 Long interspersed elements (LINEs), 227, 291 Long noncoding RNA, 482–483, 498–499 Loss of heterozygosity, in cancer, 333 Loss-of-function mutations, 70, 208, 274 Lung cancer, 334, 514b copy number variants and, 128 fragile sites in, 132–133 Lwoff, André, 297 Lygaeus turcicus sex chromosomes in, 101 sex determination in, 101 Lymphoma, Burkitt, 325 Lyon hypothesis, 107–108, 108f Lyon, Mary, 107 Lysogeny, 171–172 Lytic phages, 171–172 M M checkpoint, 328–329 Macaque monkeys, genome of, 378t, 380 I-12 IN D EX MacLeod, Colin, 21, 178, 179–180 Macroevolution, 458 Mad cow disease, 270 MADS-box proteins, 431 Maize selective breeding of, 398–399, 399f transposable elements in, 289–290, 290f Male pattern baldness, 85 Males See under Sex Male-specific region of the Y, 104, 104f Malignant transformation, 326 Mammoths, genome of, 376 Mann, Lynda, 504b Map unit (mu), 141–142 Map Viewer, 370 Maps/mapping bioinformatics and, 152 gene See Gene mapping Internet resources for, 152, 155 network, 389, 390f physical, 152 quantitative trait loci, 450–453, 451f, 452t restriction, 349–350, 358–359 sequence, 152 Marfan syndrome, 82, 175, 282t Marker genes, 529 Mass spectrometry, 386–387, 388f, 389f Mass-to-charge (m/z) ratio, 386 Mastitis-resistant cows, 400 MaTCH database, 449 Maternal effect, 92 Drosophila melanogaster embryonic development and, 92 Maternal parent, 57 Maternal-effect genes, 422–423, 423f MaterniT21 test, 403 Mating negative assortive, 470 nonrandom, allele frequency and, 470–471 positive assortive, 470 Matthaei, J Heinrich, 233 McCarty, Maclyn, 21, 178, 179–180 McClintock, Barbara, 153, 289–290 McClung, Clarence, 101 Mdm2, 317 Mean, 442 Mean value, calculation of, 443–444 Medical applications of genetic engineering and genomics, 24, 370, 402–408, 513–521 See also Personalized medicine allele-specific oligonucleotides, 404–405, 405f microarrays, 407–408 prenatal diagnosis, 402–403 RFLP analysis, 403–404, 404f of genome-wide association studies, 409–411 of Human Genome Project, 370 of model organisms, 25–26, 26t Medicine See also Genetic disorders personalized, 513–521 translational, 17–18, 534 Meiosis, 20, 37–40 chromosome behavior in, 37–40, 38f, 42–43, 42f, 57–58 crossing over in, 37, 38f, 40, 42 electron microscopy of, 42–43, 42f first division in, 37–40, 38f in fungi, 42 gametogenesis in, 40–42 in males vs females, 40–42, 41f oogenesis in, 40, 41f in plants, 42 second division in, 38f–39f, 40 in sexual reproduction, 42 spermatogenesis in, 40–42, 41f vs mitosis, 28 Mello, Craig, 494 Melting profile, 191, 191f Melting temperature, 191 Mendel, Gregor, 19, 20, 26, 47f, 48, 457 experiments of, 48, 49f Mendelian genetics, 47–65 chromosomal theory of inheritance and, 57 crosses in, 48–56 See also Crosses experimental basis of, 48–50, 49f independent assortment in, 53–55, 55–56 notation in, 50 postulates of, 49–50, 53–55, 57 Punnett squares and, 50–51, 51f rediscovery of, 57–58, 457 terminology of, 50 trihybrid crosses in, 55–56, 55f, 57f vs extranuclear inheritance, 89–92 Mendelian ratios 25:19:19:17 dihybrid, 55, 75–76, 75f genotypic, 71 modification of, 69–93 dihybrid, 75–76, 75f phenotypic, 71 Mendel’s Principles of Heredity (Bateson), 264 Merozygotes, 166, 167f, 300 Meselson, Matthew, 198–199 Meselson-Stahl experiment, 198–199, 198f, 199f Messenger RNA See mRNA (messenger RNA) Meta-analysis of Twin Correlations and Heritability (MaTCH), 449 Metabolomics, 372 Metachromatic leukodystrophy, gene therapy for, 541–542 Metafemales, 110 Metagenomics, 372, 381–383 Metalloproteinases, 332 Metamales, 110 Metaphase in meiosis, 38f, 39, 39f, 40 in mitosis, 33f, 34f, 35 Metastasis, 325, 332 Methionine, in transcription, 239 Methylation cytosine, 224 DNA in cancer, 485–486, 485f, 485t epigenetic, 481, 483f in gene regulation, 309–310 in genomic imprinting, 89, 483 in mismatch repair, 283 epigenetic changes and, 481, 481f See also Epigenetics histone, 224 23-Methylguanosine, 245, 263 Methyltransferases, in chromatin remodeling, 224 Mice coat color in, 74, 74f, 78–80 epigenetics and, 487–488, 487f genome of, 378, 378t knockout, 24, 354–357 as model organisms, 25 recessive lethal alleles in, 74, 74f segmentation genes in, 425–426, 426f transgenic, 105 Microarrays DNA, 383–384, 384f, 405–407, 406f gene-expression, 405–408, 406f, 408f genotyping, 405–408 for pathogens, 407–408 protein, 387 Microbial communities, DNA sequencing for, 381–383 Microbiome, inflammatory bowel disease and, 391 Microevolution, 458 MicroRNA (miRNA), 191, 494, 496–497 epigenetic modifications and, 482, 488 primary, 496 Microsatellites, 152, 227 in DNA fingerprinting, 503–504, 505f–507f, 509t Microscopy, electron of chromosomes, 42–43 time-resolved single particle, 262 for transcription, 250 Microtubules, 36 Midas cichlid, speciation and, 472–473, 473f Middle lamella, 36 Middle repetitive DNA, 227–228 Miescher, Friedrich, 177 Migration early human, 476–477 variation from, 468, 469f Milk, hypoallergenic, 400 Minimal medium, 160, 161f Minisatellites, 227 Mintz, Beatrice, 184 Mipomersen (Kynamro), 250 miRNA (microRNA), 191, 494, 496–497 epigenetic modifications and, 482, 488 primary, 496 Mismatch repair, 283 Missense mutations, 274 disease-causing, 282 Mitchell, Hershel, 90 Mitchell, Mary B., 90 Mitochondria, 29f, 30 chromosomes of, 217–218, 218f Mitochondrial DNA (mtDNA), 217–218, 218f, 240 hypervariable segments I and II in, 507 Mitochondrial DNA (mtDNA) profiling, 507, 508b Mitochondrial mutations, 90–92 aging and, 91–92 in human disease, 91–92 in myoclonic epilepsy and ragged-red fiber disease, 91 in yeast, 90 Mitosis, 20, 33–37, 328–329 in anaphase, 33f, 34f, 36 chromosome behavior in, 33–37, 34f, 42–43, 42f electron microscopy of, 42–43, 42f interphase and, 33–35, 33f, 34f metaphase, 33f, 34f, 35 prometaphase, 33f, 34f, 35 prophase, 33f, 34f, 35–36 telophase, 33f, 34f, 36 vs meiosis, 28 MN blood group, 72 Mobile controlling elements, 290 Model organisms, 25–26 See also specific organisms in developmental genetics, 421 genomes of, 378, 378t medical applications of, 25–26, 26t types of, 25 Moderately repetitive DNA, 227 Molecular chaperones, 270 Molecular clocks, 474–475, 475f Molecular hybridization, 191, 226, 246 Monod, Jacques, 232, 241, 297, 299–302 Monogenic diseases, 281–282, 282t Monohybrid crosses, 48–52, 49f, 51f, 52f Monosomes, 255 Monosomy, 116, 116f, 116t, 117 partial, 125 Monozygotic twins, 448 See also Twin studies copy number variations in, 449 in pedigrees, 62, 62f Moore, Keith, 106 Morgan, Thomas H., 27, 74, 82, 126, 140 I N D EX I-13 Mosaics, 103, 107, 108f, 373–374, 449 Motifs, 367 Mouse See Mice mRNA (messenger RNA), 22, 190–191, 232, 499–502 See also RNA comparative size of, 247t half-life of, 316–317 monocistronic, 243 polycistronic, 243 posttranscriptional regulation of, 499–502 pre-mRNA, 244, 248f, 249 splicing of See Splicing stability of, regulation of, 316–317 steady-state level of, 316 translation of See Translation MSH genes, 327 mtDNA (mitochondrial DNA), 217–218, 218f, 240 hypervariable segments I and II in, 507 mtDNA (mitochondrial DNA) profiling, 507, 508b MTE sequence motif, 311 Muller, Herman J., 126, 276 Müller-Hill, Benno, 302 Mullis, Kary, 347 Multifactorial traits, 439 heritability of, 444–448 See also Heritability Multigene families, 381 evolution and function of, 381, 381f Multiple alleles, 72–74 Multiple cloning site, 341 Multiple-gene hypothesis, 439, 439f Multiple-strand exchanges, 150, 151f Mus musculus See Mice Muscular dystrophy, 291, 319 Duchenne, 84, 87 myotonic, 88 Mut genes, in mismatch repair, 283 Mutagens, 279–281 Ames test for, 288, 288f chemical, 279–280, 280f definition of, 279 radiation, 281, 281f Mutants, constitutive, 299 Mutation(s), 22 apoptosis and, 329 in bacteria, 160 behavioral, 275 biochemical, 275 in cancer, 325–326, 485–486 in cell cycle, 37 chloroplast, 89–90, 90f chromosome, 115–133 aneuploidy, 116, 116t, 117 copy number variants and, 127–128 definition of, 115 deletions, 124–125, 124f–126f duplications, 126–128, 127f euploidy, 116, 116t inversions, 128–129 monosomy, 116f, 116t, 117 nondisjunction, 39, 103, 116–117, 116f polyploidy, 116, 117t, 120–123 translocations, 124f, 130–131, 130f trisomy, 116, 117–120, 117t, 118f–120f variations in composition and arrangement, 123–131, 124f variations in number, 116–123, 117t complementation analysis of, 80–82 conditional, 87, 87f, 208, 275 definition of, 20 disease-causing, 281–282 See also Genetic disorders dominant gain-of-function, 275 dominant negative, 275 driver, in cancer, 326 drug metabolism and, 516–517 expressivity of, 86 gene, 273–292 autosomal, 276 base substitution, 274, 274f from chemicals, 279–280, 280f classification of, 274–276, 274f from deamination, 278, 278f definition of, 273 from depurination, 278 frameshift, 233, 233f, 274, 274f gain-of-function, 70, 275 induced, 276, 279–281 from ionizing radiation, 281 loss-of-function, 70, 208, 274 missense, 274, 282 neutral, 70, 275 nonsense, 239, 274, 282 null, 274 nutritional, 275 from oxidative damage, 278–279 point, 274, 274f from replication errors and slippage, 277 silent, 274 somatic, 275–276 spontaneous, 276, 277–279 from tautomeric shifts, 277–278, 277f, 278f from UV light, 183, 280, 281f, 284, 284f visible, 275 X-linked, 276 Y-linked, 276 homeotic, 426–428 lethal, 74–75, 74f, 275 mitochondrial, 90–92 aging and, 91–92 in human disease, 91–92 in myoclonic epilepsy and ragged-red fiber disease, 91 in yeast, 90 ordered genetic code and, 239 passenger, in cancer, 326 paternal age effect, 105 penetrance of, 86 reduction of, 329 regulatory, 275, 299 repair of, 277–289 See also DNA repair temperature-sensitive, 87, 87f, 208, 275 transposable elements and, 291–292 Mutation hot spots, 276 Mutation rate, 276, 325, 467–468 Mutator phenotype, 327 Mycoplasma genitalium genome, synthetic, 401, 401f Myoclonic epilepsy and ragged-red fiber disease, 91 Myoglobin, 269f, 270, 381, 381f Myosin, 270 Myotonia, 316 Myotonic dystrophy, 88, 316 m/z ratio, 386 N Naked DNA, in gene therapy, 538 Narrow-sense heritability, 446, 448t Nathans, Daniel, 338 National Center for Biotechnology Information (NCBI), 154, 364, 412 National Institute of General Medical Sciences (NIGMS), 385 National Institutes of Health (NIH), 385, 412, 489 Natural selection, 19, 457 allele frequency and, 465–467, 466f definition of, 465 detection of, 464–465 directional, 466, 466f disruptive, 467, 467f fitness and, 465–466 principles of, 464–465 stabilizing, 467, 467f types of, 467 Navajo, albinism in, 469–470, 470f Neanderthals divergence from modern humans, 475–476 genome of, 376, 380–381 Neel, James, 266 Negative assortive mating, 470 Negative mutations, dominant, 70, 275 Neisseria meningitidis, gene expression profile for, 408, 408f Nematode worm See Caenorhabditis elegans Neo-Darwinism, 457 Neonatal screening, 414 NER pathway, 285, 285f Network maps, 389, 390f Neurospora crassa one-gene:one-enzyme hypothesis and, 264–265 poky mutations in, 90 Neutral mutations, 70, 275 Newborn screening, 414 Next-generation sequencing technologies, 354, 459 25:19:19:17 dihybrid ratio, 55 modification of, 75–76, 75f 9mers, 204 Nirenberg, Marshall, 233–235 Nisson-Ehle, Hermann, 439–440 Nitrogenous bases See Base(s) Nitrosamines, as carcinogens, 334 Nobel Prizes, 27 Noller, Harry, 262 Nonadditive alleles, 440 Noncoding RNA, 228, 490 long, 482–483, 498–499 small, 317, 492, 494–498 Noncrossover gametes, 137 Nondisjunction, 39, 103, 116–117, 116f Nonhomologous end joining, 287 Noninvasive prenatal genetic diagnosis, 119–120 Nonrandom mating, 470–471 Nonrecombining region of the the Y, 104 Nonsense codons, 261 Nonsense mutations, 239, 274 disease-causing, 282 Nonsister chromatids, 39 Noonan syndrome, 412 Normal distribution, 442, 442f Northern blotting, 350 Notation, for genes, 50 Notch signal system, 432–433, 432t, 433f Nuclear relocation model, 314 Nucleic acid(s), 21 See also DNA; RNA denaturing/renaturing of, 191 Nucleic acid–based drugs, gene silencing by, 250 Nuclein, 177 Nucleoids, 30, 216, 218f Nucleolar organizer region, 126 Nucleolus, 29f, 30 Nucleolus organizer region, 30 Nucleoside diphosphates, 186, 186f Nucleoside monophosphates, 186, 186f Nucleoside triphosphates, 186–187, 186f Nucleosides, structure of, 185f, 186–187, 186f Nucleosomal chromatin, 221–223, 221f, 222f modification of, 221–223, 221f, 222f, 308–309, 309f Nucleosome(s) definition of, 221 structure of, 221–223, 221f, 222f Nucleosome core particles, 222, 222f Nucleotide(s), 21 bonds in, 186, 186f, 187, 187f early studies of, 177–178 structure of, 185f, 186, 186f I-14 IN D EX Nucleotide excision repair, 284–286, 285f in xeroderma pigmentosum, 286, 286f Nucleus, cell, 30 Null alleles, 70 Null hypothesis (H0), 59, 61f Null mutations, 274 Nurse, Paul, 37 Nüsslein-Volhard, Christiane, 92, 422 Nutrigenomics, 372, 407 Nutrition cancer and, 334 epigenetics and, 488 Nutritional mutations, 275 Nutritionally enhanced foods, 399 O Ochoa, Severo, 233 Ohno, Susumo, 106, 127 Okazaki, Reiji, 206 Okazaki, Tuneko, 206 Okazaki fragments, 206, 207, 207f, 209 Oligonucleotides, 187 allele-specific, 404–405, 405f antisense, 250–251 Olins, Ada, 221 Olins, Donald, 221 O’Malley, Bert, 247 Omics era, 370–372 Omics profiling, 521b On the Origin of Species (Darwin), 457 Oncogenes, 330, 330t Oncotype DX, 515–516 One-factor (monohybrid) crosses, 48–52, 49f, 51f, 52f One-gene:one-enzyme hypothesis, 264–265 One-gene:one-polypeptide chain hypothesis, 264–265 One-gene:one-protein hypothesis, 264–265 Online Mendelian Inheritance in Man database, 64–65, 282 Online resources See Internet resources Oocytes, 40, 41f Oogenesis, 40, 41f Oogonium, 40 Ootids, 41, 41f Open reading frames, 366 Operator region, 299 Operon(s), 299–302, 377 lac, 298–303 transcription attenuation in, 305–306 trp, 304 Ordered genetic code, 239 ORFX gene, 452 Organelle heredity, 89 Organelles, 29f, 30 Orgel, Leslie, 490 OriC, 204 Origin of replication, 200 Ornithine transcarbamylase deficiency, 539–540 Orthologs, 366 Outcrossing, from genetically modified crops, 533 Out-of-Africa hypothesis, 476–477 Ova, formation of, 40, 41f Ovalbumin gene, 246f, 247 Ovaries, development of, 104 Overlapping genes, 240 Oxidants, reactive, mutations from, 278–279 P p (proband), 62 p arm, 30 P elements, 291, 291f P (peptidyl) site, 259–260, 262 P1 generation, 48 p53 protein, 317, 331 in cancer, 331 p53 tumor suppressor gene, 331 Paăaăbo, Svante, 380 Pace, Norman, 184 Pair-rule genes, 424, 424t, 425f Paleogenomics, 475–476 Palindromes, 339 Panitumab (Vectibix), 515 Papaya, genetically modified, 526b Paracentric inversions, 128, 128f Paralogs, 367 Pardue, Lou, 226 Parental gametes, 137, 140 Parental (P1) generation, 48 Parental (genomic) imprinting, 88–89, 108, 483–485 epigenetics and, 483–485, 484f genetic disorders and, 88–89, 484–485 Parkinson disease, 270 Partial digests, 362–363 Partial dominance, 71, 72f Partial monosomy, 125 Passenger mutations, in cancer, 326 Pasteur, Louis, 19 Patau syndrome, 120 Patents, 413–414, 415 Paternal age effects, 105 Paternal parent, 57 Pathogens, microarrays for, 407–408 Pattern baldness, 85 Pauling, Linus, 187, 190, 266, 268 PCGEM1, 499 PCR machines, 347 See also Polymerase chain reaction (PCR) PCSK9, cholesterol-lowering drugs and, 17–18 P-DNA, 190 Pedigrees, 62–64 analysis of, 62–64, 63f construction of, 62, 62f Penetrance, 86 Penny, Graeme, 109 Penta-X syndrome, 103 Pentose sugar, in nucleotide, 185 Peptide bonds, 267–268 Peptidyl (P) site, 259–260, 262 Peptidyl transferase, 259–260 Pericentric inversions, 128 Permease, in lac operon, 298, 298f Personal Genome Project, 373 Personal genomics, 372–374 exome sequencing and, 373–374 genome sequencing and, 373 Personalized medicine, 373, 513–521 diagnosis and, 517–520 ethical aspects of, 520–521 genome sequencing and, 518–520, 519b genomics and, 411 omics profiling and, 521b pharmacogenomics and, 513–518 social impact of, 520–521 technical issues in, 520–521 Pest-resistant crops, 23, 399 petite mutations, 90 Phages See Bacteriophage(s) Pharmaceuticals See also under Drug bioreactors for, 396, 400 Pharmacogenomics, 372, 411, 513–518 See also Drug development adverse drug reactions and, 516–517 database for, 517b definition of, 513 rational drug design and, 411 Pharmacogenomics Knowledge Base (PharmGKB), 517b PharmGKB database, 517b Phenotypes, 20 definition of, 50 disease, 22 expression of, 86–88 See also Gene expression gene interaction and, 76–78 genotypes and, 22, 438–439 See also Heritability novel, 80 reciprocal classes of, 146 wild-type, 144 Phenotypic ratios, 71 Phenotypic variation, 444–446 See also Variation components of, 445–446 epigenetics and, 480, 481f genotype-by-environment interaction variance and, 445 heritability and, 444–448 Phenylketonuria (PKU), 380 fX174 bacteriophage, 216, 217t, 240 Philadelphia chromosome, 327, 327f Phosphate group, in nucleotides, 185 Phosphodiester bond, 187, 187f Phosphomannose isomerase, marker, 529 Phosphorylation, histone, 224 Photoreactivation DNA repair, 284, 284f Photoreactivation enzyme, 284, 284f Phylogenetic trees, 473–474, 473f Phylogeny, evolutionary history and, 473–474 Physical maps, 152 Pieau, Claude, 112 Pitchfork, Colin, 504b Piwi-interacting RNA (piRNA), 494 Plants chloroplast-based inheritance in, 89–90, 90f development in, 430–432 extracellular RNA in, 500 genetically modified See Genetically modified crops as model organisms See Arabidopsis thaliana Plaque assays, 170–171, 171f Plasma membrane, 29, 29f Plasmid(s), 162, 167–168 in cloning, 340–342, 341f Col, 168 definition of, 167 F factor, 162, 167–168 R, 166–167, 168f, 289 Ti, 343–344 genetically modified plants and, 528 structure of, 528, 528f Pleiotropy, 82 Pluripotent stem cells, 435 Point mutations, 274, 274f poky mutations, in Neurospora crassa, 90 Pol a, 209 Pol d, 209 Pol e, 209 Pol g, 209 polA 17 mutation, 202, 208 Polar bodies, 40–41, 41f Poly-A tail, 245, 245f, 263 Polyacrylamide gel, in electrophoresis, 192, 192f Polyadenylation, in translation, 263 Polycomb gene, 431 Polygenes, 438 calculation of, 441 Polygenic diseases, 281 Polygenic traits See Quantitative trait(s) Polymerase chain reaction (PCR), 347–349, 348f applications of, 349 in DNA profiling, 505–506, 505f–507f limitations of, 348–349 quantitative real-time, 349 reverse transcription, 349 Polymerase switching, 209 Polynucleotide phosphorylase, in genetic code cracking, 233, 233f I N D EX I-15 Polynucleotides, 187 Polyoma virus, 216 Polypeptides, definition of, 267 Polyploidy, 116, 117t, 120–123 Polyps, colonic, 326, 326f, 333 Polyribosomes (polysomes), 261–262, 261f Polytene chromosomes, 219–220, 219f Population, definition of, 458 Population genetics, 457–464 artificial selection and, 458 genetic drift and, 469–470, 470f Hardy-Weinberg law and, 459–464 inbreeding and, 470–471 migration and, 468, 469f mutation and, 467–468 natural selection and, 465–467, 466f–467f nonrandom mating and, 470–471 overview of, 457–458 speciation and, 457–458, 471–473 Population size, small, inbreeding and, 470–471 Porphyria variegata, 82 Position effects, 86, 224 Positive assortive mating, 470 Postreplication repair, 283, 284f Posttranscriptional modification, 244, 249, 256, 315–317 Posttranslational gene regulation, 317 Postzygotic isolating mechanisms, 472 Prader-Willi syndrome, 94, 485 Preconception testing, 415 Preformation, 18 Preimplantation genetic diagnosis, 404–405, 518 ethical aspects of, 412 Preinitiation complex, 313 formation of, 245, 314f Pre-miRNA, 496 Pre-mRNA, 244 splicing of, 248f, 249 Prenatal diagnosis of genetic disorders, 119–120, 402–403, 518 ethical issues in, 412–413, 415 of sickle-cell anemia, 404 Prezygotic isolating mechanisms, 472 Pribnow box, 242 Primary miRNA, 496 Primary oocytes, 40, 41f Primary sex ratio, 105–106 Primary spermatocytes, 40, 41f Primary structures, 268, 269f Primase, 205, 211 Primates, nonhuman, genomes of, 378t, 379–380 Primer(s) in polymerase chain reaction, 347 in replication, 203, 204, 205, 205f, 207, 207f Prion diseases, 270 Privacy issues, genomics and, 412–413, 415–416 PRNCR1, 499 Probability laws, 53, 58–59 Probability values, 60–66, 61f Proband, 62 Probes, 191 allele-specific oligonucleotides, 404–405, 405f in library screening, 345–347, 346f microarray, 405–407 Processivity, 204, 206, 209 Product law, 53, 58, 144 Product rule, 509 Profile probability, 509, 509t Progressive retinal atrophy, 92–93 Project Jim, 373 Prokaryotes See also Bacteria cell division in, 30, 30f gene regulation in, 297–303 genome of, 377 RNA-guided viral defenses in, 492–494 Prometaphase, 33f, 34f, 35 Promoter(s) core, 244, 310, 311f transcription factors and, 313–315 eukaryotic, 244, 310–311, 310f, 311f prokaryotic, 242, 243f Promoter elements, 310, 310f, 311f proximal, 244, 310 Proofreading, 207, 283 Prophages, 171–172 Prophase in meiosis, 37–39, 38f, 39, 40 in mitosis, 33f, 34f, 35–36 Prosecutor’s fallacy, 510 Proteasomes, 270 Protein(s), 22 amino acids in, 22 See also Amino acid(s) cellular complement of, 384–387 See also Proteomics chaperone, 270 contractile, 270 definition of, 267 enzyme, 22 functions of, 22, 270–271 prediction by sequence analysis, 366–367 fusion, 396 as genetic material, 177–178 heat-shock, 269 misfolded, 269–270 phenotypes and, 22 ribosomal, 255–256 structural, 270 structure of, 22, 267–270, 269f primary, 268, 269f quaternary, 268 secondary, 268, 269f tertiary, 268, 269f, 270 synthesis of, 21–22, 22 See also Translation transport, 270 types of, 22 vs polypeptides, 267 Protein domains, 270–271 structural analysis of, 367 Protein folding, 269–270 Protein kinases See Kinases Protein microarrays, 387 Protein quantitative trait loci, 452 Protein Structure Initiative, 385 Protein-coding genes, number in genome, 369 Protein-gene correlation, 384–385 Protenor, sex determination in, 101 Proteome, 316 definition of, 385 size of, 385 Proteomics, 24, 372, 384–387 definition of, 385 gene-protein correlation and, 384–385 isoelectric focusing in, 385 mass spectrometry in, 386–387, 389f, 398f two-dimensional gel electrophoresis in, 385–386, 386f Proteus syndrome, 409 Proto-oncogenes, 70, 330–332, 330t ras, 330 Protoplasts, 181 Prototrophs, 160, 161f Proximal-promoter elements, 244, 310 Pseudoagouti coat color, 487, 487f Pseudoautosomal regions, 104, 104f Pseudogenes, 228, 379 PubMed, 43 Puffs, 219 Punnett, Reginald, 50, 79 Punnett squares, 50–51, 51f Purines, 185, 185f Pyrimidine dimers, in UV-induced mutagenesis, 280 Pyrimidines, 185, 185f Q q arm, 30 QTL mapping, 450–453, 451f, 452t Quantitative inheritance, 438 multifactorial, 439 multiple-gene hypothesis for, 439, 439f Quantitative real-time polymerase chain reaction, 349 Quantitative trait(s) analysis of, 443–444 heritability of, 444–448 See also Heritability inheritance of See Quantitative inheritance polygenes of, 441 statistical analysis of, 441–444 Quantitative trait loci, 450–453 expression, 452 mapping of, 451–453, 451f, 452t protein, 452 Quaternary structures, 268, 269f Quorum sensing, 318 R R group, in amino acids, 267 R plasmids (factors), 168, 168f, 289 Radiation cancer and, 286, 334 ionizing as carcinogen, 334 as mutagen, 281, 281f ultraviolet absorption spectrum of, 183, 183f, 281f action spectrum of, 183, 183f, 281f mutations from, 183, 280, 281f, 284, 284f Radical group, in amino acids, 267 Ramakrishnan, Venkatraman, 262 Random match probability, 509, 509t ras gene family, in cancer, 330 ras proto-oncogene, 330 Rational drug design, 411 RB1 tumor suppressor gene, 331–332, 331f R-determinants, 168 rDNA (ribosomal DNA), 126, 256 Reactive oxidants, mutations from, 278–279 Reading frames, 233 open, 366 Realized heritability, 447 REBASE, 359 Rec proteins, 166–167 Receptors, 29f, 30 Recessive dystrophic epidermolysis bullosa, gene therapy for, 543 Recessive epistasis, 79 Recessive lethal alleles, 74, 74f Recessive mutations, 275 Recessiveness, 49–50 pedigrees and, 62–64, 63f Reciprocal classes, 146 Reciprocal crosses, 49 Reciprocal translocations, 129–130, 130f Recognition sequences, 339, 339f Recombinant DNA technology, 23–24, 338–359 See also Biotechnology bioinformatics and, 24 cloning in, 340–344 concerns about, 359 DNA as genetic material and, 183–184 DNA libraries in, 344–347, 345f, 346f DNA sequencing in, 192, 192f, 352–354 fluorescent in situ hybridization in, 192, 192f, 350–352 I-16 IN D EX Recombinant DNA technology Continued gene editing in, 357 gene knockout in, 24, 354–357 gene-targeting methods in, 355 genetic engineering and, 394–416 See also Genetic engineering historical perspective on, 23–24 next-generation sequencing technology and, 459 nucleic acid blotting in, 350–352 polymerase chain reaction in, 347–349 proteomics and, 24 restriction enzymes in, 339–340 restriction mapping in, 349–350 transgenic organisms in, 354–358 variation and, 458–459 Recombinant gametes, 137, 140 Recombinant human antithrombin, 396 Recombinant human insulin, 395–396 Recombinase, 292 Recombination, 136, 160–174 in bacteria, 160–174 conjugation in, 161–166 F factor in, 162–166, 163f, 164f, 167–168 high-frequency, 163–165, 164f, 165f, 167f Rec proteins in, 166–167 single-strand displacement in, 167 transduction in, 172–173 transformation in, 168–169 crossing over in, 140 See also Crosses; Crossing over definition of, 160 in eukaryotes, 160 Recruitment model, 314 Red-green color blindness, 108 Reference genomes, 368 Regulatory mutations, 275 Repetitive DNA, 225–228, 225f categories of, 225, 225f genome size and, 378 highly repetitive, 226 middle, 227 satellite DNA and, 225–226 Repetitive transposable sequences, 227 Replication, 196–213 accuracy of, 202, 207 autoradiographic analysis of, 199–200, 200f in bacteria, 201–207, 201f, 208t chain elongation in, 201, 202f chromatin in, 209 coherent model of, 207–208, 207f concurrent, 206–207, 206f conservative, 197 continuous, 205–206 direction of, 191f, 200–201, 202, 202f discontinuous, 205–206 dispersive, 197–198 DNA gyrase in, 204–205, 207, 207f DNA polymerases in See also DNA polymerase(s) in bacteria, 201–207, 202f, 203t, 206f, 207f in eukaryotes, 209 errors in correction of, 207 mutations from, 277–279 in eukaryotes, 199–200, 208–212 helicases in, 204, 207, 207f helix unwinding in, 204–205 initiation of in bacteria, 205 in eukaryotes, 208–209 during interphase, 33–35, 33f leading and lagging strands in, 205f, 206, 206f, 207, 207f Meselson-Stahl experiment and, 198–199, 198f, 199f Okazaki fragments in in bacteria, 206, 207, 207f in eukaryotes, 209 polymerase switching in, 209 primers in, 203, 204, 205, 205f, 207, 207f processivity in, 206 proofreading in, 207, 283 regulation of in bacteria, 207 in eukaryotes, 208–209 semiconservative, 189, 197–201, 197f semidiscontinuous, 206n single-stranded binding proteins in, 204, 207, 207f sliding DNA clamp in, 203, 206–207, 206f, 207f, 208 steps in, 204, 207–208, 207f Taylor-Woods-Hughes experiment and, 199–200, 200f at telomeres, 210–213, 210f, 211f units of, 200–201 Replication bubbles, 208–209 Replication forks in bacteria, 200, 204, 205 in eukaryotes, 208–209, 209f from replication bubbles, 208–209 Replication origins, 200 in bacteria, 204 in eukaryotes, 208–209 Replication slippage, 277 Replicon, 200 Replisomes, 205 Repression, catabolite, 302–303 Repressor(s), 245, 312, 313–314, 314 allosteric, 299 corepressors and, 299 lac, 299–302, 300f, 303f trp, 304, 305f Repressor genes, 299 Repressor molecules, 299 Reproduction, sexual, meiosis in, 42 Reproductive isolation, 472 Reptiles, sex determination in, 111–112, 112f Resistance transfer factor, 168 Restriction endonucleases, 23 Restriction Enzyme Database, 359 Restriction enzymes, 339–340 Restriction fragment length polymorphisms (RFLPs), 152, 403–404, 404f Restriction mapping, 152, 349–350, 358–359 Restriction sites, 339, 339f Retinal blindness, gene therapy for, 540–541 Retinoblastoma, 331–332 Retrotransposons, 227 Retroviral vectors, in gene therapy, 536 Retroviruses, 184 cancer and, 333t Reverse genetics, 24 Reverse transcriptase, 184, 227 in cDNA library construction, 345f Reverse transcription, 184, 211 Reverse transcription polymerase chain reaction, 349 RFLP analysis, 403–404, 404f Rhesus monkeys, genome of, 378t, 380 Rhizobium radiobacter, 343 r (rho) termination factor, 243 Ribonuclease, 180 Ribonucleic acid See RNA Ribonucleoprotein, 211, 502 Ribose, 185f Ribosomal DNA (rDNA), 126, 256 Ribosomal proteins, 255–256 Ribosomal RNA (rRNA), 126, 190, 255–256, 255f Ribosomes, 22, 29f, 30, 190 A site on, 259, 262 E site on, 259, 262 P site on, 259–260, 262 prokaryotic, 261–262, 261f structure of, 261–262, 261f vs eukaryotic, 255–256, 255f structure of, 255–2406 in translation, 255–256, 255f Riboswitches, 306–307, 307f Ribozymes, 247, 260, 491–492 genetic engineering of, 491–492, 491f origin of life and, 490–491 structure of, 491f Rice genetically modified, 453, 526–527, 529–530 selective breeding of, 453, 527–528 Rich, Alexander, 190, 257 Richmond, Timothy, 223 Rituximab, 516t RNA, 21–22 analytic techniques for, 191–192 antisense, 191 catalytic activity of, 490–492 CRISPR-derived, 357, 493–494, 495b, 542–543 denaturing/renaturing of, 191 diversity of, 490 electrophoresis of, 192 emerging roles of, 490–502 extracellular, in signaling, 500 as genetic material, 184 heterogeneous nuclear, 244, 245 messenger See mRNA (messenger RNA) micro, 191, 494, 496–497 epigenetic modifications and, 482, 488 primary, 496 noncoding, 490 long, 482–483, 498–499 small, 317, 492, 494–498 origins of life and, 490–491 posttranscriptional modification of, 244, 249, 256 posttranslational modification of, 317 ribosomal, 126, 190, 255–256, 255f sense, 250 small interfering, 191, 494–496, 543 small nuclear, 191, 490 splicing of See Splicing structure of, 185, 185f, 190, 490 synthesis of See Transcription telomerase, 191 transfer See tRNA (transfer RNA) RNA editing, 249 RNA heteropolymers, in genetic code cracking, 234, 235f RNA homopolymers, in genetic code cracking, 234 RNA interference (RNAi) in gene regulation, 317 siRNA and, 494–496 therapeutic applications of, 250, 317, 543–544 RNA polymerase, 241–243 RNA-directed, 497 vs DNA polymerase, 241 RNA polymerase II, 244 transcription factors and, 313–315 RNA primers, in replication, 204, 205, 205f, 207, 207f RNA sequencing, 383 RNA transcripts, posttranscriptional modification of, 245–246, 249, 256 RNA World hypothesis, 491 RNA-binding proteins, 501 RNA-directed RNA polymerase, 497 RNA-guided viral defenses, in prokaryotes, 492–494 RNAi See RNA interference (RNAi) RNA-induced gene silencing, 317, 482, 494, 497–498 RNA-induced silencing complex (RISC), 482, 543 I N D EX I-17 RNA-induced transcriptional silencing (RITS) complex, 482, 497–498 Roadmap Epigenomics Project, 489 Roberts, J., 257 Roberts, Richard, 246 Robertsonian translocation, 130–131 Rough colonies, 178 Rough endoplasmic reticulum, 29f, 30 Roundup, 524–525, 529, 533 RPE65 gene, 541 rRNA (ribosomal RNA), 126, 190, 255–256, 255f Rubenstein-Taybi syndrome, 486 Rubin, Gerald, 290 runt gene, 425–426 Russell, Liane, 107 S S phase, 33–35, 33f Saccharomyces cerevisiae, 25, 25f, 26t genome of, 378, 378t petite mutations in, 90 Salmon, genetically modified, 525b, 533–534 Sanger sequencing, 352–354, 373f SARS (severe combined acute respiratory syndrome), genotyping of, 407, 408f Satellite DNA, 225–226, 225f Schleiden, Matthias, 18 Schwann, Theodor, 18 Scrapie, 270 Screening blue-white, 341, 343f for breast cancer, 334–335 neonatal, 414 SDS-PAGE (sodium dodecyl sulfate polyacrylamide gel electrophoresis), 385–386 Sea urchin, genome of, 378–379, 378t Second filial (F2) generation, 48 Second polar body, 41, 41f Secondary oocytes, 41, 41f Secondary sex ratio, 105–106 Secondary spermatocytes, 40, 41f Secondary structures, 268, 269f Sedimentation equilibrium centrifugation, 198 Segment polarity genes, 424–425, 424f, 424t, 425f, 425t, 426f Segmental deletions, 125 Segmentation genes, 423–428, 424t in Drosophila melanogaster, 424–425, 424t, 425f in humans, 425–426 in mice, 425–426, 426f Segregation, 50, 57 Selectable marker genes, 340 Selection differential, 447 Selection response, 447 Selective breeding, 92–93, 398–399, 399f, 446–448, 453, 458 Selfing, 48 Self-splicing, 247–248, 248f Semiconservative replication, 189, 197–201, 197f Semidiscontinuous replication, 206n Semisterility, 130 Sense RNA, 250 Separase, 35 Sequence maps, 152 Sequencing methods DNA See DNA sequencing RNA, 383 Serial dilution technique, 160, 160f Serotypes, 178 Severe combined acute respiratory syndrome (SARS), genotyping of, 407, 408f Severe combined immunodeficiency, gene therapy for, 538–539, 540 Sex chromatin bodies, 106–107, 107f Sex chromosomes, 32, 100, 101–111 early studies of, 101 sex determination and, 101–105 Sex determination, 100–112 See also Sexual differentiation in Caenorhabditis elegans, 110–111 chromosomal, 111–112 in Drosophila melanogaster, 109–110, 110f genotypic, 111–112 overview of, 100–101 in reptiles, 111–112, 112f sex chromosomes and, 101–105 steroids in, 112 temperature-dependent, 111–112, 112f XX/XO (Protenor) mode of, 101 XX/XY (Lygaeus) mode of, 101 ZZ/ZW mode of, 101 Sex differentiation, steroids in, 112 Sex pilus, 162 Sex ratios, 105–106 Sex selection, 415 Sex-determining chromosomes, 32 See also X chromosome; Y chromosome Sex-determining region Y, 104, 104f, 105 Sex-influenced inheritance, 84–85 Sex-lethal gene, 110 Sex-limited inheritance, 84–85 Sexual differentiation, 100 See also Sex determination in Caenorhabditis elegans, 110–111, 111f in humans, 103–104 Sexual reproduction, meiosis in, 42 Sharp, Philip, 246 Sheep, cloned, 23–24, 23f Shine-Dalgarno sequences, 259, 263 Short (small) interfering RNA (siRNA), 191, 494–496, 543 Short interspersed elements (SINEs), 227, 291–292 Short tandem repeats (STRs), 227 in DNA profiling, 505–506, 505f–507f, 509t Shotgun sequencing, 362–363, 363f medical applications of, 408–409 Shugoshin, 35, 35f Sibs, in pedigrees, 62 Sibship line, 62 Sickle-cell anemia, 22, 22f, 23f, 266–267, 266f, 270 prenatal diagnosis of, 404 RFLP analysis of, 404, 404f Sickle-cell trait, 266 s (sigma) subunit, in transcription, 241–242, 243f Signal transduction in Caenorhabditis elegans development, 432–433, 432t, 433f, 434f in cell cycle, 328 extracellular RNA in, 500b Silencers, 244–245, 311, 314 Silent mutations, 274 Similarity scores, 364 SINEs (short interspersed elements), 227, 291–292 Single crossovers, 142–143, 142f, 143f, 146 Single crystal X-ray analysis, 190 Single-cell genome sequencing, 409 Single-gene disorders, 281–282, 282t Single-nucleotide polymorphisms (SNPs), 367, 370 detection of, 404 in DNA profiling, 507–508, 508f in mapping, 152 Single-strand displacement, 167 Single-stranded binding proteins, 204, 207, 207f siRNA (small interfering RNA), 191, 494–496, 543 Sister chromatids, 31 in meiosis, 37–40, 38f–39f in mitosis, 35, 35f, 154, 154f Sliding DNA clamp, 203, 204f, 206–207, 206f, 207, 207f Sliding DNA clamp loader, 203, 204f Small (short) interfering RNA (siRNA), 191, 494–496, 544 Small noncoding RNA (sncRNA), 317 in eukaryotes, 494–498 in prokaryotes, 492 Small nuclear RNA (snRNA), 191, 490 Smith, Courtney, 510b Smith, Hamilton, 338 Smithies, Mario, 355 Smoking, cancer and, 334 Smooth colonies, 178 Smooth endoplasmic reticulum, 29f, 30 Snapping shrimp, speciation and, 472, 472f sncRNA (small noncoding RNA), 317, 492, 494–498 snRNA (small nuclear RNA), 191, 490 Sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), 385–386 Solenoids, 222f, 223 Somatic cell hybridization, 286 Somatic gene therapy, 545 Somatic mosaicism, 449 Somatic mutations, 275–276 Sorcerer II Global Ocean Sampling Expedition, 382, 382f SOS repair system, 284 Sotir, Beverly, 514b Southern blotting, 350, 351f Soybeans, genetically modified, 529 Speciation, 457–458, 471–473 barriers to, 472 changes leading to, 472, 472f phylogenetic trees and, 473–474 rate of, 472–473 Species, definition of, 471 Specification, in development, 419, 426–428 Spectral karyotypes, 351–352 Sperm, 40, 41f Spermatids, 40, 41f Spermatocytes, 40, 41f Spermatogenesis, 40–42, 41f Spermatogonium, 40, 41f Spermatozoa, 40, 41f Spermiogenesis, 40 Spheroplasts, 181 Spiegelman, Sol, 184 Spindle fibers, 30, 35, 35f, 36 Spliceopathies, 316 Spliceosomes, 248–249, 248f Splicing, 247–249 alternative, 249, 369 in gene regulation, 315–316, 315f, 316f mutations affecting, 316 in sex determination, 110 cotranscriptional, 245f definition of, 110 self-splicing in, 247–248, 248f in sex determination, 110 transesterification reactions in, 247, 248f Splicing mutations, disease-causing, 282 Split genes, 244, 246 Spontaneous mutations, 276 Spores, 28 Sporophyte stage, 42 Sports, gene doping in, 546b Stabilizing selection, 467, 467f Stahl, Franklin, 198–199 Standard deviation, 442–443, 442t Standard error of the mean, 443 Statistical analysis, 441–444 Stem cell(s), 435–436 cancer and, 325 in knockout technology, 356–357 I-18 IN D EX Stem cell hypothesis, for cancer, 325 Stern, Curt, 153 Steroids, in sex differentiation, 112 Stone Age genomics, 376, 380–381, 475–476 Stop (termination) codons, 237, 239, 261 STR DNA profiling, 505–506, 509t Streptococcus pneumoniae, transformation in, 178–180, 178t Stress, epigenetic changes and, 488 Structural genes, in lac operon, 298–299, 298f Structural genomics, 362 Sturtevant, Alfred H., 110, 126, 140–142 Substitution editing, 249 Subunit vaccines, 397 Sum law, 58 Sunitinib (Sutent), 519b Supercoiling, 204 Superfamilies, 381 evolution and function of, 381f Sutton, Walter, 20, 57, 136 SV40 virus, 240 Svedberg coefficient, 190, 255 Swanson, Robert, 395 SWI/SNF complex, 309, 309f Sxl gene, 110 Synapsis, 38 translocations and, 129–130 Synpolydactyly, 428, 429f Synthetic biology, 402 legal aspects of, 413–414, 415 Synthetic genomes, 401, 401f Systems biology, 388–389, 390f T t 17/18, of mRNA, 316–317 TAFs (TBP associated factors), 313 TALENS (transcription activator-like effector nucleases), in gene therapy, 543 Tanksley, Steven, 452 Taq polymerase, 347 Tarceva (erlotinib), 516t TATA box, 242, 244, 311, 312f TATA-binding protein, 313 Tatum, Edward, 161–162, 264–265, 385 Tautomer, 277 Tautomeric shifts, 277–278, 277f, 278f Taylor, J Herbert, 199 Taylor-Woods-Hughes experiment, 199–200, 200f Tay-Sachs disease, 69, 71, 87 TBP (TATA-binding protein), 313 TBP associated factors (TAFs), 313 T-DNA (transfer DNA), in genetically modified plants, 528–529 Telomerase, 211–212, 211f Telomerase RNA, 191 Telomere(s) aging and, 212–213 definition of, 210 replication at, 210–212, 210f, 211f structure of, 210 Telophase in meiosis, 38f, 39f, 40 in mitosis, 33f, 34f, 36 Temperate phages, 172 Temperature melting, 191 in phenotypic expression, 87, 87f Temperature-dependent sex determination, 111–112, 112f Temperature-sensitive mutations, 87, 87f, 208, 275 Terminal deletions, 124, 125f Termination codons, 237, 239, 261 Terminator hairpins, 306 Tertiary structures, 268, 269f, 270 Testcrosses, one-character, 52, 52f Testis, development of, 104–105 Testis-determining factor, 105 Tetrads, 38, 38f Tetrahymena, telomerase in, 211, 212 Tetranucleotide hypothesis, 178, 187 Tetraploidy, 116, 117t, 121 Tetra-X syndrome, 103 T-even phages, 169, 170f TFIIA, 245 TFIIB, 245 TFIID, 245 Thalassemia, gene therapy for, 541 Thermocyclers, 347 13mers, 204 19’ poly-A tail, 245, 245f, 263 Three-factor (trihybrid) crosses, 55–56, 55f, 57f Thymine, 185, 185f, 187 Ti plasmid, 343–344 genetically modified plants and, 528 structure of, 528, 528f Tijo, Joe Hin, 102 Time-resolved single particle cryo-electron microscopy (cryo-EM), 262 Tissue inhibitors of metalloproteinases (TIMPs), 332 T-loops, 210 Tn elements, 289 Tobacco mosaic virus, 184 Tomatoes genetically modified, 523, 524 selective breeding of, 451–452 Topoisomerases, DNA, 204–205 Totipotent stem cells, 435 Toxicogenomics, 372 Traits complex, 439 definition of, 48 dominant, 49–50, 62–64, 63f, 71, 72, 72f epigenetic, 480 heterogeneous, 76 in Mendel’s experiments, 48, 49–50, 49f multifactorial, 439 polygenic, 438 recessive, 49–50, 62–64, 63f X-linked, 82–84, 83f, 84f Transacetylase, in lac operon, 298, 298f trans-acting factors, 242, 244 trans-acting molecules, 297 Transcription, 21, 21f, 177, 241–250 alpha subunit in, 242, 243f attenuation of, 304, 305–306 chain elongation in, 242–243 consensus sequences in, 242 definition of, 241 early studies of, 241 electron microscopy in, 250 in eukaryotes, 243–249 activators in, 312, 314 enhancers in, 311 initiation of, 244–245, 313–314, 314f pre-initiation complex in, 313, 314f promoters in, 244, 310–311, 310f, 311f regulation of, 312–317 See also Gene regulation, in eukaryotes repressors in, 312, 314 silencers in, 311 transcription factors in, 312–315 introns in, 246f, 247, 247t nuclear relocation model for, 314 partner strand in, 242, 243f pre-initiation complex in, 313, 314f in prokaryotes, 241–243, 242–243, 243f initiation of, 239, 242 promoters in, 242, 243f regulation of, 297–303 See also Gene regulation, in prokaryotes repressors in, 245, 299–302, 304 recruitment model for, 314 reverse, 184, 211 ribonucleotide addition in, 241, 243f RNA polymerases in in eukaryotes, 244 in prokaryotes, 241–243 sigma subunit in, 241–242, 243f start site in, 242 template binding in, 242, 243f template strand in, 242, 243f termination of, 239, 243, 304–306 attenuation and, 304, 305–306 transcript processing in, 244 Transcription activator-like effector nucleases, in gene therapy, 543 Transcription factors, 105, 244, 245, 270 general, 245, 313–315, 314f p53 as, 317 Transcription factory, 308 Transcriptional activators, 245, 312, 314 Transcriptional repressors, 245, 299–302, 304, 312, 313–314 See also Repressor(s) Transcriptional silencing, RNA-induced, 317, 482, 497–498 Transcriptome analysis, 383–384 of pathogens, 408 Transcriptomics, 372, 383–384 Transduction, 172–173 cotransduction and, 173 in Lederberg-Zinder experiment, 172–173, 172f steps in, 172, 172f Transesterification reactions, in splicing, 247, 248f Transfection, 181–182 Transfer DNA (T-DNA), in genetically modified plants, 528–529 Transfer RNA See tRNA (transfer RNA) Transformation in Alloway’s experiment, 179–180 in Avery-Collins-McCarty experiment, 179–180, 180f in bacteria, 168–169, 178–180, 180f, 341–342 in Dawson’s experiment, 179–180, 180 definition of, 179 in Griffith’s experiment, 178–179 transformer gene, 110 Transforming principle, 179 Transgenic animals, 23–24, 105, 184, 357–358, 399–400, 400f, 522 as bioreactors, 396, 397, 400 creation of, 357–358 examples of, 105 as food, 525, 533–534 mice, 105 as recombinant protein hosts, 396 Transgenic plants, 23, 398–399, 399f, 453, 522–533 See also Genetically modified crops quantitative trait loci in, 451–452, 452f vaccines from, 398 Transitions, 274 Translation, 21f, 22, 177, 254–264 chain elongation in in eukaryotes, 263 in prokaryotes, 258t, 259–260, 260f definition of, 255 in eukaryotes, 263 initiation of, 239 in eukaryotes, 263 in prokaryotes, 258t, 259, 259f mRNA stability and, 316–317 polyribosomes in, 261–262, 261f in prokaryotes, 258–262 I N D EX I-19 vs in eukaryotes, 263 protein factors in, 258t, 259 regulation of, 317 ribosomes in, 255–256, 255f, 261–262, 261f termination of, 239 in eukaryotes, 263 in prokaryotes, 258t, 261 triplet code in, 232–240 See also Genetic code Translational medicine, 17–18, 535 See also Gene therapy Translocation, in translation, 260 Translocations, chromosomal, 124f, 129–131, 130f in familial Down syndrome, 130–131, 130f Robertsonian, 130–131 Transmission genetics, 47 See also Mendelian genetics Transplantation, fecal microbial, 391 Transport proteins, 270 Transposable elements, 288–292 Ac-Ds system, 289–290, 290f bacterial, 289 Copia, 290–291, 291f in Drosophila melanogaster, 290–291, 291f in evolution, 292 in humans, 227, 291–292 IS, 289, 289f long interspersed elements (LINEs), 227, 291 mutations and, 291–292 P, 291, 291f research applications of, 292 short interspersed elements (SINEs), 227, 291–292 Tn, 289 Transposable sequences, repetitive, 227 Transposase, 289 Transpositions See also Transposable elements germ-line, 292 Transposons See Transposable elements Transversions, 274 Trastuzumab (Herceptin), 514–515, 515f, 516t Trihybrid crosses, 55–56, 55f, 57f Trinucleotide repeats, 132 in single-gene disorders, 282 Tripeptides, 268 Triplet binding assay, 234–236, 236f, 236t Triploidy, 116, 117t Triplo-X syndrome, 103 Trisomy, 116, 117–120, 117t, 118f–120f in Edwards syndrome, 120 in Patau syndrome, 120, 120f Trisomy 37 (Down syndrome), 117–120, 118f, 119f familial, 130–131, 130f paternal age effect and, 105 Triticale, 123 Triticum, 123 tRNA (transfer RNA), 22, 190, 255 charging, 257–258, 258f cloverleaf model of, 256, 257f functions of, 255 isoaccepting, 258 splicing of, 247 structure of, 256–257 in translation, 256–257, 257f, 258f trp operon, 304 in attenuation, 304 components of, 304, 305f leader sequence in, 304 True single-molecule sequencing, 519b Tryptophan synthase, 304 Tschermak, Erich, 19, 57 Tubulin, 270 Tumor(s) benign, 325 malignant, 325 See also Cancer Tumor suppressor genes, 330–332, 330t, 331f Tumorigenesis, 325 Turner syndrome, 102–103, 102f, 116 Barr bodies in, 107, 107f 23andMe, 415 Twin(s) concordant, 448 discordant, 448 dizygotic (fraternal), 448, 454 epigenetic changes in, 450 monozygotic (identical), 448 copy number variations in, 449 Twin studies, 448–450 large scale analysis of, 449 limitations of, 449–450 Two-dimensional gel electrophoresis (2DGE), 385–386, 386f, 388f Two-factor (dihybrid) crosses, 52–55, 53f, 54f modified, 75–76 Tyrosine, 21 U Ubiquitin, 270, 317 Ultraviolet light absorption spectrum of, 183, 183f, 281f action spectrum of, 183, 183f, 281f mutations from, 183, 280, 281f repair of, 284, 284f Unidirectional replication, 200–201 Unit factors, 49, 50, 57, 264 U.S Food and Drug Administration (FDA), 395 Unscheduled DNA synthesis, 286 Uracil, 185, 185f V Vaccines attenuated, 397 DNA-based, 398 Ebola, 398 hepatitis B, 397 HIV, 398 human papillomavirus, 398 inactivated, 397 subunit, 397 from transgenic plants, 398 Variable gene activity hypothesis, 420 Variable number tandem repeats, 227 in DNA fingerprinting, 503–504, 505f–507f Variance, 442, 442f additive, 446 dominance, 446 environmental, 445 genotype-by-environment interaction, 445 genotypic, 445 interactive, 446 phenotypic, 444–446 Variation continuous, additive alleles and, 440 copy number, 127–128, 228–229, 370 in twins, 449 detection by artificial selection, 458 discontinuous, 77 environmental, 445 epigenetics and, 480, 481f founder effect and, 469, 470f gene duplication and, 127 genetic drift and, 469–470 genomic, 370, 459 genotypic, 445 Hardy-Weinberg law and, 459–464 heritability and, 444–448 from independent assortment, 58 from meiosis, 42 from migration, 468, 469f from mutation, 467–468 See also Mutation(s) phenotypic, 444–446 from protein structure, 267–270 single-nucleotide polymorphisms and, 370 sources of, 177, 370, 458 Vascular endothelial growth factor, 119 Vectibix (panitumab), 515 Vectors, 23 cloning, 23, 339, 340–344 bacterial artificial chromosome, 343 bacterial plasmid, 340–342, 341f expression, 343 viral, in gene therapy, 536–538, 540–541 Venter, J Craig, 363, 368, 373, 382, 401, 415 Vertical gene transfer, 161 Vicia faba, 199, 200f Vidaza (decitabine), 486 Virulent phages, 172 Virulent strains, 178 Viruses bacterial See Bacteriophage(s) cancer and, 333, 333t chromosomes of, 216, 216f as gene therapy vectors, 536–538, 540–541 RNA as genetic material in, 184 RNA-guided defenses against, in prokaryotes, 492–494 Visible mutations, 275 Vitamin A deficiency, Golden Rice and, 527–528 Vitravene (fomivirsen), 250 VNTR-based DNA fingerprinting, 503–504, 505f–507f Volker, Nicholas, 409 W Waddington, C.H., 481b Wallace, Alfred Russel, 19, 57, 457, 464 Wang, Andrew, 190 Warfarin (Coumadin), 516–517 Wartman, Lukas, 519b Watson, James, 21, 26, 184, 185, 196, 197 genome sequencing for, 373 Human Genome Project and, 367 Watson-Crick model, 188–189, 188f Webcutter, 358 Weiss, Samuel, 241 Western blotting, 350 white locus, 73–74 Whole-genome amplification, 409 Whole-genome sequencing, 362–363, 363f ethical aspects of, 414–415 medical applications of, 408–409 See also Genome sequencing Whole-genome shotgun cloning, 344 Whole-genome transcriptome analysis, of pathogens, 408 Wieschaus, Eric, 92, 422 Wildlife smuggling, 508b Wild-type alleles, 70 Wild-type phenotype, 144 Wilkins, Maurice, 21, 26 Wilson, Edmund B., 101 Wiskott-Aldrich syndrome, gene therapy for, 541–542 Wobble hypothesis, 238, 262 Woese, Carl, 490 Wollman, Ellie, 163–165 Wood, William, 170 Woods, Philip, 199 Woolly mammoths, genome of, 376 Wright, Sewall, 470 I-20 IN D EX X X chromosome, 32 dosage compensation and, 106–109 early studies of, 101 inactivation of, 107–109, 481 in sex determination, 101, 101f in Drosophila melanogaster, 109–110, 110f X inactivation center (Xic), 109 X rays cancer and, 334 as mutagens, 281, 281f Xenopus laevis, rDNA in, 126 Xeroderma pigmentosum, 286, 286f, 327 Xic (X inactivation center), 109 X-inactive specific transcript (XIST), 109 Xist gene, 109 X-linkage, 82–84 definition of, 82 dosage compensation and, 106–109 in Drosophila melanogaster, 82–83, 83f in humans, 84, 84f X-linked inhibitor of apoptosis, 409 X-linked mutations, 276 X-ray diffraction analysis, 186f, 187–188 XX/XO (Protenor) mode, of sex determination, 101 XX/XY (Lygaeus) mode, of sex determination, 101 Y Y chromosome, 32, 101 early studies of, 101 euchromatic regions of, 104 heterochromatic regions of, 104 in Klinefelter syndrome, 102–103, 102f in male development, 104–105 male-specific region of, 104, 104f pseudoautosomal regions of, 104, 104f in sex determination, 101–105, 101f sex-determining region of, 104, 104f, 105 in Turner syndrome, 102–103, 102f Y chromosome STR profiling, 506 Yamanaka, Shinya, 27 Yanofsky, Charles, 305 Yeast See also Saccharomyces cerevisiae autonomously replicating sequences in, 209 mitochondrial mutations in, 90 as recombinant protein host, 396 Yeast artificial chromosomes, 343 Y-linked mutations, 276 Young, Michael, 290 Yule, G Udny, 439 Z Z-DNA, 190 Zea mays selective breeding of, 398–399, 399f transposable elements in, 289–290, 290f Zinc-finger nucleases, in gene therapy, 542–543 Zinder, Norton, 172–173 Zip code, 501 Zip code banding protein 17, 501 ZNF9 gene, 316 Zolinza, 486 Zygote, 33 Zygotic genes, 422–426, 423f, 424t ZZ/ZW mode, of sex determination, 101 EVOLVING CONCEPT OF A GENE The Evolving Concept of the Gene is a new feature, integrated in key chapters, that highlights how scientists’ understanding of the gene has changed over time By underscoring how the conceptualization of the gene has evolved, our goal is to help students appreciate the process of discovery that has led to an ever more sophisticated understanding of hereditary information CHAPTER pg 58 Based on the pioneering work of Gregor Mendel, we can view the gene as a heritable unit factor that determines the expression of an observable trait, or phenotype CHAPTER pg 74 Based on the work of many geneticists following the rediscovery of Mendel’s work in the very early part of the twentieth century, the chromosome theory of inheritance was put forward, which hypothesized that chromosomes are the carriers of genes and that meiosis is the physical basis of Mendel’s postulates In the ensuing 40 years, the concept of a gene evolved to reflect that this hereditary unit can exist in multiple forms, or alleles, each of which can impact on the phenotype in different ways, leading to incomplete dominance, codominance, and even lethality It became clear that the process of mutation was the source of new alleles CHAPTER pg 152 Based on the gene-mapping studies in Drosophila and many other organisms from the 1920s through the mid-1950s, geneticists regarded genes as hereditary units organized in a specific sequence along chromosomes, between which recombination could occur Genes were thus viewed as indivisible “beads on a string.” CHAPTER pg 190 Based on the model of DNA put forward by Watson and Crick in 1953, the gene was viewed for the first time in molecular terms as a sequence of nucleotides in a DNA helix that encodes genetic information CHAPTER 18 pg 374 Based on the work of the ENCODE project, we now know that DNA sequences that have previously been thought of as “junk DNA,” which not encode proteins, are nonetheless often transcribed into what we call noncoding RNA (ncRNA) Since the function of some these RNAs is now being determined, we must consider whether the concept of the gene should be expanded to include DNA sequences that encode ncRNAs At this writing, there is no consensus, but it is important for you to be aware of these current findings CHAPTER 15 pg 304 The groundbreaking work of Jacob, Monod, and Lwoff in the early 1960s, which established the operon model for the regulation of gene expression in bacteria, expanded the concept of the gene to include noncoding regulatory sequences that are present upstream (59) from the coding region In bacterial operons, the transcription of several contiguous structural genes whose products are involved in the same biochemical pathway are regulated by a single set of regulatory sequences CHAPTER 13 pg 267 In the 1940s, a time when the molecular nature of the gene had yet to be defined, groundbreaking work of Beadle and Tatum provided the first experimental evidence concerning the product of genes, their “one-gene:oneenzyme” hypothesis This idea received further support and was later modified to indicate that one gene specifies one polypeptide chain CHAPTER 12 pg 249 The elucidation of the genetic code in the 1960s supported the concept that the gene is composed of a linear series of triplet nucleotides encoding the amino acid sequence of a protein While this is indeed the case in prokaryotes and viruses, in 1977, it became apparent that in eukaryotes, the gene is divided into coding sequences, called exons, which are interrupted by noncoding sequences, called introns (intervening sequences), which must be spliced out during production of the mature mRNA A Selection of Nobel Prizes Awarded for Research in Genetics or Genetics-Related Areas Year Recipients Nobel Prize* Discovery/Research Topic 2012 J.B Gurdon, S Yamanaka P/M Differentiated cells can be reprogrammed to become pluripotent 2009 E H Blackburn C W Greider J W Szostak P/M The nature and replication of the DNA of telomeres, and the discovery of the telomere-replenishing ribonucleoprotein enzyme telomerase 2008 O Shimomura, M Chalfie, R Tsien C Discovery and development of the green fluorescent protein (GFP) technology as a tool for genetic research 2007 M R Capecchi M J Evans O Smithies P/M Gene-targeting technology essential to the creation of knockout mice serving as animal models of human disease 2006 R Kornberg C Molecular basis of eukaryotic transcription 2006 A Z Fire, C C Mello P/M Gene silencing using RNA interference (RNAi) 2002 S Brenner, H R Horvitz, J E Sulston P/M Genetic regulation of organ development and programmed cell death (apoptosis) 2001 L Hartwell, T Hunt, P Nurse P/M Genes and regulatory molecules controlling the cell cycle 1997 S Prusiner P/M Prions, a new biological principle of infection 1995 E B Lewis, C Nusslein-Volhard, P/M E Wieschaus Genetic control of early development in Drosophila 1993 R Roberts, P Sharp P/M RNA processing of split genes 1993 K Mullis M Smith C Development of polymerase chain reaction (PCR) and site-directed mutagenesis (SDM) 1989 J M Bishop, H E Varmus P/M Role of retroviruses and oncogenes in cancer 1989 T R Cech, S Altman C Catalytic properties of RNA 1987 S Tonegawa P/M Genetic basis of antibody diversity 1983 B McClintock P/M Mobile genetic elements in maize 1982 A Klug C Crystalline structure analysis of significant complexes, including tRNA and nucleosomes 1980 P Berg, W Gilbert, F Sanger C Development of recombinant DNA and DNA sequencing technology 1978 W Arber, D Nathans, H O Smith P/M Recombinant DNA technology using restriction endonuclease technology 1976 B S Blumberg D C Gajdusek P/M Elucidation of the human prion-based diseases, kuru and Creutzfeldt-Jakob disease 1975 D Baltimore, R Delbecco, H Temin P/M Molecular genetics of tumor viruses 1970 N Borlaug PP Genetic improvement of Mexican wheat 1969 M Delbrück, A D Hershey, S E Luria P/M Replication mechanisms and genetic structure of bacteriophages 1968 H G Khorana, M W Nirenberg P/M Deciphering the genetic code 1968 R W Holley P/M Structure and nucleotide sequence of transfer RNA 1965 F Jacob, A M Lwoff, J L Monod P/M Genetic regulation of enzyme synthesis in bacteria 1962 F H C Crick, J D Watson, M H F Wilkins P/M Double helical model of DNA 1959 A Kornberg, S Ochoa P/M Biological synthesis of DNA and RNA 1958 G W Beadle, E L Tatum P/M Genetic control of biochemical processes 1958 J Lederberg P/M Genetic recombination in bacteria 1954 L Pauling C Alpha helical structure of proteins 1946 H J Muller P/M X-ray induction of mutations in Drosophila 1933 T H Morgan P/M Chromosomal theory of inheritance *C = Chemistry; P/M = Physiology or Medicine; PP = Peace Prize ... understanding of the organization of the human genome as well as the effects of mutations on coding and noncoding regions of genes, and the effects of mutations on development 14 .2 S pontaneous... oxidant damage 14 2 One of the most famous cases of an X-linked recessive mutation in humans is that of hemophilia found in the descendants of Britain’s Queen Victoria The pedigree of the royal family... 380 of FGFR2 gene Tyrosine to STOP codon at position 21 13 of fibrillin-1 gene Various short insertions throughout the LDLR gene Three-base-pair deletion of phenylalanine codon at position 508 of