Mapping the Human Genome

CONCEPT MAP: THE GENERATION OF BIOCHEMICAL ENERGY

26.1 Mapping the Human Genome

How to Map a Genome

What does it mean to “map” a genome? For that matter, what exactly is a genetic

“map?” Many people tend to think of mapping the genome of a given organism as like reading a novel: you start at the first page and continue until you reach the end.

Applying this approach to a genome, you would start at one end of a chromosome and, proceeding base by base, record each nucleotide until you reached the end. Although this would result in an enormous amount of sequence data, it would tell you nothing about what the nucleotides may signify or where things are located with respect to each other. In reality, a genetic map is more like a map you get when you visit a large amuse- ment park. You may see the scary mansion in one corner of the map, the death-defying roller coaster in another, the kiddie-ride area in another, and so on, with all the paths that lead to them shown. The typical map of this type is made up of landmarks and where they are located with respect to each other. A genetic map is no different, with one huge exception: we don’t know exactly what many of the landmarks represent. For example, one genetic landmark (or marker) might represent a gene for a known trait, or might be a specific pattern of repeating nucleotides. So, in effect, a genomic map is a physical representation of all the landmarks in a genome and where they are with respect to one another.

4. What are polymorphisms and single- nucleotide polymorphisms (SNPs), and how can identifying them be useful?

THE GOAL: Be able to define polymorphisms and SNPs, and explain the signifi- cance of knowing the locations of SNPs.

( A, B.)

5. What is recombinant DNA?

THE GOAL: Be able to define recombinant DNA and explain how it is used for production of proteins by bacteria.

( B, C.)

6. What does the future hold for uses of genomic information?

THE GOAL: Be able to provide an over- view of the current and possible future applications of the human genome map.

( C, D.) 1. What is the working draft of the hu-

man genome and the circumstances of its creation?

THE GOAL: Be able to describe the genome-mapping projects and the major accomplishments of their working drafts.

( D.)

2. What are the various segments along the length of the DNA in a chromosome?

THE GOAL: Be able to describe the nature of telomeres, centromeres, exons and genes, and noncoding DNA. ( D.) 3. What are mutations?

THE GOAL: Be able to define mutations, identify what can cause them, and also identify their possible results.

( B, C, D.)

CHAPTER GOALS CONCEPTS

TO REVIEW

A. Structure, Synthesis, and Function of DNA

(Sections 25.2–25.3)

B. Base Pairing and Heredity (Sections 25.4–25.5)

C. Replication of DNA (Section 25.6)

D. Transcription, Translation, and the Genetic Code (Sections 25.8–25.10)

▲A sample of DNA ready for analysis.

Mapping the genes on a eukaryotic chromosome is no easy feat. When you consider that the nucleotides that code for proteins (the exons) are interrupted by noncoding nucleotides (the introns) (see Section 25.8), it should be clear what mapping challenges exist for any organism whose genome contains only a few dozen genes. These challenges are greatly magnified for the human genome, which contains between twenty and twenty-five thousand genes! Another challenge to consider is that there is neither spacing between “words” in the genetic code, nor any “punctuation.” Using the English language as an analogy, try to find a mean- ingful phrase in this:

sfdggmaddrydkdkdkrrrsjfljhadxccctmctmaqqqoumlittgklejagkjghjoailambrsslj The phrase is “mary had a little lamb”:

sfdggmaddrydkdkdkrrrsjfljhadxccctmctmaqqqoumlittgklejagkjghjoailambrsslj Now consider how hard finding meaning would be if the phrase you were looking for was in an unfamiliar language! It has been estimated that the string of C’s, G’s, T’s, and A’s that make up the human genome would fill 75,490 pages of standard-size type in a newspaper like the New York Times.

Two organizations led the effort to map the human genome: the Human Genome Project (a collection of 20 groups at not-for-profit institutes and universities) and Celera Genomics (a commercial biotechnology company). These two groups used different approaches to taking DNA apart, analyzing its base sequences, and reas- sembling the information. The Human Genome Project created a series of maps of finer and finer resolution (think of a satellite map program such as Google Earth, where you can progress from a satellite photo of the United States to a map of your state to a map of the city where you live to the street you live on and, ultimately, to a picture of the house you live in). Celera followed a seemingly random approach in which they fragmented DNA and then relied on instrumental and computer- driven techniques to establish the sequence (think of breaking a piece of glass into thousands of shards and then piecing them back together). It was believed that data obtained via the combination of these two approaches would speed up the enormous task of sequencing the human genome.

In 2001 the stunning announcement was made that 90% of the human genome sequence had been mapped in 15 months instead of the originally anticipated four years. By October 2004, an analysis of the Human Genome Project reported that 99%

of the gene-containing parts of the genome were sequenced and declared to be 99.999%

accurate. Additionally, the mapped sequence reportedly correctly identifies almost all known genes (99.74% of them, to be exact). At a practical level, this “gold-standard”

sequence data allows researchers to rely on highly accurate sequence information, priming new biomedical research.

The strategy utilized by the Human Genome Project for generating the complete map is shown in Figure 26.1. Pictured at the top is a type of chromosome drawing, known as an ideogram (pronounced id-ee-uh-gram), for human chromosome 21. The light- and dark- blue shadings represent the location of banding visible in electron micrographs. Chro- mosome 21 is the smallest human chromosome, with 37 million base pairs (abbreviated 37 Mb) and was the second chromosome to be mapped (chromosome 22 was the first).

In the first step, a genetic map was generated. The genetic map showed the physical location of markers, identifiable DNA sequences (some within genes, some within noncoding DNA) that were known to be inherited. The markers were an average of 1 million nucleotides apart. This is known as a genetic map because the order and locations of the markers are established by genetic studies of inheritance in related individuals.

The next map, the physical map, refines the distance between markers to about 100,000 base pairs. The physical map includes markers identified by a variety of experimental methods, most notably the use of restriction enzymes (discussed in Section 26.4).

S E C T I O N 2 6 . 1 Mapping the Human Genome 807

To proceed to a map of finer resolution, a chromosome was cut into large segments and multiple copies of the segments were produced. The segment copies are called clones, a term that refers to identical copies of organisms, cells, or in this case, DNA segments. The overlapping clones, which covered the entire length of the chromosome, were arranged in order to produce the next level of map (see Figure 26.1).

In the next step, each clone was cut into 500 base-pair fragments, and iden- tity and order of bases in each fragment was determined. There is a variety of sequencing methods available, but a common one involves copying the 500 base-pair fragment many times to generate a nested set of DNA molecules (a molecule that is one nucleotide long, another that is two nucleotides long, another that is three nucleotides long . . . up to 500 nucleotides long). Each molecule in the nested set ends in a C, G, T, or A, and each of these end nucleotides is f luorescently labeled in a different color. The f luorescently labeled molecules are then separated by size using gel electrophoresis, and the order of colors seen on the gel represent the order of nucleotides in the original 500 base-pair fragment. In the final step, all the different 500 base-pair sequences are assembled into a completed nucleotide map of the chromosome.

Genetic map:

1 Mb resolution Genetic map of markers

spaced about 1,000,000 base pairs apart Physical map with markers

spaced about 100,000 base pairs apart

Set of overlapping ordered clones covering 100 kb Each overlapping clone will

be sequenced, sequences assembled into the entire genomic sequence of 3.2 × 109

nucleotides, 37 Mb of which will be from

chromosome 21 Chromosome 21 (37 Mb)

Physical map:

100 kb resolution

Overlapping clones

Nucleotide

sequence ATGCCCGATTGCAT

◀ Figure 26.1

Human Genome Project mapping strategy.

In Chapter 18 we saw how electrophoresis is used as a technique to separate proteins by charge or size (Chemistry in Action: Protein Analysis by Electrophoresis). Gel electrophoresis is also routinely used with DNA, to separate DNA molecules by size.

◀ A researcher loads samples into a genome sequencer. The results obtained will appear on the moni- tor in the colored boxes shown.

Clones Identical copies of organisms, cells, or DNA segments from a single ancestor.

One Genome To Represent Us All?

One might wonder whose genome provided the standard against which those of all other human beings will be evaluated.

What individual symbolizes all of us? A star athlete? A brilliant scientist? A truly average person?

It does not take more than a moment’s thought to realize that using the DNA of a single individual to represent the entire human genome is a bad idea. What if the person chosen had a genetic aberration? How does a project of this importance deal with ethnic differences? Since no two individuals, other than identical twins, have exactly the same base sequences in their genomes, some sort of normalized average is needed.

To avoid this, the path chosen by both genome mapping groups was to employ DNA from a group of anonymous individuals. In the Human Genome Project, researchers collected blood (female) or sperm (male) samples from a large number of donors of diverse backgrounds. After removing all identifying labels, only a randomly selected few of the many collected samples were used for sequencing, so that neither donors nor scientists could know the origins of the DNA being sequenced. The ulti- mate map comes from a composite of these random samples.

The Celera project relied on anonymous donors as well.

DNA from five individuals (the leader of the Celera team, Craig Venter, has since acknowledged being one of these individuals)

was collected, mixed, and processed for sequencing. The anonymous donors were of European, African, American (North, Central, and South), and Asian ancestry. As a result, one of the most frequently asked questions about the human genome,

“Whose DNA was sequenced?” can never truly be answered because the DNA sequenced is a composite of the DNA of many anonymous individuals. The question of whether this one genomic map is indeed randomized enough to allow its general use is still debatable, but it does provide an excellent place from which to start.

See Chemistry in Action Problem 26.58 at the end of the chapter.

CHEMISTRY IN ACTION

▲ From the four nucleotides that compose DNA comes the incredibly diverse population that makes up the human race.

The approach taken by Celera Genomics was much bolder. In what has come to be known as their “shotgun approach,” Celera broke the human genome into fragments without identifying the origin of any given fragment. The fragments were copied many times to generate many clones of each area of the genome; ultimately they were cut into 500-base-long pieces and modified with fluorescently labeled bases that could be sequenced by high-speed machines. The resulting sequences were reassembled by identifying overlapping ends. At Celera, this monumental reassembly task was carried out using the world’s largest non-governmental supercomputing center.

PROBLEM 26.1

Decode the following sequence of letters to find an English phrase made entirely out of three-letter words. (Hint: First look for a word you recognize, then work forward and backward from there.)

uouothedtttrrfatnaedigopredsldjflsjfxxratponxbvateugfaqqthenqeutbadpagfratmeabrrx

Electron Configurations and the Periodic Table

The Brứnsted–Lowry Definition of Acids and Bases