Fragile X Mutations Affect Boys and Their Grandfathers
14.4 DNA Profiling Uses Hardy- Weinberg Assumptions
Hardy-Weinberg equilibrium is useful and interesting in a the- oretical sense to understand the conditions necessary for evo- lution to occur. These calculations are also the foundation of DNA profiling. Use of Hardy-Weinberg equilibrium in DNA profiling is based on the fact that the equation applies to parts of the genome that do not affect the phenotype, and are there- fore not subject to natural selection. Short repeated sequences that are not part of a protein-encoding gene fall into this cat- egory. Variability in such sequences can be used to identify CF affects 1 in 2,000 Caucasian newborns. Therefore,
the homozygous recessive frequency— cc if c represents the disease-causing allele—is 1/2,000, or 0.0005 in the popula- tion. This equals q 2 . The square root of q 2 is about 0.022, which equals the frequency of the c allele. If q equals 0.022, then p, or 1 − q, equals 0.978. Carrier frequency is equal to 2 pq, which equals (2)(0.978)(0.022), or 0.043—about 1 in 23. Figure 14.5 summarizes these calculations.
Since there is no CF in the woman’s family, her risk of having an affected child, based on population statistics, is low.
The chance of each potential parent being a carrier is about 4.3 percent, or 1 in 23. The chance that both are carriers is 1/23 multiplied by 1/23—or 1 in 529—because the probabil- ity that two independent events will occur equals the product of the probability that each event will happen alone. However, if they are both carriers, each of their children would face a 1 in 4 chance of inheriting the illness, based on Mendel’s first law of gene segregation. Therefore, the risk that these two unrelated Caucasian individuals with no family history of CF will have an affected child is 1/4 × 1/23 × 1/23, or 1 in 2,116.
For X-linked traits, different predictions of allele fre- quencies apply to males and females. For a female, who can be homozygous recessive, homozygous dominant, or a heterozy- gote, the standard Hardy-Weinberg equation of p 2 × 2 pq × q 2 applies. However, in males, the allele frequency is the phe- notypic frequency, because a male who inherits an X-linked recessive allele exhibits it in his phenotype.
The incidence of X-linked hemophilia A (see figure 6.8), for example, is 1 in 10,000 male (X h Y) births. Therefore, q (the frequency of the h allele) equals 0.0001. Using the for- mula p + q = 1, the frequency of the wild type allele is 0.9999. The incidence of carriers (X H X h ), who are all female, equals 2 pq, or (2)(0.0001)(0.9999), which equals 0.00019; this is 0.0002, or 0.02 percent, which equals about 1 in 5,000. The incidence of a female having hemophilia A (X h X h ) is q 2, or (0.0001) 2 , or about 1 in 100 million. Figure 14.6 summarizes these calculations.
Neat allele frequencies such as 0.6 and 0.4, or 0.7 and 0.3, are unusual. In actuality, single-gene disorders are very rare, and so the q component of the Hardy-Weinberg equation con- tributes little. Because this means that the value of p approaches 1, the carrier frequency, 2 pq, is very close to 2 q. Thus, the
Figure 14.5 Calculating the carrier frequency given population incidence: Autosomal recessive.
Cystic Fibrosis
incidence (autosomal recessive class) = 1/2,000 = 0.0005 q2 = 0.0005
q = 0.0005 = 0.022 p = 1 −q = 1 − 0.022 = 0.978
carrier frequency = 2pq = (2) (0.978) (0.022) = 0.043 = 1/23
Figure 14.6 Calculating the carrier frequency given population incidence: X-linked recessive.
Hemophilia A
incidence = 1/10,000 male births = 0.0001 q = 0.0001
p = 1 −q 1 − 0.0001 = 0.9999
carrier frequency (females) = 2pq = (2) (0.9999) (0.0001) = 0.00019 = about 1/5,000
affected females = q2 = (0.0001) (0.0001) = 1/100 million
Type
Repeat
Length Distribution Example
Fragment Sizes VNTRs (minisatellites) 10–80 bases not uniform TTCGGGTTG 50–1,500 bases STRs (microsatellites) 2–10 bases more uniform ACTT 50–500 bases
Table 14.4 Characteristics of Repeats Used in DNA Profiling individuals if the frequencies are known in par-
ticular populations.
Recall from chapter 12 that repeated sequences are scattered throughout the genome. Copy number variants (the number of copies of a particular repeat) can be followed, as alleles, to identify an individ- ual. The person is classified as a heterozygote or a homozygote based on the number of copies of the same repeat at the same chromosomal locus on the two homologs. A homozygote has the same number
of repeats on both homologs, such as individual 2 in figure 14.7 . A heterozygote has two different repeat sizes, such as the other two individuals in the figure. The copy numbers are distributed in the next generation according to Mendel’s law of segregation. A child of individual 1 and individual 2 in figure 14.7 , for example, could have any of the two possible combinations of the parental copy numbers, one from each parent: 2 and 3, or 4 and 3.
DNA profiling was pioneered on detecting copy num- ber variants of very short repeats and using them to identify or distinguish individuals. In general, the technique calculates the probability that certain combinations of repeat numbers will be in two DNA sources by chance. For example, if a DNA profile of skin cells taken from under the fingernails of an assault victim matches the profile from a suspect’s hair, and the likelihood is very low that those two samples would match by chance, that is strong evidence of guilt rather than a coin- cidental similarity. DNA evidence is more often valuable in excluding a suspect, and should be considered along with other types of evidence.
Although obtaining a DNA profile is a molecular tech- nique, interpreting it requires statistical analysis of population data. Two types of repeats are used in forensics and in identify- ing victims of disasters: variable number of tandem repeats (VNTRs), and short tandem repeats (STRs). Table 14.4 com- pares them.
DNA Profiling Began with Forensics
Sir Alec Jeffreys at Leicester University in the United Kingdom invented DNA profiling (then called DNA fingerprinting) in the 1980s. He detected differences in numbers of VNTRs among individuals by cutting DNA with restriction enzymes. These
Figure 14.7 DNA profiling detects differing numbers of repeats at specific
chromosomal loci. Individuals 1 and 3 are heterozygotes for the number of copies of a 5-base sequence at a particular chromosomal locus. Individual 2 is a homozygote, with the same number of repeats on the two copies of the chromosome. (Repeat number is considered an allele.)
Individual 2 Individual 1
GCATC GCATC GCATC
Individual 3 GCATC GCATC GCATC GCATC GCATC GCATC GCATC GCATC GCATC GCATC
GCATC GCATC GCATC
GCATC GCATC GCATC GCATC GCATC GCATC
enzymes naturally protect bacteria by cutting foreign DNA, such as DNA from viruses, at specific short sequences. They are used as “molecular scissors” in biotechnology, as discussed in chapter 19. Jeffreys measured DNA fragments using a technique called agarose gel electrophoresis, described in Reading 14.1.
The different-sized fragments that result from “digesting” DNA with these enzymes are called restriction fragment length poly- morphisms (RFLPs, pronounced “riflips”).
In the technique that Jeffreys used, DNA pieces migrate through a jellylike material (agarose or the more discriminat- ing polyacrylamide) when an electrical field is applied. A positive electrode is placed at one end of the gel strip, and a negative electrode at the other. The DNA pieces, carrying nega- tive charges because of their phosphate groups, move toward the positive pole. The pieces migrate according to size, with the shorter pieces moving faster and thus traveling farther in a given time. The pattern that forms when the different-sized fragments stop moving, with the shorter fragments closer to the positive pole and the longer ones farther away, creates a distinctive DNA pattern, or profile, that looks like a strip of black smears. An individual who is heterozygous for a repeat copy number vari- ant will have two bands for that locus, as shown in figure 14.8 for Individuals 1 and 3 from figure 14.7 . A locus for which an individual is homozygous has only one corresponding band (Individual 2), because both DNA pieces are the same size.
Jeffreys’ first cases proved that a boy was the son of a British citizen so that he could enter the country, and freed a man jailed for raping two schoolgirls. Then in 1988, Jeffreys’
approach matched DNA profiles from suspect Tommie Lee Andrews’ blood cells to sperm cells left on his victim in a noto- rious rape case. Jeffreys also used DNA profiling to demon- strate that Dolly, the Scottish sheep, was truly a clone of the 6-year-old ewe that donated her nucleus ( figure 14.9 ).
DNA can be obtained from any cell with a nucleus. Common sources include cells in hair, blood, skin, secre- tions, or the inside of the cheek. DNA sequences other than VNTRs are used when sample DNA is scarce. STRs are used when DNA is fragmented, such as in evidence from terrorist attacks and natural disasters. Their smaller size makes them more likely to persist in degraded DNA. STRs are amplified using the polymerase chain reaction (see chapter 19).
If DNA is extremely damaged, such that even STRs are obliterated,
DNA profiling is a standard and powerful tool in forensic investigations, agriculture, paternity testing, and historical investigations. Until 1986, it was unheard of outside of scientific circles. A dramatic rape case changed that.
Tommie Lee Andrews watched his victims months before he attacked so that he knew when they would be home alone. On a balmy Sunday night in May 1986, Andrews awaited Nancy Hodge, a young computer operator at Disney World in Orlando, Florida. The burly man
surprised her when she was in her bathroom removing her contact lenses. He covered her face, then raped and brutalized her repeatedly.
Andrews was very careful not to leave fingerprints, threads, hairs, or any other indication that he had ever been in Hodge’s home. But he left DNA. Thanks to a clearthinking crime victim and scientifically savvy lawyers, Andrews was soon at the center of a trial that would judge the technology that helped to convict him.
After the attack, Hodge went to the hospital, where she provided a vaginal secretion sample containing sperm. Two district
attorneys who had read about DNA testing sent some of the sperm to a biotechnology company that extracted DNA and cut it with restriction enzymes. The sperm’s DNA pieces were then mixed with labeled DNA probes that bound to complementary sequences.
The same extracting, cutting, and probing of DNA was done on white blood cells from Hodge and Andrews, who had been held as a suspect in several assaults. When the radioactive DNA pieces from each sample, which were the sequences where the probes had bound, were separated and displayed by size, the resulting pattern of bands—the DNA profile—matched exactly for the sperm sample and Andrews’ blood, differing from Hodge’s DNA (figure 1).
Andrews’ allele frequencies were compared to those for a representative African American population. At his first trial in November 1987, the judge, perhaps fearful that too much technical information would overwhelm the jury, did not allow the prosecution to cite population-based statistics. Without the appropriate allele frequencies, DNA profiling was just a comparison of smeary lines on test papers to see whether the patterns of DNA pieces in the forensic sperm sample looked like those for Andrews’ white blood cells. Although population- based statistics indicated that the possibility that Andrews’ DNA would match the evidence by chance was 1 in 10 billion, the prosecution could not mention this. After a mistrial was declared, the prosecution cited the precedent of using population statistics to derive databases on standard blood types. So when Andrews stood trial just 3 months later for raping a different woman, the judge permitted population analysis. Andrews was convicted.
Although DNA profiling is widely held to be an extremely accurate method to match a suspect to a sample, setbacks still happen. In June 2009, the Supreme Court ruled against prisoner William Osborne's repeated requests for further DNA testing, in a case very similar to the one described in the chapter opener. The crime was committed in Alaska, one of a very few states not to allow DNA testing of prisoners.
In these states, prisoners must take their cases to the national level. However, Peter Neufeld, co-director of the Innocence Project, said that the ruling would have limited impact on the availability of DNA profiling because most states, and many prosecutors, allow requested DNA testing.
Reading 14.1
DNA Profiling: Molecular Genetics Meets Population Genetics
White blood cell
Suspect’s blood
Victim Suspect
Rapist’s sperm
Chromosomes
“Snipped”
DNA strands
Electrophoresis sheet 1
2 3
4
5
6
7
8
9 10
Figure 1 DNA profiling. A blood sample (1) is collected from the suspect.
White blood cells are separated and burst open (2), releasing DNA (3). Restriction enzymes snip the strands into fragments (4), and electrophoresis aligns them by size in a groove on a sheet of gel (5). The resulting pattern of DNA fragments is transferred to a nylon sheet (6). It is then exposed to radioactively tagged probes (7) that bind the DNA areas used to establish identity. When the nylon sheet is placed against a piece of X-ray film (8) and processed, black bands appear where the probes bound (9).
This pattern of bands is a DNA profile (10). It may be compared to the victim’s DNA pattern, the rapist’s DNA obtained from sperm cells, and other biological evidence.
Today fluorescent labels are used.
mitochondrial DNA (mtDNA) is often used instead, particu- larly two regions of repeats that are highly variable in popula- tions. Because a single cell can yield hundreds or thousands of copies of the mitochondrial genome, even vanishingly small forensic samples can yield this DNA.
MtDNA analysis was critical in analyzing evidence from the September 11 terrorist attacks, most of which was extremely degraded. A more bizarre application was the case of the “voo- doo child.” The evidence was a boy’s torso found floating in the River Thames in east London. The name reflects the contents of the stomach, which suggested he had been the victim of a ritualistic killing. When the DNA profile of nuclear DNA from the torso did not match that of missing English children, inves- tigators widened the search by using a global mtDNA database.
This search led to the boy’s homeland, southwestern Nigeria.
He had been kidnapped, enslaved, and beheaded. Several sus- pects were arrested, thanks to tracking the torso to Africa.
Commercially available software enables researchers to integrate different types of DNA profiling data. For forensic applications, the FBI’s Combined DNA Index System (CODIS) shares DNA profiles electronically among local, state, and fed- eral crime laboratories. More than 3 million DNA profiles are stored, and searching CODIS for DNA profiles has led to more than 22,000 “cold hits”—identifying a suspect from DNA alone. CODIS uses the thirteen STRs shown in figure 14.10 . The probability that any two individuals have the same thirteen STR markers by chance is 1 in 250 trillion. Therefore, identity at all thirteen sites is a virtual match, but just one mismatch disproves identity.
Using Population Statistics to Interpret DNA Profiles
In forensics in general, the more clues, the better. Therefore, the power of DNA profiling is greatly expanded by track- ing repeats on several chromosomes. The numbers of cop- ies of a repeat are assigned probabilities (likelihood of being
Figure 14.8 DNA profiles. DNA fragments that include differing numbers of copies of the same repeat migrate at different speeds and stop moving at different points on a strip of polyacrylamide gel. These gels correspond to the individuals represented in figure 14.7.
Actual DNA profiles typically scan up to 25 repeats on different chromosomes.
4 repeats
2 repeats
− −
+ Individual 1
+
6 repeats 4 repeats
− −
+ Individual 3
+ 3 repeats
− −
+ Individual 2
+
Figure 14.9 Comparing DNA profiles. These DNA profiles compare the DNA of Dolly the cloned sheep (lane D), fresh donor udder tissue (U), and cultured donor udder tissue (C). The other twelve lanes represent other sheep. The match between Dolly and the two versions of her nucleus donor is obvious.
1 2 U C D 3 4 5 6 7 8 9 10 11 12 kb
12 10
8
6
4
Table 14.5 shows an example of multiplying frequen- cies of different repeat numbers. The result is the probability that this particular combination of repeat sizes would occur in a particular population. Logic then enters the equation. If the combination is very rare in the population the suspect comes from, and if it is found both in the suspect’s DNA and in crime scene evidence, such as a rape victim’s body or the stolen prop- erty in table 14.5 , the suspect’s guilt appears highly likely.
Figure 14.11 summarizes the procedure.
For the sequences used in DNA profiling, Hardy- Weinberg equilibrium is assumed. When it doesn’t apply, problems can arise. For example, the requirement of non- random mating for Hardy-Weinberg equilibrium wouldn’t be met in a community with a few very large families where present) based on their observed frequencies in a particular
population. Considering repeats on different chromosomes makes it possible to use the product rule to calculate the prob- abilities of particular combinations of repeat numbers occur- ring in a population, based on Mendel’s law of independent assortment.
The Hardy-Weinberg equation and the product rule are used to derive the statistics that back up a DNA profile.
First, the pattern of fragments indicates whether an individual is a homozygote or a heterozygote for each repeat, because a homozygote only has one band representing that locus. Geno- type frequencies are then calculated using parts of the Hardy- Weinberg equation. That is, p2 and q2 denote each of the two homozygotes for a two-allele repeat, and 2 pq represents the heterozygote. Then the frequencies are multiplied.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y
Figure 14.10 DNA profiling. A minimum of thirteen sites in the genome are compared to rule out suspects in crimes. The green bands indicate the thirteen original CODIS sites. More are increasingly being used.
Table 14.5 Multiplied Frequencies of Different Repeat Numbers
The Case: A famous painting has been stolen from a gallery. The thief planned the crime carefully, but as she was removing the painting from its display, she sneezed. She averted her face, but a few tiny droplets hit the wall. Detectives obtained a DNA profile using six repeat alleles, from different chromosomes, for DNA in nose lining cells in the droplets. Then they compared the profile to those compiled for eight people in the vicinity, all women, who had been identified by hidden camera. (Assume the suspects are in the same ethnic group.) Most of the samples matched at two to four sites, but one matched at all six. She was the crook. Notice how the probability of guilt increases with the number of matches. Matching for the very rare allele #3 is particularly telling.
Allele Repeat Frequency Cumulative Multiplied Frequencies
1 ACT on chromosome 4 1/60
2 GGC on chromosome 17 1/24 1/60 × 1/24 = 1/1,440
3 AAGCTA on chromosome 14 1/1,200 1/1,440 × 1/1,200 = 1/1,728,000
4 GGTCTA on chromosome 6 1/11 1/1,728,000 × 1/11 = 1/19,008,000
5 ATACGAGG on chromosome 9 1/40 1/19,008,000 × 1/40 = 1/760,320,000
6 GTA on chromosome 5 1/310 1/760,320,000 × 1/310 = 1/235,699,200,000
DNA collected from evidence at crime scene (blood, skin under victim’s fingernail)
Five specific DNA sequences from different chromosomes are labeled and separated by size
Visual match
Multiply genotype frequencies
Blood (victim)
Blood (suspect)
Skin (evidence)
Blood (evidence)
allele 1 allele 2 allele 1 allele 2
allele 1 allele 2 allele 1 allele 2
Conclusion:The probability that another person in the suspect’s population group has the same pattern of these alleles is approximately 1 in 3,226.
Cut, label, and probe selected DNA sequences (or use PCR)
DNA sequence 5 .60
.30 .50 .30 .15 .80
.20 .80 .18 DNA sequence 4
DNA sequence 3
DNA sequence 2
DNA sequence 1
0.36× 0.30× 0.24 × 0.04 × 0.29≈ 0.00031≈ 1/3,226
2pq = (2)(.60)(.30) = 0.36
2pq = (2)(.50)(.30) = 0.30
2pq = (2)(.15)(.80) = 0.24
p2 = (.2)2 = 0.04
2pq= (2)(.80)(.18) = 0.29
Figure 14.11 To solve a crime. A man was found brutally murdered, with bits of skin and blood beneath a fingernail. The bits were sent to a forensics lab as evidence, where the patterns of five DNA sequences were compared to patterns in blood from the victim as well as blood from a man being held as a suspect. The pattern for the crime scene evidence matched that for the suspect visually, but that wasn’t sufficient. Allele frequencies from the man’s ethnic group were used in the Hardy-Weinberg equation, yielding the probability that his DNA matched that of the skin and blood under the murdered man’s fingernail by chance.