applied probability - lange k.

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	379
Dung lượng	1,91 MB

Nội dung

Applied Probability Kenneth Lange Springer Springer Texts in Statistics Advisors: George Casella Stephen Fienberg Ingram Olkin Springer New York Berlin Heidelberg Hong Kong London Milan Paris Tokyo Springer Texts in Statistics Alfrd: Elements of Statistics for the Life and Social Sciences Berger: An Introduction to Probability and Stochastic Processes Bilodeau and Brenner: Theory of Multivariate Statistics Blom: Probability and Statistics: Theory and Applications Brockwell and Davis: Introduction to Times Series and Forecasting, Second Edition Chow and Teicher: Probability Theory: Independence, Interchangeability, Martingales, Third Edition Chrisfensen: Advanced Linear Modeliig: Multivariate, Time Series, and Spatial Data-Nonparamekic Regression and Response Surface Maximization, Second Edition Chrisfensen: Log-Linear Models and Lagistic Regression, Second Edition Chrisfensen: Plane Answers to Complex Questions: The Theory of Linear Creighfon: A First Course in Probability Models and Statistical Inference Davis.' Statistical Methods for the Analysis of Repeated Measurements Dean and Vow Design and Analysis of Experiments du Toif, Sfqn, and Stump$ Graphical Exploratory Data Analysis Durreft: Essentials of Stochastic Processes Edwarak Introduction to Graphical Modelling, Second Edition Finkelstein and Levin: Statistics for Lawyers Flury: A First Course in Multivariate Statistics Jobson: Applied Multivariate Data Analysis, Volume I Regression and Jobson: Applied Multivariate Data Analysis, Volume 11: Categorical and Kulbjleisch: Probability and Statistical Inference, Volume I: Probability, Kalbjleisch: Probability and Statistical Inference, Volume 11: Statistical Inference, Karr: Probability Kqfifz: Applied Mathematical Demography, Second Edition Kiefer: Introduction to Statistical Inference Kokoska and Nevison: Statistical Tables and Formulae Kulhrni: Modeling, Analysis, Design, and Control of Stochastic Systems Lunge: Applied Probability Lehmann: Elements of Large-Sample Theory Lehmann: Testing statistical Hypotheses, Second Edition Lehmann and CareNa: Theory of Point Estimation, Second Edition Lindman: Analysis of Variance in Experimental Design Lindsey: Applying Generalized Linear Models Models, Third Edition Experimental Design Multivariate Methods Second Edition Second Edition (continued aJler index) Kenneth Lange Applied Probability Springer Kenneth Iange Department of Biomathematics UCLA School of Medicine Las Angels, CA 9W5-I766 USA klange@ucla.edu Editorial Board George Casella Stephen Fienberg Ingram Olkin Depamnent of Statisti- Depaltmnt of staristics Department of Statistics University of Florida Carnegie Mellon University Stanford University Gainesville, FL. 32611-8545 Pitlsburgh. PA 15213-3890 Stanford. CA 94305 USA USA USA Library of Congress Cataloging-in-Publication Data Lange. Kenneth. Applied probability I Kenneth Lange. Includes bibliopphical lefcrrncca and index. ISBN 0-387004254 (Ilk. paper) p. cm. -(Springer texts in statistics) l. Rohdxlities. 1. S~octusds y. 1. Tick. R. Series QA273.U6&1 2W3 5 19.2-dc2 I 2003042436 ISBN CL38740425-4 @ 2003 Springer-Vedag New Yo&, he. All rightr reserved. This wok my not be kurlated or copied in whole or in part without the wrirtcn permission of the publisher (Sprbger-Verlag New York. Inc 175 Fim Avenue. New Yak NY I00LO. USA), ~XCCQ~ far brief CK-~ in cauwtim with wkun OT scholarly analysis. Use in connection with any fomi of infomuon srorage md nuievnl. clmmnic adaptation. somputcr sofware. or by similar M dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of wde names. mdcmarks. service marks. and similar terms. even if they are nor identified as such. is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Rinted in the United States of America. Rinted on acid-frec paper. 987654121 SPW lawma Typcserting: Pages cmred by Ule author using a Springer TEX maCi-0 package w.springer-ny.cam Springer-Vedag New York Berlin Heidelberg A membcr of BertcLmnnSpringcr Scirnce+Bwimss Media Gm6H Preface Despite the fears of university mathematics departments, mathematics educat,ion is growing rather than declining. But the truth of the matter is that the increases are occurring outside departments of mathematics. Engineers, computer scientists, physicists, chemists, economists, statisticians, biologists, and even philosophers teach and learn a great deal of mathematics. The teaching is not always terribly rigorous, but it tends to be better motivated and better adapted to the needs of students. In my own experience teaching students of biostatistics and mathematical biology, I attempt to convey both the beauty and utility of probability. This is a tall order, partially because probability theory has its own vocabulary and habits of thought. The axiomatic presentation of advanced probability typically proceeds via measure theory. This approach has the advantage of rigor, but it inwitably misses most of the interesting applications, and many applied scientists rebel against the onslaught of technicalities. In the current book, I endeavor to achieve a balance between theory and applications in a rather short compass. While the combination of brevity apd balance sacrifices many of the proofs of a rigorous course, it is still consis- tent with supplying students with many of the relevant theoretical tools. In my opinion, it better to present the mathematical facts without proof rather than omit them altogether. In the preface to his lovely recent textbook (1531, David Williams writes, “Probability and Statistics used to be married; then they separated, then they got divorced; now they hardly see each other.” Although this split is doubtless irreversible, at least we ought to be concerned with properly vi Preface bringing up their children, applied probability and computational statistics. If we fail, then science as a whole will suffer. You see before you my attempt to give applied probability the attention it deserves. My other recent book (951 covers computational statistics and aspects of computational probability glossed over here. This graduate-level textbook presupposes knowledge of multivariate calculus, linear algehra, and ordinary differential equations. In probability theory, students should be comfortable with elementary combinatorics, gen- erating functions, probability densities and distributions, expectations, and conditioning arguments. My intended audience includes graduate students in applied mathematics, biostatistics, computational biology, computer science, physics, and statistics. Because of the diversity of needs, instructors are encouraged to exercise their own judgment in deciding what chapters and.topics to cover. Chapter 1 reviews elementary probability while striving to give a brief survey of relevant results from measure theory. Poorly prepared students should supplement this material with outside reading. Well-prepared students can skim Chapter 1 until they reach the less well-knom' material of the final two sections. Section 1.8 develops properties of the multivariate normal distribution of special interest to students in biostatistics and statistics. This material h applied to optimization theory in Section 3.3 and to diffusion processes in Chapter 11. We get down to serious business in Chapter 2, which is an extended essay on calculating expectations. Students often camplain that probability is nothing more than a bag of tricks. For better or worse, they are confronted here with some of those tricks. Readers may want to skip the ha1 two sections of the chapter on surface area distributions on a first pass through the book. Chapter 3 touches on advanced topics from convexity, inequalities, and optimization. Beside the obvious applications to computational statistics, part of the motivation for this material is its applicability in calculating bounds on probabilities and moments. Combinatorics has the odd reputation of being difficult in spite of rely- ing on elementary methods. Chapters 4 and 5 are my stab at making the subject accessible and interesting. There is no doubt in my mind of combinatorics' practical importance. More and more we live in a world domiuated by discrete bits of information. The stress on algorithms in Chapter 5 is intended to appeal to computer scientists. Chapt,ers 6 through 11 cover core material on stochastic processes that I have taught to students in mathematical biology over a span of many years. If supplemented with appropriate sections from Chapters 1 and 2, there is su6cient material here for a traditional semester-long course in stochastic processes. Although my examples are weighted toward biology, particularly genetics, I have tried to achieve variety. The fortunes of this hook doubtless will hinge on how cornpelling readers find these example. Preface vii You can leaf through the Table of Contents to get a better idea of the topics covered in these chapters. In the final two chapters on Poisson approximation and number theory, the applications of probability to other branches of mathematics come to the fore. These chapters are hardly in the mainstream of stocliastic processes and are meant for independent reading as much as for classrootn presentation. All chapters come with exercises. These are not graded by difficulty, but hints are provided for some of the more difficult ones. My own practice is to require one problem for each hour and a half of lecture. Students are allowed to choose among the problems within each chapter and are graded on the best of the solutions they present. This strategy provides incentive for the students to attempt more than the minimum number of problems. I would like to thank my former and current UCLA and University of Michigan students for their help in debngging this text. In retrospect, there were far more contributing students than I can possibly credit. At the risk of offending the many, let me single out Brian Dolan, Ruzong Fan, David Hunter, Wei-hsnn Liao, Ben Redelings, Eric Schadt, Marc Suchard, Janet Sinsheinier, and Andy Ming-Ham Yip. I also thank John Kimmel of Springer-Verlag for his editorial assistance. Finally, I dedicate this book to my mother, Alma Lange, on the occasion of her 80th birthday. Thanks, Mom, for your cheerfulness and generosity in raising me. You were, and always will be, an inspiration to the whole family. Preface to the First Edition When I was a postdoctoral fellow at UCLA more than two decades ago, I learned genetic modeling from the delightful texts of Elandt-Johnson [2] and Cavalli-Sforza and Bodmer [1]. In teaching my own genetics course over the past few years, first at UCLA and later at the University of Michigan, I longed for an updated version of these books. Neither appeared and I was left to my own devices. As my hastily assembled notes gradually acquired more polish, it occurred to me that they might fill a useful niche. Research in mathematical and statistical genetics has been proceeding at such a breathless pace that the best minds in the field would rather create new theories than take time to codify the old. It is also far more profitable to write another grant proposal. Needless to say, this state of affairs is not ideal for students, who are forced to learn by wading unguided into the confusing swamp of the current scientific literature. Having set the stage for nobly rescuing a generation of students, let me inject a note of honesty. This book is not the monumental synthesis of population genetics and genetic epidemiology achieved by Cavalli-Sforza and Bodmer. It is also not the sustained integration of statistics and genetics achieved by Elandt-Johnson. It is not even a compendium of recommen- dations for carrying out a genetic study, useful as that may be. My goal is different and more modest. I simply wish to equip students already so- phisticated in mathematics and statistics to engage in genetic modeling. These are the individuals capable of creating new models and methods for analyzing genetic data. No amount of expertise in genetics can over- come mathematical and statistical deficits. Conversely, no mathematician or statistician ignorant of the basic principles of genetics can ever hope to identify worthy problems. Collaborations between geneticists on one side and mathematicians and statisticians on the other can work, but it takes patience and a willingness to learn a foreign vocabulary. So what are my expectations of readers and students? This is a hard question to answer, in part because the level of the mathematics required builds as the book progresses. At a minimum, readers should be familiar with notions of theoretical statistics such as likelihood and Bayes’ theorem. Calculus and linear algebra are used throughout. The last few chapters make fairly heavy demands on skills in theoretical probability and combinatorics. For a few subjects such as continuous time Markov chains and Poisson approximation, I sketch enough of the theory to make the expo- sition of applications self-contained. Exposure to interesting applications should whet students’ appetites for self-study of the underlying mathemat- x Preface ics. Everything considered, I recommend that instructors cover the chapters in the order indicated and determine the speed of the course by the mathematical sophistication of the students. There is more than ample material here for a full semester, so it is pointless to rush through basic theory if students encounter difficulty early on. Later chapters can be covered at the discretion of the instructor. The matter of biological requirements is also problematic. Neither the brief review of population genetics in Chapter 1 nor the primer of molecular genetics in Appendix A is a substitute for a rigorous course in modern genetics. Although many of my classroom students have had little prior exposure to genetics, I have always insisted that those intending to do research fill in the gaps in their knowledge. Students in the mathematical sciences occasionally complain to me that learning genetics is hopeless because the field is in such rapid flux. While I am sympathetic to the difficult intellectual hurdles ahead of them, this attitude is a prescription for failure. Although genetics lacks the theoretical coherence of mathematics, there are fundamental principles and crucial facts that will never change. My advice is follow your curiosity and learn as much genetics as you can. In scientific research chance always favors the well prepared. The incredible flowering of mathematical and statistical genetics over the past two decades makes it impossible to summarize the field in one book. I am acutely aware of my failings in this regard, and it pains me to exclude most of the history of the subject and to leave unmentioned so many important ideas. I apologize to my colleagues. My own work receives too much attention; my only excuse is that I understand it best. Fortunately, the recent book of Michael Waterman delves into many of the important topics in molecular genetics missing here [4]. I have many people to thank for helping me in this endeavor. Carol Newton nurtured my early career in mathematical biology and encouraged me to write a book in the first place. Daniel Weeks and Eric Sobel deserve special credit for their many helpful suggestions for improving the text. My genetics colleagues David Burke, Richard Gatti, and Miriam Meisler read and corrected my first draft of Appendix A. David Cox, Richard Gatti, and James Lake kindly contributed data. Janet Sinsheimer and Hongyu Zhao provided numerical examples for Chapters 10 and 12, respectively. Many students at UCLA and Michigan checked the problems and proofread the text. Let me single out Ruzong Fan, Ethan Lange, Laura Lazzeroni, Eric Schadt, Janet Sinsheimer, Heather Stringham, and Wynn Walker for their diligence. David Hunter kindly prepared the index. Doubtless a few errors remain, and I would be grateful to readers for their corrections. Finally, I thank my wife, Genie, to whom I dedicate this book, for her patience and love. [...]... Hardy-Weinberg model for an autosomal locus the genotype frequencies for the two sexes differ What is the ultimate frequency of a given allele? How long does it take genotype frequencies to stabilize at their Hardy-Weinberg values? 3 Consider an autosomal locus with m alleles in Hardy-Weinberg equilibrium If allele Ai has frequency pi , then show that a random nonm inbred person is heterozygous with probability. .. identification, Chapter 3 a section on Bayesian estimation of haplotype frequencies, Chapter 4 a section on case-control association studies, Chapter 7 new material on the gamete competition model, Chapter 8 three sections on QTL mapping and factor analysis, Chapter 9 three sections on the Lander-Green-Kruglyak algorithm and its applications, Chapter 10 three sections on codon and rate variation models, and... Lander-Green-Kruglyak Algorithm 188 9.12 Genotyping Errors 191 9.13 Marker Sharing Statistics 192 9.14 Problems 195 9.15 References 199 10 Molecular Phylogeny 10.1 Introduction 10.2 Evolutionary Trees 10.3 Maximum Parsimony 10.4 Review of Continuous-Time... painted white Instead of inheriting an all-black or an all-white representative of a given pair, a gamete inherits a chromosome that alternates between black and white The points of exchange are termed crossovers Any given gamete will have just a few randomly positioned crossovers per chromosome The recombination fraction between two loci on the same chromosome is the probability that 1 Basic Principles... random gametes in imitation of the Hardy-Weinberg law For example, the genotype of person 2 in Figure 1.1 has population frequency (pO pA2 )2 , being the union of two OA2 haplotypes Exceptions to the rule of linkage equilibrium often occur for tightly linked loci 1.3 Hardy-Weinberg Equilibrium Let us now consider a formal mathematical model for the establishment of Hardy-Weinberg equilibrium This model relies... union of gametes argument generalizes easily to more than two alleles Hardy-Weinberg equilibrium is a bit more subtle for X-linked loci Consider a locus on the X chromosome and any allele at that locus At generation n let the frequency of the given allele in females be qn and in males be rn Under our stated assumptions for Hardy-Weinberg equilibrium, one can show that qn and rn converge quickly to the... discussed in Chapter 11 Further free software for genetic analysis is listed in the recent book by Ott and Terwilliger [3] 0.1 References [1] Cavalli-Sforza LL, Bodmer WF (1971) The Genetics of Human Populations Freeman, San Francisco [2] Elandt-Johnson RC (1971) Probability Models and Statistical Methods in Genetics Wiley, New York [3] Terwilliger JD, Ott J (1994) Handbook of Human Genetic Linkage Johns... is i the maximum of this probability, and for what allele frequencies is this maximum attained? 4 In forensic applications of genetics, loci with high exclusion probabilities are typed For a codominant locus with n alleles, show that the probability of two random people having different genotypes is n−1 n n 2pi pj (1 − 2pi pj ) + e = i=1 j=i+1 p2 (1 − p2 ) i i i=1 under Hardy-Weinberg equilibrium [8]... Ck and denote their population frequencies by pi , qj , and rk Let θAB be the probability of recombination between loci A and B but not between B and C Define θBC similarly Let θAC be the probability of simultaneous recombination between loci A and B and between loci B and C Finally, adopt the usual conditions for Hardy-Weinberg and linkage equilibrium (a) Show that the gamete frequency Pn (Ai Bj Ck... the second edition I would particularly like to single out Jason Aten, Lara Bauman, Michael Boehnke, Ruzong Fan, Steve Horvath, David Hunter, Ethan Lange, Benjamin Redelings, Eric Schadt, Janet Sinsheimer, Heather Stringham, and my wife, Genie As a one-time editor, Genie will particularly appreciate that a comma now appears in my dedication between “wife” and “Genie,” thereby removing any suspicion

Ngày đăng: 31/03/2014, 16:23

Xem thêm