1. Trang chủ
  2. » Công Nghệ Thông Tin

Probability for Statistics and Machine Learning AI fundamentals and advanced

803 0 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Probability for Statistics and Machine Learning Fundamentals and Advanced Topics
Tác giả Anirban Dasgupta
Người hướng dẫn G. Casella, S. Fienberg, I. Olkin
Trường học Purdue University
Chuyên ngành Statistics
Thể loại Book
Năm xuất bản 2011
Thành phố New York
Định dạng
Số trang 803
Dung lượng 3,93 MB

Cấu trúc

  • 1.1 Experiments and Sample Spaces (22)
  • 1.2 Conditional Probability and Independence (26)
  • 1.3 Integer-Valued and Discrete Random Variables (29)
    • 1.3.1 CDF and Independence (30)
    • 1.3.2 Expectation and Moments (34)
  • 1.4 Inequalities (40)
  • 1.5 Generating and Moment-Generating Functions (43)
  • 1.6 Applications of Generating Functions to a Pattern Problem (47)
  • 1.7 Standard Discrete Distributions (49)
  • 1.8 Poisson Approximation to Binomial (55)
  • 1.9 Continuous Random Variables (57)
  • 1.10 Functions of a Continuous Random Variable (63)
    • 1.10.1 Expectation and Moments (66)
    • 1.10.2 Moments and the Tail of a CDF (70)
  • 1.11 Moment-Generating Function and Fundamental Inequalities (72)
    • 1.11.1 Inversion of an MGF and Post’s Formula (74)
  • 1.12 Some Special Continuous Distributions (75)
  • 1.13 Normal Distribution and Confidence Interval for a Mean (82)
  • 1.14 Stein’s Lemma (87)
  • 1.15 Chernoff’s Variance Inequality (89)
  • 1.16 Various Characterizations of Normal Distributions (90)
  • 1.17 Normal Approximations and Central Limit Theorem (92)
    • 1.17.1 Binomial Confidence Interval (95)
    • 1.17.2 Error of the CLT (97)
  • 1.18 Normal Approximation to Poisson and Gamma (100)
    • 1.18.1 Confidence Intervals (101)
  • 1.19 Convergence of Densities and Edgeworth Expansions (103)
  • 2.1 Bivariate Joint Distributions and Expectations of Functions (116)
  • 2.2 Conditional Distributions and Conditional Expectations (121)
    • 2.2.1 Examples on Conditional Distributions (122)
  • 2.3 Using Conditioning to Evaluate Mean and Variance (125)
  • 2.4 Covariance and Correlation (128)
  • 2.5 Multivariate Case (132)
    • 2.5.1 Joint MGF (133)
    • 2.5.2 Multinomial Distribution (135)
  • 2.6 The Poissonization Technique (137)
  • 3.1 Joint Density Function and Its Role (144)
  • 3.2 Expectation of Functions (153)
  • 3.3 Bivariate Normal (157)
  • 3.4 Conditional Densities and Expectations (161)
    • 3.4.1 Examples on Conditional Densities and Expectations (163)
  • 3.5 Posterior Densities, Likelihood Functions, and Bayes Estimates (168)
  • 3.6 Maximum Likelihood Estimates (173)
  • 3.7 Bivariate Normal Conditional Distributions (175)
  • 3.8 Useful Formulas and Characterizations for Bivariate Normal (176)
    • 3.8.1 Computing Bivariate Normal Probabilities (178)
  • 3.9 Conditional Expectation Given a Set and Borel’s Paradox (179)
  • 4.1 Convolutions and Examples (188)
  • 4.2 Products and Quotients and the t - and F -Distribution (193)
  • 4.3 Transformations (198)
  • 4.4 Applications of Jacobian Formula (199)
  • 4.5 Polar Coordinates in Two Dimensions (201)
  • 4.6 n-Dimensional Polar and Helmert’s Transformation (203)
    • 4.6.1 Efficient Spherical Calculations with Polar (203)
    • 4.6.2 Independence of Mean and Variance (206)
    • 4.6.3 The t Confidence Interval (208)
  • 4.7 The Dirichlet Distribution (209)
    • 4.7.1 Picking a Point from the Surface of a Sphere (212)
    • 4.7.2 Poincar´e’s Lemma (212)
  • 4.8 Ten Important High-Dimensional Formulas (212)
  • 5.1 Definition and Some Basic Properties (220)
  • 5.2 Conditional Distributions (223)
  • 5.3 Exchangeable Normal Variables (226)
  • 5.4 Sampling Distributions Useful in Statistics (228)
    • 5.4.1 Wishart Expectation Identities (229)
    • 5.4.3 Distribution of Correlation Coefficient (233)
  • 5.5 Noncentral Distributions (234)
  • 5.6 Some Important Inequalities for Easy Reference (235)
  • 6.1 Basic Distribution Theory (242)
  • 6.2 More Advanced Distribution Theory (246)
  • 6.3 Quantile Transformation and Existence of Moments (250)
  • 6.4 Spacings (254)
    • 6.4.1 Exponential Spacings and R´eyni’s Representation (254)
    • 6.4.2 Uniform Spacings (255)
  • 6.5 Conditional Distributions and Markov Property (256)
  • 6.6 Some Applications (259)
    • 6.6.1 Records (259)
    • 6.6.2 The Empirical CDF (262)
  • 6.7 Distribution of the Multinomial Maximum (264)
  • 7.1 Some Basic Notation and Convergence Concepts (271)
  • 7.2 Laws of Large Numbers (275)
  • 7.3 Convergence Preservation (280)
  • 7.4 Convergence in Distribution (283)
  • 7.5 Preservation of Convergence and Statistical Applications (288)
    • 7.5.1 Slutsky’s Theorem (289)
    • 7.5.2 Delta Theorem (290)
    • 7.5.3 Variance Stabilizing Transformations (293)
  • 7.6 Convergence of Moments (295)
    • 7.6.1 Uniform Integrability (296)
    • 7.6.2 The Moment Problem and Convergence in Distribution (298)
    • 7.6.3 Approximation of Moments (299)
  • 7.7 Convergence of Densities and Scheff´e’s Theorem (303)
  • 8.1 Characteristic Functions of Standard Distributions (315)
  • 8.2 Inversion and Uniqueness (319)
  • 8.3 Taylor Expansions, Differentiability, and Moments (323)
  • 8.4 Continuity Theorems (324)
  • 8.5 Proof of the CLT and the WLLN (326)
  • 8.6 Producing Characteristic Functions (327)
  • 8.7 Error of the Central Limit Theorem (329)
  • 8.8 Lindeberg–Feller Theorem for General Independent Case (332)
  • 8.9 Infinite Divisibility and Stable Laws (336)
  • 8.10 Some Useful Inequalities (338)
  • 9.1 Central-Order Statistics (344)
    • 9.1.1 Single-Order Statistic (344)
    • 9.1.2 Two Statistical Applications (346)
    • 9.1.3 Several Order Statistics (347)
  • 9.2 Extremes (349)
    • 9.2.1 Easily Applicable Limit Theorems (349)
    • 9.2.2 The Convergence of Types Theorem (353)
  • 9.3 Fisher–Tippett Family and Putting it Together (354)
  • 10.1 Notation and Basic Definitions (361)
  • 10.2 Examples and Various Applications as a Model (361)
  • 10.3 Chapman–Kolmogorov Equation (366)
  • 10.4 Communicating Classes (370)
  • 10.5 Gambler’s Ruin (373)
  • 10.6 First Passage, Recurrence, and Transience (375)
  • 10.7 Long Run Evolution and Stationary Distributions (380)
  • 11.1 Random Walk on the Cubic Lattice (396)
    • 11.1.1 Some Distribution Theory (399)
    • 11.1.2 Recurrence and Transience (223)
    • 11.1.3 P´olya’s Formula for the Return Probability (403)
  • 11.2 First Passage Time and Arc Sine Law (404)
  • 11.3 The Local Time (408)
  • 11.4 Practically Useful Generalizations (410)
  • 11.5 Wald’s Identity (411)
  • 11.6 Fate of a Random Walk (413)
  • 11.7 Chung–Fuchs Theorem (415)
  • 11.8 Six Important Inequalities (417)
  • 12.1 Preview of Connections to the Random Walk (423)
  • 12.2 Basic Definitions (424)
    • 12.2.1 Condition for a Gaussian Process to be Markov (427)
    • 12.2.2 Explicit Construction of Brownian Motion (428)
  • 12.3 Basic Distributional Properties (429)
    • 12.3.1 Reflection Principle and Extremes (431)
    • 12.3.2 Path Properties and Behavior Near Zero and Infinity (433)
    • 12.3.3 Fractal Nature of Level Sets (436)
  • 12.4 The Dirichlet Problem and Boundary Crossing Probabilities (437)
    • 12.4.1 Recurrence and Transience (439)
  • 12.5 The Local Time of Brownian Motion (440)
  • 12.6 Invariance Principle and Statistical Applications (442)
  • 12.7 Strong Invariance Principle and the KMT Theorem (446)
  • 12.8 Brownian Motion with Drift and Ornstein–Uhlenbeck Process (448)
    • 12.8.1 Negative Drift and Density of Maximum (448)
    • 12.8.2 Transition Density and the Heat Equation (449)
    • 12.8.3 The Ornstein–Uhlenbeck Process (450)
  • 13.1 Notation (459)
  • 13.2 Defining a Homogeneous Poisson Process (460)
  • 13.3 Important Properties and Uses as a Statistical Model (461)
  • 13.4 Linear Poisson Process and Brownian Motion: A Connection (469)
  • 13.5 Higher-Dimensional Poisson Point Processes (471)
    • 13.5.1 The Mapping Theorem (473)
  • 13.6 One-Dimensional Nonhomogeneous Processes (474)
  • 13.7 Campbell’s Theorem and Shot Noise (477)
    • 13.7.1 Poisson Process and Stable Laws (479)
  • 14.1 Illustrative Examples and Applications in Statistics (484)
  • 14.2 Stopping Times and Optional Stopping (489)
    • 14.2.1 Stopping Times (490)
    • 14.2.2 Optional Stopping (491)
    • 14.2.3 Sufficient Conditions for Optional Stopping Theorem (493)
    • 14.2.4 Applications of Optional Stopping (495)
  • 14.3 Martingale and Concentration Inequalities (498)
    • 14.3.1 Maximal Inequality (498)
    • 14.3.2 Inequalities of Burkholder, Davis, and Gundy (501)
    • 14.3.3 Inequalities of Hoeffding and Azuma (504)
    • 14.3.4 Inequalities of McDiarmid and Devroye (506)
    • 14.3.5 The Upcrossing Inequality (509)
  • 14.4 Convergence of Martingales (511)
    • 14.4.1 The Basic Convergence Theorem (511)
    • 14.4.2 Convergence in L 1 and L 2 (514)
  • 14.5 Reverse Martingales and Proof of SLLN (515)
  • 14.6 Martingale Central Limit Theorem (518)
  • 15.1 Standard Probability Metrics Useful in Statistics (526)
  • 15.2 Basic Properties of the Metrics (529)
  • 15.3 Metric Inequalities (536)
  • 15.4 Differential Metrics for Parametric Families (540)
    • 15.4.1 Fisher Information and Differential Metrics (541)
    • 15.4.2 Rao’s Geodesic Distances on Distributions (543)
  • 16.1 Basic Notation and Definitions (548)
  • 16.2 Classic Asymptotic Properties of the Empirical Process (550)
    • 16.2.1 Invariance Principle and Statistical Applications (552)
    • 16.2.2 Weighted Empirical Process (555)
    • 16.2.3 The Quantile Process (557)
    • 16.2.4 Strong Approximations of the Empirical Process (558)
  • 16.3 Vapnik–Chervonenkis Theory (559)
    • 16.3.1 Basic Theory (559)
    • 16.3.2 Concrete Examples (561)
  • 16.4 CLTs for Empirical Measures and Applications (564)
    • 16.4.1 Notation and Formulation (564)
    • 16.4.2 Entropy Bounds and Specific CLTs (565)
    • 16.4.3 Concrete Examples (568)
  • 16.5 Maximal Inequalities and Symmetrization (568)
  • 16.6 Connection to the Poisson Process (572)
  • 17.1 Large Deviations for Sample Means (581)
    • 17.1.1 The Cram´er–Chernoff Theorem in R (581)
    • 17.1.2 Properties of the Rate Function (585)
    • 17.1.3 Cram´er’s Theorem for General Sets (587)
  • 17.3 The t-Statistic (591)
  • 17.4 Lipschitz Functions and Talagrand’s Inequality (593)
  • 17.5 Large Deviations in Continuous Time (595)
    • 17.5.1 Continuity of a Gaussian Process (597)
    • 17.5.2 Metric Entropy of T and Tail of the Supremum (598)
  • 18.1 One-Parameter Exponential Family (604)
    • 18.1.1 Definition and First Examples (605)
  • 18.2 The Canonical Form and Basic Properties (610)
    • 18.2.1 Convexity Properties (611)
    • 18.2.2 Moments and Moment Generating Function (612)
    • 18.2.3 Closure Properties (615)
  • 18.3 Multiparameter Exponential Family (617)
  • 18.4 Sufficiency and Completeness (621)
    • 18.4.1 Neyman–Fisher Factorization and Basu’s Theorem (83)
    • 18.4.2 Applications of Basu’s Theorem to Probability (625)
  • 18.5 Curved Exponential Family (628)
  • 19.1 The Ordinary Monte Carlo (636)
    • 19.1.1 Basic Theory and Examples (636)
    • 19.1.2 Monte Carlo P -Values (643)
    • 19.1.3 Rao–Blackwellization (644)
  • 19.2 Textbook Simulation Techniques (645)
    • 19.2.1 Quantile Transformation and Accept–Reject (645)
    • 19.2.2 Importance Sampling and Its Asymptotic Properties (650)
    • 19.2.3 Optimal Importance Sampling Distribution (654)
    • 19.2.4 Algorithms for Simulating from Common (655)
  • 19.3 Markov Chain Monte Carlo (658)
    • 19.3.1 Reversible Markov Chains (660)
    • 19.3.2 Metropolis Algorithms (663)
  • 19.4 The Gibbs Sampler (666)
  • 19.5 Convergence of MCMC and Bounds on Errors (672)
    • 19.5.1 Spectral Bounds (674)
    • 19.5.2 Dobrushin’s Inequality and Diaconis–Fill– (678)
    • 19.5.3 Drift and Minorization Methods (680)
  • 19.6 MCMC on General Spaces (683)
    • 19.6.1 General Theory and Metropolis Schemes (683)
    • 19.6.2 Convergence (102)
    • 19.6.3 Convergence of the Gibbs Sampler (691)
  • 19.7 Practical Convergence Diagnostics (694)
  • 20.1 The Bootstrap (710)
    • 20.1.1 Consistency of the Bootstrap (713)
    • 20.1.2 Further Examples (705)
    • 20.1.3 Higher-Order Accuracy of the Bootstrap (720)
    • 20.1.4 Bootstrap for Dependent Data (722)
  • 20.2 The EM Algorithm (725)
    • 20.2.1 The Algorithm and Examples (727)
    • 20.2.2 Monotone Ascent and Convergence of EM (83)
    • 20.2.3 Modifications of EM (735)
  • 20.3 Kernels and Classification (736)
    • 20.3.1 Smoothing by Kernels (736)
    • 20.3.2 Some Common Kernels in Use (738)
    • 20.3.3 Kernel Density Estimation (253)
    • 20.3.4 Kernels for Statistical Classification (745)
    • 20.3.5 Mercer’s Theorem and Feature Maps (753)
  • A.1 Glossary of Symbols (768)
  • A.2 Moments and MGFs of Common Distributions (772)
  • A.3 Normal Table (776)

Nội dung

This is the companion second volume to my undergraduate text Fundamentals of Probability: A First Course. The purpose of my writing this book is to give graduate students, instructors, and researchers in statistics, mathematics, and computer science a lucidly written unique text at the confluence of probability, advanced stochastic processes, statistics, and key tools for machine learning. Numerous topics in probability and stochastic processes of current importance in statistics and machine learning that are widely scattered in the literature in many different specialized books are all brought together under one fold in this book. This is done with an extensive bibliography for each topic, and numerous workedout examples and exercises. Probability, with all its models, techniques, and its poignant beauty, is an incredibly powerful tool for anyone who deals with data or randomness. The content and the style of this book reflect that philosophy; I emphasize lucidity, a wide background, and the farreaching applicability of probability in science. The book starts with a selfcontained and fairly complete review of basic probability, and then traverses its way through the classics, to advanced modern topics and tools, including a substantial amount of statistics itself. Because of its nearly encyclopaedic coverage, it can serve as a graduate text for a yearlong probability sequence, or for focused short courses on selected topics, for selfstudy, and as a nearly unique reference for research in statistics, probability, and computer science. It provides an extensive treatment of most of the standard topics in a graduate probability sequence, and integrates them with the basic theory and many examples of several core statistical topics, as well as with some tools of major importance in machine learning. This is done with unusually detailed bibliographies for the reader who wants to dig deeper into a particular topic, and with a huge repertoire of workedout examples and exercises. The total number of workedout examples in this book is 423, and the total number of exercises is 808. An instructor can rotate the exercises between semesters, and use them for setting exams, and a student can use them for additional exam preparation and selfstudy. I believe that the book is unique in its range, unification, bibliographic detail, and its collection of problems and examples.

Experiments and Sample Spaces

Probability theory begins with the concept of a sample space, which encompasses all possible outcomes of a specific experiment For instance, when a coin is tossed twice and the results of each toss are documented, the sample space consists of all potential outcomes from this coin-tossing scenario.

A DasGupta, Probability for Statistics and Machine Learning: Fundamentals and Advanced Topics, Springer Texts in Statistics, DOI 10.1007/978-1-4419-9634-3 1, c Springer Science+Business Media, LLC 2011

1 are HH;HT;TH;TT, withH denoting the occurrence of heads andT denoting the occurrence of tails We call

D fHH;HT;TH;TTg the sample space of the experiment.

A sample space refers to a general set, which can be either finite or infinite An illustrative example of an infinite sample space is the process of tossing a coin repeatedly until heads appears for the first time, with the number of trials recorded In this scenario, the sample space consists of a countably infinite set.

Sample spaces can be uncountably infinite, such as when selecting a random number from the interval [0, 1], where the sample space is represented as D [0, 1] In this scenario, the set is uncountably infinite, and individual elements within the sample space are denoted as ! The initial focus is on defining events and clarifying the concept of the probability of an event.

In probability theory, an event is defined as any subset of a sample space, which can include the empty set and the entire sample space itself Events can consist of a single sample point, known as a singleton set To assign probabilities to these events in a logically consistent manner, it is essential to focus on subsets that are interconnected, referred to as a σ-field In most practical scenarios, including those with infinite sample spaces, the events of interest will belong to an appropriate σ-field Therefore, we will consider events simply as subsets of the sample space without further emphasis on the σ-field concept.

Here is a definition of what counts as a legitimate probability on events.

Definition 1.2 Given a sample space, a probability or a probability measure on is a functionP on subsets ofsuch that

(c) Given disjoint subsetsA1; A2; : : : of; P [ 1 iD1 Ai/DP1 iD1P Ai/:

Countable additivity, referred to as property (c), is an assumption rather than a provable statement Our experience suggests that treating this assumption as valid yields useful and credible solutions to various problems, making it a reasonable basis for analysis.

In the realm of probability, while there is a consensus among probabilists that countable additivity is a natural concept, this book will not delve into that debate It is essential to note that finite additivity is encompassed within countable additivity; specifically, if there exists a finite number of disjoint subsets, the principles of finite additivity apply.

A1; A2; : : : ; Amof, thenP [ m iD1 Ai/D Pm iD1P Ai/:Also, it is useful to note that the last two conditions in the definition of a probability measure imply that

P /, the probability of the empty set or the null event, is zero.

In probability notation, it is more precise to denote the probability of a singleton set {f} as P({f}) However, to simplify expressions and reduce clutter, we often use the more convenient notation P(f).

One pleasant consequence of the axiom of countable additivity is the following basic result We do not prove it here as it is a simple result; seeDasGupta(2010) for a proof.

Theorem 1.1 LetA1A2A3 be an infinite family of subsets of a sample spacesuch thatAn#A Then,P An/!P A/asn! 1.

Next, the concept of equally likely sample points is a very fundamental one.

Definition 1.3 Letbe a finite sample space consisting ofN sample points We say that the sample points are equally likely ifP !/D N 1 for each sample point!.

An immediate consequence, due to the addivity axiom, is the following useful formula.

Proposition Letbe a finite sample space consisting ofN equally likely sample points LetAbe any event and supposeAcontainsndistinct sample points Then

N D Number of sample points favorable toA

Total number of sample points : Let us see some examples.

In a scenario where five pairs of shoes are stored in a closet, and four individual shoes are randomly selected, the probability of selecting at least one complete pair among the chosen shoes is a key consideration.

The total number of sample points is 10

In the D 210 study, random selection ensures that each sample point has an equal likelihood of being chosen To form at least one complete pair, we can either select two complete pairs or one complete pair along with two nonconforming shoes The total combinations for selecting two complete pairs amount to 5.

D10 ways Exactly one complete pair can be chosen in 5

1 term is for choosing the pair that is complete; the 4

To determine the probability of selecting at least one complete pair from four shoes, we consider two incomplete pairs From each pair, one shoe is chosen—either the left or the right The resulting probability is calculated as 10C120 divided by 10D13!D:62.

In five-card poker, players are dealt five random cards from a standard 52-card deck The game features various hands with different levels of rarity, and this article focuses on calculating the probabilities of achieving two pairs, specifically the hand containing an Ace and another pair.

A two pairs hand consists of two cards of one rank, two cards of another rank, and a fifth card of a different rank In contrast, a flush is defined as a hand containing five cards of the same suit, provided that the cards do not form a sequential order.

To find P B/, note that there are 10 ways to select 5 cards from a suit such that the cards are in a sequence, namely,fA; 2; 3; 4; 5g;f2; 3; 4; 5; 6g; : : : ;f10; J; Q; K; Ag, and so,

These are basic examples of counting arguments that are useful whenever there is a finite sample space and we assume that all sample points are equally likely.

A major result in combinatorial probability is the inclusionexclusion formula, which says the following.

Theorem 1.2 LetA1; A2; : : : ; An bengeneral events Let

In a Bridge game, we analyze the probability that a specific player, North, has a void in at least one suit To do this, we label the suits as 1, 2, 3, and 4, allowing for a systematic evaluation of North's hand composition Understanding this probability is crucial for strategic gameplay and decision-making in Bridge.

Ai DNorth’s hand is void in suit i.

Then, by the inclusion exclusion formula,

P North’s hand is void in at least one suit/

D:051;which is small, but not very small.

Conditional Probability and Independence

The inclusion-exclusion formula can be challenging to apply precisely due to the complexity of calculating quantities \( S_j \) for large indices \( j \) However, it provides valuable bounds for the probability of the union of \( n \) general events, offering insights in both directions.

Theorem 1.3 (Bonferroni Bounds) Given n events A1; A2; : : : ; An , let pn D

P [ n iD1 Ai/:Then, pnS1IpnS1S2IpnS1S2CS3I : : : :

Conditional probability and independence are essential concepts in probability and statistics Conditional probabilities involve adjusting beliefs based on new information, while independence signifies that new information does not affect existing beliefs Furthermore, assuming independence can greatly streamline the development, mathematical analysis, and validation of various statistical tools and procedures.

Definition 1.4 LetA; B be general events with respect to some sample space, and supposeP A/ > 0 The conditional probability ofBgivenAis defined as

Some immediate consequences of the definition of a conditional probability are the following.

Theorem 1.4 (a) (Multiplicative Formula) For any two events A, B such that

(b) For any two events A, B such that0 < P A/ < 1, one hasP B/ D P BjA/

(c) (Total Probability Formula) IfA1; A2; : : : ; A k form a partition of the sample space , (i.e.,Ai \Aj D for alli ¤ j, and[ k iD1 Ai D ), and if0 :5 We see that there is an advantage in starting first.

Definition 1.5 A collection of eventsA1; A2; : : : ; Anis said to be mutually inde- pendent (or just independent) if for eachk; 1 k n, and anykof the events,

Ai 1 ; : : : ; Ai k ; P Ai 1 \ Ai k / D P Ai 1 / P Ai k /: They are called pairwise independent if this property holds forkD2.

Many individuals purchase lottery tickets hoping for good fortune, but statistically, this practice often results in financial loss For instance, in a weekly state lottery where five numbers are drawn from a pool of 00 to 49 without replacement, the odds of winning with a single ticket are exceedingly low.

If an individual purchases a lottery ticket every week for 40 years, the probability of winning at least once is approximately 1.14 in 7,524,000, which highlights the extremely low odds of winning This calculation is based on the assumption that each weekly lottery is independent, a premise that is crucial for the accuracy of the probability estimation.

Conditional probabilities P(A|B) and P(B|A) are often confused For instance, in a group of lung cancer patients with a high percentage of smokers, we can only conclude that P(B|A) is large, indicating many lung cancer patients are smokers However, this does not imply that smoking increases the likelihood of lung cancer, meaning P(A|B) cannot be inferred to be large To accurately calculate P(A|B) when P(B|A) is known, Bayes’ theorem provides a straightforward formula.

Theorem 1.5 LetfA1; A2; : : : ; Amgbe a partition of a sample space LetBbe some fixed event Then

In a multiple choice exam with five alternatives, a student selects one option as the correct answer The student has a 70% probability of knowing the correct answer, while there is a 30% chance that the student randomly guesses If a question is answered correctly, we aim to determine the probability that the student actually knew the correct answer.

ADThe student knew the correct answer;

BDThe student answered the question correctly:

We want to computeP AjB/ By Bayes’ theorem,

Before the student answered the question, the probability of her knowing the correct answer was 7% However, after she answered correctly, the posterior probability that she knew the answer rose significantly to 92.1% This illustrates the essence of Bayes' theorem, which updates prior beliefs to reflect new evidence.

Integer-Valued and Discrete Random Variables

CDF and Independence

A cumulative distribution function (CDF) is a crucial concept in probability theory, representing the likelihood that a random variable X is less than or equal to a specified value x This definition applies universally to all types of random variables, not just discrete ones, highlighting its fundamental role in understanding probability distributions.

Definition 1.8 The cumulative distribution function of a random variableX is the functionF x/DP Xx/; x 2R.

Definition 1.9 LetXhave the CDFF x/ Any numbermsuch thatP Xm/:5, and alsoP Xm/:5is called a median ofF, or equivalently, a median ofX.

Remark The median of a random variable need not be unique A simple way to characterize all the medians of a distribution is available.

Proposition LetX be a random variable with the CDFF x/ Letm0 be the first xsuch thatF x/:5, and letm1 be the lastxsuch thatP X x/:5 Then, a numbermis a median ofXif and only ifm 2Œm0; m1

The cumulative distribution function (CDF) of a random variable adheres to specific properties, and any function that meets these criteria qualifies as a valid CDF This means it can represent the CDF of a suitably defined random variable The essential properties of CDFs are outlined in the following results.

Theorem 1.6 A functionF x/is the CDF of some real-valued random variableX if and only if it satisfies all of the following properties.

(c) Given any real numbera; F x/# F a/asx#a.

(d) Given any two real numbersx; y; x < y; F x/F y/:

Right continuity, or continuity from the right, is a key property of cumulative distribution functions (CDFs) While a CDF may exhibit right continuity, it is important to note that it does not necessarily have to be continuous from the left This is particularly evident in the case of discrete random variables, where the CDF experiences jumps at specific values At these jump points, the CDF fails to maintain left continuity.

Proposition LetF x/be the CDF of some random variableX Then, for anyx,

(a) P X Dx/ DF x/limy"xF y/DF x/F x/, including those points xfor whichP X Dx/D0.

Example 1.8 (Bridge) Consider the random variable

X DNumber of aces in North’s hand in a Bridge game:

Clearly,X can take any of the valuesx D 0; 1; 2; 3; 4 IfX D x, then the other

13xcards in North’s hand must be non-ace cards Thus, the pmf ofXis

In decimals, the pmf ofX is: x 0 1 2 3 4 p(x) 304 439 213 041 003

The CDF ofXis a jump function, taking jumps at the values0; 1; 2; 3; 4, namely the possible values ofX The CDF is

Example 1.9 (Indicator Variables) Consider the experiment of rolling a fair die twice and now define a random variableY as follows.

Y D1if the sum of the two rolls X is an even numberI

Y D0if the sum of the two rolls X is an odd number:

If we letAbe the event thatX is an even number, thenY D 1ifAhappens, and

Y D 0ifAdoes not happen Such random variables are called indicator random variables and are immensely useful in mathematical calculations in many complex situations.

1.3 Integer-Valued and Discrete Random Variables 11

Definition 1.10 LetAbe any event in a sample space The indicator random variable forAis defined as

IA D0 if A does not happen:

Thus, the distribution of an indicator variable is simply P IA D 1/DP A/I

An indicator variable is also called a Bernoulli variable with parameterp, where pis justP A/ We later show examples of uses of indicator variables in calculation of expectations.

In various applications, understanding the distribution of a function, denoted as g(X), based on a fundamental random variable X is crucial In the case of discrete variables, determining the distribution of such a function is straightforward and can be accomplished easily.

Proposition (Function of a Random Variable) Let X be a discrete random variable andP Y D g.X / a real-valued function of X Then, P Y D y/ D xW g.x/Dyp.x/.

Example 1.10 SupposeX has the pmf p.x/D c

Suppose we want to find the distribution of two functions ofX:

First, the constantcmust be explicitly evaluated By directly summing the values,

The function g(X) is a one-to-one function, while h(X) is not The possible values of Y are 0, ±1, ±8, and ±27 For instance, the probability P(Y = 0) equals P(X = 0), resulting in c = 5 Similarly, P(Y = 1) is equal to P(X = 1) minus 5 In general, for y = 0, ±1, ±8, and ±27, the probability P(Y = y) can be expressed as P(X = y/3) = 1 + c * (2/3) with c equal to 5.

However,Z D h.X /is not a one-to-one function ofX The possible values of

So, for example,P ZD0/DP XD 2/CP XD0/CP X D2/D 7 5 cD 7:The pmf ofZ Dh.X /is: z 1 0 1

In probability theory, the independence of random variables is a fundamental concept that applies to both finite and infinite collections For an infinite collection, it is essential that every finite subcollection of these random variables remains independent The formal definition of independence for a finite collection of random variables is crucial for understanding their behavior and interactions.

Definition 1.11 Let X1; X2; : : : ; X k be k 2 discrete random variables defined on the same sample space We say that X1; X2; : : : ; X k are independent if

According to the definition of independence in random variables, if X1 and X2 are independent, then any function of X1 and any function of X2 will also maintain independence This principle leads to a broader conclusion regarding the independence of functions derived from independent random variables.

Theorem 1.7 LetX1; X2; : : : ; X k bek 2 discrete random variables, and sup- pose they are independent Let U D f X1; X2; : : : ; Xi/ be some function of

X1; X2; : : : ; Xi , and V D g.XiC1; : : : ; Xk/be some function of XiC1; : : : ; Xk Then,U andV are independent.

This result is true of any types of random variables X1; X2; ; Xk , not just discrete ones.

A common notation of wide use in probability and statistics is now introduced.

When a set of random variables \( X_1, X_2, \ldots, X_k \) are independent and share the same cumulative distribution function (CDF), denoted as \( F \), they are classified as independent and identically distributed (iid) This concept is commonly abbreviated as iid, signifying that each variable is both independent and identically distributed according to the same statistical distribution \( F \).

In the experiment of tossing a fair coin four times, let X1 represent the number of heads in the first two tosses and X2 denote the number of heads in the last two tosses It is evident that X1 and X2 are independent, as the outcomes of the last two tosses do not influence the results of the first two This independence can be mathematically confirmed by applying the formal definition of independence in probability theory.

In an experiment involving the random selection of 13 cards from a standard 52-card deck, the variables X1, representing the number of aces, and X2, representing the number of clubs, are not independent Specifically, the probability of drawing four aces (X1 = 4) while having no clubs (X2 = 0) is zero, even though the probabilities of drawing four aces and drawing no clubs individually are both greater than zero This indicates a dependency between the two variables, highlighting the complex interactions within card probabilities.

1.3 Integer-Valued and Discrete Random Variables 13

Expectation and Moments

A random variable can assume various values at different times, prompting interest in its average value However, calculating a simple average of all possible values can be misleading, as some values may have negligible probabilities The mean value, also known as the expected value, is a more accurate representation, as it is a weighted average that considers the significance of each value based on its probability.

Definition 1.12 Let X be a discrete random variable We say that the expected value of X exists if P ijxijp.xi/ < 1, in which case the expected value is defined as

For notational convenience, we simply writeP xxp.x/instead ofP ixip.xi/:The expected value is also known as the expectation or the mean ofX.

If the set of possible values ofX is infinite, then the infinite sum P xxp.x/ can take different values on rearranging the terms of the infinite series unless

P xjxjp.x/ x/is logically more straightforward than directly calculatingP X D x/. Here is the expectation formula based on the tail CDF.

Theorem 1.9 (Tailsum Formula) LetXtake values0; 1; 2; : : : :Then

In a family planning scenario where a couple aims to have at least one child of each sex, the expected number of children they will have can be analyzed using probability Let X represent the childbirth at which they first have one child of each sex Assuming the probability of having a boy during any childbirth is p, and considering that all births are independent events, we can derive the expected number of children based on these probabilities.

P X > n/DP the firstn children are all boys or all girls/Dp n C.1p/ n :

Therefore,E.X /D2CP1 nD2Œp n C.1p/ n D2Cp 2 =.1p/C.1p/ 2 =pD

1 p.1p/1 If boys and girls are equally likely on any childbirth, then this says that a couple waiting to have a child of each sex can expect to have three children.

1.3 Integer-Valued and Discrete Random Variables 17

The expected value helps to determine the typical outcome of a random variable, but different distributions can yield the same expected value For instance, two stocks may have identical average returns, yet one could be significantly riskier due to greater variability in its returns Consequently, risk-averse investors often favor stocks with lower variability While there are various measures of risk, such as the mean absolute deviation or the probability of exceeding a certain threshold, the standard deviation remains the most widely used measure of variability in random variables.

Definition 1.14 Let a random variableXhave a finite mean The variance ofX is defined as

2 DEŒ.X/ 2 ; and the standard deviation ofXis defined asDp

Inequalities

.xC1/.xC2/ is not finitely summable, a fact from calculus BecauseE.X 2 /is infinite, butE.X / is finite, 2 DE.X 2 /ŒE.X / 2 must also be infinite.

If a collection of random variables is independent, then just like the expectation, the variance also adds up Precisely, one has the following very useful fact.

Theorem 1.10 LetX1; X2; : : : ; Xn benindependent random variables Then, Var.X1CX2C CXn/DVar.X1/CVar.X2/C CVar.Xn/:

An important corollary of this result is the following variance formula for the mean,XN, ofnindependent and identically distributed random variables.

Corollary 1.1 LetX1; X2; : : : ; Xn be independent random variables with a com- mon variance 2 /!0, asn! 1.

There is a stronger version of the weak law of large numbers, which says that in fact, with certainty,XN will converge toasn! 1 The precise mathematical statement is that

The strong law of large numbers requires that E[jX|] is finite, a condition that necessitates advanced concepts for proof, which will be explored later in the book Additionally, there are inequalities that can provide tighter bounds than Chebyshev's or Markov's inequality, contingent upon further restrictions on the distribution of the random variable X We will present three alternative inequalities that may offer improved bounds compared to those of Chebyshev and Markov.

Theorem 1.13 (a) (Cantelli’s Inequality) SupposeE.X / D ;Var.X / D 2 , assumed to be finite Then,

(b) (Paley–Zygmund Inequality) SupposeXtakes only nonnegative values, with

E.X /D;Var.X /D 2 , assumed to be finite Then, for0 < c < 1,

(c) (Alon–Spencer Inequality) SupposeXtakes only nonnegative integer values, withE.X /D;Var.X /D 2 , assumed to be finite Then,

These inequalities may be seen inRao(1973),Paley and Zygmund(1932), andAlon and Spencer(2000, p 58), respectively.

Probability inequalities represent a vast and varied field, primarily due to their utility in providing approximate solutions when exact answers are difficult or unattainable Throughout this book, we will periodically showcase and explain various inequalities The following theorem introduces fundamental inequalities derived from moments.

Theorem 1.14 (a) (Cauchy–Schwarz Inequality) Let X; Y be two random variables such thatE.X 2 /andE.Y 2 /are finite Then,

(b) (HRolder’s Inequality) LetX; Y be two random variables, and1 < p

Ngày đăng: 07/04/2024, 17:57

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN