probability, random processes and ergodic properties - gray

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	218
Dung lượng	0,97 MB

Nội dung

Probability, Random Processes, and Ergodic Properties November 3, 2001 ii Probability, Random Processes, and Ergodic Properties Robert M. Gray Information Systems Laboratory Department of Electrical Engineering Stanford University iv c 1987 by Springer Verlag, 2001 revision by Robert M. Gray. v This book is affectionately dedicated to Elizabeth Dubois Jordan Gray and to the memory of R. Adm. Augustine Heard Gray, U.S.N. 1888-1981 Sara Jean Dubois and William “Billy” Gray 1750-1825 vi Preface History and Goals This book has been written for several reasons, not all of which are academic. This material was for many years the first half of a book in progress on information and ergodic theory. The intent was and is to provide a reasonably self-contained advanced treatment of measure theory, probability theory, and the theory of discrete time random processes with an emphasis on general alphabets and on ergodic and stationary properties of random processes that might be neither ergodic nor stationary. The intended audience was mathematically inclined engineering graduate students and visiting scholars who had not had formal courses in measure theoretic probability. Much of the material is familiar stuff for mathematicians, but many of the topics and results have not previously appeared in books. The original project grew too large and the first part contained much that would likely bore mathematicians and discourage them from the second part. Hence I finally followed a suggestion to separate the material and split the project in two. The original justification for the present manuscript was the pragmatic one that it would be a shame to waste all the effort thus far expended. A more idealistic motivation was that the presentation had merit as filling a unique, albeit small, hole in the literature. Personal experience indicates that the intended audience rarely has the time to take a complete course in measure and probability theory in a mathematics or statistics department, at least not before they need some of the material in their research. In addition, many of the existing mathematical texts on the subject are hard for this audience to follow, and the emphasis is not well matched to engineering applications. A notable exception is Ash’s excellent text [1], which was likely influenced by his original training as an electrical engineer. Still, even that text devotes little effort to ergodic theorems, perhaps the most fundamentally important family of results for applying probability theory to real problems. In addition, there are many other special topics that are given little space (or none at all) in most texts on advanced probability and random processes. Examples of topics developed in more depth here than in most existing texts are the following: Random processes with standard alphabets We develop the theory of standard spaces as a model of quite general process alphabets. Although not as general (or abstract) as often considered by probability theorists, standard spaces have useful structural properties that simplify the proofs of some general results and yield additional results that may not hold in the more general abstract case. Examples of results holding for standard alphabets that have not been proved in the general abstract case are the Kolmogorov extension theorem, the ergodic decomposition, and the existence of regular conditional probabilities. In fact, Blackwell [6] introduced the notion of a Lusin space, a structure closely related to a standard space, in order to avoid known examples of probability spaces where the Kolmogorov extension theorem does not hold and regular conditional probabilities do not exist. Standard spaces include the vii viii PREFACE common models of finite alphabets (digital processes) and real alphabets as well as more general complete separable metric spaces (Polish spaces). Thus they include many function spaces, Euclidean vector spaces, two-dimensional image intensity rasters, etc. The basic theory of standard Borel spaces may be found in the elegant text of Parthasarathy [55], and treatments of standard spaces and the related Lusin and Suslin spaces may be found in Christensen [10], Schwartz [62], Bourbaki [7], and Cohn [12]. We here provide a different and more coding oriented development of the basic results and attempt to separate clearly the properties of standard spaces, which are useful and easy to manipulate, from the demonstrations that certain spaces are standard, which are more complicated and can be skipped. Thus, unlike in the traditional treatments, we define and study standard spaces first from a purely probability theory point of view and postpone the topological metric space considerations until later. Nonstationary and nonergodic processes We develop the theory of asymptotically mean stationary processes and the ergodic decomposition in order to model many physical processes better than can traditional stationary and ergodic processes. Both topics are virtually absent in all books on random processes, yet they are fundamental to understanding the limiting behavior of nonergodic and nonstationary processes. Both topics are considered in Krengel’s excellent book on ergodic theorems [41], but the treatment here is more detailed and in greater depth. We consider both the common two-sided processes, which are considered to have been producing outputs forever, and the more difficult one-sided processes, which better model processes that are “turned on” at some specific time and which exhibit transient behavior. Ergodic properties and theorems We develop the notion of time averages along with that of probabilistic averages to emphasize their similarity and to demonstrate many of the impli- cations of the existence of limiting sample averages. We prove the ergodic theorem theorem for the general case of asymptotically mean stationary processes. In fact, it is shown that asymptotic mean stationarity is both sufficient and necessary for the classical pointwise or almost everywhere ergodic theorem to hold. We also prove the subadditive ergodic theorem of Kingman [39], which is useful for studying the limiting behavior of certain measurements on random processes that are not simple arithmetic averages. The proofs are based on re- cent simple proofs of the ergodic theorem developed by Ornstein and Weiss [52], Katznelson and Weiss [38], Jones [37], and Shields [64]. These proofs use coding arguments reminiscent of information and communication theory rather than the traditional (and somewhat tricky) maximal ergodic theorem. We consider the interrelations of stationary and ergodic properties of processes that are stationary or ergodic with respect to block shifts, that is, processes that produce stationary or ergodic vectors rather than scalars — a topic largely developed b Nedoma [49] which plays an important role in the general versions of Shannon channel and source coding theorems. Process distance measures We develop measures of a “distance” between random processes. Such results quantify how “close” one process is to another and are useful for considering spaces of random processes. These in turn provide the means of proving the ergodic decomposition of certain functionals of random processes and of characterizing how close or different the long term behavior of distinct random processes can be expected to be. Having described the topics treated here that are lacking in most texts, we admit to the omission of many topics usually contained in advanced texts on random processes or second books on random processes for engineers. The most obvious omission is that of continuous time random processes. A variety of excuses explain this: The advent of digital systems and sampled-data systems has made discrete time processes at least equally important as continuous time processes in modeling real PREFACE ix world phenomena. The shift in emphasis from continuous time to discrete time in texts on electrical engineering systems can be verified by simply perusing modern texts. The theory of continuous time processes is inherently more difficult than that of discrete time processes. It is harder to construct the models precisely and much harder to demonstrate the existence of measurements on the models, e.g., it is usually harder to prove that limiting integrals exist than limiting sums. One can approach continuous time models via discrete time models by letting the outputs be pieces of waveforms. Thus, in a sense, discrete time systems can be used as a building block for continuous time systems. Another topic clearly absent is that of spectral theory and its applications to estimation and prediction. This omission is a matter of taste and there are many books on the subject. A further topic not given the traditional emphasis is the detailed theory of the most popular particular examples of random processes: Gaussian and Poisson processes. The emphasis of this book is on general properties of random processes rather than the specific properties of special cases. The final noticeably absent topic is martingale theory. Martingales are only briefly discussed in the treatment of conditional expectation. My excuse is again that of personal taste. In addition, this powerful theory is simply not required in the intended sequel to this book on information and ergodic theory. The book’s original goal of providing the needed machinery for a book on information and ergodic theory remains. That book will rest heavily on this book and will only quote the needed material, freeing it to focus on the information measures and their ergodic theorems and on source and channel coding theorems. In hindsight, this manuscript also serves an alternative purpose. I have been approached by engineering students who have taken a master’s level course in random processes using my book with Lee Davisson [24] and who are interested in exploring more deeply into the underlying mathematics that is often referred to, but rarely exposed. This manuscript provides such a sequel and fills in many details only hinted at in the lower level text. As a final, and perhaps less idealistic, goal, I intended in this book to provide a catalogue of many results that I have found need of in my own research together with proofs that I could follow. This is one goal wherein I can judge the success; I often find myself consulting these notes to find the conditions for some convergence result or the reasons for some required assumption or the generality of the existence of some limit. If the manuscript provides similar service for others, it will have succeeded in a more global sense. Assumed Background The book is aimed at graduate engineers and hence does not assume even an undergraduate mathematical background in functional analysis or measure theory. Hence topics from these areas are developed from scratch, although the developments and discussions often diverge from traditional treatments in mathematics texts. Some mathematical sophistication is assumed for the frequent manipulation of deltas and epsilons, and hence some background in elementary real analysis or a strong calculus knowledge is required. Acknowledgments The research in information theory that yielded many of the results and some of the new proofs for old results in this book was supported by the National Science Foundation. Portions of the research and much of the early writing were supported by a fellowship from the John Simon Guggenheim Memorial Foundation. PREFACE 1 The book benefited greatly from comments from numerous students and colleagues through many years: most notably Paul Shields, Lee Davisson, John Kieffer, Dave Neuhoff, Don Ornstein, Bob Fontana, Jim Dunham, Farivar Saadat, Mari Ostendorf, Michael Sabin, Paul Algoet, Wu Chou, Phil Chou, and Tom Lookabaugh. They should not be blamed, however, for any mistakes I have made in implementing their suggestions. I would also like to acknowledge my debt to Al Drake for introducing me to elementary probability theory and to Tom Pitcher for introducing me to measure theory. Both are extraordinary teachers. Finally, I would like to apologize to Lolly, Tim, and Lori for all the time I did not spend with them while writing this book. The New Millenium Edition After a decade and a half I am finally converting the ancient troff to LaTex in order to post a corrected and revised version of the book on the Web. I have received a few requests to do so since the book went out of print, but the electronic manuscript was lost years ago during my many migrations among computer systems and my less than thorough backup precautions. During summer 2001 a thorough search for something else in my Stanford office led to the discovery of an old data cassette, with a promising inscription. Thanks to assistance from computer wizards Charlie Orgish and Pat Burke, prehistoric equipment was found to read the cassette and the original troff files for the book were read and converted into LaTeX with some assistance from Kamal Al-Yahya’s and Christian Engel’s tr2latex program. I am still in the progress of fixing conversion errors and slowly making long planned improvements. [...]... name such as random object is used for a measurable function to include implicitly random variables (A the real line), random vectors (A a Euclidean space), and random processes (A a sequence or waveform space) We will use the term random variable in the general sense A random variable is just a function or mapping with the property that inverse images of input events determined by the random variable... = { , −2, −1, 0, 1, 2, }, in which case the random process is referred to as a two-sided random process, and the set of all nonnegative integers Z+ = {0, 1, 2, }, in which case the random process is said to be one-sided One-sided random processes will often prove to be far more difficult in theory, but they provide better models for physical random processes that must be “turned on” at some time... directly given random process Which model is better depends on the application For example, a directly given model for a random process may focus on the random process itself and not its origin and hence may be simpler to deal with If the random process is then coded or measurements are taken on the random process, then it may be better to model the encoded random process in terms of random variables... 1 PROBABILITY AND RANDOM PROCESSES 10 18 Show that for two events F and G, |1F (x) − 1G (x)| = 1F ∆G (x) 19 Let f : Ω → A be a random variable Prove the following properties of inverse images: f −1 (Gc ) = f −1 (G) f −1 ( ∞ k=1 ∞ Gk ) = c f −1 (Gk ) k=1 If G ∩ F = ∅, then f −1 (G) ∩ f −1 (F ) = ∅ 1.3 Random Processes and Dynamical Systems We now consider two mathematical models for a random process... transformation of sequences produced by the random process Thus in addition to the common random process model we shall also consider modeling random processes by dynamical systems as considered in ergodic theory 1.2 Probability Spaces and Random Variables The basic tool for describing random phenomena is probability theory The history of probability theory is long, fascinating, and rich (see, for example, Maistrov... 199 Index 204 Chapter 1 Probability and Random Processes 1.1 Introduction In this chapter we develop basic mathematical models of discrete time random processes Such processes are also called discrete time stochastic processes, information sources, and time series Physically a random process is something that produces a succession of symbols called “outputs” a random or nondeterministic manner The... I the probability space (AI , BA , m) induced in this way by a random process {Xn }n∈Z the output space or sequence space of the random process Equivalence I Since the sequence space (AI , BA , m) of a random process {Xn }n∈Z is a probability space, we can define random variables and hence also random processes on this space One simple and useful such definition is that of a sampling or coordinate or... can be useful The formulation and proof of ergodic theorems are more natural in the dynamical system context Random Processes A discrete time random process, or for our purposes simply a random process, is a sequence of random variables {Xn }n∈I or {Xn ; n ∈ I}, where I is an index set, defined on a common probability space (Ω, B, P ) We usually assume that all of the random variables share a common... notion of two random processes being the same For example, suppose we have a random process with a binary output alphabet and hence an output space made up of binary sequences We form a new directly given random process by taking each successive pair of the binary process and considering the outputs to be a sequence of quaternary symbols, that is, the original random process is coded into a new random process... is an A-valued random variable defined on (Ω, B), then the functions f T n : Ω → A defined by f T n (ω) = f (T n ω) for ω ∈ Ω will also be random variables for all n in Z+ Thus a dynamical system together with a random variable or measurable function f defines a single-sided random process {Xn }n∈Z+ by Xn (ω) = f (T n ω) If it should be true that T is invertible, that is, T is one-to-one and its inverse . Probability, Random Processes, and Ergodic Properties November 3, 2001 ii Probability, Random Processes, and Ergodic Properties Robert M. Gray Information Systems Laboratory Department. errors and slowly making long planned improvements. 2 PREFACE Contents Preface vii 1 Probability and Random Processes 5 1.1 Introduction 5 1.2 Probability Spaces and Random Variables 5 1.3 Random Processes. referred to as a two-sided random process, and the set of all nonnegative integers Z + = {0, 1, 2, }, in which case the random process is said to be one-sided. One-sided random processes will often

Ngày đăng: 31/03/2014, 16:25

Xem thêm