Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 218 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
218
Dung lượng
0,97 MB
Nội dung
Probability,
Random Processes,
and Ergodic Properties
November 3, 2001
ii
Probability,
Random Processes,
and Ergodic Properties
Robert M. Gray
Information Systems Laboratory
Department of Electrical Engineering
Stanford University
iv
c
1987 by Springer Verlag, 2001 revision by Robert M. Gray.
v
This book is affectionately dedicated to
Elizabeth Dubois Jordan Gray
and to the memory of
R. Adm. Augustine Heard Gray, U.S.N.
1888-1981
Sara Jean Dubois
and
William “Billy” Gray
1750-1825
vi
Preface
History and Goals
This book has been written for several reasons, not all of which are academic. This material was
for many years the first half of a book in progress on information andergodic theory. The intent
was and is to provide a reasonably self-contained advanced treatment of measure theory, probability
theory, and the theory of discrete time randomprocesses with an emphasis on general alphabets
and on ergodicand stationary properties of randomprocesses that might be neither ergodic nor
stationary. The intended audience was mathematically inclined engineering graduate students and
visiting scholars who had not had formal courses in measure theoretic probability. Much of the
material is familiar stuff for mathematicians, but many of the topics and results have not previously
appeared in books.
The original project grew too large and the first part contained much that would likely bore
mathematicians and discourage them from the second part. Hence I finally followed a suggestion
to separate the material and split the project in two. The original justification for the present
manuscript was the pragmatic one that it would be a shame to waste all the effort thus far expended.
A more idealistic motivation was that the presentation had merit as filling a unique, albeit small,
hole in the literature. Personal experience indicates that the intended audience rarely has the time to
take a complete course in measure and probability theory in a mathematics or statistics department,
at least not before they need some of the material in their research. In addition, many of the existing
mathematical texts on the subject are hard for this audience to follow, and the emphasis is not well
matched to engineering applications. A notable exception is Ash’s excellent text [1], which was
likely influenced by his original training as an electrical engineer. Still, even that text devotes little
effort to ergodic theorems, perhaps the most fundamentally important family of results for applying
probability theory to real problems. In addition, there are many other special topics that are given
little space (or none at all) in most texts on advanced probability andrandom processes. Examples
of topics developed in more depth here than in most existing texts are the following:
Random processes with standard alphabets We develop the theory of standard spaces as a
model of quite general process alphabets. Although not as general (or abstract) as often
considered by probability theorists, standard spaces have useful structural properties that
simplify the proofs of some general results and yield additional results that may not hold
in the more general abstract case. Examples of results holding for standard alphabets that
have not been proved in the general abstract case are the Kolmogorov extension theorem, the
ergodic decomposition, and the existence of regular conditional probabilities. In fact, Blackwell
[6] introduced the notion of a Lusin space, a structure closely related to a standard space, in
order to avoid known examples of probability spaces where the Kolmogorov extension theorem
does not hold and regular conditional probabilities do not exist. Standard spaces include the
vii
viii PREFACE
common models of finite alphabets (digital processes) and real alphabets as well as more general
complete separable metric spaces (Polish spaces). Thus they include many function spaces,
Euclidean vector spaces, two-dimensional image intensity rasters, etc. The basic theory of
standard Borel spaces may be found in the elegant text of Parthasarathy [55], and treatments
of standard spaces and the related Lusin and Suslin spaces may be found in Christensen [10],
Schwartz [62], Bourbaki [7], and Cohn [12]. We here provide a different and more coding
oriented development of the basic results and attempt to separate clearly the properties of
standard spaces, which are useful and easy to manipulate, from the demonstrations that certain
spaces are standard, which are more complicated and can be skipped. Thus, unlike in the
traditional treatments, we define and study standard spaces first from a purely probability
theory point of view and postpone the topological metric space considerations until later.
Nonstationary and nonergodic processes We develop the theory of asymptotically mean sta-
tionary processesand the ergodic decomposition in order to model many physical processes
better than can traditional stationary andergodic processes. Both topics are virtually absent
in all books on random processes, yet they are fundamental to understanding the limiting
behavior of nonergodic and nonstationary processes. Both topics are considered in Krengel’s
excellent book on ergodic theorems [41], but the treatment here is more detailed and in greater
depth. We consider both the common two-sided processes, which are considered to have been
producing outputs forever, and the more difficult one-sided processes, which better model
processes that are “turned on” at some specific time and which exhibit transient behavior.
Ergodic propertiesand theorems We develop the notion of time averages along with that of
probabilistic averages to emphasize their similarity and to demonstrate many of the impli-
cations of the existence of limiting sample averages. We prove the ergodic theorem theorem
for the general case of asymptotically mean stationary processes. In fact, it is shown that
asymptotic mean stationarity is both sufficient and necessary for the classical pointwise or
almost everywhere ergodic theorem to hold. We also prove the subadditive ergodic theorem
of Kingman [39], which is useful for studying the limiting behavior of certain measurements
on randomprocesses that are not simple arithmetic averages. The proofs are based on re-
cent simple proofs of the ergodic theorem developed by Ornstein and Weiss [52], Katznelson
and Weiss [38], Jones [37], and Shields [64]. These proofs use coding arguments reminiscent
of information and communication theory rather than the traditional (and somewhat tricky)
maximal ergodic theorem. We consider the interrelations of stationary andergodic proper-
ties of processes that are stationary or ergodic with respect to block shifts, that is, processes
that produce stationary or ergodic vectors rather than scalars — a topic largely developed b
Nedoma [49] which plays an important role in the general versions of Shannon channel and
source coding theorems.
Process distance measures We develop measures of a “distance” between random processes.
Such results quantify how “close” one process is to another and are useful for considering spaces
of random processes. These in turn provide the means of proving the ergodic decomposition
of certain functionals of randomprocessesand of characterizing how close or different the long
term behavior of distinct randomprocesses can be expected to be.
Having described the topics treated here that are lacking in most texts, we admit to the omission
of many topics usually contained in advanced texts on randomprocesses or second books on random
processes for engineers. The most obvious omission is that of continuous time random processes. A
variety of excuses explain this: The advent of digital systems and sampled-data systems has made
discrete time processes at least equally important as continuous time processes in modeling real
PREFACE ix
world phenomena. The shift in emphasis from continuous time to discrete time in texts on electrical
engineering systems can be verified by simply perusing modern texts. The theory of continuous time
processes is inherently more difficult than that of discrete time processes. It is harder to construct
the models precisely and much harder to demonstrate the existence of measurements on the models,
e.g., it is usually harder to prove that limiting integrals exist than limiting sums. One can approach
continuous time models via discrete time models by letting the outputs be pieces of waveforms.
Thus, in a sense, discrete time systems can be used as a building block for continuous time systems.
Another topic clearly absent is that of spectral theory and its applications to estimation and
prediction. This omission is a matter of taste and there are many books on the subject.
A further topic not given the traditional emphasis is the detailed theory of the most popular
particular examples of random processes: Gaussian and Poisson processes. The emphasis of this
book is on general properties of randomprocesses rather than the specific properties of special cases.
The final noticeably absent topic is martingale theory. Martingales are only briefly discussed in
the treatment of conditional expectation. My excuse is again that of personal taste. In addition,
this powerful theory is simply not required in the intended sequel to this book on information and
ergodic theory.
The book’s original goal of providing the needed machinery for a book on information and
ergodic theory remains. That book will rest heavily on this book and will only quote the needed
material, freeing it to focus on the information measures and their ergodic theorems and on source
and channel coding theorems. In hindsight, this manuscript also serves an alternative purpose. I
have been approached by engineering students who have taken a master’s level course in random
processes using my book with Lee Davisson [24] and who are interested in exploring more deeply into
the underlying mathematics that is often referred to, but rarely exposed. This manuscript provides
such a sequel and fills in many details only hinted at in the lower level text.
As a final, and perhaps less idealistic, goal, I intended in this book to provide a catalogue of
many results that I have found need of in my own research together with proofs that I could follow.
This is one goal wherein I can judge the success; I often find myself consulting these notes to find the
conditions for some convergence result or the reasons for some required assumption or the generality
of the existence of some limit. If the manuscript provides similar service for others, it will have
succeeded in a more global sense.
Assumed Background
The book is aimed at graduate engineers and hence does not assume even an undergraduate math-
ematical background in functional analysis or measure theory. Hence topics from these areas are
developed from scratch, although the developments and discussions often diverge from traditional
treatments in mathematics texts. Some mathematical sophistication is assumed for the frequent
manipulation of deltas and epsilons, and hence some background in elementary real analysis or a
strong calculus knowledge is required.
Acknowledgments
The research in information theory that yielded many of the results and some of the new proofs for
old results in this book was supported by the National Science Foundation. Portions of the research
and much of the early writing were supported by a fellowship from the John Simon Guggenheim
Memorial Foundation.
PREFACE 1
The book benefited greatly from comments from numerous students and colleagues through many
years: most notably Paul Shields, Lee Davisson, John Kieffer, Dave Neuhoff, Don Ornstein, Bob
Fontana, Jim Dunham, Farivar Saadat, Mari Ostendorf, Michael Sabin, Paul Algoet, Wu Chou, Phil
Chou, and Tom Lookabaugh. They should not be blamed, however, for any mistakes I have made
in implementing their suggestions.
I would also like to acknowledge my debt to Al Drake for introducing me to elementary probability
theory and to Tom Pitcher for introducing me to measure theory. Both are extraordinary teachers.
Finally, I would like to apologize to Lolly, Tim, and Lori for all the time I did not spend with
them while writing this book.
The New Millenium Edition
After a decade and a half I am finally converting the ancient troff to LaTex in order to post a
corrected and revised version of the book on the Web. I have received a few requests to do so
since the book went out of print, but the electronic manuscript was lost years ago during my many
migrations among computer systems and my less than thorough backup precautions. During summer
2001 a thorough search for something else in my Stanford office led to the discovery of an old data
cassette, with a promising inscription. Thanks to assistance from computer wizards Charlie Orgish
and Pat Burke, prehistoric equipment was found to read the cassette and the original troff files for
the book were read and converted into LaTeX with some assistance from Kamal Al-Yahya’s and
Christian Engel’s tr2latex program. I am still in the progress of fixing conversion errors and slowly
making long planned improvements.
[...]... name such as random object is used for a measurable function to include implicitly random variables (A the real line), random vectors (A a Euclidean space), and randomprocesses (A a sequence or waveform space) We will use the term random variable in the general sense A random variable is just a function or mapping with the property that inverse images of input events determined by the random variable... = { , −2, −1, 0, 1, 2, }, in which case the random process is referred to as a two-sided random process, and the set of all nonnegative integers Z+ = {0, 1, 2, }, in which case the random process is said to be one-sided One-sided randomprocesses will often prove to be far more difficult in theory, but they provide better models for physical randomprocesses that must be “turned on” at some time... directly given random process Which model is better depends on the application For example, a directly given model for a random process may focus on the random process itself and not its origin and hence may be simpler to deal with If the random process is then coded or measurements are taken on the random process, then it may be better to model the encoded random process in terms of random variables... 1 PROBABILITY ANDRANDOMPROCESSES 10 18 Show that for two events F and G, |1F (x) − 1G (x)| = 1F ∆G (x) 19 Let f : Ω → A be a random variable Prove the following properties of inverse images: f −1 (Gc ) = f −1 (G) f −1 ( ∞ k=1 ∞ Gk ) = c f −1 (Gk ) k=1 If G ∩ F = ∅, then f −1 (G) ∩ f −1 (F ) = ∅ 1.3 Random Processesand Dynamical Systems We now consider two mathematical models for a random process... transformation of sequences produced by the random process Thus in addition to the common random process model we shall also consider modeling randomprocesses by dynamical systems as considered in ergodic theory 1.2 Probability Spaces and Random Variables The basic tool for describing random phenomena is probability theory The history of probability theory is long, fascinating, and rich (see, for example, Maistrov... 199 Index 204 Chapter 1 Probability andRandomProcesses 1.1 Introduction In this chapter we develop basic mathematical models of discrete time randomprocesses Such processes are also called discrete time stochastic processes, information sources, and time series Physically a random process is something that produces a succession of symbols called “outputs” a random or nondeterministic manner The... I the probability space (AI , BA , m) induced in this way by a random process {Xn }n∈Z the output space or sequence space of the random process Equivalence I Since the sequence space (AI , BA , m) of a random process {Xn }n∈Z is a probability space, we can define random variables and hence also randomprocesses on this space One simple and useful such definition is that of a sampling or coordinate or... can be useful The formulation and proof of ergodic theorems are more natural in the dynamical system context RandomProcesses A discrete time random process, or for our purposes simply a random process, is a sequence of random variables {Xn }n∈I or {Xn ; n ∈ I}, where I is an index set, defined on a common probability space (Ω, B, P ) We usually assume that all of the random variables share a common... notion of two randomprocesses being the same For example, suppose we have a random process with a binary output alphabet and hence an output space made up of binary sequences We form a new directly given random process by taking each successive pair of the binary process and considering the outputs to be a sequence of quaternary symbols, that is, the original random process is coded into a new random process... is an A-valued random variable defined on (Ω, B), then the functions f T n : Ω → A defined by f T n (ω) = f (T n ω) for ω ∈ Ω will also be random variables for all n in Z+ Thus a dynamical system together with a random variable or measurable function f defines a single-sided random process {Xn }n∈Z+ by Xn (ω) = f (T n ω) If it should be true that T is invertible, that is, T is one-to-one and its inverse . Probability, Random Processes, and Ergodic Properties November 3, 2001 ii Probability, Random Processes, and Ergodic Properties Robert M. Gray Information Systems Laboratory Department. errors and slowly making long planned improvements. 2 PREFACE Contents Preface vii 1 Probability and Random Processes 5 1.1 Introduction 5 1.2 Probability Spaces and Random Variables 5 1.3 Random Processes. referred to as a two-sided random process, and the set of all nonnegative integers Z + = {0, 1, 2, }, in which case the random process is said to be one-sided. One-sided random processes will often