Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 42 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
42
Dung lượng
500,68 KB
Nội dung
CONVERGENCE RATE IN THE CENTRAL
LIMIT THEOREM FOR THE
CURIE-WEISS-POTTS MODEL
HAN HAN
(HT080869E)
A THESIS SUBMITTED FOR THE DEGREE OF
MASTER OF SCIENCE
DEPARTMENT OF MATHEMATICS
NATIONAL UNIVERSITY OF SINGAPORE
i
Acknowledgements
First and foremost, it is my great honor to work under Assistant Professor Sun
Rongfeng, for he has been more than just a supervisor to me but as well as a supportive
friend; never in my life I have met another person who is so knowledgeable but yet is
extremely humble at the same time. Apart from the inspiring ideas and endless support
that Prof. Sun has given me, I would like to express my sincere thanks and heartfelt
appreciation for his patient and selfless sharing of his knowledge on probability theory
and statistical mechanics, which has tremendously enlightened me. Also, I would like to
thank him for entertaining all my impromptu visits to his office for consultation.
Many thanks to all the professors in the Mathematics department who have taught me
before. Also, special thanks to Professor Yu Shih-Hsien and Xu Xingwang for patiently
answering my questions when I attended their classes.
I would also like to take this opportunity to thank the administrative staff of the
Department of Mathematics for all their kindness in offering administrative assistant
once to me throughout my master’s study in NUS. Special mention goes to Ms. Shanthi
D/O D Devadas, Mdm. Tay Lee Lang and Mdm. Lum Yi Lei for always entertaining my
request with a smile on their face.
Last but not least, to my family and my classmates, Wang Xiaoyan, Huang Xiaofeng
and Hou Likun, thanks for all the laughter and support you have given me throughout
my master’s study. It will be a memorable chapter of my life.
Han Han
Summer 2010
Contents
Acknowledgements
i
Summary
iii
1 Introduction
1
2 The Curie-Weiss-Potts Model
4
2.1
The Curie-Weiss-Potts Model . . . . . . . . . . . . . . . . . . . . . . . . .
4
2.2
The Phase Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
3 Stein’s Method and Its Application
17
3.1
The Stein Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2
The Stein Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3
An Approximation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4
An Application of Stein’s Method . . . . . . . . . . . . . . . . . . . . . . . 23
4 Main Results
31
Bibliography
37
ii
iii
Summary
There is a long tradition in considering mean-field models in statistical mechanics.
The Curie-Weiss-Potts model is famous, since it exhibits a number of properties of real
substances, such as multiple phases, metastable states and others, explicitly. The aim of
this paper is to prove Berry-Esseen bounds for the sums of the random variables occurring
in a statistical mechanical model called the Curie-Weiss-Potts model or mean-field Potts
model. To this end, we will apply Stein’s method using exchangeable pairs.
The aim of this thesis is to calculate the convergence rate in the central limit theorem
for the Curie-Weiss-Potts model. In chapter 1, we will give an introduction to this
problem. In chapter 2, we will introduce the Curie-Weiss-Potts model, including the
Ising model and the Curie-Weiss model. Then we will give some results about the phase
transition of the Curie-Weiss-Potts model. In chapter 3, we state Stein’s method first,
then give the Stein operator and an approximation theorem. In section 4 of this chapter,
we will give an application of Stein’s method. In chapter, we will state the main result
of this thesis and prove it.
Chapter 1
Introduction
There is a long tradition in considering mean-field models in statistical mechanics.
The Curie-Weiss-Potts model is famous, since it exhibits a number of properties of real
substances, such as multiple phases, metastable states and others, explicitly. The aim of
this paper is to prove Berry-Esseen bounds for the sums of the random variables occurring
in a statistical mechanical model called the Curie-Weiss-Potts model or mean-field Potts
model. To this end, we will apply Stein’s method using exchangeable pairs.
In statistical mechanics, the Potts model, a generalization of the Ising model(1925), is
a model of interacting spins on a crystalline lattice, so we first introduce the Ising model.
The Ising model is defined on a discrete collection of variables called spins, which can
take on the value 1 or −1. The spins Si interact in pairs, with energy that has one value
when the two spins are the same, and a second value when the two spins are different.
The energy of the Ising model is defined to be:
E=−
Jij Si Sj ,
(1.1)
i=j
where the sum counts each pair of spins only once (this condition, which is often left out
according to a different convention, introduces a factor 1/2). Notice that the product of
spins is either 1 if the two spins are the same , or −1 if they are different. Jij is called
the coupling between the spins Si and Sj . Magnetic interactions seek to align spins relative to one another. Spins become effectively "randomized" when thermal fluctuation
dominates the spin-spin interaction.
1
CHAPTER 1. INTRODUCTION
2
For each pair, if
Jij > 0, the interaction is called ferromagnetic;
Jij < 0, the interaction is called antiferromagnetic;
Jij = 0, the spins are noninteracting.
The Potts model is named after Renfrey B. Potts who described the model near the
end of his 1952 Ph.D. thesis. The model was related to the "planar Potts" or "clock
model", which was suggested to him by his advisor Cyril Domb. It is sometimes known
as the Ashkin-Teller model (after Julius Ashkin and Edward Teller), as they considered
a four component version in 1943.
The Potts model consists of spins that are placed on a lattice; the lattice is usually
taken to be a two-dimensional rectangular Euclidean lattice, but is often generalized to
other dimensions or other lattices. Domb originally suggested that each spin takes one
of q possible values on the unit circle, at angles
θn =
2πn
,
q
1
n
q,
(1.2)
and the interaction Hamiltonian be given by
Hc = −Jc
cos(θsi − θsj )
(1.3)
(i,j)
with the sum running over the nearest neighbor pairs (i, j) on the lattice. The site colors
si take on values ranging from 1, · · · , q. Here, Jc is the coupling constant, determining
the interaction strength. This model is now known as the vector Potts model or the
clock model. Potts provided a solution for two dimensions, for q = 2, 3 and 4. In the
limit as q approaches infinity, this becomes the so-called XY model.
What is now known as the standard Potts model was suggested by Potts in the course
CHAPTER 1. INTRODUCTION
3
of the solution above, and uses a simpler Hamiltonian:
Hp = −Jp
δ(si , sj )
(1.4)
(i,j)
where δ(si , sj ) is the Kronecker delta, which equals one whenever si = sj and zero
otherwise.
The q = 2 standard Potts model is equivalent to the 2D Ising model and the 2-state
vector Potts model, with Jp = −2Jc . The q = 3 standard Potts model is equivalent to
the three-state vector Potts model, with Jp = −3Jc /2.
A common generalization is to introduce an external "magnetic field" term h, and
moving the parameters inside the sums and allowing them to vary across the model:
βHg = −β
Jij δ(si , sj ) −
(i,j)
hi si ,
(1.5)
i
where β = 1/kT is the inverse temperature, k the Boltzmann constant and T the temperature. The summation may run over more distant neighbors on the lattice, or may
in fact even have infinite-range.
Chapter 2
The Curie-Weiss-Potts Model
2.1
The Curie-Weiss-Potts Model
Now we introduce the Curie-Weiss-Potts model [7]. Section I Part C of Wu [14] introduces an approximation to the Potts model, obtained by replacing the nearest neighbor
interaction by a mean interaction averaged over all the sites in the model, and we call
this approximation the Curie-Weiss-Potts model. Pearce and Griffiths [10] and Kesten
and Schonmann [9] discuss two ways in which the Curie-Weiss-Potts model approximates
the nearest neighbor Potts model.
The Curie-Weiss-Potts model generalizes the Curie-Weiss model, which is a well
known mean-field approximation to the Ising model [5]. One reason for the interest in
the Curie-Weiss-Potts model is its more intricate phase transition structure; namely, a
first-order phase transition at the critical inverse temperature compared to a second-order
phase transition for the Curie-Weiss model, which we will discuss soon.
The Curie-Weiss model and the Curie-Weiss-Potts model are both defined by sequences of finite-volume Gibbs states {Pn,β , n = 1, 2, · · · }. They are probability distributions, depending on a positive parameter β, of n spin random variables that for the
first model may occupy one of two different states and for the second model may occupy
one of q different states, where q ∈ {3, 4, · · · } is fixed. The parameter β is the inverse
temperature. For β large, the spin random variables are strongly dependent while for
β small they are weakly dependent. This change in the dependence structure manifests
itself in the phase transition for each model, which may be seen probabilistically by
4
CHAPTER 2. THE CURIE-WEISS-POTTS MODEL
5
considering law of large numbers-type results.
For the Curie-Weiss model, there exists a critical value of β, denoted by βc . For
0 < β < βc , the sample mean of the spin random variables, n−1 Sn , satisfies the law of
large numbers
Pn,β {n−1 Sn ∈ dx} ⇒ δ0 (dx)
as n → ∞.
(2.1)
However, for β > βc , the law of large numbers breaks down and is replaced by the limit
1
1
Pn,β {n−1 Sn ∈ dx} ⇒ ( δm(β) + δ−m(β) )(dx)
2
2
as n → ∞,
(2.2)
where m(β) is a positive quantity. The second-order phase transition for the model
corresponds to the fact that
lim m(β) = 0,
β→βc+
lim m (β) = ∞.
β→βc+
(2.3)
At β = βc , the limit (2.1) holds.
For the Curie-Weiss-Potts model, there also exists a critical inverse temperature βc .
For 0 < β < βc , the empirical vector of the spin random variables Ln , counting the
number of spins of each type, satisfies the law of large numbers
Pn,β {
Ln
∈ dν} ⇒ δν 0 (dν)
n
as n → ∞,
(2.4)
where ν 0 denotes the constant probability vector (q −1 , q −1 , · · · , q −1 ) ∈ Rq . As in the
Curie-Weiss model, for β > βc , the law of large numbers breaks down. It is replaced by
the limit
Pn,β {
Ln
1
∈ dν} ⇒
n
q
q
δν i (β) (dν),
(2.5)
i=1
where {ν i (β), i = 1, 2, · · · , q} are q distinct probability vectors in Rq , all distinct from
ν 0 . However, in contrast to the Curie-Weiss model, the Curie-Weiss-Potts model exhibits
a first-order phase transition at β = βc , which corresponds to the fact that for i =
1, 2, · · · , q,
lim ν i (β) = ν 0 .
β→βc+
(2.6)
CHAPTER 2. THE CURIE-WEISS-POTTS MODEL
6
At β = βc , 2.4 and 2.5 are replaced by the limit
Pn,βc {
Ln
∈ dν} ⇒ λ0 δν 0 (dν) + λ
n
q
δν i (βc ) (dν),
(2.7)
i=1
where λ0 > 0, λ > 0, λ0 + qλ = 1, and ν i (βc ) = limβ→βc+ ν i (β).
The three models, Curie-Weiss-Potts, Curie-Weiss, and Ising, represent three levels
of difficulty. Their large deviation behaviors may be analyzed in terms of the three
respective levels of large deviations for i.i.d. random variables; namely, the sample mean,
the empirical vector, and the empirical field. These and related issues are discussed in
[6].
2.2
The Phase Transition
Now we state some known results about the Curie-Weiss-Potts model. Let q
3
be a fixed integer and {θi , i = 1, 2, · · · , q} be q different vectors in Rq . Let Σ denote
the set {e1 , e2 , · · · , eq }, where ei ∈ Zq , i = 1, 2, · · · , q is the vector with the ith entry 1 and the other entries 0. Let Ωn , n ∈ N denote the set of sequences {ω : ω =
(ω1 , ω2 , · · · , ωn ), each ωi ∈ Σ}. The Curie-Weiss-Potts model is defined by the sequence
of probability measure on Ωn ,
Pn,β {dω} =
1
exp[−βHn (ω)]
Zn (β)
n
ρ(dωj ).
(2.8)
j=1
In this formula, β is a positive parameter, which is the inverse temperature,
1
Hn (ω) = −
2n
n
i,j=1
1
δ(ωi , ωj ) = −
2n
n
< ωi , ωj >,
(2.9)
i,j=1
where δ(·, ·) denotes the Kronecker delta, ρ is the uniform distribution on Σ with
ρ(dωj ) =
1
q
q
δθi (dωj ),
i=1
(2.10)
CHAPTER 2. THE CURIE-WEISS-POTTS MODEL
7
and Zn (β) is the normalization
n
exp[−βHn (ω)]
Zn (β) =
Ωn
ρ(dωj ).
(2.11)
j=1
For q = 2, if we let Σ = {1, −1}, then θ1 = −1, θ2 = 1 yield a model that is equivalent
to the Curie-Weiss model.
With respect to Pn,β , let the empirical vector Ln(ω) = (Ln,1 (ω), Ln,2 (ω), · · · , Ln,q (ω))
be defined by
Ln,i (ω) =
1
n
n
δ(ωj , θi ),
i = 1, 2, · · · , q.
(2.12)
j=1
Ln(ω) takes values in the set of probability vectors
1
M = {ν ∈ R : ν = (ν1 , ν2 , · · · , νq ), each νi
q
0,
νi = 1}.
i=1
A key to the analysis of the Curie-Weiss-Potts model is the fact that
1
Hn (ω) = − n Ln (ω), Ln (ω) ,
2
(2.13)
where ·, · denotes the Rq -inner product.
The specific Gibbs free energy for the model is the quantity ψ(β) defined by the limit
−βψ(β) = lim
n→∞
1
log Zn (β).
n
(2.14)
Now we do some large deviation analysis to derive the free energy. See [5] for details.
Definition 2.2.1. A rate function I is a lower semi-continuous mapping I : Ω →
[0, ∞] such that the level set Ψα := {x : I(x)
α} is a closed subset of Ω.
Definition 2.2.2. Suppose Ω is a topological space and B is the Borel σ− field on Ω,
then a sequence of probability measures {µn } on (Ω, B) satisfies the large deviation
principle(LDP) if there exists a rate function I : Ω → [0, ∞] such that the following
hold:
(i) For all closed subsets F ⊂ Ω,
CHAPTER 2. THE CURIE-WEISS-POTTS MODEL
lim sup
n→∞
8
1
ln µn (F )
n
− inf [I(x)],
1
ln µn (G)
n
− inf [I(x)].
x∈F
(ii) For all open subsets G ⊂ Ω,
lim inf
n→∞
x∈G
Definition 2.2.3. Let µ be the probability measure for a q-dimensional random vector
X, then the logarithmic generating function for µ is defined as
Λ(λ) := log M (λ) := log E[exp λ, X ],
λ ∈ Rq .
Definition 2.2.4. The Fenchel-Legendre transform of Λ(λ), which we denote as
Λ∗ (x), is defined
Λ∗ (x) := sup { λ, x − Λ(λ)} ,
λ∈Rq
x ∈ Rq .
Varadhan’s Lemma and Cramer’s theorem are also needed, so we state them here,
but omit the proofs. See Chapter III in [8].
Lemma 2.2.5. (Varadhan’s Lemma)Let µn be a sequence of probability measures on
(Ω, B) satisfying the LDP with rate function I : Ω → [0, ∞]. Then if G : Ω → R is
continuous and bounded above, we have
lim
n→∞
1
ln
n
enG(ω) µn (dω)
= sup[G(x) − I(x))].
x∈Ω
Ω
q ∞
1
2
Theorem 2.2.6. (Cramer’s Theorem) Let {Xn }∞
n=1 = {(Xn , Xn , · · · , Xn )}n=1 be a
sequence of q-dimensional random variables, then the sequence of probability measures
{µn } for Sˆn :=
1
n
n
j=1 Xj
satisfies the LDP with convex rate function Λ∗ (·), where Λ∗ (·)
is the Fenchel-Legendre transform of the logarithmic generating function for µn .
Let ν = Xi = (ν1 , ν2 , · · · , νq ). From the above, we get
q
Λ(λ) = log E[exp λ, ν ] = log E[exp{
i=1
1
λi νi }] = log[
q
q
exp{λi }].
i=1
CHAPTER 2. THE CURIE-WEISS-POTTS MODEL
9
Hence, the rate function is
I(ν) =
sup
1
q
λ, ν − log
ν∈M
q
=
Denote H =
q
i=1 λi νi
i=1
q
eλi
i=1
+ log q
.
i=1
q
λi
i=1 e
− log
eλi
λi νi − log
sup
ν∈M
q
+ log q ,then for any 1
k
q,
eλk
∂H
= νk −
∂λk
,
q
λi
i=1 e
so we get
log νk = λk ,
and thus
q
I(ν) =
νi log νi + log q.
i=1
Recall that Zn (β) =
Ωn
exp[−βHn (ω)]
n
j=1 ρ(dωj ),
by Varadhan’s Lemma, we get
1
log Zn (β)
n
1
β ν, ν − I(ν)
= log q + sup
ν∈M 2
−βψ(β) =
lim
n→∞
= log q + sup
ν∈M
=
sup
ν∈M
1
β ν, ν −
2
1
β ν, ν −
2
q
νi log νi + log q
i=1
q
νi log(νi q) .
i=1
If we denote
1
αβ (ν) = β ν, ν −
2
q
νi log(νi q),
(2.15)
i=1
then
−βψ(β) = sup αβ (ν).
(2.16)
ν∈M
To get another representation of the formula (2.16), we need some knowledge about
the convex duality.
Let X be a real Banach space and F1 : X → R ∪ {+∞} a convex functional on X .
CHAPTER 2. THE CURIE-WEISS-POTTS MODEL
10
We assume that SF1 = {x : F1 (x) < ∞} = ∅. We say that F1 is closed if the subset
(epigraph of F1 )
E (F1 ) = {(x, u) ∈ SF1 × R : u
F1 (x)}
is closed in X × R, where SF1 is the domain of F1 . We denote by X ∗ the dual space
of X . The Legendre transformation of F1 is the function F1∗ with the domain
SF1∗ = {α ∈ X ∗ : sup [α(x) − F1 (x)] < ∞}.
x∈X
For α ∈ X ∗ , we define
F1∗ (α) = sup [α(x) − F1 (x)].
x∈X
Since F1 = +∞ on X \SF1 , we can replace X in this formular by SF1 .
Theorem 2.2.7. We suppose that F1 and F2 are closed convex functionals on X . Then
SF1 = ∅ and
sup [F1 (x) − F2 (x)] = sup [F2∗ (α) − F1∗ (α)].
x∈SF2
α∈SF ∗
2
Proof. See Appendix C in [4].
Now, by Theorem 2.2.7, we get another representation of the formula (2.16)
βψ(β) = min Gβ (u) + log q,
u∈Rq
(2.17)
where
1
Gβ (u) = β u, u − log
2
q
eβui .
(2.18)
i=1
Let φ(s) denote the function mapping s ∈ [0, 1] into Rq defined as
φ(s) = (q −1 [1 + (q − 1)s], q −1 (1 − s), · · · , q −1 (1 − s)),
(2.19)
where the last (q − 1) components all equal q −1 (1 − s).
We quote the following results from Ellis and Wang [7].
Theorem 2.2.8. Let βc = (2(q − 1)/(q − 2)) log(q − 1) and for β > 0 let s(β) be the
CHAPTER 2. THE CURIE-WEISS-POTTS MODEL
11
largest solution of the equation
s=
1 − e−βs
.
1 + (q − 1)e−βs
(2.20)
Let Kβ denote the set of global minimum points of the symmetric function Gβ (u), u ∈ Rq .
Then the following conclusions hold.
(i) The quantity s(β) is well-defined. It is positive, strictly increasing, and differentiable
in β on an open interval containing [βc , ∞), s(βc ) = (q−2)/(q−1), and limβ
(ii) Define ν 0 = φ(0) = (q −1 , q −1 , · · · , q −1 ). For β
∞ s(β)
= 1.
βc , define ν 1 (β) = φ(s(β)) and let
ν i (β), i = 1, 2, · · · , q, denote the points in Rq obtained by interchanging the first and ith
coordinates of ν1 (β). Then
Kβ =
For β
{ν 0 }
for 0 < β < βc ,
{ν 1 (β), ν 2 (β), · · · , ν q (β)}
{ν 0 , ν 1 (βc ), ν 2 (βc ), · · · , ν q (βc )}
for β > βc ,
for β = βc .
βc , the points in Kβ are all distinct. The point ν 1 (βc ) equals φ(s(βc )) =
φ((q − 2)/(q − 1)).
We denote by D2 Gβ (u) the Hessian Matrix {∂ 2 Gβ (u)/∂ui ∂uj , i, j = 1, 2, · · · , q} of
Gβ at u.
Proposition 2.2.9. For any β > 0, let ν¯ denote a global minimum point of Gβ (u).
Then D2 Gβ (¯
ν ) is positive definite.
We can calculate the matrix D2 Gβ (u) at ν 0 as follows, that is, calculate
each i, j = 1, 2, · · · , q. From Gβ (u) = 21 β u, u − log
∂Gβ (u)
∂u1
∂ 2 Gβ (u)
∂u21
= βu1 −
= β−
= β−
q
βui ,
i=1 e
for i, j = 1, we have
βeβu1
,
q
βuk
k=1 e
β 2 eβu1 ·
β 2 eβu1 (
(
q
βuk − βeβu1
k=1 e
( qk=1 eβuk )2
q
βuk − eβu1 )
k=1 e
.
q
βuk )2
k=1 e
∂ 2 Gβ (u)
∂ui ∂uj
· βeβu1
for
CHAPTER 2. THE CURIE-WEISS-POTTS MODEL
12
For i = 1, j = 2, we have
∂ 2 Gβ (u)
= βeβu1 ·
∂u1 ∂u2
(
βeβu2
q
βuk )2
k=1 e
=
β 2 eβ(u1 +u2 )
.
( qk=1 eβuk )2
For i = 1 and any j ∈ {1, 2, · · · , q}, we get
∂ 2 Gβ (u)
= βeβu1 ·
∂u1 ∂uj
(
βeβuj
β 2 eβ(u1 +uj )
=
.
q
βuk )2
( qk=1 eβuk )2
k=1 e
Similarly, for i = 2 and any j ∈ {1, 2, · · · , q},
∂Gβ (u)
∂u2
βeβu2
,
q
βuk
k=1 e
= βu2 −
∂ 2 Gβ (u)
∂u22
q
βuk − βeβu2 · βeβu2
k=1 e
( qk=1 eβuk )2
β 2 eβu2 ( qk=1 eβuk − eβu2 )
,
β−
( qk=1 eβuk )2
β 2 eβ(u2 +uj )
βeβuj
=
.
βeβu2 ·
( qk=1 eβuk )2
( qk=1 eβuk )2
= β−
=
∂ 2 Gβ (u)
∂u2 ∂uj
=
β 2 eβu2 ·
So for any i, j = 1, 2, · · · , q, we get
∂ 2 Gβ (u)
∂u2i
=
∂ 2 Gβ (u)
∂ui ∂uj
q
βuk − βeβui · βeβui
k=1 e
( qk=1 eβuk )2
β 2 eβui ( qk=1 eβuk − eβui )
β−
,
( qk=1 eβuk )2
βeβuj
β 2 eβ(ui +uj )
βeβui ·
=
if
( qk=1 eβuk )2
( qk=1 eβuk )2
= β−
=
β 2 eβui ·
At u = ν 0 = ( 1q , 1q , · · · , 1q ), we have
∂ 2 Gβ (u)
|ν 0
∂u2i
β
= β−
=
∂ 2 Gβ (u)
| 0
∂ui ∂uj ν
=
β
β 2 e q (q − 1)e q
β
(qe q )2
2
β + βq(q − β)
,
q2
β2e
2β
q
β
(qe q )2
=
β2
.
q2
i = j.
CHAPTER 2. THE CURIE-WEISS-POTTS MODEL
13
Hence the matrix D2 Gβ (u)|ν 0 is
β 2 +βq(q−β)
q2
D2 Gβ (u)
ν0
=
β2
q2
β2
β 2 +βq(q−β)
q2
q2
..
.
..
β2
q2
···
that is, a matrix with the diagonal entries
···
β2
q2
···
β2
q2
β2
q2
β 2 +βq(q−β)
q2
.
β 2 +βq(q−β)
,
q2
,
and the other entries
(2.21)
β2
.
q2
Now we give a limit theorem, which gives the law of large numbers and the breakdown
for the empirical vector Ln . It was also established in Ellis and Wang [7].
Theorem 2.2.10. (i) For 0 < β < βc ,
Pn,β {Ln ∈ dν} ⇒ δν 0 (dν)
as
n → ∞.
(ii) Define
κ1 = (det D2 Gβc (βc ))−1/2 ,
κ0 = (det D2 Gβc (ν ( 0)))−1/2 ,
λ0 = κ0 /(κ0 + qκ1 ),
λ = κ1 /(κ0 + qκ1 ).
Then for β = βc ,
q
Pn,β {Ln ∈ dν} ⇒ λ0 δν 0 (dν) + λ
δν i (βc ) (dν)
as
n → ∞.
i=1
For a non-negative semidefinite q × q matrix A, we denote by N (0, A) the multinormal distribution on Rq with mean 0 and covariance matrix A. The following result
states the central limit theorem for 0 < β < βc .
Theorem 2.2.11. For 0 < β < βc ,
√
Pn,β { n(Ln − ν 0 ) ∈ dx} ⇒ N (0, [D2 Gβ (ν 0 )]−1 − β −1 I)
as
n → ∞,
where I is the q × q identity matrix. The limiting covariance matrix is non-negative
CHAPTER 2. THE CURIE-WEISS-POTTS MODEL
14
semidefinite and has rank (q − 1).
By (2.21), we can calculate the inverse of D2 Gβ (u)
2
0
D Gβ (ν )
−1
q 2 −β
βq(q−β)
β2
− βq(q−β)
− β 2
= βq(q−β)
..
.
β2
− βq(q−β)
that is, a matrix with the diagonal entries
q 2 −β
βq(q−β)
..
.
2
β
− βq(q−β)
q 2 −β
βq(q−β) ,
ν0
:
β2
− βq(q−β)
···
β2
· · · − βq(q−β)
,
2
q −β
· · · βq(q−β)
(2.22)
2
β
and the other entries − βq(q−β)
. Hence,
we can obtain
q−1
q 2 −qβ
2
− β 2
−1
2
0
−1
D Gβ (ν )
− β I = βq(q−β)
..
.
β2
− βq(q−β)
that is, a matrix with the diagonal entries
β
− βq(q−β)
q−1
q 2 −qβ
..
···
2
β2
− βq(q−β)
.
β
− βq(q−β)
q−1
,
q 2 −qβ
2
β
· · · − βq(q−β)
···
q−1
q 2 −qβ
,
(2.23)
2
β
.
and the other entries − βq(q−β)
We sketch below the key ingredients needed to prove Theorem 2.2.11. First we recall
some lemmas involving the function
1
Gβ (u) = β u, u − log
2
q
eβui .
i=1
All the proofs are omitted here, see [7] for details.
The first lemma gives a useful lower bound on Gβ (u).
Lemma 2.2.12. For β > 0, Gβ (u) is a real analytic function of u ∈ Rq . There exists
Mβ > 0 such that
Gβ (u)
1
β u, u whenever u
4
Mβ .
The next lemma expresses the distribution of the empirical vector of Ln (ω) in terms
of Gβ (u). The spins {ωi , i = 1, 2, · · · , n} are assumed to have the joint distribution Pn,β
defined in (2.8).
Lemma 2.2.13. Let I be the q × q identity matrix. For β > 0, choose a random vector
CHAPTER 2. THE CURIE-WEISS-POTTS MODEL
15
W such that L (W ), which is the law of W , equals N (0, β −1 I) and W is independent of
{ωi , i = 1, 2, · · · , n}. Then for any points m ∈ Rq and γ ∈ R and any n = 1, 2, · · · ,
L
W
n(Ln − m)
+
n1−γ
n1/2 − γ
= exp −nGβ m +
x
nγ
exp −nGβ m +
dx
Rq
x
nγ
−1
dx
.
(2.24)
In the next lemma we give a bound on certain integrals that occur in the proofs of
the limit theorems.
¯ β = minu∈Rq Gβ (u). Then for any closed subset V
Lemma 2.2.14. For β > 0, let G
of Rq that contains no global minimum point of Gβ (u) and for any t ∈ Rq , there exists
ε > 0 such that
¯
e−nGβ (u)+
enGβ
√
n t,u
du
Ce−nε
as
n → ∞,
V
where C is a constant independent of n and V .
Lemma 2.2.15. For β > 0, let ν¯ be a global minimum point of Gβ (u), i.e. Gβ (¯
ν) =
¯ β = minu∈Rq Gβ (u). Then there exists a positive number bν¯ such that the following hold.
G
√
(i) For all x ∈ B(0, nbν¯ ) and all τ ∈ [0, 1],
√
x, D2 Gβ (¯
ν + τ x/ n)x
1
µβ x, x ,
2
where µβ > 0 denotes the minimum eigenvalue of D2 Gβ (¯
ν ).
(ii) For any t ∈ Rq , any b ∈ (0, bν¯ ], and any bounded continuous function f : Rq → R
lim e−
√
n t,¯
ν
n→∞
¯
f (u)e−nGβ (u)+
nq/2 enGβ
√
n t,u
du
B(¯
ν ,b)
¯
= lim enGβ
n→∞
√
√
√
f (¯
ν + x/ n) exp[−nGβ (¯
ν ) + x/ n + t, x ]dx
B(0, nv)
= f (¯
ν)
exp[−
Rq
1
x, D2 Gβ (¯
ν )x + t, x ]dx.
2
We now prove the central limit theorem, that is, Theorem 2.2.11.
CHAPTER 2. THE CURIE-WEISS-POTTS MODEL
16
Proof of Theorem 2.2.11: According to Lemma 2.2.13 with γ = 1/2, for each t ∈ Rq ,
exp[ t, W +
√
n(Ln − ν 0 ) ]dP
√
exp[−nGβ (ν 0 + x/ n) + t, x ]dx
=
Rq
√
exp[−nGβ (ν 0 + x/ n)]dx}−1 .
×{
Rq
¯
We multiply the numerator and denominator on the right-hand side by enGβ and write
√
√
each integral over Rq as an integral over B(0, nb0 ) and over Rq B(0, nb0 ), where
√
b0 = bν 0 is defined in Lemma 2.2.15. The change of variables x = n(u − ν 0 ) converts
√
the two integrals over Rq B(0, nb0 ) into integrals to which the bound in Lemma 2.2.14
may be applied. Using Lemma 2.2.15(ii), we see that
lim E{exp[ t, W +
n→∞
√
n(Ln − ν 0 ) ]}
1
x, D2 Gβ (ν 0 )x + t, x ]dx
2
Rq
1
×{
exp[− x, D2 Gβ (ν 0 )x ]dx}−1
2
Rq
1
= exp[ t, [D2 Gβ (ν 0 )x]−1 t ].
2
=
exp[−
Since W and Ln are independent and
E{e
t,W
} = e(1/2β)
t,t
,
we get that
√
Pn,β { n(Ln − ν 0 ) ∈ dx} ⇒ N (0, [D2 Gβ (ν 0 )]−1 − β −1 I).
Since [D2 Gβ (ν 0 )]−1 − β −1 I has a simple eigenvalue at 0 and an eigenvalue of multiplicity
q − 1 at 1/(q − β), which is positive since 0 < β < βc < q. Thus the covariance matrix
is non-negative semidefinite and has rank q − 1. The proof is complete.
Chapter 3
Stein’s Method and Its Application
Stein’s method is a way of deriving estimates of the accuracy of the approximation of
one probability distribution by another. It is used to obtain the bounds on the distance
between two probability distributions with respect to some probability metric. It was
introduced by Charles Stein, who first published it 1972([13]), to obtain a bound between
the distribution of a sum of n-dependent sequence of random variables and a standard
normal distribution in the Kolmogorov (uniform) metric and hence to prove not only a
central limit theorem, but also bounds on the rates of convergence for the given metric.
Later, his Ph.D. student Louis Chen Hsiao Yun, modified the method so as to obtain
approximation results for the Poisson distribution([2]), therefore the method is often
referred to as Stein-Chen method.
In this chapter, we will introduce Stein’s method and then give some examples for
the application. These are mostly taken from [1].
3.1
The Stein Operator
Since Stein’s method is a way of bounding the distance of two probability distributions
in a specific probability metric. To use this method, we need have the metric first. We
define the distance in the following form
d(P, Q) = sup
hdP −
hdQ = sup |Eh(W ) − Eh(Y )| .
h∈H
h∈H
17
(3.1)
CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION
18
Here, P and Q are probability measures on a measurable space X , H is a set of functions
from X to the real numbers, E is the usual expectation operator, W and Y are random
variables with distributions P and Q respectively. The set H should be large enough
so that the above definition indeed yields a metric. Important examples are the total
variation metric, where we let H consist of all the characteristic function1 of measurable
sets; the Kolmogorov (uniform) metric for probability measures on the real numbers,
where we consider all the half-line characteristic functions; and the Lipschitz (first order
Wasserstein; Kantorovich) metric, where the underlying space is itself a metric space
and we take the set H to be all Lipschitz-continuous functions with Lipschitz-constant
1.
In what follows in this section, we think of P as the distribution of a sum of dependent random variables, which we want to approximate by a much simpler and tractable
distribution Q (e.g. the standard normal distribution to obtain a central limit theorem).
Now we assume that the distribution Q is a fixed distribution; in what follows we
shall in particular consider the case when Q is the standard normal distribution, which
serves as a classical example of the application of Stein’s method.
First of all, we need an operator L(See P.62-P.64 in [1]) which acts on functions f
from X to the real numbers, and which "characterizes" the distribution Q in the sense
that the following equivalence holds:
E(Lf )(Y ) = 0 for all f
⇐⇒
Y has distribution Q.
(3.2)
We call such an operator the Stein operator. For the standard normal distribution,
1
A characteristic function is a function defined on a set X that indicates membership of an
element in a subset A ⊂ X, having the value 1 for all elements of A and the value 0 for all elements of
X not in A, that is,
1A (x) =
1
0
if x ∈ A,
if x ∈
/ A.
CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION
19
Stein’s lemma 2 exactly yields such an operator:
E(f (Y ) − Y f (Y )) = 0 for all f ∈ Cb1
⇐⇒
Y has standard normal distribution.
(3.3)
Thus we can take
(Lf )(x) = f (x) − xf (x).
(3.4)
We note that there are in general infinitely many such operators and it still remains an
open question about which one to choose. However, it seems that for many distributions
there is a particular good one, like (3.4) for the normal distribution.
There are different ways to get Stein operators. But by far the most important one
is via generators. This approach was, as already mentioned, introduced by Barbour and
G¨otze. Assume that Z = (Zt )t
0
is a (homogeneous) continuous time Markov process
taking values in X . If Z has the stationary distribution Q it is easy to see that, if L is the
generator of Z, we have E(Lf )(Y ) = 0 for a large set of functions f . Thus, generators
are natural candidates for Stein operators and this approach will also help us for later
computations.
3.2
The Stein Equation
Since saying that P is close to Q with respect to the metric d is equivalent to saying
that the difference of expectations in (3.1) is close to 0, and indeed if P = Q it is equal
to 0. Now we hope that the operator L exhibits the same behavior. It is obvious that if
P = Q, we have E(Lf )(W ) = 0 and hopefully if P ≈ Q, we have E(Lf )(W ) ≈ 0. To
make this statement rigorous we could find a function f , such that, for a given function
h,
E(Lf )(W ) = Eh(W ) − Eh(Y ),
2
(3.5)
Stein’s Lemma: Suppose X is a normally distributed random variable with expectation µ and
variance σ 2 . Further suppose g is a function for which the two expectations E(g(X)(X − µ)) and
E(g (X)) both exist. Then
E(g(X)(X − µ)) = σ 2 E(g (X)).
CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION
20
so that the behavior of the right hand side is reproduced by the operator L and f.
However, this equation is too general. We solve the more specific equation
(Lf )(x) = h(x) − Eh(Y ) for all x,
(3.6)
which is called the Stein equation(See P.63 in [1]). Replacing x by W and taking
expectation with respect to W , we are back to (3.5), which is what we effectively want.
Now all the effort is worthwhile only if the left hand side of (3.5) is easier to bound than
the right hand side. This is, surprisingly, often the case.
If Q is the standard normal distribution and by (3.4), the corresponding Stein equation is
f (x) − xf (x) = h(x) − Eh(Z) for all x,
(3.7)
which is just an ordinary differential equation.
Now we need to solve the Stein equation, the following can be found in [1]. In general,
we cannot say much about how the equation (3.6) is to be solved. However, there are
important cases, where we can.
Analytic Method: We see from (3.7) that equation (3.6) can in particular be a differential equation (if Q is concentrated on the integers, it will often turn out to be a
difference equation). As there are many methods available to treat such equations, we
can use them to solve the equation. For example, (3.7) can be easily solved explicitly:
f (x) = ex
x
2 /2
[h(s) − Eh(Y )] e−s
2 /2
ds.
(3.8)
−∞
Generator method: If L is the generator of a Markov process (Zt )t
0
as explained
before, we can give a general solution to (3.6):
∞
[E x h(Zt ) − Eh(Y )] dt,
f (x) = −
(3.9)
0
where E x denotes expectation with respect to the process Z being started in x. However,
one still has to prove that the solution (3.9) exists for all desired functions h ∈ H.
In the following, we give some properties of the solution to the Stein equation. Usu-
CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION
21
ally, one tries to give bounds on f and its derivatives (which has to be carefully defined
if X is a more complicated space) or differences in terms of h and its derivatives or
differences, that is, inequalities of the form
Dk f
Ck,l Dl h ,
for some specific k, l = 0, 1, 2, · · · (typically k
(3.10)
l − 1, respectively, depending on
l or k
the form of the Stein operator) and where often · is taken to be the supremum norm.
Here, Dk denotes the differential operator, but in discrete settings it usually refers to a
difference operator. The constants Ck,l may contain the parameters of the distribution
Q. If there are any, they are often referred to as Stein factors or magic factors.
In the case of (3.8) we can prove for the supremum norm that
π/2 h
f
min
f
min 2 h
f
2 h
∞, 4
∞, 2
h
∞
h
∞
,
,
(3.11)
∞,
where the last bound is of course only applicable if h is differentiable (or at least Lipschitzcontinuous, which, for example, is not the case if we consider the total variation metric or
the Kolmogorov metric). As the standard normal distribution has no extra parameters,
in this specific case, the constants are free of additional parameters.
Note that, up to this point, we did not make use of the random variable W . So, the
steps up to here in general have to be calculated only once for a specific combination
of distribution Q, metric d and Stein operator L. However, if we have bounds in the
general form (3.10), we usually are able to treat many probability metrics together.
Furthermore as there is often a particular ’good’ Stein operator for a distribution (e.g.,
no other operator than (3.4) has been used for the standard normal distribution up to
now), one can often just start with the next step below, if bounds of the form (3.10) are
already available (which is the case for many distributions).
CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION
3.3
22
An Approximation Theorem
We are now in a position to bound the left hand side of (3.5). As this step heavily
depends on the form of the Stein operator, we directly regard the case of the standard
normal distribution.
Now, at this point we could directly plug in our random variable W which we want
to approximate and try to find upper bounds. However, it is often fruitful to formulate
a more general theorem using only abstract properties of W . Let us consider here the
case of local dependence.
To this end, assume that W =
n
i=1 Xi
is a sum of random variables such that
E(W ) = 0 and the variance V ar(W ) = 1. Assume that, for every i = 1, 2, · · · , n, there
is a set Ai ⊂ {1, 2, · · · , n} such that Xi is independent of all random variables Xj with
j∈
/ Ai . We call this set the ’neighborhood’ of Xi . Likewise let Bi ⊂ {1, 2, · · · , n} be a
set such that all Xj with are independent of all Xk , k ∈
/ Bi . We can think of Bi as the
neighbors in the neighborhood of Xi , a second-order neighborhood, so to speak. For a
set A ⊂ {1, 2, · · · , n} define now the sum
XA :=
Xj .
j∈A
Using basically only Taylor expansion, it is possible to prove that
E(f (W ) − W f (W ))
f
∞
1
2
E|Xi XA
| + E|Xi XAi XBi \Ai | + E|Xi XAi |E|XBi | .
i
2
(3.12)
Note that, if we follow this line of argument, we can bound (3.1) only for functions
where h
is bounded because of the third inequality of (3.11) (and in fact, if h has
discontinuities, so will f ). To obtain a bound similar to (3.12) which contains only the
expressions f
∞
and f
∞,
the argument is much more involved and the result is not
as simple as (3.12); however, it can be done, see the following (P.70-P.75 in [1]).
CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION
23
Theorem 3.3.1. If W is as described above, we have for the Lipschitz metric dW that
dW (L (W ), N (0, 1))
n
2
i=1
1
2
E|Xi XA
| + E|Xi XAi XBi \Ai | + E|Xi XAi |E|XBi | .
i
2
(3.13)
Proof. Recall that the Lipschitz metric is of the form (3.1) where the functions h are
Lipschitz-continuous with Lipschitz-constant 1, thus h
1. Combining this with
(3.12) and the last bound in (3.11) proves the theorem.
Thus, roughly speaking, we have proved that, to calculate the Lipschitz-distance
between a W with local dependence structure and a standard normal distribution, we
only need to know the third moments of Xi and the size of the neighborhoods Ai and
Bi .
We can treat the case of sums of independent and identically distributed random
variables with Theorem 3.3.1. So assume now that E(Xi ) = 0, V ar(Xi ) = 1 and
W = n−1/2
Xi . We can take Ai = Bi = {i} and we obtain from Theorem 3.3.1 that
dW (L (W ), N (0, 1))
3.4
5E|X1 |3
.
n1/2
(3.14)
An Application of Stein’s Method
To give an application of Stein’s method, we introduce the Stein’s method for dependent random variables occurring in statistical mechanics. We first introduce Stein’s
method with exchangeable pairs (P.19-P.23 in [1]) for normal approximation, then give
the Berry-Esseen bounds for the classical Curie-Weiss model.
Given a random variable W , Stein’s method of exchangeable pairs is based on the
construction of another random variable W on the same probability space such that
the pair (W, W ) is exchangeable, i.e. their joint distribution is symmetric. The approach essentially uses the elementary fact that if (W, W ) is an exchangeable pair, then
E[g(W, W )] = 0 for all antisymmetric measurable functions g(x, y) such that the expec-
CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION
24
tation exists. Stein also assumed the linear regression property
E(W |W ) = (1 − λ)W
for some 0 < λ < 1. Here we develop Stein’s method by replacing the linear regression
property with
E(W |W ) = W + λψ(W ) + R(W ),
where ψ(x) will depend on a continuous distribution under consideration.
For measuring the distance of the distribution of W and the standard normal distribution (or any other distribution), we would like to bound |Eh(W ) − Φ(h)| for a class
of test functions h ∈ H, where Φ(h) :=
∞
−∞ h(z)Φ(dz)
and Φ is the standard normal
distribution function.
Theorem 3.4.1. (Berry-Esseen bounds for high temperature Curie-Weiss model)
Let ρ = 12 δ−1 + 21 δ1 and 0 < β < 1, then
sup Pn
z∈R
S
√n
n
z
− Φβ (z)
Cn−1/2 ,
(3.15)
where Φβ denote the distribution function of the normal distribution with expectation 0
and variance (1 − β)−1 , and C is a constant, depending only on β.
To prove this theorem (See Theorem 1.2 in [3] for details), we need some other
theorems and corollaries, but only Corollary 3.4.6 is useful in Chapter 4. We state them
here without the proofs. For Lemma 3.4.2, see Lemma 1.13 in [3].
Lemma 3.4.2. Let ρ = 12 δ−1 + 21 δ1 , let α be one of the global minima of maximal type
k for k
1 and strength µ of Gρ , where
Gρ (β, s) :=
βs2
− φρ (βs)
2
with
φρ (s) := log
exp(s, x)dρ(x).
CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION
25
Then for
W :=
Sn − nα
,
n1−1/2k
we have for any l ∈ N
E|W |l
const.(l).
In the following theorems, Theorem 3.4.3 is from [11], Theorem 3.4.4 is from [12],
Theorem 3.4.5, Corollary 3.4.6 and Corollary 3.4.7 are from [3].
Theorem 3.4.3. Suppose W is a random variable with E(W ) = 0 and E(W 2 ) = 1. Let
(W, W ) be an exchangeable pair, define a random variable R = R(W ) by
E(W |W ) = (1 − λ)W + R,
where λ is a number satisfying 0 < λ < 1. If moreover
|W − W |
A
for some constant A. Then one can obtain
sup |P (W
z) − Φ(z)|
z∈R
√
12
λ
E(R2 )
λ
3
(3.16)
Theorem 3.4.4. Suppose W is a random variable with E(W ) = 0 and E(W 2 )
1. Let
+ 48
2/π Aλ +
2
2/π Aλ .
var{E[(W − W )2 |W ]} + 37
(W, W ) be an exchangeable pair such that
E(W |W ) = (1 − λ)W,
with 0 < λ < 1, then for any a > 0
sup |P (W
z) − Φ(z)|
z∈R
2
1
0.41a3
2
E 1−
E[(W − W ) |W ] +
2λ
λ
1
+1.5a +
E (W − W )1{|W −W | a} .
2λ
(3.17)
CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION
If |W − W |
26
A, then the bound reduces to
sup |P (W
z) − Φ(z)|
z∈R
E 1−
1
2λ E[(W
− W )2 |W ]
2
+
0.41A3
λ
+ 1.5A.
(3.18)
Suppose X and Y are two random variables defined on a common probability space,
the Kolmogorov distance of the distributions of X and Y is defined by
dK (X, Y ) := sup |P (X
z) − P (Y
z)| .
z∈R
Theorem 3.4.5. Let (W, W ) be an exchangeable pair of real-valued random variables
such that
E(W |W ) = (1 − λ)W + R
for some random variable R = R(W ) and 0 < λ < 1. Assume that E(W 2 )
1. Let Z
be a random variable with standard normal distribution. Then for any A > 0,
dK (W, Z)
E 1−
1
2λ E[(W
− W )2 |W ]
3
+ 0.41A
+ 1.5A +
λ
If |W − W |
1
2λ E
2
√
√
+
2π
4
+ 1.5A
(W − W )2 1{|W −W
E(R2 )
λ
| A}
.
(3.19)
A for some constant A, we obtain the bound
dK (W, Z)
E 1−
1
2λ E[(W
−W
2
)2 |W ]
√
√
+
2π
4
+ 1.5A
Note that in this theorem, we have assumed E(W 2 )
E(R2 )
λ
+
0.41A3
λ
+ 1.5A.(3.20)
1. Alternatively, if we assume
that E(W 2 ) is finite. Then the third and the fourth summand of the bound 3.19 change
to
A3
λ
√
2π
+
16
E(W 2 )
4
+ 1.5AE(|W |).
In the following corollary, we discuss the Kolmogorov-distance of the distribution of
a random variable W to a random variable distributed according to N (0, σ 2 ), the normal
distribution with mean 0 and variance σ 2 .
CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION
27
Corollary 3.4.6. Let σ 2 > 0 and (W, W ) be an exchangeable pair of real-valued random
variables such that
E(W |W ) = (1 −
λ
)W + R
σ2
for some random variable R = R(W ) and with 0 < λ < 1. Assume that E(W 2 ) is finite.
Let Zσ be a random variable distributed according to N (0, σ 2 ). If |W − W |
A for a
constant A, we obtain the bound
dk (W, Zσ )
2
1
2
2λ E[(W − W ) |W ]
√
√
3
E(W 2 )
2πσ 2
+ Aλ
+
16
4
E 1−
√
√
+ σ
2π
4
+ 1.5A
+ 1.5A
E(R2 )
λ
E(W 2 ).
(3.21)
An alternative bound can be obtained comparing with a N (0, E(W 2 ))-distribution.
Corollary 3.4.7. In the situation of Corollary 3.4.6, let ZW denote the N (0, E(W 2 ))
distribution. We obtain
dk (W, ZW )
+σ 2
A3
λ
√
E(W 2 ) 2π
+ 1.5A
4
σ2
2
V ar E[(W − W )2 |W ]
+ σ2
2λ
√
E(W 2 ) 2π
E(W 2 )
+
+ σ 2 1.5A
16
4
E(W 2 ) + σ 2
E(R2 )
λ
E(W 2 ) E(R62)
.
λ
(3.22)
Let ρ = 12 δ−1 + 12 δ1 and 0 < β < 1. Then according to Theorem 3.4.1,
1
W := Wn := √
n
n
Xi
i=1
converges in distribution to a N (0, σ 2 ) with σ 2 = (1 − β)−1 . Now, we prove Theorem
3.4.1.
Proof of Theorem 3.4.1: We consider the usual construction of an exchangeable pair.
We produce a spin collection X = (Xi )i
1
via a Gibbs sampling procedure: select a coor-
dinate, say i, at random and replace Xi by Xi drawn from the conditional distribution of
the i−th coordinate given (Xj )j=i . Let I be a random variable taking values {1, 2, · · · , n}
CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION
28
with equal probability, and independent of all other random variables. Consider
X
XI
1
W := W − √ + √ I = √
n
n
n
j=I
X
Xj + √ I .
n
Hence (W, W ) is an exchangeable pair and
W −W =
XI − XI
√
.
n
Let F := σ(X1 , X2 , · · · , Xn ) be a σ-algebra generated by {X1 , X2 , · · · , Xn }. Now we
have
1 1
E[W − W |F] = √
nn
n
E[Xi − Xi |F] =
i=1
1
1 1
W−√
n
nn
n
E[Xi |F].
i=1
The conditional distribution at site i is given by
Pn (xi |(xj )j=i ) =
exp(xi βmi (x))
,
exp(βmi (x)) + exp(−βmi (x))
where
mi (x) =
1
n
i = 1, 2, · · · , n.
xj ,
j=i
It follows that
E[Xi |F] = E[Xi |(Xj )j=i ] = tanh(βmi (X)).
Now
√1
n
√1 1
nn
n
i=1 tanh(βmi (X))
=
√1 1
nn
n
i=1 (tanh(βmi (X))
tanh(βm(X)) := R1 + R2 with m(X) =
1
n
n
i=1 Xi .
− tanh(βm(X))) +
Taylor-expanding tanh(x)
= x + O(x3 ) leads to
1
1
β
R2 = √ βm(X) + √ O(m(X)3 ) = W + O
n
n
n
W3
n2
.
Hence
E[W − W |W ] =
1−β
λ
W + R = 2W + R
n
σ
3
with λ = n1 , σ 2 = (1−β)−1 and R = O( W
)−R1 . Since |W −W | = |
R2
XI −XI
√
|
n
√1
n
:= A,
we are able to apply Corollary 3.4.6. From Lemma 3.4.2 we know that for ρ being the
CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION
symmetric Bernoulli distribution and for 0 < β < 1 we have E(W 4 )
29
const.. Applying
this it follows that the fourth term in (3.21) can be bounded by
E(W 2 )
σ2
1.5A
(1 − β)const.
√
,
n
and the third summand in (3.21) can be estimated as follows:
A3
λ
√
2π
16
(1 − β) +
const.
(1 − β)
4
1
√
n
(1 − β)const..
Moreover we obtain
E|R|
E|W 3 |
n2
E|R1 | + O
.
Since tanh(x) is 1-Lipschitz, so we get
1
√ |mi (X) − m(X)|
n
|R1 |
1
n3/2
.
Hence, with Lemma 3.4.2, we obtain E|R| = O(n3/2 ), and thus, the second summand in
(3.21) can be bounded by
√
const.
1
+ 1.5 √
n
(1 − β)
4
2π
1
1
√ = O( √ ).
n
n
To bound the first summand in (3.21), we have
(W − W )2 =
X
XI2 2XI XI
−
+ I,
n
n
n
hence
2
2
E[(W − W ) |F] = − 2
n n
n
2
Xi tanh(βmi (X)),
i=1
CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION
and therefore
1−
1
E[(W − W )2 |F] =
2λ
1
n
=
1
n
n
Xi tanh(βmi (X))
i=1
n
Xi (tanh(βmi (X)) − tanh(βm(X)))
i=1
+m(X) tanh(βm(X))
= R1 + R 2 .
By Taylor-expansion we get
R2 =
β 2
W +O
n
W4
n2
by Lemma 3.4.2, we get
E|R2 | = O(n−1 ).
Since tanh(x) is 1-Lipschitz, we obtain
|R1 |
1
.
n
Hence
E|R1 + R2 | = O(n−1 ),
and the theorem is proved.
,
30
Chapter 4
Main Results
In this chapter, we use the Stein’s method to bound the convergence rate in the
central limit theorem for the total magnetization in the Curie-Weiss-Potts model for
high temperature.
Theorem 4.0.1. For the Curie-Weiss-Potts model, let Sn be the sum of the spin random
variables, where (ω1 , · · · , ωn ) has joint distribution Pn,β defined as in (2.8). If 0 < β < 1,
(1)
then for the first component Sn of the vector Sn , we have
(1)
Sn
√
n
sup Pn,β
z∈R
z
Cn−1/2 ,
− Φβ (z)
(4.1)
where Φβ denote the distribution function of the normal distribution with expectation 0
and variance
q−1
,
q 2 −qβ
and C is a constant, depending only on β.
Similarly to chapter 3.4, we will apply Corollary 3.4.6. In the rest of this chapter,
we will prove the above theorem.
Proof. Let ei ∈ Zq , i = 1, 2, · · · , q be the vector with the ith entry 1 and the other
entries 0, ωi ∈ {e1 , e2 , · · · , eq },
i = 1, 2, · · · , n as stated before. In the equation (2.9)
of chapter 2, we have given that the Hamiltonian is
Hn (ω) = −
1
2n
n
δ(ωi , ωj ) = −
i,j=1
where δ(·, ·) denotes the Kronecker delta.
31
1
2n
n
< ωi , ωj >,
i,j=1
CHAPTER 4. MAIN RESULTS
32
The Gibbs weight is
β
n
2
2
2
e−βHn (ω)
e 2n i,j=1 δ(ωi ,ωj )
eβn(s1 +s2 +···+sq )
=
=
,
Zn (β)
Zn (β)
Zn (β)
n
j=1 δ(ei , ωj ).
1
n
here Zn (β) is defined in (2.11) and si =
Now we consider the usual construction of an exchangeable pair. We produce a
spin collection ω = (ωi )i
1
via a Gibbs sampling procedure: select a coordinate, say
i, at random and replace ωi by ωi drawn from the conditional distribution of the i−th
coordinate given (ωj )j=i . Let I be a random variable taking values in {1, 2, · · · , n} with
equal probability and independent of all other random variables. For 0 < β < 1, we
denote
W := Wn =
=
1
√
n
√
n
ωi −
i=1
1
(e1 + e2 + · · · + eq )
q
1
1
1
n (s1 − )e1 + (s2 − )e2 + · · · + (sq − )eq
q
q
q
with each component
1
W (i) = √
n
n
ω i , ei −
i=1
1
q
,
1
i
q.
Now let
1
1
W := W − √ ωI + √ ωI ,
n
n
where ωI has distribution
I
P (ωI = ek {ωi , ω2 , · · · , ωn } \ {ωI }) =
with
sIk
1
=
n
n
δ(ek , ωj ).
j=1
j=I
Hence (W, W ) is an exchangeable pair and
W −W =
ωI − ωI
√
.
n
eβsk
i
i
eβs1 + · · · + eβsq
CHAPTER 4. MAIN RESULTS
33
Then
1
E[W − W |W ] = √ E[ωI − ωI |ω]
n
=
1 1
√
nn
=
1
n
=
1
n
=
1
n
n
E[ωi − ωi |ω]
i=1
√
n
n
1 1
W+
(e1 + · · · + eq ) − √
E[ωi |ω]
q
nn
i=1
√
i
i
i
n
eβs1 e1 + eβs2 e2 + · · · + eβsq eq
n
1 1
W+
(e1 + · · · + eq ) − √
i
i
i
q
nn
eβs1 + eβs2 + · · · + eβsq
i=1
√
1 eβs1 e1 + eβs2 e2 + · · · + eβsq eq
n
W+
(e1 + · · · + eq ) − √
q
n
eβs1 + eβs2 + · · · + eβsq
1 1
−√
nn
n
i
i
i
eβs1 e1 + eβs2 e2 + · · · + eβsq eq
i
−
i
i
eβs1 + eβs2 + · · · + eβsq
i=1
eβs1 e1 + eβs2 e2 + · · · + eβsq eq
.
eβs1 + eβs2 + · · · + eβsq
(4.2)
Let us denote
eβs1
eβs1 +eβs2 +···+eβsq
eβs2
eβs1 +eβs2 +···+eβsq
f (s) = f (s1 , s2 , · · · , sq ) =
,
..
.
eβsq
eβs1 +eβs2 +···+eβsq
let ∆i = βn ωi , then we can continue the calculation of the equation (4.2):
E[W − W |W ]
=
W+
1 1
−√
nn
=
√
n
(e1 + · · · + eq )
q
1
1 1
1
1 1
1
− √ f (s) − f ( , , · · · , ) + f ( , , · · · , )
q q
q
q q
q
n
1
n
1
√
n
n
i
i
i=1
i
i
eβs1 e1 + eβs2 e2 + · · · + eβsq eq
i
i
eβs1 + eβs2 + · · · + eβsq
− f (s)
1
1
1
(s1 − )e1 + (s2 − )e2 + · · · + (sq − )eq
q
q
q
1
1 1
1
1 1
− √ [f (s1 , s2 , · · · , sq ) − f ( , , · · · , )] − √
q q
q
n
nn
n
[f (s − ∆i ) − f (s)].
i=1
(4.3)
CHAPTER 4. MAIN RESULTS
34
By the Taylor expansion of f (s), we get
E[W − W |W ]
β
1
(s
−
)
s1 − 1q
1
q
q
β
1
1
1
1
1
s2 − q
q (s2 − q )
= √ . − √
+ O( 3/2 )
.
n ..
n
n
..
β
1
sq − 1q
q (sq − q )
1
s −
1 q
1
1 − βq
1
s2 − q
√ . + O( 3/2 )
=
n
n
..
sq − 1q
=
1−
n
β
q
W + O(
1
n3/2
)
(4.4)
Now we still need E|W |l is uniformly bounded for all l = 1, 2, 3, 4. We can use the
Hubbard-Stratonovich transformation to prove it, just as the proof of Lemma 1.13 in [3].
So the third and the fourth terms in (3.21) can be bounded by
√1 C1 (β),
n
where C1 (β)
is a constant depending only on β.
1
According to the above calculation, we know R = O( n3/2
), so obviously, the second
summand in (3.21) can be bounded by
√1 C2 (β),
n
where C2 (β) is a constant depending
only on β.
To bound the first summand in (3.21), we obtain
W − W ,W − W
=
ω
ωI2 2ωI ωI
−
+ I.
n
n
n
CHAPTER 4. MAIN RESULTS
Let W (1) =
n
i=1 (ωi (1)
√1
n
35
− 1q ), W (1) = W (1) −
√1 ωI (1)
n
+
√1 ω (1),
n I
then we have
E[(W (1) − W (1))2 |W ]
=
=
1
E[(ωI (1) − ωI (1))2 |W ]
n
n
1
E[ωi (1) − 2ωi (1)ωi (1) + ωi (1)|W ]
n2
i=1
=
=
=
1
ns1 +
n2
1
ns1 +
n2
1
ns1 +
n2
n
i
(1 − 2ωi (1))
i=1
n
eβs1
i
i
i
eβs1 + eβs2 (1) + · · · + eβsq
(1 − 2ωi (1))
i=1
n
(1 − 2ωi (1))
i=1
eβs1
1
eβs1
+ O( )
βs
βs
q
2
n
+ e + ··· + e
1
1
1 β
+ (s1 − ) + O( )
q
q
q
n
=
1
ns1 + n (1 − 2s1 )
n2
1 β
1
1
+ (s1 − ) + O( )
q
q
q
n
=
1 2
2
− 2 +O
n q q
+O
W
√
n
2
W
√
n
,
(4.5)
and thus
1−
E
1
E[(W (1) − W (1))2 |W ]
2λ
2
2
1 1 2
1−
− 2 +O
2λ n q q
W
√
n
= E
Choose λ =
1
− 12
q
q
n
E
=
q−1
,
nq 2
then
1−
1
E[(W (1) − W (1))2 |W ]
2λ
= E
q2
O
2(q − 1)
W
√
n
+
+O
2
W2
n
.
(4.6)
2
q2
O
2(q − 1)
and therefore the first summand of (3.21) is bounded by
W2
n
2
√1 C3 (β),
n
,
(4.7)
where C3 (β) is a
constant depending only on β. Now let C = C1 + C2 + C3 , then the theorem is proved.
CHAPTER 4. MAIN RESULTS
Since E[W − W |W ] =
1− βq
n
36
1
W + O( n3/2
), so
E[W (1) − W (1)|W ]
=
=
1−
β
q
n
W (1) + O(
1
n3/2
λ
W + R,
σ2
thus we get
1−
λ
=
σ2
n
β
q
,
and therefore
σ2 =
nλ
1−
β
q
=
q−1
.
− qβ
q2
)
Bibliography
[1] A.D. Barbour and L.H.Y. Chen, An introduction to Stein’s method(Singapore University Press, Singapore, 2005).
[2] L.H.Y. Chen, Poisson approximation for dependent trials, Annals of Probability
3 (3)(1975), 534¨C545. doi:10.1214/aop/1176996359. JSTOR 2959474. MR428387
Zbl 0335.60016.0.
[3] P. Eichelsbacher and M. Löwe, Stein’s method for dependent random variables occurring in statistical mechanics, Electronic Journal of Probability, Vol. 15, Paper
no. 30, 962-988, 2010.
[4] T. Eisele and R.S. Ellis, Symmetry breaking and random waves for magnetic systems
on a circle, Z. Wahrsch. Verw. Gebiete 63 (1983) 297-348.
[5] R.S. Ellis, Entropy, Large deviations, and statistical mechanics(Springer, New York,
1985).
[6] R.S. Ellis, A unified approach to large deviations for Markov chains and applications
to statistical mechanics, in:D. Merlini, ed., Proc. wnd Internat. Ascona/Locarno
Conf. on Stochastic Processed, Physics, and Geometry, July 4-9, 1988. Lecture
Notes in Phys. (Springer, Berlin, 1989).
[7] R.S. Ellis and K.M. Wang, Limit theorems for the empirical vector of the CurieWeiss -Potts model, Stochastic Processes and their Applications 35 (1990) 59-79.
[8] F. den Hollander, Large deviations (American Mathematical Society, Providence,
Rhode Island, 2000).
37
BIBLIOGRAPHY
38
[9] H. Kesten and R.H. Schonmann, Behavior in large dimensions of the Potts model
and Heisenberg models, Reviews in Mathematical Physics 1(2,3) (1990).
[10] P.A. Pearce and R.B. Griffiths, Potts model in the many-component limit, J. Phys.
A 13 (1980) 2143-2148.
[11] Y. Rinott and V. Rotar, On coupling constructions and rates in the CLT for dependent summands with applications to the antivoter model and weighted U-statistics,
Ann. Appl. Probab. 7 (1997), no. 4,1080-1105. MR MR1484798(99g:60050).
[12] Q.-M. Shao and Z.-G. Su, The Berry-Esseen bound for charater ratios, Proc. Amer.
Math. Soc. 134 (2006), no.7, 2153-2159 (electronic). MR MR2215787 (2008j:60064).
[13] C. Stein, A bound for the error in the normal approximation to the distribution
of a sum of dependent random variables, Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability(1972), 583-602, MR402873 Zbl
0278.60026.
[14] F.Y. Wu, The Potts model, Reviews of Modern Physics, Vol. 54(1982), No.1, 235268.
[...]... O(n−1 ) Since tanh(x) is 1-Lipschitz, we obtain |R1 | 1 n Hence E|R1 + R2 | = O(n−1 ), and the theorem is proved , 30 Chapter 4 Main Results In this chapter, we use the Stein’s method to bound the convergence rate in the central limit theorem for the total magnetization in the Curie- Weiss- Potts model for high temperature Theorem 4.0.1 For the Curie- Weiss- Potts model, let Sn be the sum of the spin random... distribution in the Kolmogorov (uniform) metric and hence to prove not only a central limit theorem, but also bounds on the rates of convergence for the given metric Later, his Ph.D student Louis Chen Hsiao Yun, modified the method so as to obtain approximation results for the Poisson distribution([2]), therefore the method is often referred to as Stein-Chen method In this chapter, we will introduce Stein’s... by N (0, A) the multinormal distribution on Rq with mean 0 and covariance matrix A The following result states the central limit theorem for 0 < β < βc Theorem 2.2.11 For 0 < β < βc , √ Pn,β { n(Ln − ν 0 ) ∈ dx} ⇒ N (0, [D2 Gβ (ν 0 )]−1 − β −1 I) as n → ∞, where I is the q × q identity matrix The limiting covariance matrix is non-negative CHAPTER 2 THE CURIE- WEISS- POTTS MODEL 14 semidefinite and has... Wang [7] Theorem 2.2.8 Let βc = (2(q − 1)/(q − 2)) log(q − 1) and for β > 0 let s(β) be the CHAPTER 2 THE CURIE- WEISS- POTTS MODEL 11 largest solution of the equation s= 1 − e−βs 1 + (q − 1)e−βs (2.20) Let Kβ denote the set of global minimum points of the symmetric function Gβ (u), u ∈ Rq Then the following conclusions hold (i) The quantity s(β) is well-defined It is positive, strictly increasing, and... We now prove the central limit theorem, that is, Theorem 2.2.11 CHAPTER 2 THE CURIE- WEISS- POTTS MODEL 16 Proof of Theorem 2.2.11: According to Lemma 2.2.13 with γ = 1/2, for each t ∈ Rq , exp[ t, W + √ n(Ln − ν 0 ) ]dP √ exp[−nGβ (ν 0 + x/ n) + t, x ]dx = Rq √ exp[−nGβ (ν 0 + x/ n)]dx}−1 ×{ Rq ¯ We multiply the numerator and denominator on the right-hand side by enGβ and write √ √ each integral over... : Ω → [0, ∞] such that the following hold: (i) For all closed subsets F ⊂ Ω, CHAPTER 2 THE CURIE- WEISS- POTTS MODEL lim sup n→∞ 8 1 ln µn (F ) n − inf [I(x)], 1 ln µn (G) n − inf [I(x)] x∈F (ii) For all open subsets G ⊂ Ω, lim inf n→∞ x∈G Definition 2.2.3 Let µ be the probability measure for a q-dimensional random vector X, then the logarithmic generating function for µ is defined as Λ(λ) := log M (λ)... CHAPTER 2 THE CURIE- WEISS- POTTS MODEL 15 W such that L (W ), which is the law of W , equals N (0, β −1 I) and W is independent of {ωi , i = 1, 2, · · · , n} Then for any points m ∈ Rq and γ ∈ R and any n = 1, 2, · · · , L W n(Ln − m) + n1−γ n1/2 − γ = exp −nGβ m + x nγ exp −nGβ m + dx Rq x nγ −1 dx (2.24) In the next lemma we give a bound on certain integrals that occur in the proofs of the limit theorems... and then give some examples for the application These are mostly taken from [1] 3.1 The Stein Operator Since Stein’s method is a way of bounding the distance of two probability distributions in a specific probability metric To use this method, we need have the metric first We define the distance in the following form d(P, Q) = sup hdP − hdQ = sup |Eh(W ) − Eh(Y )| h∈H h∈H 17 (3.1) CHAPTER 3 STEIN’S... we obtain from Theorem 3.3.1 that dW (L (W ), N (0, 1)) 3.4 5E|X1 |3 n1/2 (3.14) An Application of Stein’s Method To give an application of Stein’s method, we introduce the Stein’s method for dependent random variables occurring in statistical mechanics We first introduce Stein’s method with exchangeable pairs (P.19-P.23 in [1]) for normal approximation, then give the Berry-Esseen bounds for the classical... global minima of maximal type k for k 1 and strength µ of Gρ , where Gρ (β, s) := βs2 − φρ (βs) 2 with φρ (s) := log exp(s, x)dρ(x) CHAPTER 3 STEIN’S METHOD AND ITS APPLICATION 25 Then for W := Sn − nα , n1−1/2k we have for any l ∈ N E|W |l const.(l) In the following theorems, Theorem 3.4.3 is from [11], Theorem 3.4.4 is from [12], Theorem 3.4.5, Corollary 3.4.6 and Corollary 3.4.7 are from [3] Theorem ... convergence rate in the central limit theorem for the total magnetization in the Curie-Weiss-Potts model for high temperature Theorem 4.0.1 For the Curie-Weiss-Potts model, let Sn be the sum of the spin... pairs The aim of this thesis is to calculate the convergence rate in the central limit theorem for the Curie-Weiss-Potts model In chapter 1, we will give an introduction to this problem In chapter... will introduce the Curie-Weiss-Potts model, including the Ising model and the Curie-Weiss model Then we will give some results about the phase transition of the Curie-Weiss-Potts model In chapter