Convergence rate in the central limit theorem for the curie weiss potts model

CONVERGENCE RATE IN THE CENTRAL LIMIT THEOREM FOR THE CURIE-WEISS-POTTS MODEL HAN HAN (HT080869E) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS NATIONAL UNIVERSITY OF SINGAPORE i Acknowledgements First and foremost, it is my great honor to work under Assistant Professor Sun Rongfeng, for he has been more than just a supervisor to me but as well as a supportive friend; never in my life I have met another person who is so knowledgeable but yet is extremely humble at the same time. Apart from the inspiring ideas and endless support that Prof. Sun has given me, I would like to express my sincere thanks and heartfelt appreciation for his patient and selfless sharing of his knowledge on probability theory and statistical mechanics, which has tremendously enlightened me. Also, I would like to thank him for entertaining all my impromptu visits to his office for consultation. Many thanks to all the professors in the Mathematics department who have taught me before. Also, special thanks to Professor Yu Shih-Hsien and Xu Xingwang for patiently answering my questions when I attended their classes. I would also like to take this opportunity to thank the administrative staff of the Department of Mathematics for all their kindness in offering administrative assistant once to me throughout my master’s study in NUS. Special mention goes to Ms. Shanthi D/O D Devadas, Mdm. Tay Lee Lang and Mdm. Lum Yi Lei for always entertaining my request with a smile on their face. Last but not least, to my family and my classmates, Wang Xiaoyan, Huang Xiaofeng and Hou Likun, thanks for all the laughter and support you have given me throughout my master’s study. It will be a memorable chapter of my life. Han Han Summer 2010 Contents Acknowledgements i Summary iii 1 Introduction 1 2 The Curie-Weiss-Potts Model 4 2.1 The Curie-Weiss-Potts Model . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 The Phase Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Stein’s Method and Its Application 17 3.1 The Stein Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 The Stein Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 An Approximation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4 An Application of Stein’s Method . . . . . . . . . . . . . . . . . . . . . . . 23 4 Main Results 31 Bibliography 37 ii iii Summary There is a long tradition in considering mean-field models in statistical mechanics. The Curie-Weiss-Potts model is famous, since it exhibits a number of properties of real substances, such as multiple phases, metastable states and others, explicitly. The aim of this paper is to prove Berry-Esseen bounds for the sums of the random variables occurring in a statistical mechanical model called the Curie-Weiss-Potts model or mean-field Potts model. To this end, we will apply Stein’s method using exchangeable pairs. The aim of this thesis is to calculate the convergence rate in the central limit theorem for the Curie-Weiss-Potts model. In chapter 1, we will give an introduction to this problem. In chapter 2, we will introduce the Curie-Weiss-Potts model, including the Ising model and the Curie-Weiss model. Then we will give some results about the phase transition of the Curie-Weiss-Potts model. In chapter 3, we state Stein’s method first, then give the Stein operator and an approximation theorem. In section 4 of this chapter, we will give an application of Stein’s method. In chapter, we will state the main result of this thesis and prove it. Chapter 1 Introduction There is a long tradition in considering mean-field models in statistical mechanics. The Curie-Weiss-Potts model is famous, since it exhibits a number of properties of real substances, such as multiple phases, metastable states and others, explicitly. The aim of this paper is to prove Berry-Esseen bounds for the sums of the random variables occurring in a statistical mechanical model called the Curie-Weiss-Potts model or mean-field Potts model. To this end, we will apply Stein’s method using exchangeable pairs. In statistical mechanics, the Potts model, a generalization of the Ising model(1925), is a model of interacting spins on a crystalline lattice, so we first introduce the Ising model. The Ising model is defined on a discrete collection of variables called spins, which can take on the value 1 or −1. The spins Si interact in pairs, with energy that has one value when the two spins are the same, and a second value when the two spins are different. The energy of the Ising model is defined to be: E=− Jij Si Sj , (1.1) i=j where the sum counts each pair of spins only once (this condition, which is often left out according to a different convention, introduces a factor 1/2). Notice that the product of spins is either 1 if the two spins are the same , or −1 if they are different. Jij is called the coupling between the spins Si and Sj . Magnetic interactions seek to align spins relative to one another. Spins become effectively "randomized" when thermal fluctuation dominates the spin-spin interaction. 1 CHAPTER 1. INTRODUCTION 2 For each pair, if Jij > 0, the interaction is called ferromagnetic; Jij < 0, the interaction is called antiferromagnetic; Jij = 0, the spins are noninteracting. The Potts model is named after Renfrey B. Potts who described the model near the end of his 1952 Ph.D. thesis. The model was related to the "planar Potts" or "clock model", which was suggested to him by his advisor Cyril Domb. It is sometimes known as the Ashkin-Teller model (after Julius Ashkin and Edward Teller), as they considered a four component version in 1943. The Potts model consists of spins that are placed on a lattice; the lattice is usually taken to be a two-dimensional rectangular Euclidean lattice, but is often generalized to other dimensions or other lattices. Domb originally suggested that each spin takes one of q possible values on the unit circle, at angles θn = 2πn , q 1 n q, (1.2) and the interaction Hamiltonian be given by Hc = −Jc cos(θsi − θsj ) (1.3) (i,j) with the sum running over the nearest neighbor pairs (i, j) on the lattice. The site colors si take on values ranging from 1, · · · , q. Here, Jc is the coupling constant, determining the interaction strength. This model is now known as the vector Potts model or the clock model. Potts provided a solution for two dimensions, for q = 2, 3 and 4. In the limit as q approaches infinity, this becomes the so-called XY model. What is now known as the standard Potts model was suggested by Potts in the course CHAPTER 1. INTRODUCTION 3 of the solution above, and uses a simpler Hamiltonian: Hp = −Jp δ(si , sj ) (1.4) (i,j) where δ(si , sj ) is the Kronecker delta, which equals one whenever si = sj and zero otherwise. The q = 2 standard Potts model is equivalent to the 2D Ising model and the 2-state vector Potts model, with Jp = −2Jc . The q = 3 standard Potts model is equivalent to the three-state vector Potts model, with Jp = −3Jc /2. A common generalization is to introduce an external "magnetic field" term h, and moving the parameters inside the sums and allowing them to vary across the model: βHg = −β Jij δ(si , sj ) − (i,j) hi si , (1.5) i where β = 1/kT is the inverse temperature, k the Boltzmann constant and T the temperature. The summation may run over more distant neighbors on the lattice, or may in fact even have infinite-range. Chapter 2 The Curie-Weiss-Potts Model 2.1 The Curie-Weiss-Potts Model Now we introduce the Curie-Weiss-Potts model [7]. Section I Part C of Wu [14] introduces an approximation to the Potts model, obtained by replacing the nearest neighbor interaction by a mean interaction averaged over all the sites in the model, and we call this approximation the Curie-Weiss-Potts model. Pearce and Griffiths [10] and Kesten and Schonmann [9] discuss two ways in which the Curie-Weiss-Potts model approximates the nearest neighbor Potts model. The Curie-Weiss-Potts model generalizes the Curie-Weiss model, which is a well known mean-field approximation to the Ising model [5]. One reason for the interest in the Curie-Weiss-Potts model is its more intricate phase transition structure; namely, a first-order phase transition at the critical inverse temperature compared to a second-order phase transition for the Curie-Weiss model, which we will discuss soon. The Curie-Weiss model and the Curie-Weiss-Potts model are both defined by sequences of finite-volume Gibbs states {Pn,β , n = 1, 2, · · · }. They are probability distributions, depending on a positive parameter β, of n spin random variables that for the first model may occupy one of two different states and for the second model may occupy one of q different states, where q ∈ {3, 4, · · · } is fixed. The parameter β is the inverse temperature. For β large, the spin random variables are strongly dependent while for β small they are weakly dependent. This change in the dependence structure manifests itself in the phase transition for each model, which may be seen probabilistically by 4 CHAPTER 2. THE CURIE-WEISS-POTTS MODEL 5 considering law of large numbers-type results. For the Curie-Weiss model, there exists a critical value of β, denoted by βc . For 0 < β < βc , the sample mean of the spin random variables, n−1 Sn , satisfies the law of large numbers Pn,β {n−1 Sn ∈ dx} ⇒ δ0 (dx) as n → ∞. (2.1) However, for β > βc , the law of large numbers breaks down and is replaced by the limit 1 1 Pn,β {n−1 Sn ∈ dx} ⇒ ( δm(β) + δ−m(β) )(dx) 2 2 as n → ∞, (2.2) where m(β) is a positive quantity. The second-order phase transition for the model corresponds to the fact that lim m(β) = 0, β→βc+ lim m (β) = ∞. β→βc+ (2.3) At β = βc , the limit (2.1) holds. For the Curie-Weiss-Potts model, there also exists a critical inverse temperature βc . For 0 < β < βc , the empirical vector of the spin random variables Ln , counting the number of spins of each type, satisfies the law of large numbers Pn,β { Ln ∈ dν} ⇒ δν 0 (dν) n as n → ∞, (2.4) where ν 0 denotes the constant probability vector (q −1 , q −1 , · · · , q −1 ) ∈ Rq . As in the Curie-Weiss model, for β > βc , the law of large numbers breaks down. It is replaced by the limit Pn,β { Ln 1 ∈ dν} ⇒ n q q δν i (β) (dν), (2.5) i=1 where {ν i (β), i = 1, 2, · · · , q} are q distinct probability vectors in Rq , all distinct from ν 0 . However, in contrast to the Curie-Weiss model, the Curie-Weiss-Potts model exhibits a first-order phase transition at β = βc , which corresponds to the fact that for i = 1, 2, · · · , q, lim ν i (β) = ν 0 . β→βc+ (2.6) CHAPTER 2. THE CURIE-WEISS-POTTS MODEL 6 At β = βc , 2.4 and 2.5 are replaced by the limit Pn,βc { Ln ∈ dν} ⇒ λ0 δν 0 (dν) + λ n q δν i (βc ) (dν), (2.7) i=1 where λ0 > 0, λ > 0, λ0 + qλ = 1, and ν i (βc ) = limβ→βc+ ν i (β). The three models, Curie-Weiss-Potts, Curie-Weiss, and Ising, represent three levels of difficulty. Their large deviation behaviors may be analyzed in terms of the three respective levels of large deviations for i.i.d. random variables; namely, the sample mean, the empirical vector, and the empirical field. These and related issues are discussed in [6]. 2.2 The Phase Transition Now we state some known results about the Curie-Weiss-Potts model. Let q 3 be a fixed integer and {θi , i = 1, 2, · · · , q} be q different vectors in Rq . Let Σ denote the set {e1 , e2 , · · · , eq }, where ei ∈ Zq , i = 1, 2, · · · , q is the vector with the ith entry 1 and the other entries 0. Let Ωn , n ∈ N denote the set of sequences {ω : ω = (ω1 , ω2 , · · · , ωn ), each ωi ∈ Σ}. The Curie-Weiss-Potts model is defined by the sequence of probability measure on Ωn , Pn,β {dω} = 1 exp[−βHn (ω)] Zn (β) n ρ(dωj ). (2.8) j=1 In this formula, β is a positive parameter, which is the inverse temperature, 1 Hn (ω) = − 2n n i,j=1 1 δ(ωi , ωj ) = − 2n n < ωi , ωj >, (2.9) i,j=1 where δ(·, ·) denotes the Kronecker delta, ρ is the uniform distribution on Σ with ρ(dωj ) = 1 q q δθi (dωj ), i=1 (2.10) CHAPTER 2. THE CURIE-WEISS-POTTS MODEL 7 and Zn (β) is the normalization n exp[−βHn (ω)] Zn (β) = Ωn ρ(dωj ). (2.11) j=1 For q = 2, if we let Σ = {1, −1}, then θ1 = −1, θ2 = 1 yield a model that is equivalent to the Curie-Weiss model. With respect to Pn,β , let the empirical vector Ln(ω) = (Ln,1 (ω), Ln,2 (ω), · · · , Ln,q (ω)) be defined by Ln,i (ω) = 1 n n δ(ωj , θi ), i = 1, 2, · · · , q. (2.12) j=1 Ln(ω) takes values in the set of probability vectors 1 M = {ν ∈ R : ν = (ν1 , ν2 , · · · , νq ), each νi q 0, νi = 1}. i=1 A key to the analysis of the Curie-Weiss-Potts model is the fact that 1 Hn (ω) = − n Ln (ω), Ln (ω) , 2 (2.13) where ·, · denotes the Rq -inner product. The specific Gibbs free energy for the model is the quantity ψ(β) defined by the limit −βψ(β) = lim n→∞ 1 log Zn (β). n (2.14) Now we do some large deviation analysis to derive the free energy. See [5] for details. Definition 2.2.1. A rate function I is a lower semi-continuous mapping I : Ω → [0, ∞] such that the level set Ψα := {x : I(x) α} is a closed subset of Ω. Definition 2.2.2. Suppose Ω is a topological space and B is the Borel σ− field on Ω, then a sequence of probability measures {µn } on (Ω, B) satisfies the large deviation principle(LDP) if there exists a rate function I : Ω → [0, ∞] such that the following hold: (i) For all closed subsets F ⊂ Ω, CHAPTER 2. THE CURIE-WEISS-POTTS MODEL lim sup n→∞ 8 1 ln µn (F ) n − inf [I(x)], 1 ln µn (G) n − inf [I(x)]. x∈F (ii) For all open subsets G ⊂ Ω, lim inf n→∞ x∈G Definition 2.2.3. Let µ be the probability measure for a q-dimensional random vector X, then the logarithmic generating function for µ is defined as Λ(λ) := log M (λ) := log E[exp λ, X ], λ ∈ Rq . Definition 2.2.4. The Fenchel-Legendre transform of Λ(λ), which we denote as Λ∗ (x), is defined Λ∗ (x) := sup { λ, x − Λ(λ)} , λ∈Rq x ∈ Rq . Varadhan’s Lemma and Cramer’s theorem are also needed, so we state them here, but omit the proofs. See Chapter III in [8]. Lemma 2.2.5. (Varadhan’s Lemma)Let µn be a sequence of probability measures on (Ω, B) satisfying the LDP with rate function I : Ω → [0, ∞]. Then if G : Ω → R is continuous and bounded above, we have lim n→∞ 1 ln n enG(ω) µn (dω) = sup[G(x) − I(x))]. x∈Ω Ω q ∞ 1 2 Theorem 2.2.6. (Cramer’s Theorem) Let {Xn }∞ n=1 = {(Xn , Xn , · · · , Xn )}n=1 be a sequence of q-dimensional random variables, then the sequence of probability measures {µn } for Sˆn := 1 n n j=1 Xj satisfies the LDP with convex rate function Λ∗ (·), where Λ∗ (·) is the Fenchel-Legendre transform of the logarithmic generating function for µn . Let ν = Xi = (ν1 , ν2 , · · · , νq ). From the above, we get q Λ(λ) = log E[exp λ, ν ] = log E[exp{ i=1 1 λi νi }] = log[ q q exp{λi }]. i=1 CHAPTER 2. THE CURIE-WEISS-POTTS MODEL 9 Hence, the rate function is I(ν) = sup 1 q λ, ν − log ν∈M q = Denote H = q i=1 λi νi i=1 q eλi i=1 + log q . i=1 q λi i=1 e − log eλi λi νi − log sup ν∈M q + log q ,then for any 1 k q, eλk ∂H = νk − ∂λk , q λi i=1 e so we get log νk = λk , and thus q I(ν) = νi log νi + log q. i=1 Recall that Zn (β) = Ωn exp[−βHn (ω)] n j=1 ρ(dωj ), by Varadhan’s Lemma, we get 1 log Zn (β) n 1 β ν, ν − I(ν) = log q + sup ν∈M 2 −βψ(β) = lim n→∞ = log q + sup ν∈M = sup ν∈M 1 β ν, ν − 2 1 β ν, ν − 2 q νi log νi + log q i=1 q νi log(νi q) . i=1 If we denote 1 αβ (ν) = β ν, ν − 2 q νi log(νi q), (2.15) i=1 then −βψ(β) = sup αβ (ν). (2.16) ν∈M To get another representation of the formula (2.16), we need some knowledge about the convex duality. Let X be a real Banach space and F1 : X → R ∪ {+∞} a convex functional on X . CHAPTER 2. THE CURIE-WEISS-POTTS MODEL 10 We assume that SF1 = {x : F1 (x) < ∞} = ∅. We say that F1 is closed if the subset (epigraph of F1 ) E (F1 ) = {(x, u) ∈ SF1 × R : u F1 (x)} is closed in X × R, where SF1 is the domain of F1 . We denote by X ∗ the dual space of X . The Legendre transformation of F1 is the function F1∗ with the domain SF1∗ = {α ∈ X ∗ : sup [α(x) − F1 (x)] < ∞}. x∈X For α ∈ X ∗ , we define F1∗ (α) = sup [α(x) − F1 (x)]. x∈X Since F1 = +∞ on X \SF1 , we can replace X in this formular by SF1 . Theorem 2.2.7. We suppose that F1 and F2 are closed convex functionals on X . Then SF1 = ∅ and sup [F1 (x) − F2 (x)] = sup [F2∗ (α) − F1∗ (α)]. x∈SF2 α∈SF ∗ 2 Proof. See Appendix C in [4]. Now, by Theorem 2.2.7, we get another representation of the formula (2.16) βψ(β) = min Gβ (u) + log q, u∈Rq (2.17) where 1 Gβ (u) = β u, u − log 2 q eβui . (2.18) i=1 Let φ(s) denote the function mapping s ∈ [0, 1] into Rq defined as φ(s) = (q −1 [1 + (q − 1)s], q −1 (1 − s), · · · , q −1 (1 − s)), (2.19) where the last (q − 1) components all equal q −1 (1 − s). We quote the following results from Ellis and Wang [7]. Theorem 2.2.8. Let βc = (2(q − 1)/(q − 2)) log(q − 1) and for β > 0 let s(β) be the CHAPTER 2. THE CURIE-WEISS-POTTS MODEL 11 largest solution of the equation s= 1 − e−βs . 1 + (q − 1)e−βs (2.20) Let Kβ denote the set of global minimum points of the symmetric function Gβ (u), u ∈ Rq . Then the following conclusions hold. (i) The quantity s(β) is well-defined. It is positive, strictly increasing, and differentiable in β on an open interval containing [βc , ∞), s(βc ) = (q−2)/(q−1), and limβ (ii) Define ν 0 = φ(0) = (q −1 , q −1 , · · · , q −1 ). For β ∞ s(β) = 1. βc , define ν 1 (β) = φ(s(β)) and let ν i (β), i = 1, 2, · · · , q, denote the points in Rq obtained by interchanging the first and ith coordinates of ν1 (β). Then Kβ = For β     {ν 0 }     for 0 < β < βc , {ν 1 (β), ν 2 (β), · · · , ν q (β)}       {ν 0 , ν 1 (βc ), ν 2 (βc ), · · · , ν q (βc )} for β > βc , for β = βc . βc , the points in Kβ are all distinct. The point ν 1 (βc ) equals φ(s(βc )) = φ((q − 2)/(q − 1)). We denote by D2 Gβ (u) the Hessian Matrix {∂ 2 Gβ (u)/∂ui ∂uj , i, j = 1, 2, · · · , q} of Gβ at u. Proposition 2.2.9. For any β > 0, let ν¯ denote a global minimum point of Gβ (u). Then D2 Gβ (¯ ν ) is positive definite. We can calculate the matrix D2 Gβ (u) at ν 0 as follows, that is, calculate each i, j = 1, 2, · · · , q. From Gβ (u) = 21 β u, u − log ∂Gβ (u) ∂u1 ∂ 2 Gβ (u) ∂u21 = βu1 − = β− = β− q βui , i=1 e for i, j = 1, we have βeβu1 , q βuk k=1 e β 2 eβu1 · β 2 eβu1 ( ( q βuk − βeβu1 k=1 e ( qk=1 eβuk )2 q βuk − eβu1 ) k=1 e . q βuk )2 k=1 e ∂ 2 Gβ (u) ∂ui ∂uj · βeβu1 for CHAPTER 2. THE CURIE-WEISS-POTTS MODEL 12 For i = 1, j = 2, we have ∂ 2 Gβ (u) = βeβu1 · ∂u1 ∂u2 ( βeβu2 q βuk )2 k=1 e = β 2 eβ(u1 +u2 ) . ( qk=1 eβuk )2 For i = 1 and any j ∈ {1, 2, · · · , q}, we get ∂ 2 Gβ (u) = βeβu1 · ∂u1 ∂uj ( βeβuj β 2 eβ(u1 +uj ) = . q βuk )2 ( qk=1 eβuk )2 k=1 e Similarly, for i = 2 and any j ∈ {1, 2, · · · , q}, ∂Gβ (u) ∂u2 βeβu2 , q βuk k=1 e = βu2 − ∂ 2 Gβ (u) ∂u22 q βuk − βeβu2 · βeβu2 k=1 e ( qk=1 eβuk )2 β 2 eβu2 ( qk=1 eβuk − eβu2 ) , β− ( qk=1 eβuk )2 β 2 eβ(u2 +uj ) βeβuj = . βeβu2 · ( qk=1 eβuk )2 ( qk=1 eβuk )2 = β− = ∂ 2 Gβ (u) ∂u2 ∂uj = β 2 eβu2 · So for any i, j = 1, 2, · · · , q, we get ∂ 2 Gβ (u) ∂u2i = ∂ 2 Gβ (u) ∂ui ∂uj q βuk − βeβui · βeβui k=1 e ( qk=1 eβuk )2 β 2 eβui ( qk=1 eβuk − eβui ) β− , ( qk=1 eβuk )2 βeβuj β 2 eβ(ui +uj ) βeβui · = if ( qk=1 eβuk )2 ( qk=1 eβuk )2 = β− = β 2 eβui · At u = ν 0 = ( 1q , 1q , · · · , 1q ), we have ∂ 2 Gβ (u) |ν 0 ∂u2i β = β− = ∂ 2 Gβ (u) | 0 ∂ui ∂uj ν = β β 2 e q (q − 1)e q β (qe q )2 2 β + βq(q − β) , q2 β2e 2β q β (qe q )2 = β2 . q2 i = j. CHAPTER 2. THE CURIE-WEISS-POTTS MODEL 13 Hence the matrix D2 Gβ (u)|ν 0 is  β 2 +βq(q−β) q2  D2 Gβ (u) ν0    =    β2 q2 β2 β 2 +βq(q−β) q2 q2 .. . .. β2 q2 ··· that is, a matrix with the diagonal entries  ··· β2 q2 ··· β2 q2 β2 q2 β 2 +βq(q−β) q2 . β 2 +βq(q−β) , q2     ,    and the other entries (2.21) β2 . q2 Now we give a limit theorem, which gives the law of large numbers and the breakdown for the empirical vector Ln . It was also established in Ellis and Wang [7]. Theorem 2.2.10. (i) For 0 < β < βc , Pn,β {Ln ∈ dν} ⇒ δν 0 (dν) as n → ∞. (ii) Define κ1 = (det D2 Gβc (βc ))−1/2 , κ0 = (det D2 Gβc (ν ( 0)))−1/2 , λ0 = κ0 /(κ0 + qκ1 ), λ = κ1 /(κ0 + qκ1 ). Then for β = βc , q Pn,β {Ln ∈ dν} ⇒ λ0 δν 0 (dν) + λ δν i (βc ) (dν) as n → ∞. i=1 For a non-negative semidefinite q × q matrix A, we denote by N (0, A) the multinormal distribution on Rq with mean 0 and covariance matrix A. The following result states the central limit theorem for 0 < β < βc . Theorem 2.2.11. For 0 < β < βc , √ Pn,β { n(Ln − ν 0 ) ∈ dx} ⇒ N (0, [D2 Gβ (ν 0 )]−1 − β −1 I) as n → ∞, where I is the q × q identity matrix. The limiting covariance matrix is non-negative CHAPTER 2. THE CURIE-WEISS-POTTS MODEL 14 semidefinite and has rank (q − 1). By (2.21), we can calculate the inverse of D2 Gβ (u)  2 0 D Gβ (ν ) −1 q 2 −β βq(q−β) β2 − βq(q−β)   − β 2  =  βq(q−β) ..  .   β2 − βq(q−β) that is, a matrix with the diagonal entries q 2 −β βq(q−β) .. . 2 β − βq(q−β) q 2 −β βq(q−β) , ν0 :  β2 − βq(q−β)  ···  β2  · · · − βq(q−β)  ,    2 q −β · · · βq(q−β) (2.22) 2 β and the other entries − βq(q−β) . Hence, we can obtain  q−1 q 2 −qβ 2   − β 2 −1  2 0 −1 D Gβ (ν ) − β I =  βq(q−β) ..  .   β2 − βq(q−β) that is, a matrix with the diagonal entries β − βq(q−β) q−1 q 2 −qβ .. ··· 2    β2  − βq(q−β)  . β − βq(q−β) q−1 , q 2 −qβ 2 β · · · − βq(q−β) ··· q−1 q 2 −qβ ,    (2.23) 2 β . and the other entries − βq(q−β) We sketch below the key ingredients needed to prove Theorem 2.2.11. First we recall some lemmas involving the function 1 Gβ (u) = β u, u − log 2 q eβui . i=1 All the proofs are omitted here, see [7] for details. The first lemma gives a useful lower bound on Gβ (u). Lemma 2.2.12. For β > 0, Gβ (u) is a real analytic function of u ∈ Rq . There exists Mβ > 0 such that Gβ (u) 1 β u, u whenever u 4 Mβ . The next lemma expresses the distribution of the empirical vector of Ln (ω) in terms of Gβ (u). The spins {ωi , i = 1, 2, · · · , n} are assumed to have the joint distribution Pn,β defined in (2.8). Lemma 2.2.13. Let I be the q × q identity matrix. For β > 0, choose a random vector CHAPTER 2. THE CURIE-WEISS-POTTS MODEL 15 W such that L (W ), which is the law of W , equals N (0, β −1 I) and W is independent of {ωi , i = 1, 2, · · · , n}. Then for any points m ∈ Rq and γ ∈ R and any n = 1, 2, · · · , L W n(Ln − m) + n1−γ n1/2 − γ = exp −nGβ m + x nγ exp −nGβ m + dx Rq x nγ −1 dx . (2.24) In the next lemma we give a bound on certain integrals that occur in the proofs of the limit theorems. ¯ β = minu∈Rq Gβ (u). Then for any closed subset V Lemma 2.2.14. For β > 0, let G of Rq that contains no global minimum point of Gβ (u) and for any t ∈ Rq , there exists ε > 0 such that ¯ e−nGβ (u)+ enGβ √ n t,u du Ce−nε as n → ∞, V where C is a constant independent of n and V . Lemma 2.2.15. For β > 0, let ν¯ be a global minimum point of Gβ (u), i.e. Gβ (¯ ν) = ¯ β = minu∈Rq Gβ (u). Then there exists a positive number bν¯ such that the following hold. G √ (i) For all x ∈ B(0, nbν¯ ) and all τ ∈ [0, 1], √ x, D2 Gβ (¯ ν + τ x/ n)x 1 µβ x, x , 2 where µβ > 0 denotes the minimum eigenvalue of D2 Gβ (¯ ν ). (ii) For any t ∈ Rq , any b ∈ (0, bν¯ ], and any bounded continuous function f : Rq → R lim e− √ n t,¯ ν n→∞ ¯ f (u)e−nGβ (u)+ nq/2 enGβ √ n t,u du B(¯ ν ,b) ¯ = lim enGβ n→∞ √ √ √ f (¯ ν + x/ n) exp[−nGβ (¯ ν ) + x/ n + t, x ]dx B(0, nv) = f (¯ ν) exp[− Rq 1 x, D2 Gβ (¯ ν )x + t, x ]dx. 2 We now prove the central limit theorem, that is, Theorem 2.2.11. CHAPTER 2. THE CURIE-WEISS-POTTS MODEL 16 Proof of Theorem 2.2.11: According to Lemma 2.2.13 with γ = 1/2, for each t ∈ Rq , exp[ t, W + √ n(Ln − ν 0 ) ]dP √ exp[−nGβ (ν 0 + x/ n) + t, x ]dx = Rq √ exp[−nGβ (ν 0 + x/ n)]dx}−1 . ×{ Rq ¯ We multiply the numerator and denominator on the right-hand side by enGβ and write √ √ each integral over Rq as an integral over B(0, nb0 ) and over Rq B(0, nb0 ), where √ b0 = bν 0 is defined in Lemma 2.2.15. The change of variables x = n(u − ν 0 ) converts √ the two integrals over Rq B(0, nb0 ) into integrals to which the bound in Lemma 2.2.14 may be applied. Using Lemma 2.2.15(ii), we see that lim E{exp[ t, W + n→∞ √ n(Ln − ν 0 ) ]} 1 x, D2 Gβ (ν 0 )x + t, x ]dx 2 Rq 1 ×{ exp[− x, D2 Gβ (ν 0 )x ]dx}−1 2 Rq 1 = exp[ t, [D2 Gβ (ν 0 )x]−1 t ]. 2 = exp[− Since W and Ln are independent and E{e t,W } = e(1/2β) t,t , we get that √ Pn,β { n(Ln − ν 0 ) ∈ dx} ⇒ N (0, [D2 Gβ (ν 0 )]−1 − β −1 I). Since [D2 Gβ (ν 0 )]−1 − β −1 I has a simple eigenvalue at 0 and an eigenvalue of multiplicity q − 1 at 1/(q − β), which is positive since 0 < β < βc < q. Thus the covariance matrix is non-negative semidefinite and has rank q − 1. The proof is complete. Chapter 3 Stein’s Method and Its Application Stein’s method is a way of deriving estimates of the accuracy of the approximation of one probability distribution by another. It is used to obtain the bounds on the distance between two probability distributions with respect to some probability metric. It was introduced by Charles Stein, who first published it 1972([13]), to obtain a bound between the distribution of a sum of n-dependent sequence of random variables and a standard normal distribution in the Kolmogorov (uniform) metric and hence to prove not only a central limit theorem, but also bounds on the rates of convergence for the given metric. Later, his Ph.D. student Louis Chen Hsiao Yun, modified the method so as to obtain approximation results for the Poisson distribution([2]), therefore the method is often referred to as Stein-Chen method. In this chapter, we will introduce Stein’s method and then give some examples for the application. These are mostly taken from [1]. 3.1 The Stein Operator Since Stein’s method is a way of bounding the distance of two probability distributions in a specific probability metric. To use this method, we need have the metric first. We define the distance in the following form d(P, Q) = sup hdP − hdQ = sup |Eh(W ) − Eh(Y )| . h∈H h∈H 17 (3.1) CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION 18 Here, P and Q are probability measures on a measurable space X , H is a set of functions from X to the real numbers, E is the usual expectation operator, W and Y are random variables with distributions P and Q respectively. The set H should be large enough so that the above definition indeed yields a metric. Important examples are the total variation metric, where we let H consist of all the characteristic function1 of measurable sets; the Kolmogorov (uniform) metric for probability measures on the real numbers, where we consider all the half-line characteristic functions; and the Lipschitz (first order Wasserstein; Kantorovich) metric, where the underlying space is itself a metric space and we take the set H to be all Lipschitz-continuous functions with Lipschitz-constant 1. In what follows in this section, we think of P as the distribution of a sum of dependent random variables, which we want to approximate by a much simpler and tractable distribution Q (e.g. the standard normal distribution to obtain a central limit theorem). Now we assume that the distribution Q is a fixed distribution; in what follows we shall in particular consider the case when Q is the standard normal distribution, which serves as a classical example of the application of Stein’s method. First of all, we need an operator L(See P.62-P.64 in [1]) which acts on functions f from X to the real numbers, and which "characterizes" the distribution Q in the sense that the following equivalence holds: E(Lf )(Y ) = 0 for all f ⇐⇒ Y has distribution Q. (3.2) We call such an operator the Stein operator. For the standard normal distribution, 1 A characteristic function is a function defined on a set X that indicates membership of an element in a subset A ⊂ X, having the value 1 for all elements of A and the value 0 for all elements of X not in A, that is, 1A (x) = 1 0 if x ∈ A, if x ∈ / A. CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION 19 Stein’s lemma 2 exactly yields such an operator: E(f (Y ) − Y f (Y )) = 0 for all f ∈ Cb1 ⇐⇒ Y has standard normal distribution. (3.3) Thus we can take (Lf )(x) = f (x) − xf (x). (3.4) We note that there are in general infinitely many such operators and it still remains an open question about which one to choose. However, it seems that for many distributions there is a particular good one, like (3.4) for the normal distribution. There are different ways to get Stein operators. But by far the most important one is via generators. This approach was, as already mentioned, introduced by Barbour and G¨otze. Assume that Z = (Zt )t 0 is a (homogeneous) continuous time Markov process taking values in X . If Z has the stationary distribution Q it is easy to see that, if L is the generator of Z, we have E(Lf )(Y ) = 0 for a large set of functions f . Thus, generators are natural candidates for Stein operators and this approach will also help us for later computations. 3.2 The Stein Equation Since saying that P is close to Q with respect to the metric d is equivalent to saying that the difference of expectations in (3.1) is close to 0, and indeed if P = Q it is equal to 0. Now we hope that the operator L exhibits the same behavior. It is obvious that if P = Q, we have E(Lf )(W ) = 0 and hopefully if P ≈ Q, we have E(Lf )(W ) ≈ 0. To make this statement rigorous we could find a function f , such that, for a given function h, E(Lf )(W ) = Eh(W ) − Eh(Y ), 2 (3.5) Stein’s Lemma: Suppose X is a normally distributed random variable with expectation µ and variance σ 2 . Further suppose g is a function for which the two expectations E(g(X)(X − µ)) and E(g (X)) both exist. Then E(g(X)(X − µ)) = σ 2 E(g (X)). CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION 20 so that the behavior of the right hand side is reproduced by the operator L and f. However, this equation is too general. We solve the more specific equation (Lf )(x) = h(x) − Eh(Y ) for all x, (3.6) which is called the Stein equation(See P.63 in [1]). Replacing x by W and taking expectation with respect to W , we are back to (3.5), which is what we effectively want. Now all the effort is worthwhile only if the left hand side of (3.5) is easier to bound than the right hand side. This is, surprisingly, often the case. If Q is the standard normal distribution and by (3.4), the corresponding Stein equation is f (x) − xf (x) = h(x) − Eh(Z) for all x, (3.7) which is just an ordinary differential equation. Now we need to solve the Stein equation, the following can be found in [1]. In general, we cannot say much about how the equation (3.6) is to be solved. However, there are important cases, where we can. Analytic Method: We see from (3.7) that equation (3.6) can in particular be a differential equation (if Q is concentrated on the integers, it will often turn out to be a difference equation). As there are many methods available to treat such equations, we can use them to solve the equation. For example, (3.7) can be easily solved explicitly: f (x) = ex x 2 /2 [h(s) − Eh(Y )] e−s 2 /2 ds. (3.8) −∞ Generator method: If L is the generator of a Markov process (Zt )t 0 as explained before, we can give a general solution to (3.6): ∞ [E x h(Zt ) − Eh(Y )] dt, f (x) = − (3.9) 0 where E x denotes expectation with respect to the process Z being started in x. However, one still has to prove that the solution (3.9) exists for all desired functions h ∈ H. In the following, we give some properties of the solution to the Stein equation. Usu- CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION 21 ally, one tries to give bounds on f and its derivatives (which has to be carefully defined if X is a more complicated space) or differences in terms of h and its derivatives or differences, that is, inequalities of the form Dk f Ck,l Dl h , for some specific k, l = 0, 1, 2, · · · (typically k (3.10) l − 1, respectively, depending on l or k the form of the Stein operator) and where often · is taken to be the supremum norm. Here, Dk denotes the differential operator, but in discrete settings it usually refers to a difference operator. The constants Ck,l may contain the parameters of the distribution Q. If there are any, they are often referred to as Stein factors or magic factors. In the case of (3.8) we can prove for the supremum norm that π/2 h f min f min 2 h f 2 h ∞, 4 ∞, 2 h ∞ h ∞ , , (3.11) ∞, where the last bound is of course only applicable if h is differentiable (or at least Lipschitzcontinuous, which, for example, is not the case if we consider the total variation metric or the Kolmogorov metric). As the standard normal distribution has no extra parameters, in this specific case, the constants are free of additional parameters. Note that, up to this point, we did not make use of the random variable W . So, the steps up to here in general have to be calculated only once for a specific combination of distribution Q, metric d and Stein operator L. However, if we have bounds in the general form (3.10), we usually are able to treat many probability metrics together. Furthermore as there is often a particular ’good’ Stein operator for a distribution (e.g., no other operator than (3.4) has been used for the standard normal distribution up to now), one can often just start with the next step below, if bounds of the form (3.10) are already available (which is the case for many distributions). CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION 3.3 22 An Approximation Theorem We are now in a position to bound the left hand side of (3.5). As this step heavily depends on the form of the Stein operator, we directly regard the case of the standard normal distribution. Now, at this point we could directly plug in our random variable W which we want to approximate and try to find upper bounds. However, it is often fruitful to formulate a more general theorem using only abstract properties of W . Let us consider here the case of local dependence. To this end, assume that W = n i=1 Xi is a sum of random variables such that E(W ) = 0 and the variance V ar(W ) = 1. Assume that, for every i = 1, 2, · · · , n, there is a set Ai ⊂ {1, 2, · · · , n} such that Xi is independent of all random variables Xj with j∈ / Ai . We call this set the ’neighborhood’ of Xi . Likewise let Bi ⊂ {1, 2, · · · , n} be a set such that all Xj with are independent of all Xk , k ∈ / Bi . We can think of Bi as the neighbors in the neighborhood of Xi , a second-order neighborhood, so to speak. For a set A ⊂ {1, 2, · · · , n} define now the sum XA := Xj . j∈A Using basically only Taylor expansion, it is possible to prove that E(f (W ) − W f (W )) f ∞ 1 2 E|Xi XA | + E|Xi XAi XBi \Ai | + E|Xi XAi |E|XBi | . i 2 (3.12) Note that, if we follow this line of argument, we can bound (3.1) only for functions where h is bounded because of the third inequality of (3.11) (and in fact, if h has discontinuities, so will f ). To obtain a bound similar to (3.12) which contains only the expressions f ∞ and f ∞, the argument is much more involved and the result is not as simple as (3.12); however, it can be done, see the following (P.70-P.75 in [1]). CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION 23 Theorem 3.3.1. If W is as described above, we have for the Lipschitz metric dW that dW (L (W ), N (0, 1)) n 2 i=1 1 2 E|Xi XA | + E|Xi XAi XBi \Ai | + E|Xi XAi |E|XBi | . i 2 (3.13) Proof. Recall that the Lipschitz metric is of the form (3.1) where the functions h are Lipschitz-continuous with Lipschitz-constant 1, thus h 1. Combining this with (3.12) and the last bound in (3.11) proves the theorem. Thus, roughly speaking, we have proved that, to calculate the Lipschitz-distance between a W with local dependence structure and a standard normal distribution, we only need to know the third moments of Xi and the size of the neighborhoods Ai and Bi . We can treat the case of sums of independent and identically distributed random variables with Theorem 3.3.1. So assume now that E(Xi ) = 0, V ar(Xi ) = 1 and W = n−1/2 Xi . We can take Ai = Bi = {i} and we obtain from Theorem 3.3.1 that dW (L (W ), N (0, 1)) 3.4 5E|X1 |3 . n1/2 (3.14) An Application of Stein’s Method To give an application of Stein’s method, we introduce the Stein’s method for dependent random variables occurring in statistical mechanics. We first introduce Stein’s method with exchangeable pairs (P.19-P.23 in [1]) for normal approximation, then give the Berry-Esseen bounds for the classical Curie-Weiss model. Given a random variable W , Stein’s method of exchangeable pairs is based on the construction of another random variable W on the same probability space such that the pair (W, W ) is exchangeable, i.e. their joint distribution is symmetric. The approach essentially uses the elementary fact that if (W, W ) is an exchangeable pair, then E[g(W, W )] = 0 for all antisymmetric measurable functions g(x, y) such that the expec- CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION 24 tation exists. Stein also assumed the linear regression property E(W |W ) = (1 − λ)W for some 0 < λ < 1. Here we develop Stein’s method by replacing the linear regression property with E(W |W ) = W + λψ(W ) + R(W ), where ψ(x) will depend on a continuous distribution under consideration. For measuring the distance of the distribution of W and the standard normal distribution (or any other distribution), we would like to bound |Eh(W ) − Φ(h)| for a class of test functions h ∈ H, where Φ(h) := ∞ −∞ h(z)Φ(dz) and Φ is the standard normal distribution function. Theorem 3.4.1. (Berry-Esseen bounds for high temperature Curie-Weiss model) Let ρ = 12 δ−1 + 21 δ1 and 0 < β < 1, then sup Pn z∈R S √n n z − Φβ (z) Cn−1/2 , (3.15) where Φβ denote the distribution function of the normal distribution with expectation 0 and variance (1 − β)−1 , and C is a constant, depending only on β. To prove this theorem (See Theorem 1.2 in [3] for details), we need some other theorems and corollaries, but only Corollary 3.4.6 is useful in Chapter 4. We state them here without the proofs. For Lemma 3.4.2, see Lemma 1.13 in [3]. Lemma 3.4.2. Let ρ = 12 δ−1 + 21 δ1 , let α be one of the global minima of maximal type k for k 1 and strength µ of Gρ , where Gρ (β, s) := βs2 − φρ (βs) 2 with φρ (s) := log exp(s, x)dρ(x). CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION 25 Then for W := Sn − nα , n1−1/2k we have for any l ∈ N E|W |l const.(l). In the following theorems, Theorem 3.4.3 is from [11], Theorem 3.4.4 is from [12], Theorem 3.4.5, Corollary 3.4.6 and Corollary 3.4.7 are from [3]. Theorem 3.4.3. Suppose W is a random variable with E(W ) = 0 and E(W 2 ) = 1. Let (W, W ) be an exchangeable pair, define a random variable R = R(W ) by E(W |W ) = (1 − λ)W + R, where λ is a number satisfying 0 < λ < 1. If moreover |W − W | A for some constant A. Then one can obtain sup |P (W z) − Φ(z)| z∈R √ 12 λ E(R2 ) λ 3 (3.16) Theorem 3.4.4. Suppose W is a random variable with E(W ) = 0 and E(W 2 ) 1. Let + 48 2/π Aλ + 2 2/π Aλ . var{E[(W − W )2 |W ]} + 37 (W, W ) be an exchangeable pair such that E(W |W ) = (1 − λ)W, with 0 < λ < 1, then for any a > 0 sup |P (W z) − Φ(z)| z∈R 2 1 0.41a3 2 E 1− E[(W − W ) |W ] + 2λ λ 1 +1.5a + E (W − W )1{|W −W | a} . 2λ (3.17) CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION If |W − W | 26 A, then the bound reduces to sup |P (W z) − Φ(z)| z∈R E 1− 1 2λ E[(W − W )2 |W ] 2 + 0.41A3 λ + 1.5A. (3.18) Suppose X and Y are two random variables defined on a common probability space, the Kolmogorov distance of the distributions of X and Y is defined by dK (X, Y ) := sup |P (X z) − P (Y z)| . z∈R Theorem 3.4.5. Let (W, W ) be an exchangeable pair of real-valued random variables such that E(W |W ) = (1 − λ)W + R for some random variable R = R(W ) and 0 < λ < 1. Assume that E(W 2 ) 1. Let Z be a random variable with standard normal distribution. Then for any A > 0, dK (W, Z) E 1− 1 2λ E[(W − W )2 |W ] 3 + 0.41A + 1.5A + λ If |W − W | 1 2λ E 2 √ √ + 2π 4 + 1.5A (W − W )2 1{|W −W E(R2 ) λ | A} . (3.19) A for some constant A, we obtain the bound dK (W, Z) E 1− 1 2λ E[(W −W 2 )2 |W ] √ √ + 2π 4 + 1.5A Note that in this theorem, we have assumed E(W 2 ) E(R2 ) λ + 0.41A3 λ + 1.5A.(3.20) 1. Alternatively, if we assume that E(W 2 ) is finite. Then the third and the fourth summand of the bound 3.19 change to A3 λ √ 2π + 16 E(W 2 ) 4 + 1.5AE(|W |). In the following corollary, we discuss the Kolmogorov-distance of the distribution of a random variable W to a random variable distributed according to N (0, σ 2 ), the normal distribution with mean 0 and variance σ 2 . CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION 27 Corollary 3.4.6. Let σ 2 > 0 and (W, W ) be an exchangeable pair of real-valued random variables such that E(W |W ) = (1 − λ )W + R σ2 for some random variable R = R(W ) and with 0 < λ < 1. Assume that E(W 2 ) is finite. Let Zσ be a random variable distributed according to N (0, σ 2 ). If |W − W | A for a constant A, we obtain the bound dk (W, Zσ ) 2 1 2 2λ E[(W − W ) |W ] √ √ 3 E(W 2 ) 2πσ 2 + Aλ + 16 4 E 1− √ √ + σ 2π 4 + 1.5A + 1.5A E(R2 ) λ E(W 2 ). (3.21) An alternative bound can be obtained comparing with a N (0, E(W 2 ))-distribution. Corollary 3.4.7. In the situation of Corollary 3.4.6, let ZW denote the N (0, E(W 2 )) distribution. We obtain dk (W, ZW ) +σ 2 A3 λ √ E(W 2 ) 2π + 1.5A 4 σ2 2 V ar E[(W − W )2 |W ] + σ2 2λ √ E(W 2 ) 2π E(W 2 ) + + σ 2 1.5A 16 4 E(W 2 ) + σ 2 E(R2 ) λ E(W 2 ) E(R62) . λ (3.22) Let ρ = 12 δ−1 + 12 δ1 and 0 < β < 1. Then according to Theorem 3.4.1, 1 W := Wn := √ n n Xi i=1 converges in distribution to a N (0, σ 2 ) with σ 2 = (1 − β)−1 . Now, we prove Theorem 3.4.1. Proof of Theorem 3.4.1: We consider the usual construction of an exchangeable pair. We produce a spin collection X = (Xi )i 1 via a Gibbs sampling procedure: select a coordinate, say i, at random and replace Xi by Xi drawn from the conditional distribution of the i−th coordinate given (Xj )j=i . Let I be a random variable taking values {1, 2, · · · , n} CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION 28 with equal probability, and independent of all other random variables. Consider X XI 1 W := W − √ + √ I = √ n n n j=I X Xj + √ I . n Hence (W, W ) is an exchangeable pair and W −W = XI − XI √ . n Let F := σ(X1 , X2 , · · · , Xn ) be a σ-algebra generated by {X1 , X2 , · · · , Xn }. Now we have 1 1 E[W − W |F] = √ nn n E[Xi − Xi |F] = i=1 1 1 1 W−√ n nn n E[Xi |F]. i=1 The conditional distribution at site i is given by Pn (xi |(xj )j=i ) = exp(xi βmi (x)) , exp(βmi (x)) + exp(−βmi (x)) where mi (x) = 1 n i = 1, 2, · · · , n. xj , j=i It follows that E[Xi |F] = E[Xi |(Xj )j=i ] = tanh(βmi (X)). Now √1 n √1 1 nn n i=1 tanh(βmi (X)) = √1 1 nn n i=1 (tanh(βmi (X)) tanh(βm(X)) := R1 + R2 with m(X) = 1 n n i=1 Xi . − tanh(βm(X))) + Taylor-expanding tanh(x) = x + O(x3 ) leads to 1 1 β R2 = √ βm(X) + √ O(m(X)3 ) = W + O n n n W3 n2 . Hence E[W − W |W ] = 1−β λ W + R = 2W + R n σ 3 with λ = n1 , σ 2 = (1−β)−1 and R = O( W )−R1 . Since |W −W | = | R2 XI −XI √ | n √1 n := A, we are able to apply Corollary 3.4.6. From Lemma 3.4.2 we know that for ρ being the CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION symmetric Bernoulli distribution and for 0 < β < 1 we have E(W 4 ) 29 const.. Applying this it follows that the fourth term in (3.21) can be bounded by E(W 2 ) σ2 1.5A (1 − β)const. √ , n and the third summand in (3.21) can be estimated as follows: A3 λ √ 2π 16 (1 − β) + const. (1 − β) 4 1 √ n (1 − β)const.. Moreover we obtain E|R| E|W 3 | n2 E|R1 | + O . Since tanh(x) is 1-Lipschitz, so we get 1 √ |mi (X) − m(X)| n |R1 | 1 n3/2 . Hence, with Lemma 3.4.2, we obtain E|R| = O(n3/2 ), and thus, the second summand in (3.21) can be bounded by √ const. 1 + 1.5 √ n (1 − β) 4 2π 1 1 √ = O( √ ). n n To bound the first summand in (3.21), we have (W − W )2 = X XI2 2XI XI − + I, n n n hence 2 2 E[(W − W ) |F] = − 2 n n n 2 Xi tanh(βmi (X)), i=1 CHAPTER 3. STEIN’S METHOD AND ITS APPLICATION and therefore 1− 1 E[(W − W )2 |F] = 2λ 1 n = 1 n n Xi tanh(βmi (X)) i=1 n Xi (tanh(βmi (X)) − tanh(βm(X))) i=1 +m(X) tanh(βm(X)) = R1 + R 2 . By Taylor-expansion we get R2 = β 2 W +O n W4 n2 by Lemma 3.4.2, we get E|R2 | = O(n−1 ). Since tanh(x) is 1-Lipschitz, we obtain |R1 | 1 . n Hence E|R1 + R2 | = O(n−1 ), and the theorem is proved. , 30 Chapter 4 Main Results In this chapter, we use the Stein’s method to bound the convergence rate in the central limit theorem for the total magnetization in the Curie-Weiss-Potts model for high temperature. Theorem 4.0.1. For the Curie-Weiss-Potts model, let Sn be the sum of the spin random variables, where (ω1 , · · · , ωn ) has joint distribution Pn,β defined as in (2.8). If 0 < β < 1, (1) then for the first component Sn of the vector Sn , we have (1) Sn √ n sup Pn,β z∈R z Cn−1/2 , − Φβ (z) (4.1) where Φβ denote the distribution function of the normal distribution with expectation 0 and variance q−1 , q 2 −qβ and C is a constant, depending only on β. Similarly to chapter 3.4, we will apply Corollary 3.4.6. In the rest of this chapter, we will prove the above theorem. Proof. Let ei ∈ Zq , i = 1, 2, · · · , q be the vector with the ith entry 1 and the other entries 0, ωi ∈ {e1 , e2 , · · · , eq }, i = 1, 2, · · · , n as stated before. In the equation (2.9) of chapter 2, we have given that the Hamiltonian is Hn (ω) = − 1 2n n δ(ωi , ωj ) = − i,j=1 where δ(·, ·) denotes the Kronecker delta. 31 1 2n n < ωi , ωj >, i,j=1 CHAPTER 4. MAIN RESULTS 32 The Gibbs weight is β n 2 2 2 e−βHn (ω) e 2n i,j=1 δ(ωi ,ωj ) eβn(s1 +s2 +···+sq ) = = , Zn (β) Zn (β) Zn (β) n j=1 δ(ei , ωj ). 1 n here Zn (β) is defined in (2.11) and si = Now we consider the usual construction of an exchangeable pair. We produce a spin collection ω = (ωi )i 1 via a Gibbs sampling procedure: select a coordinate, say i, at random and replace ωi by ωi drawn from the conditional distribution of the i−th coordinate given (ωj )j=i . Let I be a random variable taking values in {1, 2, · · · , n} with equal probability and independent of all other random variables. For 0 < β < 1, we denote W := Wn = = 1 √ n √ n ωi − i=1 1 (e1 + e2 + · · · + eq ) q 1 1 1 n (s1 − )e1 + (s2 − )e2 + · · · + (sq − )eq q q q with each component 1 W (i) = √ n n ω i , ei − i=1 1 q , 1 i q. Now let 1 1 W := W − √ ωI + √ ωI , n n where ωI has distribution I P (ωI = ek {ωi , ω2 , · · · , ωn } \ {ωI }) = with sIk 1 = n n δ(ek , ωj ). j=1 j=I Hence (W, W ) is an exchangeable pair and W −W = ωI − ωI √ . n eβsk i i eβs1 + · · · + eβsq CHAPTER 4. MAIN RESULTS 33 Then 1 E[W − W |W ] = √ E[ωI − ωI |ω] n = 1 1 √ nn = 1 n = 1 n = 1 n n E[ωi − ωi |ω] i=1 √ n n 1 1 W+ (e1 + · · · + eq ) − √ E[ωi |ω] q nn i=1 √ i i i n eβs1 e1 + eβs2 e2 + · · · + eβsq eq n 1 1 W+ (e1 + · · · + eq ) − √ i i i q nn eβs1 + eβs2 + · · · + eβsq i=1 √ 1 eβs1 e1 + eβs2 e2 + · · · + eβsq eq n W+ (e1 + · · · + eq ) − √ q n eβs1 + eβs2 + · · · + eβsq 1 1 −√ nn n i i i eβs1 e1 + eβs2 e2 + · · · + eβsq eq i − i i eβs1 + eβs2 + · · · + eβsq i=1 eβs1 e1 + eβs2 e2 + · · · + eβsq eq . eβs1 + eβs2 + · · · + eβsq (4.2) Let us denote  eβs1  eβs1 +eβs2 +···+eβsq     eβs2    eβs1 +eβs2 +···+eβsq  f (s) = f (s1 , s2 , · · · , sq ) =  , ..   .     eβsq eβs1 +eβs2 +···+eβsq let ∆i = βn ωi , then we can continue the calculation of the equation (4.2): E[W − W |W ] = W+ 1 1 −√ nn = √ n (e1 + · · · + eq ) q 1 1 1 1 1 1 1 − √ f (s) − f ( , , · · · , ) + f ( , , · · · , ) q q q q q q n 1 n 1 √ n n i i i=1 i i eβs1 e1 + eβs2 e2 + · · · + eβsq eq i i eβs1 + eβs2 + · · · + eβsq − f (s) 1 1 1 (s1 − )e1 + (s2 − )e2 + · · · + (sq − )eq q q q 1 1 1 1 1 1 − √ [f (s1 , s2 , · · · , sq ) − f ( , , · · · , )] − √ q q q n nn n [f (s − ∆i ) − f (s)]. i=1 (4.3) CHAPTER 4. MAIN RESULTS 34 By the Taylor expansion of f (s), we get E[W − W |W ]     β 1 (s − ) s1 − 1q 1 q    q   β  1 1   1  1  1 s2 − q   q (s2 − q ) = √  . − √   + O( 3/2 ) .  n  ..  n n ..         β 1 sq − 1q q (sq − q )   1 s −  1 q   1 1 − βq  1 s2 − q  √  .  + O( 3/2 ) =   n n  ..    sq − 1q = 1− n β q W + O( 1 n3/2 ) (4.4) Now we still need E|W |l is uniformly bounded for all l = 1, 2, 3, 4. We can use the Hubbard-Stratonovich transformation to prove it, just as the proof of Lemma 1.13 in [3]. So the third and the fourth terms in (3.21) can be bounded by √1 C1 (β), n where C1 (β) is a constant depending only on β. 1 According to the above calculation, we know R = O( n3/2 ), so obviously, the second summand in (3.21) can be bounded by √1 C2 (β), n where C2 (β) is a constant depending only on β. To bound the first summand in (3.21), we obtain W − W ,W − W = ω ωI2 2ωI ωI − + I. n n n CHAPTER 4. MAIN RESULTS Let W (1) = n i=1 (ωi (1) √1 n 35 − 1q ), W (1) = W (1) − √1 ωI (1) n + √1 ω (1), n I then we have E[(W (1) − W (1))2 |W ] = = 1 E[(ωI (1) − ωI (1))2 |W ] n n 1 E[ωi (1) − 2ωi (1)ωi (1) + ωi (1)|W ] n2 i=1 = = = 1 ns1 + n2 1 ns1 + n2 1 ns1 + n2 n i (1 − 2ωi (1)) i=1 n eβs1 i i i eβs1 + eβs2 (1) + · · · + eβsq (1 − 2ωi (1)) i=1 n (1 − 2ωi (1)) i=1 eβs1 1 eβs1 + O( ) βs βs q 2 n + e + ··· + e 1 1 1 β + (s1 − ) + O( ) q q q n = 1 ns1 + n (1 − 2s1 ) n2 1 β 1 1 + (s1 − ) + O( ) q q q n = 1 2 2 − 2 +O n q q +O W √ n 2 W √ n , (4.5) and thus 1− E 1 E[(W (1) − W (1))2 |W ] 2λ 2 2 1 1 2 1− − 2 +O 2λ n q q W √ n = E Choose λ = 1 − 12 q q n E = q−1 , nq 2 then 1− 1 E[(W (1) − W (1))2 |W ] 2λ = E q2 O 2(q − 1) W √ n + +O 2 W2 n . (4.6) 2 q2 O 2(q − 1) and therefore the first summand of (3.21) is bounded by W2 n 2 √1 C3 (β), n , (4.7) where C3 (β) is a constant depending only on β. Now let C = C1 + C2 + C3 , then the theorem is proved. CHAPTER 4. MAIN RESULTS Since E[W − W |W ] = 1− βq n 36 1 W + O( n3/2 ), so E[W (1) − W (1)|W ] = = 1− β q n W (1) + O( 1 n3/2 λ W + R, σ2 thus we get 1− λ = σ2 n β q , and therefore σ2 = nλ 1− β q = q−1 . − qβ q2 ) Bibliography [1] A.D. Barbour and L.H.Y. Chen, An introduction to Stein’s method(Singapore University Press, Singapore, 2005). [2] L.H.Y. Chen, Poisson approximation for dependent trials, Annals of Probability 3 (3)(1975), 534¨C545. doi:10.1214/aop/1176996359. JSTOR 2959474. MR428387 Zbl 0335.60016.0. [3] P. Eichelsbacher and M. Löwe, Stein’s method for dependent random variables occurring in statistical mechanics, Electronic Journal of Probability, Vol. 15, Paper no. 30, 962-988, 2010. [4] T. Eisele and R.S. Ellis, Symmetry breaking and random waves for magnetic systems on a circle, Z. Wahrsch. Verw. Gebiete 63 (1983) 297-348. [5] R.S. Ellis, Entropy, Large deviations, and statistical mechanics(Springer, New York, 1985). [6] R.S. Ellis, A unified approach to large deviations for Markov chains and applications to statistical mechanics, in:D. Merlini, ed., Proc. wnd Internat. Ascona/Locarno Conf. on Stochastic Processed, Physics, and Geometry, July 4-9, 1988. Lecture Notes in Phys. (Springer, Berlin, 1989). [7] R.S. Ellis and K.M. Wang, Limit theorems for the empirical vector of the CurieWeiss -Potts model, Stochastic Processes and their Applications 35 (1990) 59-79. [8] F. den Hollander, Large deviations (American Mathematical Society, Providence, Rhode Island, 2000). 37 BIBLIOGRAPHY 38 [9] H. Kesten and R.H. Schonmann, Behavior in large dimensions of the Potts model and Heisenberg models, Reviews in Mathematical Physics 1(2,3) (1990). [10] P.A. Pearce and R.B. Griffiths, Potts model in the many-component limit, J. Phys. A 13 (1980) 2143-2148. [11] Y. Rinott and V. Rotar, On coupling constructions and rates in the CLT for dependent summands with applications to the antivoter model and weighted U-statistics, Ann. Appl. Probab. 7 (1997), no. 4,1080-1105. MR MR1484798(99g:60050). [12] Q.-M. Shao and Z.-G. Su, The Berry-Esseen bound for charater ratios, Proc. Amer. Math. Soc. 134 (2006), no.7, 2153-2159 (electronic). MR MR2215787 (2008j:60064). [13] C. Stein, A bound for the error in the normal approximation to the distribution of a sum of dependent random variables, Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability(1972), 583-602, MR402873 Zbl 0278.60026. [14] F.Y. Wu, The Potts model, Reviews of Modern Physics, Vol. 54(1982), No.1, 235268. [...]... O(n−1 ) Since tanh(x) is 1-Lipschitz, we obtain |R1 | 1 n Hence E|R1 + R2 | = O(n−1 ), and the theorem is proved , 30 Chapter 4 Main Results In this chapter, we use the Stein’s method to bound the convergence rate in the central limit theorem for the total magnetization in the Curie- Weiss- Potts model for high temperature Theorem 4.0.1 For the Curie- Weiss- Potts model, let Sn be the sum of the spin random... distribution in the Kolmogorov (uniform) metric and hence to prove not only a central limit theorem, but also bounds on the rates of convergence for the given metric Later, his Ph.D student Louis Chen Hsiao Yun, modified the method so as to obtain approximation results for the Poisson distribution([2]), therefore the method is often referred to as Stein-Chen method In this chapter, we will introduce Stein’s... by N (0, A) the multinormal distribution on Rq with mean 0 and covariance matrix A The following result states the central limit theorem for 0 < β < βc Theorem 2.2.11 For 0 < β < βc , √ Pn,β { n(Ln − ν 0 ) ∈ dx} ⇒ N (0, [D2 Gβ (ν 0 )]−1 − β −1 I) as n → ∞, where I is the q × q identity matrix The limiting covariance matrix is non-negative CHAPTER 2 THE CURIE- WEISS- POTTS MODEL 14 semidefinite and has... Wang [7] Theorem 2.2.8 Let βc = (2(q − 1)/(q − 2)) log(q − 1) and for β > 0 let s(β) be the CHAPTER 2 THE CURIE- WEISS- POTTS MODEL 11 largest solution of the equation s= 1 − e−βs 1 + (q − 1)e−βs (2.20) Let Kβ denote the set of global minimum points of the symmetric function Gβ (u), u ∈ Rq Then the following conclusions hold (i) The quantity s(β) is well-defined It is positive, strictly increasing, and... We now prove the central limit theorem, that is, Theorem 2.2.11 CHAPTER 2 THE CURIE- WEISS- POTTS MODEL 16 Proof of Theorem 2.2.11: According to Lemma 2.2.13 with γ = 1/2, for each t ∈ Rq , exp[ t, W + √ n(Ln − ν 0 ) ]dP √ exp[−nGβ (ν 0 + x/ n) + t, x ]dx = Rq √ exp[−nGβ (ν 0 + x/ n)]dx}−1 ×{ Rq ¯ We multiply the numerator and denominator on the right-hand side by enGβ and write √ √ each integral over... : Ω → [0, ∞] such that the following hold: (i) For all closed subsets F ⊂ Ω, CHAPTER 2 THE CURIE- WEISS- POTTS MODEL lim sup n→∞ 8 1 ln µn (F ) n − inf [I(x)], 1 ln µn (G) n − inf [I(x)] x∈F (ii) For all open subsets G ⊂ Ω, lim inf n→∞ x∈G Definition 2.2.3 Let µ be the probability measure for a q-dimensional random vector X, then the logarithmic generating function for µ is defined as Λ(λ) := log M (λ)... CHAPTER 2 THE CURIE- WEISS- POTTS MODEL 15 W such that L (W ), which is the law of W , equals N (0, β −1 I) and W is independent of {ωi , i = 1, 2, · · · , n} Then for any points m ∈ Rq and γ ∈ R and any n = 1, 2, · · · , L W n(Ln − m) + n1−γ n1/2 − γ = exp −nGβ m + x nγ exp −nGβ m + dx Rq x nγ −1 dx (2.24) In the next lemma we give a bound on certain integrals that occur in the proofs of the limit theorems... and then give some examples for the application These are mostly taken from [1] 3.1 The Stein Operator Since Stein’s method is a way of bounding the distance of two probability distributions in a specific probability metric To use this method, we need have the metric first We define the distance in the following form d(P, Q) = sup hdP − hdQ = sup |Eh(W ) − Eh(Y )| h∈H h∈H 17 (3.1) CHAPTER 3 STEIN’S... we obtain from Theorem 3.3.1 that dW (L (W ), N (0, 1)) 3.4 5E|X1 |3 n1/2 (3.14) An Application of Stein’s Method To give an application of Stein’s method, we introduce the Stein’s method for dependent random variables occurring in statistical mechanics We first introduce Stein’s method with exchangeable pairs (P.19-P.23 in [1]) for normal approximation, then give the Berry-Esseen bounds for the classical... global minima of maximal type k for k 1 and strength µ of Gρ , where Gρ (β, s) := βs2 − φρ (βs) 2 with φρ (s) := log exp(s, x)dρ(x) CHAPTER 3 STEIN’S METHOD AND ITS APPLICATION 25 Then for W := Sn − nα , n1−1/2k we have for any l ∈ N E|W |l const.(l) In the following theorems, Theorem 3.4.3 is from [11], Theorem 3.4.4 is from [12], Theorem 3.4.5, Corollary 3.4.6 and Corollary 3.4.7 are from [3] Theorem ... convergence rate in the central limit theorem for the total magnetization in the Curie-Weiss-Potts model for high temperature Theorem 4.0.1 For the Curie-Weiss-Potts model, let Sn be the sum of the spin... pairs The aim of this thesis is to calculate the convergence rate in the central limit theorem for the Curie-Weiss-Potts model In chapter 1, we will give an introduction to this problem In chapter... will introduce the Curie-Weiss-Potts model, including the Ising model and the Curie-Weiss model Then we will give some results about the phase transition of the Curie-Weiss-Potts model In chapter

Định dạng
Số trang	42
Dung lượng	500,68 KB