Chapter 2 ConditionalExpectation Please see Hull’s book (Section 9.6.) 2.1 A Binomial Model for Stock Price Dynamics Stock prices are assumed to follow this simple binomial model: The initial stock price during the period under study is denoted S 0 . At each time step, the stock price either goes up by a factor of u or down by a factor of d . It will be useful to visualize tossing a coin at each time step, and say that the stock price moves up by a factor of u if the coin comes out heads ( H ), and down by a factor of d if it comes out tails ( T ). Note that we are not specifying the probabilityof heads here. Consider a sequence of 3 tosses of the coin (See Fig. 2.1) The collection of all possible outcomes (i.e. sequences of tosses of length 3) is =fHHH; HHT; HTH; HTT; THH; T HH; THT; TT H; TT T g: A typical sequence of will be denoted ! ,and ! k will denote the k th element in the sequence ! . We write S k ! to denote the stock price at “time” k (i.e. after k tosses) under the outcome ! .Note that S k ! depends only on ! 1 ;! 2 ;::: ;! k . Thus in the 3-coin-toss example we write for instance, S 1 ! 4 = S 1 ! 1 ;! 2 ;! 3 4 = S 1 ! 1 ; S 2 ! 4 = S 2 ! 1 ;! 2 ;! 3 4 = S 2 ! 1 ;! 2 : Each S k is a random variable defined on the set . More precisely, let F = P .Then F is a -algebra and ; F is a measurable space. Each S k is an F -measurable function !IR ,thatis, S ,1 k is a function B!F where B is the Borel -algebra on IR. We will see later that S k is in fact 49 50 S 3 (HHH) = u S 3 S 3 S 3 dS 0 2 (HHT) = u S 3 S 3 S 3 (HH) = u 2 2 SS 0 S 0 1 1 ω = Η ω = Τ 2 2 2 2 3 3 3 3 3 3 ω = Η ω = Τ ω = Η ω = Τ ω = Η ω = Τ ω = Η ω = Τ ω = Η ω = Τ S (T) = dS 1 S (H) = uS 1 0 0 S 3 S 0 3 (TTT) = d S 0 3 dS 0 2 dS 0 2 (HTH) = u (THH) = u S 0 2 S 0 2 (HTT) = d u S 0 2 (TTH) = d (THT) = d u u 2 2 SS 0 (TT) = d 2 S (TH) = ud S (HT) = ud S 2 S 0 0 Figure 2.1: A three coin period binomial model. measurable under a sub- -algebra of F . Recall that the Borel -algebra B is the -algebra generated by the open intervals of IR. In this course we will always deal with subsets of IR that belong to B . For any random variable X defined on a sample space and any y 2 IR , we will use the notation: fX y g 4 = f! 2 ; X ! y g: The sets fXyg;fXyg;fX=yg; etc, are defined similarly. Similarly for any subset B of IR , we define fX 2 B g 4 = f! 2 ; X ! 2 B g: Assumption 2.1 ud0 . 2.2 Information Definition 2.1 (Sets determined by the first k tosses.) We say that a set A is determined by the first k coin tosses if, knowing only the outcome of the first k tosses, we can decide whether the outcome of all tosses is in A . In general we denote the collection of sets determined by the first k tosses by F k . It is easy to check that F k is a -algebra. Note that the random variable S k is F k -measurable, for each k =1;2;::: ;n . Example 2.1 In the 3 coin-toss example, the collection F 1 of sets determined by the first toss consists of: CHAPTER 2. ConditionalExpectation 51 1. A H 4 = fHHH; HHT; HT H; HT T g , 2. A T 4 = fTHH;THT;TTH;TTTg , 3. , 4. . The collection F 2 of sets determined by the first two tosses consists of: 1. A HH 4 = fHHH; HHT g , 2. A HT 4 = fHT H; HT T g , 3. A TH 4 = fTHH;THTg , 4. A TT 4 = fTTH;TTTg , 5. The complements of the above sets, 6. Any union of the above sets (including the complements), 7. and . Definition 2.2 (Information carried by a random variable.) Let X be a random variable !IR . We say that a set A is determined by the random variable X if, knowing only the value X ! of the random variable, we can decide whether or not ! 2 A . Another way of saying this is that for every y 2 IR , either X ,1 y A or X ,1 y A = . The collection of susbets of determined by X is a -algebra, which we call the -algebra generated by X , and denote by X . If the random variable X takes finitely many different values, then X is generated by the collec- tion of sets fX ,1 X !j! 2 g; these sets are called the atoms of the -algebra X . In general, if X is a random variable !IR ,then X is given by X = fX ,1 B; B 2Bg: Example 2.2 (Sets determined by S 2 ) The -algebra generated by S 2 consists of the following sets: 1. A HH = fHHH; HHT g = f! 2 ; S 2 !=u 2 S 0 g , 2. A TT = fTTH;TTTg = fS 2 = d 2 S 0 g; 3. A HT A TH = fS 2 = udS 0 g; 4. Complements of the above sets, 5. Any union of the above sets, 6. = fS 2 ! 2 g , 7. =fS 2 !2IRg . 52 2.3 ConditionalExpectation In order to talk about conditional expectation, we need to introduce a probability measure on our coin-toss sample space . Let us define p 2 0; 1 is the probability of H , q 4 =1,p is the probability of T , the coin tosses are independent, so that, e.g., IP HHT =p 2 q; etc. IP A 4 = P ! 2A IP ! , 8A . Definition 2.3 (Expectation.) IEX 4 = X !2 X!IP !: If A then I A ! 4 = 1 if ! 2 A 0 if ! 62 A and IE I A X = Z A XdIP = X !2A X!IP !: We can think of IE I A X as a partial average of X over the set A . 2.3.1 An example Let us estimate S 1 ,given S 2 . Denote the estimate by IE S 1 jS 2 . From elementary probability, IE S 1 jS 2 is a random variable Y whose value at ! is defined by Y !=IES 1 jS 2 =y; where y = S 2 ! . Properties of IE S 1 jS 2 : IE S 1 jS 2 should depend on ! , i.e., it is a random variable. If the value of S 2 is known, then the value of IE S 1 jS 2 should also be known. In particular, – If ! = HHH or ! = HHT ,then S 2 ! =u 2 S 0 . If we know that S 2 ! =u 2 S 0 ,then even without knowing ! , we know that S 1 ! =uS 0 .Wedefine IE S 1 jS 2 HHH= IES 1 jS 2 HHT = uS 0 : – If ! = TTT or ! = TTH ,then S 2 ! = d 2 S 0 . If we know that S 2 ! = d 2 S 0 ,then even without knowing ! , we know that S 1 ! =dS 0 .Wedefine IE S 1 jS 2 TTT=IES 1 jS 2 TTH=dS 0 : CHAPTER 2. ConditionalExpectation 53 – If ! 2 A = fHT H; HT T; T HH; THT g ,then S 2 ! =udS 0 . If we know S 2 ! = udS 0 , then we do not know whether S 1 = uS 0 or S 1 = dS 0 . We then take a weighted average: IP A=p 2 q+pq 2 + p 2 q + pq 2 =2pq : Furthermore, Z A S 1 dIP = p 2 quS 0 + pq 2 uS 0 + p 2 qdS 0 + pq 2 dS 0 = pq u + dS 0 For ! 2 A we define IE S 1 jS 2 != R A S 1 dIP IP A = 1 2 u + dS 0 : Then Z A IE S 1 jS 2 dIP = Z A S 1 dIP: In conclusion, we can write IE S 1 jS 2 !=gS 2 !; where g x= 8 : uS 0 if x = u 2 S 0 1 2 u + dS 0 if x = udS 0 dS 0 if x = d 2 S 0 In other words, IE S 1 jS 2 is random only through dependence on S 2 . We also write IE S 1 jS 2 = x=gx; where g is the function defined above. The random variable IE S 1 jS 2 has two fundamental properties: IE S 1 jS 2 is S 2 -measurable. For every set A 2 S 2 , Z A IE S 1 jS 2 dIP = Z A S 1 dIP: 2.3.2 Definition of ConditionalExpectation Please see Williams, p.83. Let ; F ;IP be a probabilityspace, and let G be a sub- -algebra of F .Let X be a random variable on ; F ;IP .Then IE X jG is defined to be any random variable Y that satisfies: (a) Y is G -measurable, 54 (b) For every set A 2G , we have the “partial averaging property” Z A YdIP = Z A XdIP: Existence. There is always a random variable Y satisfying the above properties (provided that IE jX j 1 ), i.e., conditional expectations always exist. Uniqueness. There can be more than one random variable Y satisfying the above properties, but if Y 0 is another one, then Y = Y 0 almost surely, i.e., IP f! 2 ; Y !=Y 0 !g=1: Notation 2.1 For random variables X; Y , it is standard notation to write IE X jY 4 = IE X jY : Here are some useful ways to think about IE X jG : A random experiment is performed, i.e., an element ! of is selected. The value of ! is partially but not fully revealed to us, and thus we cannot compute the exact value of X ! . Based on what we know about ! ,wecomputeanestimateof X ! . Because this estimate depends on the partial information we have about ! , it depends on ! , i.e., IE X jY ! is a function of ! , although the dependence on ! is often not shown explicitly. If the -algebra G contains finitely many sets, there will be a “smallest” set A in G containing ! , which is the intersectionof all sets in G containing ! .Theway ! is partially revealed to us is that we are told it is in A , but not told which element of A it is. We then define IE X jY ! to be the average (with respect to IP ) value of X over this set A . Thus, for all ! in this set A , IE X jY ! will be the same. 2.3.3 Further discussion of Partial Averaging The partial averaging property is Z A IE X jGdIP = Z A XdIP; 8A 2G: (3.1) We can rewrite this as IE I A :IE X jG = IE I A :X : (3.2) Note that I A is a G -measurable random variable. In fact the following holds: Lemma 3.10 If V is any G -measurable random variable, then provided IE jV:IEXjGj 1 , IE V:IEXjG = IE V:X: (3.3) CHAPTER 2. ConditionalExpectation 55 Proof: To see this, first use (3.2) and linearity of expectations to prove (3.3) when V is a simple G -measurable random variable, i.e., V is of the form V = P n k=1 c k I A K , where each A k is in G and each c k is constant. Next consider the case that V is a nonnegative G -measurable random variable, but is not necessarily simple. Such a V can be written as the limit of an increasing sequence of simple random variables V n ; we write (3.3) for each V n and then pass to the limit, using the Monotone Convergence Theorem (See Williams), to obtain (3.3) for V . Finally, the general G - measurable random variable V can be writtenas the difference of twononnegative random-variables V = V + , V , , and since (3.3) holds for V + and V , it must hold for V as well. Williams calls this argument the “standard machine” (p. 56). Based on this lemma, we can replace the second condition in the definition of a conditional expec- tation (Section 2.3.2) by: (b’) For every G -measurable random-variable V ,wehave IE V:IEXjG = IE V:X: (3.4) 2.3.4 Properties of ConditionalExpectation Please see Willams p. 88. Proof sketches of some of the properties are provided below. (a) IE IE X jG = IE X : Proof: Just take A in the partial averaging property to be . The conditionalexpectation of X is thus an unbiased estimator of the random variable X . (b) If X is G -measurable, then IE X jG= X: Proof: The partial averaging property holds trivially when Y is replaced by X . And since X is G -measurable, X satisfies the requirement (a) of a conditionalexpectation as well. If the information content of G is sufficient to determine X , then the best estimate of X based on G is X itself. (c) (Linearity) IE a 1 X 1 + a 2 X 2 jG =a 1 IEX 1 jG +a 2 IEX 2 jG : (d) (Positivity) If X 0 almost surely, then IE X jG 0: Proof: Take A = f! 2 ; IE X jG! 0g . This setis in G since IE X jG is G -measurable. Partial averaging implies R A IE X jGdIP = R A XdIP . The right-hand side is greater than or equal to zero, and the left-hand side is strictly negative, unless IP A=0 . Therefore, IP A=0 . 56 (h) (Jensen’s Inequality) If : R!R is convex and IE jX j 1 ,then IE X jG IE X jG: Recall the usual Jensen’s Inequality: IEX IEX: (i) (Tower Property) If H is a sub- -algebra of G ,then IE IE X jGjH= IEXjH: H is a sub- -algebra of G means that G contains more information than H . If we estimate X based on the information in G , and then estimate the estimator based on the smaller amount of information in H , then we get the same result as if we had estimated X directly based on the information in H . (j) (Taking out what is known) If Z is G -measurable, then IE ZXjG =Z:IE X jG: When conditioningon G ,the G -measurable random variable Z acts like a constant. Proof: Let Z be a G -measurable random variable. A random variable Y is IE ZXjG if and only if (a) Y is G -measurable; (b) R A YdIP = R A ZXdIP; 8A 2G . Take Y = Z:IE X jG .Then Y satisfies (a) (a product of G -measurable random variables is G -measurable). Y also satisfies property (b), as we can check below: Z A YdIP = IEI A :Y = IE I A ZIEXjG = IE I A Z:X ((b’) with V = I A Z = Z A ZXdIP: (k) (Role of Independence) If H is independent of X ; G ,then IE X jG ; H = IE X jG: In particular, if X is independent of H ,then IE X jH=IEX: If H is independent of X and G , then nothing is gained by including the information content of H in the estimation of X . CHAPTER 2. ConditionalExpectation 57 2.3.5 Examples from the Binomial Model Recall that F 1 = f; A H ;A T ;g . Notice that IE S 2 jF 1 must be constant on A H and A T . Now since IE S 2 jF 1 must satisfy the partial averaging property, Z A H IE S 2 jF 1 dIP = Z A H S 2 dIP; Z A T IE S 2 jF 1 dIP = Z A T S 2 dIP: We compute Z A H IE S 2 jF 1 dIP = IP A H :IE S 2 jF 1 ! = pIE S 2 jF 1 ! ; 8! 2 A H : On the other hand, Z A H S 2 dIP = p 2 u 2 S 0 + pq udS 0 : Therefore, IE S 2 jF 1 != pu 2 S 0 + q udS 0 ; 8! 2 A H : We can also write IE S 2 jF 1 ! = pu 2 S 0 + q udS 0 = pu + qduS 0 = pu + qdS 1 !; 8! 2 A H Similarly, IE S 2 jF 1 !=pu + qdS 1 !; 8! 2 A T : Thus in both cases we have IE S 2 jF 1 != pu + qdS 1 !; 8! 2 : A similar argument one time step later shows that IE S 3 jF 2 ! =pu + qdS 2 !: We leave the verification of this equality as an exercise. We can verify the Tower Property, for instance, from the previous equations we have IE IE S 3 jF 2 jF 1 = IE pu + qdS 2 jF 2 = pu + qdIES 2 jF 1 (linearity) = pu + qd 2 S 1 : This final expression is IE S 3 jF 1 . 58 2.4 Martingales The ingredients are: A probability space ; F ;IP . A sequence of -algebras F 0 ; F 1 ;::: ;F n , with the property that F 0 F 1 ::: F n F . Such a sequence of -algebras is called a filtration. A sequence of random variables M 0 ;M 1 ;::: ;M n . This is called a stochastic process. Conditions for a martingale: 1. Each M k is F k -measurable. If you know the information in F k , then you know the value of M k . We say that the process fM k g is adapted to the filtration fF k g . 2. For each k , IE M k+1 jF k =M k . Martingales tend to go neither up nor down. A supermartingaletendsto go down, i.e. the secondconditionabove is replaced by IE M k+1 jF k M k ;asubmartingale tends to go up,i.e. IE M k+1 jF k M k . Example 2.3 (Example from the binomial model.) For k =1;2 we already showed that IE S k+1 jF k =pu + qdS k : For k =0 ,weset F 0 = f; g , the “trivial -algebra”. This -algebra contains no information, and any F 0 -measurable random variable must be constant (nonrandom). Therefore, by definition, IE S 1 jF 0 is that constant which satisfies the averaging property Z IE S 1 jF 0 dIP = Z S 1 dIP: The right hand side is IES 1 =pu + qdS 0 , and so we have IE S 1 jF 0 =pu + qdS 0 : In conclusion, If pu + qd=1 then fS k ; F k ; k =0;1;2;3g is a martingale. If pu + qd 1 then fS k ; F k ; k =0;1;2;3g is a submartingale. If pu + qd 1 then fS k ; F k ; k =0;1;2;3g is a supermartingale. . = fS 2 ! 2 g , 7. =fS 2 !2IRg . 52 2.3 Conditional Expectation In order to talk about conditional expectation, we need to introduce a probability. V:IEXjG = IE V:X: (3.3) CHAPTER 2. Conditional Expectation 55 Proof: To see this, first use (3.2) and linearity of expectations to prove (3.3) when V is