Conditional Expectation

Contents

Let

(\Omega, \mathscr{F}, \mathrm{P})

be a probability space, and let

A \in \mathscr{F}

be an event such that

\mathrm{P}(A)>0

. As for finite probability spaces, the conditional probability of

B

with respect to

A

(denoted by

\mathrm{P}(B \mid A)

) means

\mathrm{P}(B A) / \mathrm{P}(A)

, and the conditional probability of

B

with respect to the finite or countable decomposition

\mathscr{D}=

\left\{D_1, D_2 \ldots\right\}

with

\mathrm{P}\left(D_i\right)>0, i \geq 1

(denoted by

\mathrm{P}(B \mid \mathscr{D})

) is the random variable equal to

\mathrm{P}\left(B \mid D_i\right)

for

\omega \in D_i, i \geq 1

\begin{equation*}\mathrm{P}(B \mid \mathscr{D})=\sum_{i \geq 1} \mathrm{P}\left(B \mid D_i\right) I_{D_i}(\omega) .\end{equation*}

In a similar way, if

\xi

is a random variable for which

E \xi

is defined, the conditional expectation of

\xi

with respect to the event

A

with

\mathrm{P}(A)>0

(denoted by

\mathrm{E}(\xi \mid A))

\mathrm{E}\left(\xi I_A\right) / \mathrm{P}(A)

The random variable

\mathrm{P}(B \mid \mathscr{D})

is evidently measurable with respect to the

\sigma

-algebra

\mathscr{G}=\sigma(\mathscr{D})

, and is consequently also denoted by

\mathrm{P}(B \mid \mathscr{G})

. However, in probability theory we may have to consider conditional probabilities with respect to events whose probabilities are zero.

Example
Consider, for example, the following experiment. Let

\xi

be a random variable that is uniformly distributed on

[0,1]

. If

\xi=x

, toss a coin for which the probability of head is

x

, and the probability of tail is

1-x

. Let

v

be the number of heads in

n

independent tosses of this coin. What is the "conditional probability

\mathrm{P}(v=k \mid \xi=x)

"? Since

\mathrm{P}(\xi=x)=0

, the conditional probability

\mathrm{P}(v=k \mid \xi=x)

is undefined, although it is intuitively plausible that "it ought to be

C_n^k x^k(1-x)^{n-k}

Formal definition

Let

(\Omega, \mathscr{F}, \mathrm{P})

be a probability space,

\mathscr{G}

\sigma

-algebra,

\mathscr{G} \subseteq \mathscr{F}(\mathscr{G}

is a

\sigma

subalgebra of

\mathscr{F})

, and

\xi=\xi(\omega)

a random variable. The expectation

\mathrm{E} \xi

was defined in two stages: first for a nonnegative random variable

\xi

, then in the general case by

\begin{equation*}\mathrm{E} \xi=\mathrm{E} \xi^{+}-\mathrm{E} \xi^{-},\end{equation*}

and only under the assumption that

\begin{equation*}\min \left(\mathbf{E} \xi^{-}, \mathrm{E} \xi^{+}\right)<\infty .\end{equation*}

A similar two-stage construction is also used to define conditional expectations

\mathrm{E}(\xi \mid \mathscr{G})

Definition

\mathbf{A.}

The conditional expectation of a nonnegative random variable

\xi

with respect to the

\sigma

-algebra

\mathscr{G}

is a nonnegative extended random variable, denoted by

\mathrm{E}(\xi \mid \mathscr{G})

\mathrm{E}(\xi \mid \mathscr{G})(\omega)

, such that

$\mathrm{E}(\xi \mid \mathscr{G})$ is $\mathscr{G}$ -measurable;
for every $A \in \mathscr{G}$

\begin{equation}\int_A \xi d \mathrm{P}=\int_A \mathrm{E}(\xi \mid \mathscr{G}) d \mathrm{P} .\end{equation}

\mathbf{B.}

The conditional expectation

\mathrm{E}(\xi \mid \mathscr{G})

, or

\mathrm{E}(\xi \mid \mathscr{G})(\omega)

, of any random variable

\xi

with respect to the

\sigma

-algebra

\mathscr{G}

, is considered to be defined if

\begin{equation*}\min \left(\mathrm{E}\left(\xi^{+} \mid \mathscr{G}\right), \mathrm{E}\left(\xi^{-} \mid \mathscr{G}\right)\right)<\infty,\end{equation*}

\mathrm{P}

-a.s., and it is given by the formula

\begin{equation*}\mathrm{E}(\xi \mid \mathscr{G}) \equiv \mathrm{E}\left(\xi^{+} \mid \mathscr{G}\right)-\mathrm{E}\left(\xi^{-} \mid \mathscr{G}\right),\end{equation*}

where, on the set (of probability zero) of sample points for which

\mathrm{E}\left(\xi^{+} \mid \mathscr{G}\right)

=\mathrm{E}\left(\xi^{-} \mid \mathscr{G}\right)=\infty

, the difference

\mathrm{E}\left(\xi^{+} \mid \mathscr{G}\right)-\mathrm{E}\left(\xi^{-} \mid \mathscr{G}\right)

is given an arbitrary value, for example zero.

We begin by showing that, for nonnegative random variables,

\mathrm{E}(\xi \mid \mathscr{G})

actually exists. By (6.36) the set function

\begin{equation}\mathrm{Q}(A)=\int_A \xi d \mathrm{P}, \quad A \in \mathscr{G},\end{equation}

is a measure on

(\Omega, \mathscr{G})

, and is absolutely continuous with respect to

\mathrm{P}

(considered on

(\Omega, \mathscr{G}), \mathscr{G} \subseteq \mathscr{F})

. Therefore (by the Radon-Nikodým theorem) there is a nonnegative

\mathscr{G}

-measurable extended random variable

\mathrm{E}(\xi \mid \mathscr{G})

such that

\begin{equation}\mathrm{Q}(A)=\int_A \mathrm{E}(\xi \mid \mathscr{G}) d \mathrm{P} .\end{equation}

Then (1) follows from (2) and (3).

Properties

We shall suppose that the expectations are defined for all the random variables that we consider and that

\mathscr{G} \subseteq \mathscr{F}

\mathbf{A.}

C

is a constant and

\xi=C(

a.s.

)

, then

\mathrm{E}(\xi \mid \mathscr{G})=C

(a.s.).

\mathbf{B.}

\xi \leq \eta

(a.s.) then

\mathrm{E}(\xi \mid \mathscr{G}) \leq \mathrm{E}(\eta \mid \mathscr{G})

(a.s.).

\mathbf{C.}

|\mathrm{E}(\xi \mid \mathscr{G})| \leq \mathrm{E}(|\xi| \mid \mathscr{G})

(a.s.).

\mathbf{D.}

a, b

are constants and

a \mathrm{E} \xi+b \mathrm{E} \eta

is defined, then

\begin{equation*}\mathrm{E}(a \xi+b \eta \mid \mathscr{G})=a \mathrm{E}(\xi \mid \mathscr{G})+b \mathrm{E}(\eta \mid \mathscr{G}) \quad \text { (a.s.). }\end{equation*} \newline

\mathbf{E.}

Let

\mathscr{F}_*=\{\varphi, \Omega\}

be the trivial

\sigma

-algebra. Then

\begin{equation*}\left.\mathrm{E}\left(\xi \mid \mathscr{F}_*\right)=\mathrm{E} \xi \quad \text { (a.s. }\right) .\end{equation*} \newline

\mathbf{F.}

\mathrm{E}(\xi \mid \mathscr{F})=\xi(

a.s.).

\mathbf{G.}

\mathrm{E}(\mathrm{E}(\xi \mid \mathscr{G}))=\mathrm{E} \xi

\mathbf{H.}

\mathscr{G}_1 \subseteq \mathscr{G}_2

then

\begin{equation*}E\left[E\left(\xi \mid \mathscr{G}_2\right) \mid \mathscr{G}_1\right]=\mathrm{E}\left(\xi \mid \mathscr{G}_1\right) \quad \text { (a.s.). }\end{equation*} \newline

\mathbf{I.}

\mathscr{G}_1 \supseteq \mathscr{G}_2

then

\begin{equation*}\mathrm{E}\left[\mathrm{E}\left(\xi \mid \mathscr{G}_2\right) \mid \mathscr{G}_1\right)=\mathrm{E}\left(\xi \mid \mathscr{G}_2\right) \quad \text { (a.s.). }\end{equation*} \newline

\mathbf{J.}

Let a random variable

\xi

for which

\mathrm{E} \xi

is defined be independent of the

\sigma

-algebra

\mathscr{G}

(i.e., independent of

I_B, B \in \mathscr{G}

). Then

\begin{equation*}\mathrm{E}(\xi \mid \mathscr{G})=\mathrm{E} \xi \quad \text { (a.s.). }\end{equation*}

\mathbf{K.}

Let

\eta

be a

\mathscr{G}

-measurable random variable,

\mathrm{E}|\xi|<\infty

and

\mathrm{E}|\xi \eta|<\infty

.Then

\begin{equation*}\mathrm{E}(\xi \eta \mid \mathscr{G})=\eta \mathrm{E}(\xi \mid \mathscr{G}) \quad \text { (a.s.). }\end{equation*} \newline

Let proof these properties.

\mathbf{A.}

A constant function is measurable with respect to

\mathscr{G}

. Therefore we need only verify that

\begin{equation*}\int_{\boldsymbol{A}} \xi d \mathrm{P}=\int_{\boldsymbol{A}} C d \mathrm{P}, \quad A \in \mathscr{G} .\end{equation*}

But, by the hypothesis

\xi=C

(a.s.) and Property

\mathbf{G}

of Mathematical Expectation, this equation is obviously satisfied.

\mathbf{B.}

\xi \leq \eta

(a.s.), then by Property B of Mathematical Expectation

\begin{equation*}\int_{\boldsymbol{A}} \xi d \mathrm{P} \leq \int_{\boldsymbol{A}} \eta d \mathrm{P}, \quad A \in \mathscr{G},\end{equation*}

and therefore

\begin{equation*}\int_A \mathrm{E}(\xi \mid \mathscr{G}) d \mathrm{P} \leq \int_A \mathrm{E}(\eta \mid \mathscr{G}) d \mathrm{P}, \quad A \in \mathscr{G} .\end{equation*}

The required inequality now follows from Property I of Mathematical Expectation.

\mathbf{C.}

This follows from the preceding property if we observe that

-|\xi|

\leq \xi \leq|\xi|

\mathbf{D.}

A \in \mathscr{G}

then

\begin{align*}\int_A(a \xi+b \eta) d \mathrm{P}= & \int_A a \xi d \mathrm{P}+\int_A b \eta d \mathrm{P}=\int_A a \mathrm{E}(\xi \mid \mathscr{G}) d \mathrm{P} \\& +\int_A b \mathrm{E}(\eta \mid \mathscr{G}) d \mathrm{P}=\int_A[a \mathrm{E}(\xi \mid \mathscr{G})+b \mathrm{E}(\eta \mid \mathscr{G})] d \mathrm{P},\end{align*}

which establishes

\mathbf{D}

\mathbf{E.}

This property follows from the remark that

\mathrm{E} \xi

is an

\mathscr{F}_*

-measurable function and the evident fact that if

A=\Omega

A=\varnothing

then

\begin{equation*}\int_A \xi d \mathrm{P}=\int_A \mathrm{E} \xi d \mathrm{P}\end{equation*} \newline

\mathbf{F.}

Since

\xi

\mathscr{F}

-measurable and

\begin{equation*}\int_A \xi d \mathrm{P}=\int_A \xi d \mathrm{P}, \quad A \in \mathscr{F},\end{equation*}

we have

\mathrm{E}(\xi \mid \mathscr{F})=\xi

(a.s.).

\mathbf{G.}

This follows from

\mathbf{E}

and

\mathbf{H}^*

by taking

\mathscr{G}_1=\{\varnothing, \Omega\}

and

\mathscr{G}_2=\mathscr{G}

\mathbf{H.}

Let

A \in \mathscr{G}_1

; then

\begin{equation*}\int_A \mathrm{E}\left(\xi \mid \mathscr{G}_1\right) d \mathrm{P}=\int_{\boldsymbol{A}} \xi d \mathrm{P} .\end{equation*}

Since

\mathscr{G}_1 \subseteq \mathscr{G}_2

, we have

A \in \mathscr{G}_2

and therefore

\begin{equation*}\int_A \mathrm{E}\left[\mathrm{E}\left(\xi \mid \mathscr{G}_2\right) \mid \mathscr{G}_1\right] d \mathrm{P}=\int_A \mathrm{E}\left(\xi \mid \mathscr{G}_2\right) d \mathrm{P}=\int_A \xi d \mathrm{P} .\end{equation*}

Consequently, when

A \in \mathscr{G}_1

\begin{equation*}\int_A \mathrm{E}\left(\xi \mid \mathscr{G}_1\right) d \mathrm{P}=\int_A \mathrm{E}\left[\mathrm{E}\left(\xi \mid \mathscr{G}_2\right) \mid \mathscr{G}_1\right] d \mathrm{P}\end{equation*}

and by Property I of Mathematical Expectation

\begin{equation*}\mathrm{E}\left(\xi \mid \mathscr{G}_1\right)=\mathrm{E}\left[\mathrm{E}\left(\xi \mid \mathscr{G}_2\right) \mid \mathscr{G}_1\right] \text { (a.s.). }\end{equation*} \newline

\mathbf{I.}

A \in \mathscr{G}_1

, then by the definition of

\mathrm{E}\left[\mathrm{E}\left(\xi \mid \mathscr{G}_2\right) \mid \mathscr{G}_1\right]

\begin{equation*}\int_{A^*} \mathrm{E}\left[\mathrm{E}\left(\xi \mid \mathscr{G}_2\right) \mid \mathscr{G}_1\right] d \mathrm{P}=\int_A \mathrm{E}\left(\xi \mid \mathscr{G}_2\right) d \mathrm{P} .\end{equation*}

The function

\mathrm{E}\left(\xi \mid \mathscr{G}_2\right)

\mathscr{G}_2

-measurable and, since

\mathscr{G}_2 \subseteq \mathscr{G}_1

, also

\mathscr{G}_1

-measurable. It follows that

\mathrm{E}\left(\xi \mid \mathscr{G}_2\right)

is a variant of the expectation

\mathrm{E}\left[\mathrm{E}\left(\xi \mid \mathscr{G}_2\right) \mid \mathscr{G}_1\right]

, which proves Property

\mathrm{I}

\mathbf{J.}

Since

\mathrm{E} \xi

is a

\mathscr{G}

-measurable function, we have only to verify that

\begin{equation*}\int_B d \mathrm{P}=\int_B \mathrm{E} \xi d \mathrm{P}\end{equation*}

i.e. that

\mathrm{E}\left[\xi \cdot I_B\right]=\mathrm{E} \xi \cdot \mathrm{E} I_B

. If

\mathrm{E}|\xi|<\infty

Theorem (On Taking Limits Under the Expectation Sign)
Let

\left\{\xi_n\right\}_{n \geq 1}

be a sequence of extended random variables.(a) If

\left|\xi_n\right| \leq \eta

, E

\eta<\infty

and

\xi_n \rightarrow \xi

(a.s.), then

References

[Shiryaev]
Shiryaev, Albert N. Probability-2. Springer, 2010.