S 3.4 Expectation values



[A] Expectation value: definition

Key Point 3.6

Let g(x) be some function of a random variable X. The expectation value of g(x) is defined by

\begin{displaymath}
\langle g(x) \rangle=\left\{\begin{array}
{ll}
\sum_i p(x_i)...
 ...rm if $X$\space is continuous}} \hspace*{1cm}\end{array}\right.\end{displaymath}

Commentary:

  • The expectation value of g(x) coincides with the average of a large number of observations of g(x).

  • We will establish this link explicitly in the particular case where g(x) is simply x itself, to be considered below.

  • Note that in some texts (including RHB) you will find

    \begin{displaymath}
E[g(x)] \hspace*{1cm} \mbox{{\rm in place of}} \hspace*{1cm} \langle g(x) \rangle\end{displaymath}



[B] Expectation value: rules

  • Expectation values satisfy a number of simple but important rules (see EQ7 for proofs):

    Key Point 3.7

    For any functions g1 and g2 of the random variable X

    \begin{displaymath}
\langle g_1(x)+ g_2(x) \rangle = \langle g_1(x) \rangle +\langle g_2(x) \rangle\end{displaymath}

    and, if $\alpha$ is constant (independent of x)

    \begin{displaymath}
\langle \alpha \rangle =\alpha
\hspace*{1cm} \mbox{{\rm and}...
 ...*{1cm}
\langle \alpha g(x) \rangle =\alpha \langle g(x) \rangle\end{displaymath}



[C] Mean

Key Point 3.8

The mean of a random variable X is (by definition) its expectation value:

\begin{displaymath}
\langle x \rangle=\left\{\begin{array}
{ll}
\sum_i p(x_i) x_...
 ...rm if $X$\space is continuous}} \hspace*{1cm}\end{array}\right.\end{displaymath}

The mean coincides with the average of a large number of observations of x.

Proof:

  • Consider N measurements of a discrete variable X, with possible values $x_i, i=1,2..\Omega$.Let Xt denote the value of X observed in measurement $t=1 \ldots N$.

  • Then the average of the N observations can be written as a sum over observations :

    \begin{displaymath}
x_{av} = \frac{1}{N}\sum_{t=1}^{N} X_t\end{displaymath}

  • Divide the N observations into groups, corresponding to the possible values $x_1\ldots x_ \Omega$; and denote by ni the number of measurements yielding value xi.

  • Then the average may be re-written as a sum over the possible discrete values :

    \begin{displaymath}
x_{av} = \frac{1}{N}\sum_{i=1}^{\Omega} n_i x_i\end{displaymath}

  • Appealing to the frequency definition of probability ( KP2.2 ) we identify

    \begin{displaymath}
p(x_i) = \lim_{N\rightarrow \infty}\frac{n_i}{N}\end{displaymath}

  • It follows that

    \begin{displaymath}
\lim_{N\rightarrow \infty} x_{av} = 
\lim_{N\rightarrow \inf...
 ...ac{n_i}{N}x_i=
\sum_{i=1}^{\Omega} p(x_i)x_i =\langle x \rangle\end{displaymath}

    so that the mean is indeed the average in the long-run (large number) limit.

Commentary:

  • The mean is uniquely defined by the PD of the variable in question --and referred to sometimes as the `mean of the PD'

  • It may be thought of as a single-parameter indicator of what (in some sense) to `expect' from a single observation of X.

  • Its utility depends on the extent to which observed values deviate from the mean.

Examples:

1. Mean of a discrete variable

  • Let X be the result of a die throw.

  • Then X assumes values xi = i $(i=1, \ldots 6)$ with probabilities p(xi)=p=1/6.

  • The mean (or expectation value) of X is

    \begin{displaymath}
\langle x \rangle = \sum_{i=1}^{6} p(x_i) x_i =p \sum_{i=1}^{6}x_i = \frac{1+2+3+4+5+6}{6} = \frac{7}{2} \end{displaymath}

  • Note that in this case no single observation will coincide with the `expectation value'.

2. Mean of a continuous variable

  • Let X be a continuous variable chosen randomly from the interval [1,6].

  • Then X has PDF

    \begin{displaymath}
f(x) = \left\{
\begin{array}
{ll}
\frac{1}{5} & 1\le x\le 6\\ &\\ 0& \mbox{otherwise}\end{array}\right .\end{displaymath}

  • The mean (or expectation value) of X is

    \begin{displaymath}
\langle x \rangle = \int_{-\infty}^{\infty} dx f(x) x = 
\fr...
 ... dx x
= \frac{1}{5} \times \frac{x^2}{2}\mid_1^{6} =\frac{7}{2}\end{displaymath}

3. Further examples


See HQ3 for more a physically-interesting application.



[D] Variance and standard deviation

Key Point 3.9

The variance V[x] of a random variable is the expectation value of the square of the deviation of the variable from its mean:

\begin{displaymath}
V[x] \left\{\begin{array}
{ll}
=\sum_i p(x_i) (\Delta x_i)^2...
 ...x$\space is continuous}} \hspace*{1cm}\\ &\\ \end{array}\right.\end{displaymath}

Abbreviated form: $V[x] = \langle (\Delta x)^2 \rangle$

Alternative form: $V[x] = \langle x ^2 \rangle - \langle x \rangle ^2$

Commentary:

  • The variance is constructed to reflect how spread-out a PD is.

  • We need to use the square of $\Delta x$because (see EQ7 )

    \begin{displaymath}
\langle \Delta x \rangle= 0\end{displaymath}

  • The variance thus measures the typical `square of the deviation from the mean'.

  • Proving the `alternative form' is part of EQ7 .

  • To provide a measure of the typical `deviation from the mean' we take the square root of the variance to give the standard deviation :

Key Point 3.10

The standard deviation $\sigma[x]$of a random variable is the square-root of its variance:

\begin{displaymath}
\sigma[x] = \sqrt{ V[x] }\end{displaymath}

The standard deviation measures the `spread' or `width' of the PD.

Examples:

1. Variance and standard deviation of a discrete variable

  • Let X be the result of a dice throw (as in the earlier example).

  • Then the expectation value of X2 is

    \begin{displaymath}
\langle x^2 \rangle = \sum_{i=1}^{6} p(x_i) x_i^2 =p \sum_{i=1}^{6}x_i =
 \frac{1^2+2^2+3^2+
4^2+5^2+6^2}{6} = \frac{91}{6} \end{displaymath}

  • Recalling that in this case $\langle x \rangle = \frac{7}{2}$ and using the `alternative form' in KP3.9 we find

    \begin{displaymath}
V[x] = \langle x^2 \rangle - \langle x \rangle ^2 = \frac{91}{6} - \left[ \frac{7}{2}\right ] ^2
= \frac{35}{12}\end{displaymath}

    while

    \begin{displaymath}
\sigma[x] = \sqrt{ V[x] } = \sqrt{\frac{35}{12}} =1.71 
\hspace*{0.5cm} \mbox{{\rm (3 sf)}} \hspace*{0.5cm} \end{displaymath}

  • This seems sensible given the perceived spread of the distribution.
2. Variance and standard deviation of a continuous variable

  • Let X be a continuous variable chosen randomly from the interval [1,6] (as in the earlier example).

  • Then the expectation value of X2 is

    \begin{displaymath}
\langle x^2 \rangle 
= \int_{-\infty}^{\infty} dx f(x) x^2 =...
 ... x^2
= \frac{1}{5} \times \frac{x^3}{3}\mid_1^{6} =\frac{43}{3}\end{displaymath}

  • Recall that in this case $\langle x \rangle = \frac{7}{2}$ and appeal to the `alternative form' for the variance. It then follows that

    \begin{displaymath}
V[x] = \langle x^2 \rangle - \langle x \rangle ^2 =\frac{43}{3} - \left[ \frac{7}{2}\right ] ^2
= \frac{25}{12}\end{displaymath}

    while

    \begin{displaymath}
\sigma[x] = \sqrt{ V[x] } = \sqrt{\frac{25}{12}} =1.44 
\hspace*{0.5cm} \mbox{{\rm 3 sf}} \hspace*{0.5cm} \end{displaymath}

3. Further examples


See TQ5 for a more physically-interesting application.

  • The nature of the information provided by a PD depends crucially on the relative sizes of the mean and the standard deviation.

  • If the mean and standard deviation are of similar sizes (the PD is `broad') the mean is a relatively uninformative indicator of what to expect from one observation of X.

  • If the mean is large on the scale of the standard deviation (the PD is `sharp') the mean is an extremely good indicator of what to expect from one observation of X.

\includegraphics [scale=0.6]{{/Home/alastair/teaching/probstats}/source/figures/sharpness.eps}

\includegraphics [scale = 0.5 ]{{/Home/alastair/teaching/probstats}/source/figures/Iflag.eps}
Broad and narrow distribtions

  • The mean and the standard deviation are the two most important parameters of a PD. But there are others.

\includegraphics [scale = 0.5 ]{{/Home/alastair/teaching/probstats}/source/figures/Xflag.eps}
Beyond the mean and the variance



[E] Independent random variables

  • Consider two statistically independent random variables X and Y.

    Meaning of statistically independent:

    • In words: knowledge of the value of one tells us nothing about the likely value of the other.

    • In algebra (discrete case):

      \begin{displaymath}
P(X=x\vert Y=y)= P(X=x) 
\hspace*{1cm} \mbox{{\rm and}} \hspace*{1cm}

P(Y=y\vert X=x)= P(Y=y) \end{displaymath}

    • These are specific instances of `mutually-independent' assertions (S2.2 ).

  • Then (a reformulation of KP2.6 )

    \begin{displaymath}
p(x,y)\equiv P([X=x],[Y=y]) =P(X=x) \times P(Y=y) =p(x) \times p(y)
\end{displaymath} (3.1)

    In words: the joint distribution (p(x,y)) is the product of the distributions (p(x) and p(y)) of the two variables.

  • For continuous variables

    \begin{displaymath}
f(x,y) =f(x) \times f(y)
\end{displaymath} (3.2)

  • It is then straightforward to show that (EQ7 ):

    Key Point 3.11

    For two (statistically) independent random variables X and Y

    \begin{displaymath}
\langle xy \rangle =\langle x \rangle \times\langle y \rangle\end{displaymath}

  • In words: the expectation value of the product of two independent variables is equal to the product of their expectation values.
Questions for you to do at this point: TQ5 EQ7