[A] Expectation value: definition
- The probability distribution/density (PD for short) expresses
all there is to know about the associated random variable.
- But the PD may be simply (if partially)
characterised by two of its properties: its
mean
and its
variance.
- Both represent special cases of the more general concept
of
expectation value
.
Key Point 3.6
|
Let g(x) be some function of a random variable X.
The expectation value of g(x) is defined by

|
Commentary:
- The expectation value of g(x) coincides with the
average of a
large number
of observations of g(x).
- We will establish this link explicitly
in the particular case where g(x) is simply x itself,
to be considered below.
- Note that in some texts (including RHB) you will find
![\begin{displaymath}
E[g(x)] \hspace*{1cm} \mbox{{\rm in place of}} \hspace*{1cm} \langle g(x) \rangle\end{displaymath}](img3.gif)
|
[B] Expectation value: rules
- Expectation values satisfy a number of simple but important
rules (see EQ7
for proofs):
Key Point 3.7
|
For any functions g1 and g2 of the random variable X

and, if is constant (independent of x)

|
[C] Mean
Key Point 3.8
|
The mean of a random variable X is (by definition) its
expectation value:

The mean coincides with the
average of a large number of observations of x.
|
Proof:
- Consider N measurements of a discrete variable X, with possible
values
.Let Xt denote the value of X observed in
measurement
.
- Then the average of the N observations can be written as a
sum over observations
:

- Divide the N observations into groups, corresponding to the possible
values
; and denote by
ni the number of measurements yielding value xi.
- Then the average may be re-written as a
sum over the possible discrete values
:

- Appealing to the frequency definition of probability
( KP2.2
) we identify

- It follows that

so that the mean is indeed the average in the long-run (large number) limit.
Commentary:
- The mean is uniquely defined by the PD of the variable in
question
--and referred to sometimes as the `mean of the PD'
- It may be thought of as a single-parameter
indicator of what (in some sense) to `expect' from a
single
observation of X.
- Its utility depends on the extent to which observed values deviate
from the mean.
Examples:
1. Mean of a discrete variable
- Let X be the result of a die throw.
- Then X assumes values xi = i
with probabilities
p(xi)=p=1/6.
- The mean (or expectation value) of X is

- Note that in this case no single observation will coincide with the
`expectation value'.
2. Mean of a continuous variable
- Let X be a continuous variable chosen randomly from the interval
[1,6].
- Then X has PDF

- The mean (or expectation value) of X is

3. Further examples
See HQ3
for more a physically-interesting application.
|
[D] Variance and standard deviation
Key Point 3.9
|
The variance V[x] of a random variable is the expectation value of the square of the
deviation of the variable from its mean:
![\begin{displaymath}
V[x] \left\{\begin{array}
{ll}
=\sum_i p(x_i) (\Delta x_i)^2...
...x$\space is continuous}} \hspace*{1cm}\\ &\\ \end{array}\right.\end{displaymath}](img21.gif)
Abbreviated form: ![$V[x] = \langle (\Delta x)^2 \rangle$](img22.gif)
Alternative form: ![$V[x] = \langle x ^2 \rangle - \langle x \rangle ^2$](img23.gif)
|
Commentary:
- The variance is constructed to reflect how spread-out a PD is.
- We need to use the
square
of
because (see EQ7
)

- The variance thus measures the typical `square of the
deviation from the mean'.
- Proving the `alternative form' is part of EQ7
.
- To provide a measure of the typical `deviation from the mean'
we take the square root of the variance to give the
standard deviation
:
|
Key Point 3.10
The standard deviation of a random variable is the square-root of its variance:
![\begin{displaymath}
\sigma[x] = \sqrt{ V[x] }\end{displaymath}](img27.gif)
The standard deviation measures the `spread' or `width' of the PD.
|
Examples:
1. Variance and standard deviation of a discrete variable
- Let X be the result of a dice throw (as in the earlier example).
- Then the expectation value of X2 is

- Recalling that in this case
and using the `alternative form'
in KP3.9
we find
![\begin{displaymath}
V[x] = \langle x^2 \rangle - \langle x \rangle ^2 = \frac{91}{6} - \left[ \frac{7}{2}\right ] ^2
= \frac{35}{12}\end{displaymath}](img30.gif)
while
![\begin{displaymath}
\sigma[x] = \sqrt{ V[x] } = \sqrt{\frac{35}{12}} =1.71
\hspace*{0.5cm} \mbox{{\rm (3 sf)}} \hspace*{0.5cm} \end{displaymath}](img31.gif)
- This seems sensible
given the
perceived spread of the distribution.
2. Variance and standard deviation of a continuous variable
- Let X be a continuous variable chosen randomly from the interval
[1,6]
(as in the earlier example).
- Then the expectation value of X2 is

- Recall that in this case
and appeal to the `alternative form'
for the variance. It then follows that
![\begin{displaymath}
V[x] = \langle x^2 \rangle - \langle x \rangle ^2 =\frac{43}{3} - \left[ \frac{7}{2}\right ] ^2
= \frac{25}{12}\end{displaymath}](img33.gif)
while
![\begin{displaymath}
\sigma[x] = \sqrt{ V[x] } = \sqrt{\frac{25}{12}} =1.44
\hspace*{0.5cm} \mbox{{\rm 3 sf}} \hspace*{0.5cm} \end{displaymath}](img34.gif)
3. Further examples
See TQ5
for a more physically-interesting application.
|
- The nature of the information provided by a PD depends
crucially on the relative sizes of the mean and the standard deviation.
- If the mean and standard deviation are of similar sizes
(the PD is `broad')
the mean is a relatively uninformative
indicator of what
to expect from one observation of X.
- If the mean is large on the scale of the standard deviation
(the PD is `sharp') the mean is an extremely good
indicator of what
to expect from one observation of X.
|
![\includegraphics [scale=0.6]{{/Home/alastair/teaching/probstats}/source/figures/sharpness.eps}](img35.gif) |
- The
mean
and the
standard deviation
are the two most important parameters of a PD. But there are others.
[E] Independent random variables
- Consider two statistically independent random variables X and Y.
Meaning of statistically independent:
- In words: knowledge of the value of one tells us nothing about the
likely value of the other.
- In algebra (discrete case):

- These are specific instances of `mutually-independent' assertions
(S2.2
).
- Then (a reformulation of KP2.6
)
| ![\begin{displaymath}
p(x,y)\equiv P([X=x],[Y=y]) =P(X=x) \times P(Y=y) =p(x) \times p(y)
\end{displaymath}](img39.gif) |
(3.1) |
In words: the joint distribution (p(x,y)) is the product
of the distributions (p(x) and p(y)) of the two variables.
- For continuous variables
|  |
(3.2) |
- It is then straightforward to show that (EQ7
):
Key Point 3.11
|
For two (statistically) independent random variables X and Y

|
- In words: the expectation value of the product of two independent
variables is equal to the product of their expectation values.
Questions for you to do at this point: TQ5
EQ7