MATH3010 Online Resources
Note that MATH3010 has now been superseded by MATH3067.
Information Theory 2004
This is the home page for MATH3010 Information Theory
for 2nd semester 2004. The page will be updated with
tutorial solutions and so forth as the semester progresses.
Any important announcements will
also be posted here. Consequently you are urged to bookmark
this page and consult it regularly.
The textbook for the course is a set of lecture notes by
Dr Nigel O'Brian, which is available for purchase from
It is essential for students to obtain a copy of the book.
There were some minor misprints and other errors in last
year's version of the book. If anyone finds any in this
year's version, please tell me.
For a list of reference books, see the
of the Senior Maths Handbook.
Please read and retain a copy of
the information sheet
The lecturer is A/Prof Bob Howlett, whose room is Carslaw 709.
His consultation times are shown on his
(The tutorial sheets and solutions are no longer available.)
A sample exam has been released.
A probability distribution is essentially a collection
of nonnegative numbers that add up to 1. The associated
information entropy is obtained by calculating
–p log2 p for each of these
numbers p and summing over all p (excluding
p = 0).
Note that –log2 p can be regarded
as a measure of the amount of information one receives when an
event of probability p occurs. The entropy is the weighted
average of this over all possible outcomes. That is, it is the
expectation, or expected value, of the amount of information
that will be obtained when the outcome is known.
So if X is a random variable, the entropy
H(X) is to be thought
of as the amount of information you expect to to get from
discovering the value of X.
If X, Y are two random variables, their joint entropy
the entropy of the joint distribution. That is, you sum
–p(x,y) log2 p(x,y)
over all x and y. This is the total amount of
information you expect to get from knowing both X
The conditional entropy of X given Y is the amount of
additional information you expect to get from learning the
value of X given that you already know Y. It is
denoted by H(X|Y), and equals
H(X,Y) – H(Y).
For each particular value y of Y there is an
associated probability distribution for X, made up of
the probabilities of the various values of X given that
Y = y. The entropy of this is the conditional
entropy of X given that Y = y. It is denoted by
H(X|Y = y), and is found by
summing – p(x|y) log2
p(x|y) over all values of x.
(Remember that y is fixed here.) Remember that
p(x,y)/p(y). The conditional
entropy of X given Y (see above) is equal to the
weighted average, over all possible values for y, of the
conditional entropy of X given that Y = y.
The mutual information of X and Y is the
amount of information about X that you expect to get
by learning the value of Y. It also equals the amount of
information about Y that you expect to get by learning
the value of X. This is obviously less than the total
amount of information you expect get by learning the value of
Y or the total amount of information you expect get by
learning the value of X. Indeed, it is given by
I(X;Y) = H(X) –
H(X|Y). Do not confuse the conditional
entropy H(X|Y) (additional information
expected to be gained from learning X when you
already know Y) with the mutual information
(expected information about X gained by learning the
value of Y).