Next: Coupled-, Overlapping- and Higher-Order
Up: Modelling with Markov Chains
Previous: Modelling with Markov Chains
The construction of a Markov chain requires two basic
ingredients, namely a transition matrix and an initial distribution.
We start with the definition of the transition matrix.
Assume a finite set
of states. Assign to each
pair
of states a real number pi j such that the properties
 |
(2.1) |
 |
(2.2) |
are satisfied and define the transition matrix
by
Let
be a sequence of random variables with values
in S. Here, n denotes the time at which the state Xn occurs.
Definition 2.1 (Markov Chain)
The sequence

is called
a homogeneous Markov chain with
discrete time, state space
S, and transition matrix

,
if for every

the condition
![\begin{displaymath}
P\left[ X_{n+1}=j \Big\vert X_0=i_0,\ldots,X_n=i_n \right] =
P\left[ X_{n+1}=j \Big\vert X_n=i_n \right] = p_{i_n j} \end{displaymath}](img71.gif) |
(2.3) |
is satisfied for all

,
for which
![$P[X_0=i_0,\ldots,X_n=i_n]>0$](img73.gif)
.
The first identity in (2.3), which is also called ``Markov property'',
defines the ``memory'' or ``order'' of the chain. In this case, the order
equals one since the transition probabilities
are entirely determined by the preceding state. As we will see,
the restriction to order-one chains is no serious limitation since processes with
arbitrary finite memory s can be interpreted as order-one MCs on
the product space Ss.
The second identity in (2.3) is called homogeneity condition.
It assures that the transition probabilities do not vary with the time n.
So far, we have only specified the ingredients for the evolvement of probabilities
throughout the time.
To complete the construction of a Markov chain we need to specify an initial
distribution. Hence denote by DS the set
of discrete distributions on S,
where we represent distributions as row vectors.
We call
the initial distribution of the chain
if
P[X0=i] = P0 i for all states
.
Consider now the so-called n-th order transition probabilities
p(n)i j=P[Xm+n=j | Xm=i],
,
which are computed by summing over all possible intermediate states,
that is, over all paths of length n+1 with initial state i and final state j,
whence the matrix
equals the n'th power
of
the transition matrix
.
We further define
so that
equals the
identity matrix Im.
Given the distribution
of X0, the distribution of Xnthus calculates to
,
.
As we will see, Perron's formula (see Section 2.3) proves to be a
powerful tool for the analysis of powers of the matrix
.
The construction of a probability space
on which a given chain
can be realized
is given by the Existence Theorem, see [3, Theorem 8.1] for example.
We may think of
to consist of all possible paths
in
.
The following type of Markov chains deserves special attention.
Definition 2.2 (Independent chain)
Let

and define a Markov chain with state
space
S, transition matrix

given by
pi j=
Pj,

,
and arbitrary initial
distribution

.
Then for
all

,
We call this the ``independent chain'' with
respect to

.
Corollary 2.1
In an independent chain the sequence of states

is a sequence
of independent random variables.
Independent chains have no memory whence they are also called zero-order Markov chains.
It is quite clear how the above generalizes to the case of countable state space
.
We shortly recall the following definitions and lemmas which
help to identify the different types of chains. They are discussed in much more
detail in every reasonable text book on Markov chains, see e.g. [36], where
the reader can also find proofs.
Definition 2.3 (Recurrence and Transience)
Consider the probability
of an eventual visit in state
j starting in
i. A state
i is called
recurrent (or persistent) if
ri i=1 and transient if
ri i<1.
A chain is called recurrent (transient) if all states are recurrent (transient).
The following alternative conditions follow by the Borel-Cantelli Lemma:
Lemma 2.2
A state
i is
recurrent iff
or, equivalently,
iff

.
On the other hand, it is transient iff
![$P[X_n = i \mbox{ infinitely often }\vert X_0 = i] = 0$](img96.gif)
or, equivalently, iff

.
In the case of finite state space S, a state i is transient if and only if it
is absorbing, that is, if pi i=1.
Definition 2.4 (Irreducibility)
A MC is called irreducible (or undecomposable) if for all pairs of states

there exists an integer
n such that
p(n)i j>0.
Proposition 2.3
A MC is irreducible if and only if for arbitrary

and

there exists an integer

such that
pi j(n) >0.
Proof:
The ``if'' part being obvious, the ``only if'' part follows from the
fact that for
there exist integers
ni i,ni j such that
p(ni i)i i>0 and
p(ni j)i j>0 and consequently
for
all
.
Irreducible MCs cannot be decomposed into parts which do not interact.
It can be shown that either all the states of an irreducible MC are transient or all
states are recurrent. In this case, the chain itself is either recurrent or transient.
Finite irreducible chains are always recurrent,
see [3, Example 8.7, p. 118].
Unless indicated otherwise we will assume irreducibility in the following.
In irreducible chains there may still exist a periodic structure such that
for each state
,
the set of possible return times to i when starting
in i is a subset of the set
,
containing all but a finite set of these elements. The smallest number
p with this property is the so-called period of the chain,
For p=1, this gives reason for the following definition.
Definition 2.5 (Aperiodicity)
An irreducible chain is called aperiodic (or acyclic) if
the period
p equals 1 or, equivalently, if
for all pairs

of states there is an integer
ni j such that
for all

,
the probability
p(n)i j>0.
The state space S of an irreducible periodic chain with period p>1 can be partitioned
into p mutually disjoint classes
of states such that if the system is in a state
at time n,
it will be in a state
at time n+1, where
.
The following lemma gives a class of irreducible and aperiodic chains.
Lemma 2.4
If

is strictly positive,
Pi > 0,

,
then the independent chain with respect to

is irreducible and
aperiodic.
Proof: For every
,
the probability
p(n)i j=Pj is positive.
A central role in the analysis of MCs is played by so-called stationary
distributions. Such distributions are invariant under the transition matrix.
Definition 2.6 (Stationary distribution)
A distribution

which satisfies

for all states

,
or

in matrix notation,
is called a stationary distribution.
By induction we immediately get
for
all
,
so that we have the following lemma:
Lemma 2.5
If the initial distribution of a chain is a stationary distribution, then
the process

of states is a stationary process,
i.e. each
Xn has the same distribution

.
Lemma 2.6
For an independent chain with respect to

,

itself is a stationary distribution.
Proof:
To further analyze stable distributions put
the time of the first visit of the chain in state i and let Ei[Ti] denote
the expectation conditional on X0=i, that is,
the expected return time to state i.
Definition 2.7 (Positive recurrence)
A state

is called positive recurrent (null recurrent) if
![$E_i[T_i]<\infty$](img118.gif)
(
![$E_i[T_i]=\infty$](img119.gif)
). A recurrent chain is called positive recurrent,
if all states are positive recurrent.
Note, that we do not rely on irreducibility in the above definition of a positive
recurrent chain.
Lemma 2.7 (Existence and uniqueness of stable distributions)
Positive recurrence of a chain

guarantees the existence of a stable distribution

.
If, in addition,

is irreducible, then the
stable distribution is unique and can be written in the form

with
Pi = 1/
Ei[
Ti],

.
In particular, every finite chain
is positive recurrent and thus has a stable distribution.
For positive recurrent irreducible aperiodic chains
we have the following limiting result:
Lemma 2.8
Let

be irreducible, positive recurrent, and aperiodic, with
stable distribution

.
Then

is given by

for all pairs

of states.
In this case,
converges element-wise to the transition matrix of the
independent chain with respect to the stable distribution
.
For the corresponding l1-result on the sequence of distributions
,
an arbitrary initial distribution,
see [6, Theorem 6.1] and [42].
If S is finite in addition, we have the following Lemma on the speed of convergence:
Lemma 2.9
Let

be a finite irreducible aperiodic MC with
stationary distribution

.
Then there exist

and

,
such that
Any arbitrary initial distribution converges exponentially to
in this case.
This class of chains allows a particular simple form of the Central Limit Theorem
for the number of visits in each state. For the corresponding result (see
Theorem 2.18) both, the finiteness
and irreducibility are essential prerequisites. Although aperiodicity may
be omitted, we will have this property in the examples discussed in
Chapter 3.
Next: Coupled-, Overlapping- and Higher-Order
Up: Modelling with Markov Chains
Previous: Modelling with Markov Chains
Stefan Wegenkittl
1998-05-19