next up previous contents
Next: Coupled-, Overlapping- and Higher-Order Up: Modelling with Markov Chains Previous: Modelling with Markov Chains

Markov Chains

The construction of a Markov chain requires two basic ingredients, namely a transition matrix and an initial distribution. We start with the definition of the transition matrix. Assume a finite set $S=\{1,\ldots,m\}$ of states. Assign to each pair $(i,j) \in S^2$ of states a real number pi j such that the properties

 \begin{displaymath}
p_{i j} \ge 0 \quad \forall (i,j) \in S^2 \end{displaymath} (2.1)


 \begin{displaymath}
\sum_{j \in S} p_{i j} = 1 \quad \forall i \in S \end{displaymath} (2.2)

are satisfied and define the transition matrix ${\rm I\!P}$ by

\begin{displaymath}{\rm I\!P}=\left( \begin{array}{ccccc}
p_{1 1} & p_{1 2} & \c...
...\\
p_{m 1} & p_{m 2} & \cdots & p_{m m}\\
\end{array} \right)\end{displaymath}

Let $(X_n)_{n\in{\rm I\!N}_0}$ be a sequence of random variables with values in S. Here, n denotes the time at which the state Xn occurs.

Definition 2.1 (Markov Chain)   The sequence $(X_n)_{n\in{\rm I\!N}_0}$ is called a homogeneous Markov chain with discrete time, state space S, and transition matrix ${\rm I\!P}$, if for every $n \in {\rm I\!N}_0$ the condition

 \begin{displaymath}
P\left[ X_{n+1}=j \Big\vert X_0=i_0,\ldots,X_n=i_n \right] =
P\left[ X_{n+1}=j \Big\vert X_n=i_n \right] = p_{i_n j} \end{displaymath} (2.3)

is satisfied for all $(i_0,\ldots,i_n,j) \in S^{n+2}$, for which $P[X_0=i_0,\ldots,X_n=i_n]>0$.

The first identity in (2.3), which is also called ``Markov property'', defines the ``memory'' or ``order'' of the chain. In this case, the order equals one since the transition probabilities are entirely determined by the preceding state. As we will see, the restriction to order-one chains is no serious limitation since processes with arbitrary finite memory s can be interpreted as order-one MCs on the product space Ss. The second identity in (2.3) is called homogeneity condition. It assures that the transition probabilities do not vary with the time n.

So far, we have only specified the ingredients for the evolvement of probabilities throughout the time. To complete the construction of a Markov chain we need to specify an initial distribution. Hence denote by DS the set of discrete distributions on S,

\begin{displaymath}D_S=\{{\bf P}=(P_{i})_{i\in S}: \; P_{i} \ge 0, \sum_{i \in S} P_{i} = 1\},\end{displaymath}

where we represent distributions as row vectors. We call ${\bf P}_0=(P_{0 i})_{i\in S} \in D_S$the initial distribution of the chain $(X_n)_{n\in{\rm I\!N}_0}$ if P[X0=i] = P0 i for all states $i \in S$.

Consider now the so-called n-th order transition probabilities p(n)i j=P[Xm+n=j | Xm=i], $n \in {\rm I\!N}$, which are computed by summing over all possible intermediate states, that is, over all paths of length n+1 with initial state i and final state j,

\begin{eqnarray*}p^{(n)}_{i j} & = & \sum_{(i_1,\ldots,i_{n-1}) \in S^{n-1}} \!\...
...} \cdot
\prod_{j=1}^{n-2} p_{i_j i_{j+1}}
\cdot p_{i_{n-1} j},
\end{eqnarray*}


whence the matrix $(p^{(n)}_{i j})_{(i,j)\in S^2}$ equals the n'th power ${\rm I\!P}^n$ of the transition matrix ${\rm I\!P}$. We further define $p^{(0)}_{i j}:= \delta_{i j}$ so that ${\rm I\!P}^0$ equals the identity matrix Im. Given the distribution ${\bf P}_0$ of X0, the distribution of Xnthus calculates to ${\bf P}_0 {\rm I\!P}^n$, $n \in {\rm I\!N}_0$. As we will see, Perron's formula (see Section 2.3) proves to be a powerful tool for the analysis of powers of the matrix ${\rm I\!P}$.

The construction of a probability space $(\Omega,{\cal A},\mu)$on which a given chain $(X_n)_{n\in{\rm I\!N}_0}$ can be realized is given by the Existence Theorem, see [3, Theorem 8.1] for example. We may think of $\Omega$ to consist of all possible paths $(x_0,x_1,x_2,\ldots)$ in $S^{{\rm I\!N}_0}$.

The following type of Markov chains deserves special attention.

Definition 2.2 (Independent chain)   Let $\; {\bf P}=(P_{1},\ldots,P_{m}) \in D_S$ and define a Markov chain with state space S, transition matrix ${\rm I\!P}=(p_{i j})_{(i,j) \in S^2}$ given by pi j=Pj, $i \in S$, and arbitrary initial distribution ${\bf P}_0 \in D_S$. Then for all $n \in {\rm I\!N}_0$,

\begin{displaymath}P\left[ X_{n+1}=j \Big\vert X_0=i_0,\ldots,X_n=i_n \right] =
P\left[ X_{n+1}=j \Big\vert X_n=i_n \right] = P_{j}.\end{displaymath}

We call this the ``independent chain'' with respect to ${\bf P}$.

Corollary 2.1   In an independent chain the sequence of states $(X_n)_{n\in{\rm I\!N}_0}$ is a sequence of independent random variables.

Independent chains have no memory whence they are also called zero-order Markov chains.

It is quite clear how the above generalizes to the case of countable state space $S={\rm I\!N}$. We shortly recall the following definitions and lemmas which help to identify the different types of chains. They are discussed in much more detail in every reasonable text book on Markov chains, see e.g. [36], where the reader can also find proofs.

Definition 2.3 (Recurrence and Transience)   Consider the probability

\begin{displaymath}r_{i j}=P \left[\bigcup_{n \in {\rm I\!N}} \{X_n = j \Big\vert X_0=i \} \right]\end{displaymath}

of an eventual visit in state j starting in i. A state i is called recurrent (or persistent) if ri i=1 and transient if ri i<1. A chain is called recurrent (transient) if all states are recurrent (transient).

The following alternative conditions follow by the Borel-Cantelli Lemma:

Lemma 2.2   A state i is recurrent iff $P[X_n = i \mbox{ infinitely often }\vert X_0 = i] = 1$
or, equivalently, iff $\sum_{n \in {\rm I\!N}} p^{(n)}_{i i} = \infty$. On the other hand, it is transient iff
$P[X_n = i \mbox{ infinitely often }\vert X_0 = i] = 0$or, equivalently, iff $\sum_{n \in {\rm I\!N}} p^{(n)}_{i i} < \infty$.

In the case of finite state space S, a state i is transient if and only if it is absorbing, that is, if pi i=1.

Definition 2.4 (Irreducibility)   A MC is called irreducible (or undecomposable) if for all pairs of states $(i,j) \in S^2$ there exists an integer n such that p(n)i j>0.

Proposition 2.3   A MC is irreducible if and only if for arbitrary $k \in {\rm I\!N}$ and $(i,j) \in S^2$there exists an integer $n=n_{i j} \ge k$ such that pi j(n) >0.

Proof:      The ``if'' part being obvious, the ``only if'' part follows from the fact that for $(i,j) \in S^2$ there exist integers ni i,ni j such that p(ni i)i i>0 and p(ni j)i j>0 and consequently $p^{(ln_{i i}+n_{i j})}_{i j} \ge (p^{(n_{i i})}_{i i})^l p^{(n_{i j})}_{i j} >0$ for all $l \in {\rm I\!N}$.

Irreducible MCs cannot be decomposed into parts which do not interact. It can be shown that either all the states of an irreducible MC are transient or all states are recurrent. In this case, the chain itself is either recurrent or transient. Finite irreducible chains are always recurrent, see [3, Example 8.7, p. 118]. Unless indicated otherwise we will assume irreducibility in the following.

In irreducible chains there may still exist a periodic structure such that for each state $i \in S$, the set of possible return times to i when starting in i is a subset of the set $p \cdot {\rm I\!N}= \{p, 2 p, 3 p, \ldots \}$, containing all but a finite set of these elements. The smallest number p with this property is the so-called period of the chain,

\begin{displaymath}p=\gcd\{n \in {\rm I\!N}\; : \; p^{(n)}_{i i} > 0\}.\end{displaymath}

For p=1, this gives reason for the following definition.

Definition 2.5 (Aperiodicity)   An irreducible chain is called aperiodic (or acyclic) if the period p equals 1 or, equivalently, if for all pairs $(i,j) \in S^2$ of states there is an integer ni j such that for all $n \ge n_{i j}$, the probability p(n)i j>0.

The state space S of an irreducible periodic chain with period p>1 can be partitioned into p mutually disjoint classes ${\cal P}_0,\ldots,{\cal P}_{p-1}$of states such that if the system is in a state $i \in {\cal P}_l$ at time n, it will be in a state $j \in {\cal P}_k$ at time n+1, where $k \equiv l+1 \pmod p$.

The following lemma gives a class of irreducible and aperiodic chains.

Lemma 2.4   If ${\bf P}=(P_{1},\ldots,P_{m})$ is strictly positive, Pi > 0, $i \in S$, then the independent chain with respect to ${\bf P}$ is irreducible and aperiodic.

Proof:     For every $n \in N$, the probability p(n)i j=Pj is positive.

A central role in the analysis of MCs is played by so-called stationary distributions. Such distributions are invariant under the transition matrix.

Definition 2.6 (Stationary distribution)   A distribution ${\bf P}\in D_S$ which satisfies $\sum_{i \in S} P_{i} p_{i j} = P_{j}$ for all states $j \in S$, or ${\bf P}{\rm I\!P}= {\bf P}$ in matrix notation, is called a stationary distribution.

By induction we immediately get ${\bf P}{\rm I\!P}^n = {\bf P}$ for all $n \in {\rm I\!N}$, so that we have the following lemma:

Lemma 2.5   If the initial distribution of a chain is a stationary distribution, then the process $(X_n)_{n\in{\rm I\!N}_0}$ of states is a stationary process, i.e. each Xn has the same distribution ${\bf P}$.

Lemma 2.6   For an independent chain with respect to ${\bf P}$, ${\bf P}$ itself is a stationary distribution.

Proof:      $\sum_{i \in S} P_{i} p_{i j} = \sum_{i \in S} P_{i} P_{j} = P_{j}$

To further analyze stable distributions put

\begin{displaymath}T_i = \left\{ \begin{array}{ccl}
\infty & : & \mbox{ if } X_...
...\; : \; X_n = i \} & : & \mbox { otherwise}
\end{array} \right.\end{displaymath}

the time of the first visit of the chain in state i and let Ei[Ti] denote the expectation conditional on X0=i, that is, the expected return time to state i.

Definition 2.7 (Positive recurrence)   A state $i \in S$ is called positive recurrent (null recurrent) if $E_i[T_i]<\infty$ ( $E_i[T_i]=\infty$). A recurrent chain is called positive recurrent, if all states are positive recurrent.

Note, that we do not rely on irreducibility in the above definition of a positive recurrent chain.

Lemma 2.7 (Existence and uniqueness of stable distributions)   Positive recurrence of a chain $(S,{\rm I\!P})$ guarantees the existence of a stable distribution ${\bf P}$. If, in addition, $(S,{\rm I\!P})$ is irreducible, then the stable distribution is unique and can be written in the form ${\bf P}=(P_{1},P_{2},\ldots)$ with Pi = 1/Ei[Ti], $i \in S$.

In particular, every finite chain is positive recurrent and thus has a stable distribution. For positive recurrent irreducible aperiodic chains we have the following limiting result:

Lemma 2.8   Let $(S,{\rm I\!P})$ be irreducible, positive recurrent, and aperiodic, with stable distribution ${\bf P}$. Then ${\bf P}=(P_{1},P_{2},\ldots)$is given by $P_{j} = \lim_{n \to \infty} p^{(n)}_{i j}$for all pairs $(i,j) \in S^2$ of states.

In this case, ${\rm I\!P}^n$ converges element-wise to the transition matrix of the independent chain with respect to the stable distribution ${\bf P}$. For the corresponding l1-result on the sequence of distributions ${\bf P}_0 {\rm I\!P}^n$, ${\bf P}_0 \in D_S$ an arbitrary initial distribution, see [6, Theorem 6.1] and [42]. If S is finite in addition, we have the following Lemma on the speed of convergence:

Lemma 2.9   Let $(S,{\rm I\!P})$ be a finite irreducible aperiodic MC with stationary distribution ${\bf P}$. Then there exist $c \ge 0$ and $0 \le \rho < 1$, such that

\begin{displaymath}\vert p^{(n)}_{i j} - P_{j}\vert \le c \rho^n, \quad \forall (i,j) \in S^2, \forall n \in {\rm I\!N}.\end{displaymath}

Any arbitrary initial distribution converges exponentially to ${\bf P}$ in this case. This class of chains allows a particular simple form of the Central Limit Theorem for the number of visits in each state. For the corresponding result (see Theorem 2.18) both, the finiteness and irreducibility are essential prerequisites. Although aperiodicity may be omitted, we will have this property in the examples discussed in Chapter 3.


next up previous contents
Next: Coupled-, Overlapping- and Higher-Order Up: Modelling with Markov Chains Previous: Modelling with Markov Chains
Stefan Wegenkittl
1998-05-19