`bayesianmc`

– Markov Chain Inference¶

The `cmpy.inference.bayesianmc`

submodule provides a fully Bayesian
framework for inferring order-r
Markov chains
from a sequence of observations. It also provides model comparision
techniques for Markov chains of various orders. The general method is
described in [strelioff2007].

## Introduction¶

In this document, we will walk through a sample inference, model comparison, and sampling of the posterior distribution. Throughout, we’ll assume the user has the following imports:

```
>>> from __future__ import division
>>> import cmpy.machines as cmm
>>> import cmpy.inference.bayesianmc as bmc
```

Let’s begin by generating some data from the Even process.

```
>>> m = cmm.Even()
>>> data = m.symbols(5000)
```

One noteworthy feature of the Even process is that it is it cannot be
represented by any finite order Markov chain. One can easily verify this in
`CMPy`

via:

```
>>> m.markov_order()
inf
```

However, suppose we didn’t know the symbols in data had come from the
Even process and further, that we believed the data to have been generated
from a order-3 Markov process. Then, a natural next step would be to infer
a 3rd order Markov chain from the data, and `cmpy.inference.bayesianmc`

is just the right tool for that task.

## Inferring A Markov Chain¶

The basic building block of Markov chain inference is the
`InferMC`

class. It treats each
word of length as a state and considers words of length
to determine transitions between the states.

For example, consider the string . We start with the first three symbols and by shifting to the right, we transition to on symbol . We take this dynamic over words as the dynamic over states. Visually, we can see this as the following graph structure:

The goal of the inference algorithm is to assign probabilities to the edges, such that the total probability of the edges leaving a particular state is 1. The probabilities are generally obtained by counting the number of times a word, for example , is followed by a or a .

The simplest way to infer a Markov chain from this data is to create an
instance of `InferMC`

:

```
>>> alphabet = m.alphabet()
>>> order = 3
>>> mc = bmc.InferMC(alphabet, data, order)
```

Immediately, the data sequence is scanned to obtain counts of words and their transitions to other words. We can view these counts too. Note, your results will be different from what appears below:

```
>>> print mc.counts_string()
n(0,0,0 -> *) = 457
n(0,0,0 -> 0) = 239
n(0,0,0 -> 1) = 218
n(0,0,1 -> *) = 417
n(0,0,1 -> 1) = 417
n(0,1,1 -> *) = 834
n(0,1,1 -> 0) = 428
n(0,1,1 -> 1) = 406
n(1,0,0 -> *) = 417
n(1,0,0 -> 0) = 218
n(1,0,0 -> 1) = 199
n(1,0,1 -> *) = 417
n(1,0,1 -> 1) = 417
n(1,1,0 -> *) = 834
n(1,1,0 -> 0) = 417
n(1,1,0 -> 1) = 417
n(1,1,1 -> *) = 1621
n(1,1,1 -> 0) = 406
n(1,1,1 -> 1) = 1215
```

### Maximum Likelihood Estimate¶

In the non-Bayesian approach, one would simply take those counts to compute probabilities for the transitions. This corresponds to a maximum likelihood estimate. For example, we can ask for MLE transition probability of of a following .

```
>>> 239/457
0.52297592997811815
>>> mc.get_transition_probability_MLE( ('0', '0', '0'), '0' )
(0.52297592997811815, 0.00054589082416113923)
```

The second quantity is the variance, as computed by a Gaussian approximation of the log likelihood. We can obtain all probabilities by:

```
>>> print mc.transition_probability_MLE_string()
Pr( 0 | 0,0,0 ) = 0.52297593 , StdDev = 0.02336431
Pr( 1 | 0,0,0 ) = 0.47702407 , StdDev = 0.02336431
Pr( 0 | 0,0,1 ) = 0.00000000 , StdDev = 0.00000000
Pr( 1 | 0,0,1 ) = 1.00000000 , StdDev = 0.00000000
Pr( 0 | 0,1,0 ) = 0.00000000 , StdDev = 0.00000000
Pr( 1 | 0,1,0 ) = 0.00000000 , StdDev = 0.00000000
Pr( 0 | 0,1,1 ) = 0.51318945 , StdDev = 0.01730756
Pr( 1 | 0,1,1 ) = 0.48681055 , StdDev = 0.01730756
Pr( 0 | 1,0,0 ) = 0.52278177 , StdDev = 0.02445968
Pr( 1 | 1,0,0 ) = 0.47721823 , StdDev = 0.02445968
Pr( 0 | 1,0,1 ) = 0.00000000 , StdDev = 0.00000000
Pr( 1 | 1,0,1 ) = 1.00000000 , StdDev = 0.00000000
Pr( 0 | 1,1,0 ) = 0.50000000 , StdDev = 0.01731358
Pr( 1 | 1,1,0 ) = 0.50000000 , StdDev = 0.01731358
Pr( 0 | 1,1,1 ) = 0.25046268 , StdDev = 0.01076159
Pr( 1 | 1,1,1 ) = 0.74953732 , StdDev = 0.01076159
```

And we can obtain a `MealyHMM`

of this inferred Markov
chain as well.

```
>>> m_MLE = mc.generate_MealyHMM(method='MLE')
```

So let’s compare the entropy rates:

```
>>> m.entropy_rate()
0.66666666666666663
>>> m_MLE.entropy_rate()
0.77176896639047399
```

So we see that this inferred model overestimates the true entropy rate.

### Posterior Mean Estimate¶

We can (and should) also make use of the Bayesian estimate. The technique incorporates the priors that were defaultly set to uniform.

```
>>> mc.prior_type
'uniform'
>>> print mc.prior_string()
alpha(0,0,0 -> *) = 2
alpha(0,0,0 -> 0) = 1
alpha(0,0,0 -> 1) = 1
alpha(0,0,1 -> *) = 2
alpha(0,0,1 -> 0) = 1
alpha(0,0,1 -> 1) = 1
alpha(0,1,0 -> *) = 2
alpha(0,1,0 -> 0) = 1
alpha(0,1,0 -> 1) = 1
alpha(0,1,1 -> *) = 2
alpha(0,1,1 -> 0) = 1
alpha(0,1,1 -> 1) = 1
alpha(1,0,0 -> *) = 2
alpha(1,0,0 -> 0) = 1
alpha(1,0,0 -> 1) = 1
alpha(1,0,1 -> *) = 2
alpha(1,0,1 -> 0) = 1
alpha(1,0,1 -> 1) = 1
alpha(1,1,0 -> *) = 2
alpha(1,1,0 -> 0) = 1
alpha(1,1,0 -> 1) = 1
alpha(1,1,1 -> *) = 2
alpha(1,1,1 -> 0) = 1
alpha(1,1,1 -> 1) = 1
```

The transition probabilities are:

```
>>> print mc.transition_probability_PME_string()
Pr( 0 | 0,0,0 ) = 0.52287582 , StdDev = 0.02328821
Pr( 1 | 0,0,0 ) = 0.47712418 , StdDev = 0.02328821
Pr( 0 | 0,0,1 ) = 0.00238663 , StdDev = 0.00238095
Pr( 1 | 0,0,1 ) = 0.99761337 , StdDev = 0.00238095
Pr( 0 | 0,1,0 ) = 0.50000000 , StdDev = 0.28867513
Pr( 1 | 0,1,0 ) = 0.50000000 , StdDev = 0.28867513
Pr( 0 | 0,1,1 ) = 0.51315789 , StdDev = 0.01727654
Pr( 1 | 0,1,1 ) = 0.48684211 , StdDev = 0.01727654
Pr( 0 | 1,0,0 ) = 0.52267303 , StdDev = 0.02437241
Pr( 1 | 1,0,0 ) = 0.47732697 , StdDev = 0.02437241
Pr( 0 | 1,0,1 ) = 0.00238663 , StdDev = 0.00238095
Pr( 1 | 1,0,1 ) = 0.99761337 , StdDev = 0.00238095
Pr( 0 | 1,1,0 ) = 0.50000000 , StdDev = 0.01728253
Pr( 1 | 1,1,0 ) = 0.50000000 , StdDev = 0.01728253
Pr( 0 | 1,1,1 ) = 0.25077018 , StdDev = 0.01075604
Pr( 1 | 1,1,1 ) = 0.74922982 , StdDev = 0.01075604
```

In particular, note that transitions which had no counts are not necessarily zero. Let’s compare the entropy rate of this inferred Markov chain.

```
>>> m_PME = mc.generate_MealyHMM(method='PME')
>>> m_PME.entropy_rate()
0.77583338814575054
```

Note, you can visualize the machine with:

```
>>> m.draw()
```

However, to draw the comparison back to the original diagram, the above probabilities have been manually overlayed.

### Sampling From The Posterior Distribution¶

One nice feature of the Bayesian approach is that we can sample from the posterior distribution. Now, we use this to generate a histogram of entropy rates.

```
>>> spmc = bmc.SamplePosteriorMC(inferMC=mc)
>>> spmc.histogram_entropy_rate(1000)
```

If you are following along in an IPython session with Matplotlib interactive drawing enabled, a histogram will pop up. If you are typing this in a script, add the following to the end of your script:

```
>>> import matplotlib.pyplot as plt
>>> plt.show()
```

In any event, the histogram should look (something) like this: