bayesianmc – Markov Chain Inference

The cmpy.inference.bayesianmc submodule provides a fully Bayesian framework for inferring order-r Markov chains from a sequence of observations. It also provides model comparision techniques for Markov chains of various orders. The general method is described in [strelioff2007].

Introduction

In this document, we will walk through a sample inference, model comparison, and sampling of the posterior distribution. Throughout, we’ll assume the user has the following imports:

>>> from __future__ import division
>>> import cmpy.machines as cmm
>>> import cmpy.inference.bayesianmc as bmc

Let’s begin by generating some data from the Even process.

>>> m = cmm.Even()
>>> data = m.symbols(5000)

One noteworthy feature of the Even process is that it is it cannot be represented by any finite order Markov chain. One can easily verify this in CMPy via:

>>> m.markov_order()
inf

However, suppose we didn’t know the symbols in data had come from the Even process and further, that we believed the data to have been generated from a order-3 Markov process. Then, a natural next step would be to infer a 3rd order Markov chain from the data, and cmpy.inference.bayesianmc is just the right tool for that task.

Inferring A Markov Chain

The basic building block of Markov chain inference is the InferMC class. It treats each word of length L=\text{order} as a state and considers words of length L=\text{order}+1 to determine transitions between the states.

For example, consider the string 0010. We start with the first three symbols 001 and by shifting to the right, we transition to 010 on symbol 0. We take this dynamic over words as the dynamic over states. Visually, we can see this as the following graph structure:

../../_images/mc3_tr.png

The goal of the inference algorithm is to assign probabilities to the edges, such that the total probability of the edges leaving a particular state is 1. The probabilities are generally obtained by counting the number of times a word, for example 001, is followed by a 0 or a 1.

The simplest way to infer a Markov chain from this data is to create an instance of InferMC:

>>> alphabet = m.alphabet()
>>> order = 3
>>> mc = bmc.InferMC(alphabet, data, order)

Immediately, the data sequence is scanned to obtain counts of words and their transitions to other words. We can view these counts too. Note, your results will be different from what appears below:

>>> print mc.counts_string()
n(0,0,0 -> *) = 457
n(0,0,0 -> 0) = 239
n(0,0,0 -> 1) = 218
n(0,0,1 -> *) = 417
n(0,0,1 -> 1) = 417
n(0,1,1 -> *) = 834
n(0,1,1 -> 0) = 428
n(0,1,1 -> 1) = 406
n(1,0,0 -> *) = 417
n(1,0,0 -> 0) = 218
n(1,0,0 -> 1) = 199
n(1,0,1 -> *) = 417
n(1,0,1 -> 1) = 417
n(1,1,0 -> *) = 834
n(1,1,0 -> 0) = 417
n(1,1,0 -> 1) = 417
n(1,1,1 -> *) = 1621
n(1,1,1 -> 0) = 406
n(1,1,1 -> 1) = 1215

Maximum Likelihood Estimate

In the non-Bayesian approach, one would simply take those counts to compute probabilities for the transitions. This corresponds to a maximum likelihood estimate. For example, we can ask for MLE transition probability of of a 0 following 000.

>>> 239/457
0.52297592997811815
>>> mc.get_transition_probability_MLE( ('0', '0', '0'), '0' )
(0.52297592997811815, 0.00054589082416113923)

The second quantity is the variance, as computed by a Gaussian approximation of the log likelihood. We can obtain all probabilities by:

>>> print mc.transition_probability_MLE_string()
Pr( 0 | 0,0,0 ) = 0.52297593 , StdDev = 0.02336431
Pr( 1 | 0,0,0 ) = 0.47702407 , StdDev = 0.02336431
Pr( 0 | 0,0,1 ) = 0.00000000 , StdDev = 0.00000000
Pr( 1 | 0,0,1 ) = 1.00000000 , StdDev = 0.00000000
Pr( 0 | 0,1,0 ) = 0.00000000 , StdDev = 0.00000000
Pr( 1 | 0,1,0 ) = 0.00000000 , StdDev = 0.00000000
Pr( 0 | 0,1,1 ) = 0.51318945 , StdDev = 0.01730756
Pr( 1 | 0,1,1 ) = 0.48681055 , StdDev = 0.01730756
Pr( 0 | 1,0,0 ) = 0.52278177 , StdDev = 0.02445968
Pr( 1 | 1,0,0 ) = 0.47721823 , StdDev = 0.02445968
Pr( 0 | 1,0,1 ) = 0.00000000 , StdDev = 0.00000000
Pr( 1 | 1,0,1 ) = 1.00000000 , StdDev = 0.00000000
Pr( 0 | 1,1,0 ) = 0.50000000 , StdDev = 0.01731358
Pr( 1 | 1,1,0 ) = 0.50000000 , StdDev = 0.01731358
Pr( 0 | 1,1,1 ) = 0.25046268 , StdDev = 0.01076159
Pr( 1 | 1,1,1 ) = 0.74953732 , StdDev = 0.01076159

And we can obtain a MealyHMM of this inferred Markov chain as well.

>>> m_MLE = mc.generate_MealyHMM(method='MLE')

So let’s compare the entropy rates:

>>> m.entropy_rate()
0.66666666666666663
>>> m_MLE.entropy_rate()
0.77176896639047399

So we see that this inferred model overestimates the true entropy rate.

Posterior Mean Estimate

We can (and should) also make use of the Bayesian estimate. The technique incorporates the priors that were defaultly set to uniform.

>>> mc.prior_type
'uniform'
>>> print mc.prior_string()
alpha(0,0,0 -> *) = 2
alpha(0,0,0 -> 0) = 1
alpha(0,0,0 -> 1) = 1
alpha(0,0,1 -> *) = 2
alpha(0,0,1 -> 0) = 1
alpha(0,0,1 -> 1) = 1
alpha(0,1,0 -> *) = 2
alpha(0,1,0 -> 0) = 1
alpha(0,1,0 -> 1) = 1
alpha(0,1,1 -> *) = 2
alpha(0,1,1 -> 0) = 1
alpha(0,1,1 -> 1) = 1
alpha(1,0,0 -> *) = 2
alpha(1,0,0 -> 0) = 1
alpha(1,0,0 -> 1) = 1
alpha(1,0,1 -> *) = 2
alpha(1,0,1 -> 0) = 1
alpha(1,0,1 -> 1) = 1
alpha(1,1,0 -> *) = 2
alpha(1,1,0 -> 0) = 1
alpha(1,1,0 -> 1) = 1
alpha(1,1,1 -> *) = 2
alpha(1,1,1 -> 0) = 1
alpha(1,1,1 -> 1) = 1

The transition probabilities are:

>>> print mc.transition_probability_PME_string()
Pr( 0 | 0,0,0 ) = 0.52287582 , StdDev = 0.02328821
Pr( 1 | 0,0,0 ) = 0.47712418 , StdDev = 0.02328821
Pr( 0 | 0,0,1 ) = 0.00238663 , StdDev = 0.00238095
Pr( 1 | 0,0,1 ) = 0.99761337 , StdDev = 0.00238095
Pr( 0 | 0,1,0 ) = 0.50000000 , StdDev = 0.28867513
Pr( 1 | 0,1,0 ) = 0.50000000 , StdDev = 0.28867513
Pr( 0 | 0,1,1 ) = 0.51315789 , StdDev = 0.01727654
Pr( 1 | 0,1,1 ) = 0.48684211 , StdDev = 0.01727654
Pr( 0 | 1,0,0 ) = 0.52267303 , StdDev = 0.02437241
Pr( 1 | 1,0,0 ) = 0.47732697 , StdDev = 0.02437241
Pr( 0 | 1,0,1 ) = 0.00238663 , StdDev = 0.00238095
Pr( 1 | 1,0,1 ) = 0.99761337 , StdDev = 0.00238095
Pr( 0 | 1,1,0 ) = 0.50000000 , StdDev = 0.01728253
Pr( 1 | 1,1,0 ) = 0.50000000 , StdDev = 0.01728253
Pr( 0 | 1,1,1 ) = 0.25077018 , StdDev = 0.01075604
Pr( 1 | 1,1,1 ) = 0.74922982 , StdDev = 0.01075604

In particular, note that transitions which had no counts are not necessarily zero. Let’s compare the entropy rate of this inferred Markov chain.

>>> m_PME = mc.generate_MealyHMM(method='PME')
>>> m_PME.entropy_rate()
0.77583338814575054

Note, you can visualize the machine with:

>>> m.draw()

However, to draw the comparison back to the original diagram, the above probabilities have been manually overlayed.

../../_images/even_inferred.png

Sampling From The Posterior Distribution

One nice feature of the Bayesian approach is that we can sample from the posterior distribution. Now, we use this to generate a histogram of entropy rates.

>>> spmc = bmc.SamplePosteriorMC(inferMC=mc)
>>> spmc.histogram_entropy_rate(1000)

If you are following along in an IPython session with Matplotlib interactive drawing enabled, a histogram will pop up. If you are typing this in a script, add the following to the end of your script:

>>> import matplotlib.pyplot as plt
>>> plt.show()

In any event, the histogram should look (something) like this:

../../_images/even_histogram.png