Epsilon Machine Inference TutorialΒΆ

This tutorial will walk you through how to infer an Epsilon Machine (eM) from a string of data using two inference methods available in CMPy, Subtree Merging (also called simply Tree Merging, Topological Merging or TM) and Causal State-Splitting Reconstruction (also called CSSR – pronounced “scissor” – or State Splitting or SS).

First we create a string of data from our favorite process, in this case the Even process. To do this we import the Even process from CMPy and create a data string of 10,000 symbols:

from cmpy.machines import Even
data = Even().symbols(100000)

Next we import the State Splitter from the inference submodule:

from cmpy.inference import Splitter

The Splitter function will take a list of symbols and return an Epsilon Machine that we call ssem below. The main parameter Splitter is history_length, which we keep as the default value of 3:

ssem = Splitter(data)

Now we try inferring the process using Tree Merging. Similarly, we first import the Tree Merger, or Topological Merger:

from cmpy.inference import TopologicalMerger

We feed the same data to the Tree Merger that we used for the Splitter and it returns an Epsilon Machine that we call tmem below. TopologicalMerger has 2 main parameters, morph_length and tree_depth. Here we set morph_length to 3 and, omitting a tree_depth value, it will be determined for us automatically:

tmem = TopologicalMerger(data, 3)

Now we can check various values of the inferred machine, such as number of recurrent states, entropy rate and statistical complexity.