Machines Documentation

This page contains the Machines Package documentation.

Subpackages

The dfa Module

Simple DFA class.

class cmpy.machines.dfa.DFA(data=None, name='', **attr)

Bases: cmpy.machines.base.BaseMachine

Machine where symbols are output on the edge transitions.

Methods

add_accept_node(n) Sets the start node.
alphabet([u, v]) Returns the alphabet of the machine.
alphabet_by_node([succ]) Returns the alphabet for each node.
complement() Returns a DFA accepting the complement language of the DFA.
complete([reject, copy]) Returns the DFA as a complete DFA.
delta(ns, word) Returns the final state after seeing word from nodes ns.
get_accept_nodes() Returns the set of accept nodes.
intersection(other[, minimize, relabel_nodes]) Returns a DFA accepting the intersection of the languages.
is_deterministic() Returns True if the DFA is deterministic.
is_empty() Returns True if the DFA’s language is empty and False if not.
is_equivalent(other[, minimize]) Returns True is the languages of self and other are equal.
is_isomorphic(other[, style]) Returns True if the machine is isomorphic to m.
is_minimal() Returns True if the DFA is minimal.
is_subset(other) Returns True if the DFA language is a subset the language of other.
is_unifilar() Returns True if the DFA is unifilar.
minimize([mapping]) Returns a minimized version of the DFA.
reduce([copy]) Reduces the machine so that it contains only the recurrent nodes.
relabel_symbols(mapping) Changes the alaphabet of the machine, in-place.
remove_accept_node(n) Removes n from the set of accept nodes.
reverse([copy]) Reverses the machine so that it generates the reverse language.
set_accept_nodes(nbunch) Sets the set of accept nodes to the nodes in nbunch.
set_start_node(n) Sets the start node.
add_accept_node(n)

Sets the start node.

Parameters:

n : node

The node that will be set as the start node.

Raises:

CMPyException :

When n is not in the machine.

alphabet(u=None, v=None)

Returns the alphabet of the machine.

The alphabet between the from node u and the to node v is dynamically constructed by examining every edge between those nodes. In the default call, the alphabet of the entire machine is returned. Note that this is not the same as the alphabet from all nodes accessible from the start node.

Parameters:

u : node

The “from” node. If u is None, then all nodes are considered as “from” nodes. The result is that the alphabet over all incoming edges to v is constructed. Thus, specifying u and leaving v as None, will give the outgoing alphabet from node u.

v : node

The “to” node. If v is None, then all nodes are considered as “to” nodes. The result is that the alphabet over all outgoing edges to u is constructed. Thus, specifying v and leaving u as None, will give the incoming alphabet to node v.

Returns:

a : set

The alphabet of the machine.

Raises:

AlphabetException :

When the alphabet cannot be obtained from the machine.

alphabet_by_node(succ=True)

Returns the alphabet for each node.

Parameters:

succ : bool

If True, then we return the outgoing alphabet for each node. If False, then we return the incoming alphabet for each node.

Returns:

a : dict

A dictionary keys by nodes in the machine. The values are sets containing all symbols that can be seen going out from (or into) the node.

complement()

Returns a DFA accepting the complement language of the DFA.

Returns:

m : DFA

The complement DFA.

complete(reject='REJECT', copy=True)

Returns the DFA as a complete DFA.

Parameters:

reject : str

The name given to the reject node.

copy : bool

If True, then a copy of the machine is made and returned. If False, the machine is updated in-place.

Returns:

m : DFA

The complete DFA.

delta(ns, word)

Returns the final state after seeing word from nodes ns.

Parameters:

ns : nodes

The nodes from which an attempt is made to recognize the word.

word : word

An iterable of words from the alphabet, used to advance the from nodes ns.

Returns:

nns : {nodes}

The nodes the DFA is in after seeing word from nodes ns. If the word is not recognized, then an empty set is returned.

edge_attr_aliases = {'i': 'output', 'input': 'output', 'o': 'output'}
edge_attr_required = set(['output'])
get_accept_nodes()

Returns the set of accept nodes.

graph_attr_aliases = {}
graph_attr_required = set([])
intersection(other, minimize=True, relabel_nodes=True)

Returns a DFA accepting the intersection of the languages.

Parameters:

other : DFA

The DFAs whose language will be intersected with.

minimize : bool

If True, then we return the minimal DFA recognizing the intersection of the languages recognized by the two DFAs.

relabel_nodes : bool

If True, then the nodes are relabeled to integers.

Returns:

m : DFA

The DFA which recognizes the intersection language.

is_deterministic()

Returns True if the DFA is deterministic.

A DFA is deterministic if, given the current state, there is only one next state associated with each next symbol.

Notes

Determinism is also know as ‘unifilarity’ in information theory literature and as ‘right-resolving’ in symbolic dynamics literature.

is_empty()

Returns True if the DFA’s language is empty and False if not.

is_equivalent(other, minimize=True)

Returns True is the languages of self and other are equal.

Parameters:

other : DFA

The machine to be tested against for equivalence.

minimize : bool

We test if the minimal DFA for each machine are isomorphic. When minimize is True, then the input machines are not assumed to be minimal and are minimized before the isomorphism check.

Returns:

equiv : bool

True if the DFAs are equivalent and False otherwise.

is_isomorphic(other, style='node_with_symbol')

Returns True if the machine is isomorphic to m.

Parameters:

other : DFA

The machine to be tested against for an isomorphism.

style : str

The type of isomorphism to search for. Acceptable options are below.

‘node’:

Checks for a node isomorphism and ignores the symbols appearing on each edge. So the Golden Mean process is isomorphic to the Even process with this type of check.

‘node_with_symbol’ :

Checks for a node isomorphism but also demands that the output symbols on each edge match. So the Golden Mean process is not isomorphic to the Even process with this type of check.

‘nodesymbol’ :

Checks for a simultaneous node and symbol isomorphism.

Returns:

iso : bool

True if the machine is isomorphic to m and False otherwise.

is_minimal()

Returns True if the DFA is minimal.

Returns:

minimal : bool

If the DFA is minimal, then minimal is True. Otherwise, False.

is_subset(other)

Returns True if the DFA language is a subset the language of other.

is_unifilar()

Returns True if the DFA is unifilar.

Since unifilar is another name for ‘deterministic’, this is True for all DFAs.

minimize(mapping=False)

Returns a minimized version of the DFA.

The minimization method follows the Hopcroft algorithm. Worst-case running time is O(kn log(n)) where k is the alphabet size and n is the number of nodes in the DFA.

Parameters:

m : DFA

The DFA which will be minimized.

mapping : bool

If True, then the mapping from nodes in the original machine to nodes in the minimized machine is returned.

Returns:

mm : DFA

The minimized DFA.

mapp : dict

The mapping from nodes in m to nodes in the minimized DFA. This is returned only if map is True.

node_attr_aliases = {}
node_attr_required = set([])
reduce(copy=True)

Reduces the machine so that it contains only the recurrent nodes.

Parameters:

copy : bool

When True, we return a new machine. When False, we modify the machine in place, removing every node except the recurrent nodes.

Returns:

m : machine

The irreducible (or recurrent) portion of the original machine.

relabel_symbols(mapping)

Changes the alaphabet of the machine, in-place.

Parameters:

mapping : dict

A dictionary mapping old symbols to new symbols.

remove_accept_node(n)

Removes n from the set of accept nodes.

No exception is raised if n in the set of accept nodes.

Parameters:

n : node

The node that will be removed from the set of accept nodes.

reverse(copy=True)

Reverses the machine so that it generates the reverse language.

Parameters:

copy : bool

When True, we return a new machine. When False, we modify the machine in-place.

Returns:

m : machine

The reversed machine.

TODO: make start and accept states copy over. likely requires adjusting :

base to allow for multiple start nodes. This should be in an NFA base :

class – NFAs allow for de facto multiple start states via empty :

transitions. :

set_accept_nodes(nbunch)

Sets the set of accept nodes to the nodes in nbunch.

Parameters:

nbunch : nodes

A iterable of nodes that will become the new set of accept nodes.

Raises:

CMPyException :

When a node n is in nbunch but not in the machine.

set_start_node(n)

Sets the start node.

Parameters:

n : node

The node that will be set as the start node.

The mealyhmm Module

Mealy Hidden Markov Models

Mealy hidden Markov models output symbols on the edges.

class cmpy.machines.mealyhmm.MealyHMM(data=None, name='', **attr)

Bases: cmpy.machines.hmm.HMM

Machine where symbols are output on the edge transitions.

Methods

allowable_words([nodes, sort]) A generator of sets of allowable words from nodes.
alphabet([u, v]) Returns the alphabet of the machine.
averaged_distribution([ndist, logs]) Returns the averaged distribution.
averaged_entropy() Returns the entropy of the averaged distribution.
block_entropies(lengths[, logs, ndist, ...]) Returns the block entropies for the specified word lengths.
cmech_quantities(quantities, maxL) Returns an array of quantities from computational mechanics.
cryptic_order() Returns the cryptic order of the process.
delta(n, word) Returns the set of current nodes after seeing word from node n.
distributions_iter([n, sdist]) Evolves and yields the current distribution n times.
emission_matrix([node_ordering, symbol_ordering]) Returns the node-to-symbol emission matrix.
entropy_rate([ndist, max_length, internal]) Returns the process entropy rate.
excess_entropy([max_length]) Returns the excess entropy of the process.
group_nodes(nodes[, name, style]) Groups every node in nodes into a single node.
is_asymptotic_sync() Returns True if the machine is asymptotically synchronizing.
is_counifilar() Returns True if the machine is counifilar and False if not.
is_detailed_balance() Returns True if the machine is in detailed balance.
is_equal_process(other[, ndist1, ndist2, ...]) Returns True if the generated process languages are the same.
is_ergodic([ndist, weak]) Returns True if the process is ergodic.
is_exactly_sync() Returns True if the machine is exactly synchronizable.
is_isomorphic(machine[, style]) Returns True if the machine is isomorphic to m.
is_minimal()
is_minimal_unifilar() Returns True if the machine is the minimial unifilar presentation.
is_periodic([ndist, eventually, max_length]) Returns the True if the process is periodic and False if not.
is_reversible([micro, max_length]) Returns True if the process language is causally reversible.
is_stationary([ndist, rtol, atol]) Returns True if the generated process language is stationary.
is_strictly_sofic() Returns True if the support of the machine is strictly sofic.
is_unifilar() Returns True if the machine is unifilar and False if not.
labeled_transition_matrices([node_ordering, ...]) Returns the labeled transition matrices.
merge_nodes() Returns a new machine with causally equivalent nodes merged.
minimize_unifilar([reduce, mapping]) Returns a minimized version the MealyHMM.
minsync_words() Return an iterator over the minimal synchronizing words.
period([ndist, max_length]) Returns the period of the generated process.
probabilities_iter(lengths[, logs, sparse, ...]) Yields the probability of every word, for all lengths in lengths.
probability(word[, logs, start, method]) Returns the probability of a word.
relabel_symbols(mapping) Changes the alaphabet of the machine, in-place.
support() Returns a DFA recognizing the support of the MealyHMM.
symbol_topology() Returns the topology of the machine, keyed by symbols.
symbols(n[, out, prng]) Returns n symbols by evolving the HMM.
symbols_iter([n]) Yields n symbols by evolving the machine’s current node.
toMooreHMM()
to_recurrent_eM() Transforms this presentation into a recurrent epsilon machine, reducing and minimizing if necessary.
to_string() Export this machine to it’s string format, suitable so that self == from_string(self.to_string()) is True.
topological_entropy_rate() Returns the topological entropy rate of the underlying Markov chain.
word_distribution_from_path(m, path) Returns the distribution of observed words induced by internal state
word_transition_matrix(word[, node_ordering]) Returns the transition matrix for a word.
allowable_words(nodes=None, sort=False)

A generator of sets of allowable words from nodes.

Parameters:

nodes : set

A set of nodes in the machine, specifying the nodes from which the words should be generated. If None, then every node in the machine will be used.

sort : bool

If True, then the set of words at each length will be sorted.

Examples

Use as an interator and break out after L=5. >>> for L, words in m.allowable_words(sort=True): >>> ... print words >>> ... if L == 5: >>> ... break >>> ...

Save the generator and explicitly call next(). >>> m = cmpy.machines.Even() >>> x = m.allowable_words([‘B’], sort=True) >>> print x.next() 0, [()] >>> print x.next() 1, [(‘1’,)] >>> print x.next() 2, [(‘1’, ‘0’), (‘1’, ‘1’)]

alphabet(u=None, v=None)

Returns the alphabet of the machine.

The alphabet between the from node u and the to node v is dynamically constructed by examining every edge between those nodes. In the default call, the alphabet of the entire machine is returned.

Parameters:

u : node

The “from” node. If u is None, then all nodes are considered as “from” nodes. The result is that the alphabet over all incoming edges to v is constructed. Thus, specifying u and leaving v as None, will give the outgoing alphabet from node u.

v : node

The “to” node. If v is None, then all nodes are considered as “to” nodes. The result is that the alphabet over all outgoing edges to u is constructed. Thus, specifying v and leaving u as None, will give the incoming alphabet to node v.

Returns:

a : set

The alphabet of the machine.

Raises:

AlphabetException :

When the alphabet cannot be obtained from the machine.

averaged_distribution(ndist=None, logs=None)

Returns the averaged distribution.

Loosely, the averaged distribution is the appropriate generalization of the stationary distribution for nonergodic processes.

Let M be a machine, and let d be an initial node distribution.

  1. Given d, calculate the probability of each ergodic component.
  2. Calculate the stationary distribution of each ergodic component.

The averaged distribution is defined as the weighted stationary distribution of each ergodic component:

avg_d = ( Pr(pi_1 | d) pi_1, Pr(pi_2 | d) pi_2, ... )

Recall, the process language generated by M is specifed by d.

If the process is ergodic, then the averaged distribution is equal to the stationary distribution of the induced ergodic component. Note the emphasis on “induced”. The machine can have multiple attracting components, but if the initial node distribution is such that only one attracting component is visited, then the process is ergodic.

In the case of an ergodic, nonstationary process, one interpretation of the averaged distribution is that it is the time-averaged node distribution from d. As an example, consider the period-2 process with initial node distribution peaked at node ‘A’. The generated process is nonstationary, but the process is still (weakly) ergodic. If one were to take the time-average of node counts for any realiziation of the process, then this would yield the stationary distribution. Since the process is ergodic, another interpretation is as the ensemble- average of node counts over multiple realiziations of the process. Note, that it is not correct to interpret the averaged distribution as the eventual state distribution that d.T^n tends to, since that limit does not exist in this case. If, however, the process were strongly ergodic instead, then this would be another valid interpretation.

If the process generated by d is nonergodic, recall that there is not a unique stationary distribution. As an example, the stationary distribution of any attracting component, extended to the entire machine, would qualify as a stationary distribution for the larger machine. In fact, any mixture of the stationary distributions from each ergodic component would itself be a stationary distribution. This degeneracy is why stationary_distribution() fails on machines with multiple attracting components.

However, the initial node distribution d fixes the probabilties of each ergodic component. In effect, this removes the degeneracy, and means that given a machine with multiple attracting components and an initial node distribution, one can identify a unique stationary distribution which has the same ergodic component occupation probabilities as the initial distribution induces. This is the averaged distribution. One can think of it as the ensemble-averaged node counts.

Parameters:

ndist : Distribution | LogDistribution | node | None

An initial node distribution, a node in the machine, or None. If None, then an attempt is made to grab one from the machine.

logs : bool, None

Specifies if logarithms should be used in the calculations of the averaged distribution. If None, then the default value is taken from cmpyParams[‘logs’].

averaged_entropy()

Returns the entropy of the averaged distribution.

block_entropies(lengths, logs=None, ndist=None, method=None, fprob_method=None, HX0LSL=False, condS0=False)

Returns the block entropies for the specified word lengths.

Parameters:

lengths : list

A list of word lengths. For each word length, we calculate the block entropy of all words of that length.

logs : bool, None

Specifies if logarithms should be used in the calculations of the block entropies. If None, then the default value is taken from cmpyParams[‘logs’].

Returns:

be : NumPy array

An array of block entropies of shape (L,) where L=len(lengths). The order of the entropies will match the order of the specified lengths in lengths. If method == ‘msp’ and HX0LSL is True, then the shape is (2,L,) and the first row represents the standard block entropies H[X_0^L] while the second row represents H[X_0^L, S_L].

cmech_quantities(quantities, maxL)

Returns an array of quantities from computational mechanics.

When a quantity is a function of the word length (e.g. block entropy), then its values at each L will be returned. When a quantity is a scalar value, then its estimates as a function of L will be returned.

Parameters:

quantities : [str]

A list of strings from the following list, specifying the desired quantities that will be returned at each iteration. In the list below, the states S are to be interpreted as states in the initial node distribution. Usually, this means that they are recurrent states (but not necessarily recurrent causal states).

‘HX0L’ - block entropy

H[X_0^L]

‘HX0LSL’ - block-state entropy

H[X_0^L, S_L]

‘HS0X0L’ - state-block entropy

H[S_0, X_0^L]

‘HX0LgS0’ - block entropy conditioned on state

H[X_0^L | S_0]

‘HX0LSLgS0’ - block-state entropy conditioned on state

H[X_0^L, S_L | S_0]

‘MSE’ - asymptotic mixed state entropy

H[X_L | (R_L | R_0 = \pi)]

‘El’ - lower bound estimate of excess entropy

H[X_0^L] - L h_\mu

‘Eu’ - upper bound estimate of excess entropy

H[X_0^L, S_L] - L h_\mu

‘T’ - transient information estimate

\sum_{l=0}^L ( E + l h_\mu - H[X_0^l] )

‘G’ - total predictability estimate

\sum_{l=2}^L ( \Delta^2 H(l) )

‘S’ - synchronization information estimate

\sum_{l=0}^L H[S_l|X_0^l]

‘hmu’ - entropy rate estimate

H[X_{0:L}] - H[X_{0:L-1}]
:math:`H[X_L | X_0^L] = H[X_0^L] - H[X_0^{L-1}]

‘pg’ - predictability gain

\Delta^2 H(L)

‘sL’ - synchronization at L

H[S_L|X_0^L]

‘In’ - nth block entropy convergence integral,

For example: ‘I0’, ‘I5’

‘Jn’ - nth block-state entropy convergence integral

For example: ‘J0’, ‘J5’

‘Kn’ - nth state-block entropy convergence integral

For example: ‘K0’, ‘K5’

‘phi’ - gauge information estimate

H[R_L] - E - chi - zeta

‘zeta’ - oracular information estimate

h_\mu L - H[X_0^L | R_0]

‘chi’ - crypticity estimate

h_\mu L - H[X_0^L | R_L]

‘HX0L~’ - block entropy asymptotic behavior

E + h_\mu L

‘HX0LSL~’ - block-state entropy asymptotic behavior

H[S] - I[S_0; X_{-\infty:0} | X_{0:\infty}] + h_\mu L

‘HS0X0L~’ - state-block entropy asymptotic behavior

H[S] - I[R_0; X_{0:\infty} | X_{-\infty:0}] + h_\mu L

‘hmuL` - entropy rate line

h_\mu L

maxL : int

The maximum L value used to compute the quantities.

Returns:

estimates : NumPy masked array, shape (len(quantities), maxL+1)

The estimates of the requested quantites at the current value of L. For quantities which are discrete derivatives, the first few values may be masked.

cryptic_order()

Returns the cryptic order of the process.

Returns:

k : int, inf

The process’s cryptic order.

Notes

The machine must be able to be put into a finite, standard epsilon machine form.

delta(n, word)

Returns the set of current nodes after seeing word from node n.

Parameters:

n : node, set of nodes

The node from which word is seen. If n is a set, then we see word from each node in the set.

word : word

An iterable of symbols from the alphabet, used to advance the from node n.

Returns:

nodes : set

The set of nodes the machine is in after seeing word from n (or from each node in the set n). If the word occurs with zero probability, then the empty set is returned.

distributions_iter(n=None, sdist=None)

Evolves and yields the current distribution n times.

If the machine has no current distribution, one is choosen randomly. The distribution is evolved using a symbol distribution which describes the probability of observing a particular symbol.

For example, if x is the current distribution and T0 and T1 are the labeled transition matrices for symbols 0 and 1, then a possible symbol distribution is: {0:.75, 1:.25}. That is, the symbol distribution specifies the belief that there is a 75% chance that the next symbol we see is a 0 and a 25% chance that the next symbol is a 1. Then, x is updated as follows:

x’ = x * (.75 T0 + .25 T1)

and is then renormalized. In the default scenario, each labeled transition matrix is weighted equally:

x’ = x * (.5 * T0 + .5 * T1)

which, after renormalization, is equivalent to:

x’ = x * (T0 + T1) = x * T
Parameters:

n : int, iterable, None

The number of times to evolve the current distribution by the node transition matrix. If n is a tuple, then the current distribution is evolved by treating n’ as a word. For each symbol in the word, the current distribution is multiplied by the corresponding labeled transition matrix. This is akin to specifying `sdist as ‘symbol’, but different in that the exact symbols used during the evolution are specified explicitly. If n is None, then the current distribution will be evolved indefinitely by the node transition matrix. This parameter takes precedence over sdist. Specifying both will result in only n being considered.

sdist : Distribution, LogDistribution, ‘symbol’, None

If sdist equals ‘symbol’, then n symbols are generated and the symbol distribution is set to match the symbol that was generated. For example, if 0 is generated, then the symbol distribution used to update the current distribution is {0:1.0}.

If sdist is None, then the current distribution is updated by node transition matrix. This is equivalent to giving equal weight to each symbol in the alphabet.

Notes

The current distribution is returned. If you want to save it, you should make a copy of it, since it will be updated when the current distribution is evolved again.

edge_attr_aliases = {'p': 'probability', 'value': 'probability', 'o': 'output'}
edge_attr_required = set(['output', 'probability'])
emission_matrix(node_ordering=None, symbol_ordering=None)

Returns the node-to-symbol emission matrix.

The node-to-symbol emission matrix is an n-by-x matrix where n is the number of nodes and x is the number of symbols in the alphabet. The ij entry represents the probability that symbol j is seen from node i.

Parameters:

node_ordering : list

A list of the nodes in the machine, which specifies the order of the rows in the matrix.

symbol_ordering : list

A list of the symbols in the machine’s alphabet, specifying the order of the columns in the matrix.

Returns:

em : NumPy array

The constructed emission matrix.

norder : list

The node ordering used to construct the rows of the matrix. This is returned only if node_ordering is None.

sorder : list

The symbol ordering used to construct the columns of the matrix. This is returned only if symbol_ordering is None.

entropy_rate(ndist=None, max_length=30, internal=False)

Returns the process entropy rate.

Calculates the entropy rate using:

h_\mu = H[X|S]
      = -\sum_i \pi_i \sum_{jk} T_{ij}^k \log_2 \sum_j T_{ij}^k

Note, this formula holds only when the machine is unifilar and it is not neccessary that the machine be the minimal unifilar presentation. If the machine is not unifilar, then an attempt is made to construct the eM.

The entropy rate for the internal markov chain is:

h_\mu = H[S'|S]
      = -\sum_i \pi_i \sum_{jk} T_{ij}^k \log_2 \sum_{jk} T_{ij}^k

Parameters:

ndist : distribution | node | None

The initial node distribution. If None, then an attempt is made to a preset or default distribution.

max_length : int

The maximum length used to construct the eM if the machine is not unifilar.

internal : bool

If True, then the entropy rate of the internal Markov chain is calculated. If False, then the entropy rate of the hidden Markov model is calculated.

Returns:

er : float

The entropy rate.

Raises:

NonunifilarException :

When the machine is not unifilar.

excess_entropy(max_length=30)

Returns the excess entropy of the process.

The excess entropy is calculated by E = H[S^+,S^-] where S^+ represents the forward causal states and S^- represents the reverse causal states.

The eM of the machine is first constructed.

Parameters:

max_length : int

The maximum word length used in attempts to construct the recurrent portion of the forward and reverse epsilon machines.

Returns:

ee : float

The excess entropy of the process.

Raises:

CMPyException :

When the recurrent portion of the forward or reverse epsilon machines was not found after reaching L=`max_length`. As long as the forward and reverse processes are exactly synchronizable, increasing the value of max_length will eventually work. If the process is not exactly sychronizable, then no value of max_length will work.

graph_attr_aliases = {}
graph_attr_required = set([])
group_nodes(nodes, name=None, style=0)

Groups every node in nodes into a single node.

This is an in-place operation and the process is not, in general, preserved by this operation. Unifilarity is not preserved either.

Suppose an inference algorithm failed to distinguish nodes A and B. Then to gain an idea about the theoretically ‘’best’’ inferred machine, one can group nodes A and B together and study the properties of that machine.

Parameters:

nodes : list

A list of nodes that will be group.

name : hashable | None

The name to give the grouped nodes. If None, then tuple(nodes) is used as the name.

style : int

An integer representing the manner in which the nodes are grouped. Valid options are:

0 Transition probabilities are weighted by the relative

probability of the node. For example, merging nodes A and B in the Even process yields:

Pr(0|AB) = Pr(0,A|A)*relPr(A in AB) = 1/3 Pr(1|AB) = Pr(1,B|A)*relPr(A in AB)

  • Pr(1,A|B)*relPr(B in AB) = 2/3
1 Transition probabilties are simply renormalized by giving

each edge the same weight. For example, merging nodes A and B in the Even process yields:

Pr(0|AB) = [ Pr(0,A|A) ] / Z = 1/4 Pr(1|AB) = [Pr(1,B|A) + Pr(1,A|A) ] / Z = 3/4

where Z = Pr(0,A|A) + Pr(1,B|A) + Pr(1,A|A).

Examples

>>> m = machines.Odd()
>>> m.group_nodes(['B','C'])
is_asymptotic_sync()

Returns True if the machine is asymptotically synchronizing.

Note: This is a statement about the presentation not the process.

is_counifilar()

Returns True if the machine is counifilar and False if not.

A machine is counifilar if, given the current state and previous symbol, the previous state is known with certainty. In the labeled transition matrices, each column of each labeled transition matrix can have only one non-zero element.

Formally, a machine is co-unifilar iff:
H[S_0|X_0,S_1] = 0
When the machine is co-unifilar,
H[S_0|X_0,X_1,...] = 0 H[X_{-L},...X_{-1}|S_0] = L H[X_{-1}|S_0] = L H[X_0|S_1]

Notes

Counifilarity is known as ‘co-determinism’ in finite state automata theory and as ‘left-resolving’ in symbolic dynamics literature.

is_detailed_balance()

Returns True if the machine is in detailed balance.

If the machine is in detailed balance, then it is microscopically reversible as well.

is_equal_process(other, ndist1=None, ndist2=None, rtol=None, atol=None)

Returns True if the generated process languages are the same.

Crudely, if n is the maximum of the number of states between this machine and other, then the generated process languages are equal if they agree on the probability of every word of length less 2n.

This algorithm relies on constructing history and future word lists, which only works with standard (non-log) probabilities. Note: Stationarity isnot required.

Parameters:

ndist1 : distribution | node | None

The initial node distribution of self.

ndist2 : distribution | node | None

The initial node distribution of other.

rtol : None

The relative tolerance used to compare probabilities. If None, then cmpyParams[‘rtol’] is used.

atol : None

The absolute tolerance used to compare probabilities. If None, then cmpyParams[‘atol’] is used.

Returns:

equal : bool

True if the generated process languages are equal. False if not.

References

[upper1997]

is_ergodic(ndist=None, weak=True)

Returns True if the process is ergodic.

A process generated by a finite-state machine with initial node distribution ndist is weakly ergodic if there exists a single recurrent component. For a machine with multiple attracting comonents, this corresponds to every node in the space-averaged node distribution (aka ensemble-averaged node distribution) being part of the same attracting component. Any machine with a single attracting component is automatically weakly ergodic. Practically, weak ergodicity means that the time-averaged probability of any word or node is the same from any initial node distribution.

A process generated by a finite-state machine from initial node distribution ndist is strongly ergodic if the process is weakly ergodic and if the internal Markov chain is aperiodic. This definition is perhaps nonstandard, but relates to the fact that “ergodic” Markov chains are required to be aperiodic. Practically, strong ergodicity additionally means that any initial node distribution will converge to the unique stationary distribution when evolving it by the node transition matrix.

Parameters:

ndist : distribution | node | None

The initial distribution over the nodes. This distribution defines the process that is generated by the finite-state machine. The distribution should be a formal distribution object or a node in the machine. If None, then an attempt is made to grab an “obvious” distribution from the machine. See self._get_ndist().

weak : bool

If True, then we verify weak ergodicity. If False, then we verify strong ergodicity.

is_exactly_sync()

Returns True if the machine is exactly synchronizable.

is_isomorphic(machine, style='node_with_symbol')

Returns True if the machine is isomorphic to m.

Parameters:

m : MealyHMM

The machines being tested for an isomorphism.

style : str

The type of isomorphism to search for. Acceptable options are below.

‘node’:

Checks for a node isomorphism and ignores the symbols appearing on each edge. So the Golden Mean process is isomorphic to the Even process with this type of check.

‘node_with_symbol’ :

Checks for a node isomorphism but also demands that the output symbols on each edge match. So the Golden Mean process is not isomorphic to the Even process with this type of check.

‘nodesymbol’ :

Checks for a simultaneous node and symbol isomorphism.

Returns:

iso : bool

True if the machine is isomorphic to m and False otherwise.

is_minimal()
is_minimal_unifilar()

Returns True if the machine is the minimial unifilar presentation.

is_periodic(ndist=None, eventually=True, max_length=30)

Returns the True if the process is periodic and False if not.

Note: This method is not concerned with the periodicity of a node.

There are (at least) two notions of periodicity to consider.

eventually periodic periodic

Intuitively, it should be clear what is meant be each of these notions, but in implementation, it turns out that the notion you want will depend on your interpretation of the process. Most of the time, eventually periodic, as implemented here, will correspond to what is desired; hence, it is the default option.

Given a machine and initial distribution d, the generated process language is eventually periodic if the entropy rate is zero. The generated process langauge is periodic if there is no stochasticity in the generation of the language.

To begin, consider the eM of the period-2 process:

start state is T T -> A on 0 with p=1/2 T -> B on 1 with p=1/2 A -> B on 1 with p=1 B -> A on 0 with p=1

If one interpets the transient states merely as a representation of the user’s knowledge (of phase in this case), then the transients do not correspond to the actual generation of the process. That is, the process has been running forever and there has never and will never be any stochasticity in the generation. With this interpretation, the transient states are somewhat of an artifact, and the stochasticity present in the transient states should not detract from us concluding that this process is “periodic”.

However, if one interprets the transient states as actually specifying what happens when you start the process, then it is clear that one begins with a coin flip and then the process generates without stochasticity. In this interpretation, the transient states are real, and the stochasticity in the them forces us to conclude that the process is “eventually periodic”.

Here, we take the latter view. Namely, the process generated by the above structure and initial start state is “eventually periodic”. The reason we do this is that we’d like to also consider processes like:

M1 M2 start state is A start state is A A -> B on 1 with p=1 A -> A on 1 with p=1/2 B -> C on 2 with p=1 A -> B on 0 with p=1/2 C -> C on 3 with p=1 B -> B on 1 with p=1

M3 start state is A A -> B on 1 with p=1 B -> A on 0 with p=1

M1 and M2 are eventually periodic while M3 is periodic. Note that the transients in M1 and M2 do not correspond to the user’s state of knowledge. Rather, they represent real transient behavior.

Parameters:

ndist : distribution or node

The distribution from which the process language is generated. If None, then an attempt will be made to obtain a default. See _get_ndist().

eventually : bool

Specifies whether to check if the process is eventually periodic or periodic.

max_length : int

The maximum length to consider when constructing the eM.

Returns:

isp : bool

Boolean specifying if the process is periodic or not.

Raises:

CMPyException :

Raised when the periodicity of the process language could not be determined. If the eM has an infinite number of states, then you might be out of luck, but generally, it should not be neccessary to construct the entire machine.

is_reversible(micro=False, max_length=None)

Returns True if the process language is causally reversible.

Parameters:

micro : bool

If True, then we test if the process lanaguage is microscopically reversible. Otherwise, we test if the process language is causally reversible.

max_length : int | None

When checking if the process language is causally reversible, we must build the forward and reverse eMs. This parameter specifies the maximum word length used to build the machine.

Returns:

rev : bool

True is the process language is causally (or microscopically) reversible and False otherwise.

Raises:

CMPyException :

Raised when the forward or reverse eMs could not be built. Try increasing max_length, but it is possible that the eMs have an infinite number of recurrent causal states.

is_stationary(ndist=None, rtol=None, atol=None)

Returns True if the generated process language is stationary.

Parameters:

ndist : distribution

The initial node distribution from which the process language is generated.

Returns:

s : bool

True is the generated process language is stationary. False if the generated process language is not stationary.

is_strictly_sofic()

Returns True if the support of the machine is strictly sofic.

is_unifilar()

Returns True if the machine is unifilar and False if not.

A machine is unifilar if, given the current state and current symbol, the next state is known with certainty. In the labeled transition matrices, each row of each labeled transition matrix can have only one non-zero element.

Formally, a machine is unifilar iff:
H[S_1|S_0,X_0] = 0
When the machine is unifilar,
H[S_0|...X_{-2},X_{-1}] = 0 H[X_0,...,X_{L-1}|S_0] = L H[X_0|S_0]

Notes

Unifilarity is also known as ‘determinism’ in finite state automata theory and as ‘right-resolving’ in symbolic dynamics literature.

labeled_transition_matrices(node_ordering=None, logs=False, cdist=False)

Returns the labeled transition matrices.

Each element in the matrix is given by:

T_{ij}^k = Pr(j,k|i)

where i and j are nodes and k is an output symbol. The value specifies the probability of seeing symbol k and transitioning to node j, given that the current state is node i.

There is one matrix for each symbol in the alphabet.

Parameters:

node_ordering : list

An ordering of nodes in the machine.

logs : bool

If True, then the base-2 log of the labeled transitions matrices is returned.

cdist : bool

If True, then return the labeled transition matrices as a formal conditional distribution from cmpy.infotheory.

Returns:

ltm : dict

A dictionary of NumPy arrays keyed by output symbols. The values are the labeled transition matrices.

norder : list

List of nodes, specifying the order of the rows/cols of the labeled transition matrices. This is returned only if node_ordering is None.

Raises:

CMPyException :

If a node in node_ordering does not exist in the machine.

merge_nodes()

Returns a new machine with causally equivalent nodes merged.

If the machine is unifilar, then all causally equivalent nodes will be merged. If the machine is not unifilar, then only some of the causally equivalent nodes are merged.

In detail, the algorithm is similar to DFA algorithms for identifying equivalent nodes in that it inductively identifies nodes which are distinct. Afterwards, any undistinguished nodes are causally equivalent. The difference from DFA algorithms is that probability, in addition to topology, can distinguish nodes.

Nodes u and v are distinct iff either of the following are true:

  1. Their future morphs are the distinct.

\Pr(X_0,X_1\ldots|S_0=u) \neq \Pr(X_0,X_1\ldots|S_0=v)

2) Their outgoing transitions are the distinct.

\Pr(X_0, S_1|S_0=u) = \Pr(X_0, S_1|S_0=v)

minimize_unifilar(reduce=True, mapping=False)

Returns a minimized version the MealyHMM.

The algorithm will merge states which have the same morphs. To obtain the minimal unifilar MealyHMM, the machine must be irreducible.

Parameters:

reduce : bool

If True, the machine is made irreducible before minimization.

mapping : bool

If True, then the mapping from nodes in the original machine to nodes in the minimized machine is returned.

Returns:

m : MealyHMM

The minimal unifilar MealyHMM.

mapp : dict

The mapping from nodes to nodes in the minimized machine.

Notes

The minimization method follows the Hopcroft algorithm. Worst-case running time is O(kn log(n)) where k is the alphabet size and n is the number of nodes in the machine.

minsync_words()

Return an iterator over the minimal synchronizing words.

The words are iterated in lexicographical order.

Notes

This iterator is potentially infinite, so don’t do list(machine.minsync_words()) unless you know what you’re doing.

node_attr_aliases = {}
node_attr_required = set([])
period(ndist=None, max_length=30)

Returns the period of the generated process.

Note: This method is not concerned with the period of a node.

If the process is aperiodic, then np.inf is returned.

For machines with multiple recurrent components, the periodics of each recurrent component will be weighted by the probability of the recurrent component. So the returned period might not be an integer, and if there is one recurrent component that is aperiodic, then it will be infinite.

Parameters:

ndist : distribution | node | None

The initial distribution over the nodes. This distribution defines the process that is generated by the finite-state machine. The distribution should be a formal distribution object or a node in the machine. If None, then an attempt is made to grab an “obvious” distribution from the machine. See self._get_ndist().

max_length : int

The maximum length to consider when determining if the process is eventually periodic.

Returns:

period : float

The average period of the process. The average is taken over all recurrent components.

periodicities : dict

A dictionary keyed by recurrent components with keys of the form (p, period) where ‘p’ specifies the probability of the recurrent component and ‘period’ is the period of that recurrent component.

probabilities_iter(lengths, logs=None, sparse=None, store=False, start=None, startfp=None, method=None, norder=None)

Yields the probability of every word, for all lengths in lengths.

Parameters:

lengths : list

A list of integers, specifying the desired word lengths. For each word length, we will calculate the probability of every word.

logs : bool

When True, we compute the log probabilities of every word, and when False, we compute standard probabilities. When None, we use the value from cmpyParams[‘logs’].

sparse : bool

When True, only words with probabilities greater than 0 (or -inf) are included in the return value. When False, every word will be included. When None, we use the value from cmpyParams[‘sparse’].

probability(word, logs=None, start=None, method=None)

Returns the probability of a word.

Parameters:

word : iterable

The word whose probability will be computed.

logs : bool

When True, we compute the log probabilities of every word, and when False, we compute standard probabilities. When None, we use the value from cmpyParams[‘logs’].

Returns:

p : float

The probability of the word.

relabel_symbols(mapping)

Changes the alaphabet of the machine, in-place.

Parameters:

mapping : dict

A dictionary mapping old symbols to new symbols.

support()

Returns a DFA recognizing the support of the MealyHMM.

Given a MealyHMM, the support of the generated a process language is the set of all words with Pr(w) > 0. This function returns a DFA which recognizes all words in the support of the process.

Note: This algorithm does not return the minimal DFA recognizing the support.

Returns:

dfa : DFA

The DFA recognizing the support of the generated process language.

symbol_topology()

Returns the topology of the machine, keyed by symbols.

Returns:

st : dict

A dictionary keyed by symbols. Each value is a dictionary which maps “from” nodes to a set of “to” nodes which can be reached by following an edge which outputs the keyed symbol.

symbols(n, out=None, prng=None)

Returns n symbols by evolving the HMM.

If the machine has no current node, then one is choosen randomly according to the stationary distribution.

Note: This implementation is read-only. The final node of the machine after the method is called will equal the initial node of the machine when this method was first called.

Parameters:

n : int

The number of symbols to generate.

out : container

A list or NumPy array which is filled during iteration. If None, then we use a list since symbols need not be of the same type. However, if you know that your symbols are all integers or strings of the same length, then it might be useful to pass in a NumPy array instead.

prng : random number generator

A random number generator which returns `k’ random numbers when `prng.rand(k)’ is called. If unspecified, then we use the random number generator at cmpy.math.prng.

Returns:

s : list

The list of generated symbols.

symbols_iter(n=None)

Yields n symbols by evolving the machine’s current node.

If the machine has no current node, then one is choosen randomly according to the stationary distribution

Note: This implementation is not read-only. That is the node of the machine is updated as symbols are generated and the final node of the machine will not neccessarily be the same as when this method was first called.

Parameters:

n : int, None

The number of symbols to generate. If n is None, then the iterator will generate symbols indefinitely.

Notes

The algorithm builds a dictionary which stores the machine node/edge strucutre. This structure is assumed to stay the same throughout the yields.

toMooreHMM()
to_recurrent_eM()

Transforms this presentation into a recurrent epsilon machine, reducing and minimizing if necessary.

Returns:

m : RecurrentEpsilonMachine

The recurrent epsilon machine form of this process.

to_string()

Export this machine to it’s string format, suitable so that self == from_string(self.to_string()) is True.

Returns:

string: str :

the machine as a string

topological_entropy_rate()

Returns the topological entropy rate of the underlying Markov chain.

The topological entropy rate is defined as:

h = lim_{Lrightarrowinfty} 1/L log_2 N(L)

where N(L) is the number of allowable words of length L. It is a well-known result that the topological entropy rate equals the logarithm of the largest eigenvalue of the adjacency matrix. For a proof, see:

D. Lind and B. Marcus. An Introduction to Symbolic Dynamics and Coding. Cambridge University Press, 1995.

Notes

This does not compute the topological entropy rate of the HMM.

word_distribution_from_path(m, path)

Returns the distribution of observed words induced by internal state sequence path

Parameters:

path: list :

A list of nodes representing a path in the machine.

Returns:

word_dist: Distribution :

The distribution of words induced by path.

word_transition_matrix(word, node_ordering=None)

Returns the transition matrix for a word.

For each symbol in the word, we construct the product of the labeled transition matrices.

Parameters:

word : iterable

The word used to construc the word transition matrix.

node_ordering : list, None

An ordering of the nodes in the graph. The ordering determines the order of the rows and columns in the array. If None, then self.nodes() is used.

Returns:

wtm : NumPy array

The word transition matrix.

norder : list

The ordering used to construct the array. This is returned only if node_ordering was None.

class cmpy.machines.mealyhmm.MarkovChain(data=None, name='', **attr)

Bases: cmpy.machines.mealyhmm.MealyHMM

Methods

allowable_words([nodes, sort]) A generator of sets of allowable words from nodes.
alphabet([u, v]) Returns the alphabet of the machine.
averaged_distribution([ndist, logs]) Returns the averaged distribution.
averaged_entropy() Returns the entropy of the averaged distribution.
block_entropies(lengths[, logs, ndist, ...]) Returns the block entropies for the specified word lengths.
cmech_quantities(quantities, maxL) Returns an array of quantities from computational mechanics.
cryptic_order() Returns the cryptic order of the process.
delta(n, word) Returns the set of current nodes after seeing word from node n.
distributions_iter([n, sdist]) Evolves and yields the current distribution n times.
emission_matrix([node_ordering, symbol_ordering]) Returns the node-to-symbol emission matrix.
entropy_rate([ndist, max_length, internal]) Returns the process entropy rate.
excess_entropy([max_length]) Returns the excess entropy of the process.
group_nodes(nodes[, name, style]) Groups every node in nodes into a single node.
is_asymptotic_sync() Returns True if the machine is asymptotically synchronizing.
is_counifilar() Returns True if the machine is counifilar and False if not.
is_detailed_balance() Returns True if the machine is in detailed balance.
is_equal_process(other[, ndist1, ndist2, ...]) Returns True if the generated process languages are the same.
is_ergodic([ndist, weak]) Returns True if the process is ergodic.
is_exactly_sync() Returns True if the machine is exactly synchronizable.
is_isomorphic(machine[, style]) Returns True if the machine is isomorphic to m.
is_minimal()
is_minimal_unifilar() Returns True if the machine is the minimial unifilar presentation.
is_periodic([ndist, eventually, max_length]) Returns the True if the process is periodic and False if not.
is_reversible([micro, max_length]) Returns True if the process language is causally reversible.
is_stationary([ndist, rtol, atol]) Returns True if the generated process language is stationary.
is_strictly_sofic() Returns True if the support of the machine is strictly sofic.
is_unifilar() Returns True if the machine is unifilar and False if not.
labeled_transition_matrices([node_ordering, ...]) Returns the labeled transition matrices.
merge_nodes() Returns a new machine with causally equivalent nodes merged.
minimize_unifilar([reduce, mapping]) Returns a minimized version the MealyHMM.
minsync_words() Return an iterator over the minimal synchronizing words.
period([ndist, max_length]) Returns the period of the generated process.
probabilities_iter(lengths[, logs, sparse, ...]) Yields the probability of every word, for all lengths in lengths.
probability(word[, logs, start, method]) Returns the probability of a word.
relabel_symbols(mapping) Changes the alaphabet of the machine, in-place.
support() Returns a DFA recognizing the support of the MealyHMM.
symbol_topology() Returns the topology of the machine, keyed by symbols.
symbols(n[, out, prng]) Returns n symbols by evolving the HMM.
symbols_iter([n]) Yields n symbols by evolving the machine’s current node.
toMooreHMM()
to_recurrent_eM() Transforms this presentation into a recurrent epsilon machine, reducing and minimizing if necessary.
to_string() Export this machine to it’s string format, suitable so that self == from_string(self.to_string()) is True.
topological_entropy_rate() Returns the topological entropy rate of the underlying Markov chain.
word_distribution_from_path(m, path) Returns the distribution of observed words induced by internal state
word_transition_matrix(word[, node_ordering]) Returns the transition matrix for a word.

The epsilonmachine Module

Epsilon Machine

class cmpy.machines.epsilonmachine.BaseEpsilonMachine

Bases: object

Methods

cryptic_order() Returns the cryptic order of the process.
crypticity([max_length]) Returns the crypticity of the process.
excess_entropy([max_length, tolerance]) Returns the excess entropy of the process.
excess_entropy_bounds(lengths[, return_be]) Returns lower and upper bounds for the excess entropy.
excess_entropy_exact([max_length]) Returns the excess entropy of the process.
is_markov() Returns True if the Markov order of the process is finite.
markov_order() Returns the Markov order of the process.
predictive_information_rate([N]) Returns the predictive information rate, b_mu, of the process [abdallah2010].
predictive_information_rate_exact() Returns the predictive information rate, b_mu, of the process [abdallah2010].
residual_entropy_rate([N]) Returns the residual entropy rate, r_mu, of the process [abdallah2010].
residual_entropy_rate_exact() Returns the residual entropy rate, r_mu, of the process [abdallah2010].
reverse([copy]) Reverses the machine so that it generates the ‘reverse’ process.
setcomplexity() Returns the set complexity of the process language.
statistical_complexity() Returns the statistical complexity of the process.
to_mealyhmm([copy]) Return this same presentation, but as a MealyHMM.
cryptic_order()

Returns the cryptic order of the process.

Returns:

k : int, inf

The process’s cryptic order.

Notes

The machine must be able to be put into a finite, standard epsilon machine form.

crypticity(max_length=30)

Returns the crypticity of the process.

The crypticity of the process is defined as H[S|Future]. The calculation of crypticity depends on the calculation of excess entropy. See the docstring for excess_entropy for more details.

Parameters:

max_length : int

The maximum word length used in attempts to construct the recurrent portion of the reverse epsilon machine.

Returns:

chi : float

The crypticity of the process.

Raises:

CMPyException :

Raised when the recurrent portion of the reverse epsilon machine was not found after reaching L=`max_length`. As long as the reverse process is exactly synchronizable, increasing the value of max_length will eventually work. If the process is not exactly sychronizable, then no value of max_length will work.

NonunifilarException :

Raised when the machine is not unifilar, as required.

excess_entropy(max_length=30, tolerance=1e-05)

Returns the excess entropy of the process.

Parameters:

max_length : int

The maximum length to use when computing the excess entropy, either via the exact method or via block entropies.

excess_entropy_bounds(lengths, return_be=False)

Returns lower and upper bounds for the excess entropy.

The lower and upper bounds on the excess entropy are given by:

H[X_0^L] - L h_\mu \leq E \leq H[X_0^L, S_L]

where S_L is the causal state at time t=L whose probabilities are given by the stationary distribution.

When lengths is a list of integers, we compute the lower and upper bounds for every length in lengths. If lengths is a float, it specifies the desired stopping difference. When the difference between the bounds is less than the specified difference, we stop at that particular L and return the array of bounds.

Note: If the two curves are equal, it doesn’t necessarily mean that we have converged to the value of the excess entropy. It simply means that the two curves give the same estimate.

Parameters:

lengths : [int], float

Either a list of integers or a float.

return_be : bool

If True, then we also return the computed block entropies.

Returns:

ee : NumPy array, shape (2,L)

The array of lower and upper bounds on the excess entropy. L is equal to len(lengths) if lengths was a list. If lengths was a float, then L is the length at which the difference between the upper and lower bounds became less then the specified float. The i`th column corresponds to the block entropies at `lengths[i]. The rows (0,1) correspond to (lower bound, upper bound).

be : NumPy array, shape (2,L)

The block entropies that were computed to obtain the bounds on the excess entropy. The first row corresponds to H[X_0^L] and the second row corresponds to H[X_0^L, S_L].

Raises:

TypeError :

Raised if lengths could not be converted to either a list of lengths or a float.

NonunifilarException :

Raised when the machine is not unifilar, as required.

See also

excess_entropy

excess_entropy_exact(max_length=30)

Returns the excess entropy of the process.

The excess entropy is calculated by E = H[S^+,S^-] where S^+ represents the forward causal states and S^- represents the reverse causal states. The calculation proceeds via the mixed state algorithm. Presently, the method is guaranteed to succeed only if:

  1. the presentation is unifilar
  2. the presentation is irreducible
  3. the reverse process is exactly synchronizable

Also, there is (currently) no way of knowing if the reverse process is exactly synchronizable. This is due to the non-unifilarity of the time-reversed presentation. When the reverse process is not exactly synchronizable, it is still possible to get good estimates of the excess entropy. See excess_entropy_bounds().

Parameters:

max_length : int

The maximum word length used in attempts to construct the recurrent portion of the reverse epsilon machine.

Returns:

ee : float

The excess entropy of the process.

Raises:

CMPyException :

When the recurrent portion of the reverse epsilon machine was not found after reaching L=`max_length`. As long as the reverse process is exactly synchronizable, increasing the value of max_length will eventually work. If the process is not exactly sychronizable, then no value of max_length will work.

NonunifilarException :

Raised when the machine is not unifilar, as required.

is_markov()

Returns True if the Markov order of the process is finite.

markov_order()

Returns the Markov order of the process.

Returns:

R : int, inf

The Markov order of the process.

Notes

The machine must have a finite number of recurrent states for this method to work.

predictive_information_rate(N=10)

Returns the predictive information rate, b_mu, of the process [abdallah2010].

b_mu = I[ X_0 : X_{1:infty} | X_{-infty:0} ]

Returns:

bmu : float

The predictive information rate of the process.

predictive_information_rate_exact()

Returns the predictive information rate, b_mu, of the process [abdallah2010].

b_mu = I[ X_0 : X_{1:infty} | X_{-infty:0} ]

Returns:

bmu : float

The predictive information rate of the process.

Raises:

CMPyException : :

If either the forward or reverse process does not have a finite epsilon machine.

residual_entropy_rate(N=10)

Returns the residual entropy rate, r_mu, of the process [abdallah2010].

r_mu = H[ X_0 | X_{-infity:0}, X_{1:infty}]

Returns:

rmu : float

The residual entropy rate of the process.

residual_entropy_rate_exact()

Returns the residual entropy rate, r_mu, of the process [abdallah2010].

r_mu = H[ X_0 | X_{-infity:0}, X_{1:infty}]

Returns:

rmu : float

The residual entropy rate of the process.

Raises:

CMPyException : :

If either the forward or reverse process does not have a finite epsilon machine.

reverse(copy=True)

Reverses the machine so that it generates the ‘reverse’ process.

Parameters:

copy : bool

When True, we return a new machine. When False, we modify the machine in-place.

Returns:

m : MealyHMM

The reversed machine.

Notes

The reverse machine is not, generally, the recurrent part of an epsilon machine. Thus, m is returned as a MealyHMM.

setcomplexity()

Returns the set complexity of the process language.

Let w = xs be a semi-infinite word ending in symbol s. The set complexity is the average Shannon entropy stored about x in order to determine if s results in w being a valid word in the language.

The set complexity was defined by Grassberger in ...

statistical_complexity()

Returns the statistical complexity of the process.

Returns:

sc : float

The statistical complexity of the process.

Notes

Because the machine is an EpsilonMachine instance, we assume that the machine is the minimal unifilar presentation of the process.

to_mealyhmm(copy=True)

Return this same presentation, but as a MealyHMM.

Parameters:

copy : bool

If True, then a copy of the machine is made. Otherwise, the class is dynamically changed to MealyHMM.

Returns:

m : MealyHMM

A machine which generating the same process.

class cmpy.machines.epsilonmachine.EpsilonMachine(data=None, name='', **attr)

Bases: cmpy.machines.epsilonmachine.BaseEpsilonMachine, cmpy.machines.mealyhmm.MealyHMM

EpsilonMachine

Reducible, merged presentations of a process.

This presentation contains only causal states, both recurrent and transient.

Methods

causal_state(word) Returns the causal state induced by the specified word.
cryptic_order() Returns the cryptic order of the process.
crypticity([max_length]) Returns the crypticity of the process.
excess_entropy([max_length, tolerance]) Returns the excess entropy of the process.
excess_entropy_bounds(lengths[, return_be]) Returns lower and upper bounds for the excess entropy.
excess_entropy_exact([max_length]) Returns the excess entropy of the process.
is_markov() Returns True if the Markov order of the process is finite.
markov_order() Returns the Markov order of the process.
predictive_information_rate([N]) Returns the predictive information rate, b_mu, of the process [abdallah2010].
predictive_information_rate_exact() Returns the predictive information rate, b_mu, of the process [abdallah2010].
reduce([copy]) Returns a new machine containing only the recurrent nodes.
residual_entropy_rate([N]) Returns the residual entropy rate, r_mu, of the process [abdallah2010].
residual_entropy_rate_exact() Returns the residual entropy rate, r_mu, of the process [abdallah2010].
reverse([copy]) Reverses the machine so that it generates the ‘reverse’ process.
setcomplexity() Returns the set complexity of the process language.
statistical_complexity() Returns the statistical complexity of the process.
to_mealyhmm([copy]) Return this same presentation, but as a MealyHMM.
causal_state(word)

Returns the causal state induced by the specified word.

Parameters:

word : iterable

The word used to induce the causal state.

Returns:

cs : node

The causal state induced by the specified word.

Raises:

ForbiddenWord :

Raised if the word is forbidden.

reduce(copy=True)

Returns a new machine containing only the recurrent nodes.

Parameters:

copy : bool

If True, then a deepcopy of the machine is made. If False, then the transient nodes are removed and the class is dynamically changed to RecurrentEpsilonMachine

Returns:

m : machine

The recurrent portion of the machine.

Notes

If the number of transient nodes is much larger than the number of recurrent nodes, then a copy is generally quicker. If the number of recurrent nodes is much larger than the number of transient nodes, then it is generally quicker to remove the transient nodes in place (so set copy to False).

class cmpy.machines.epsilonmachine.RecurrentEpsilonMachine(data=None, name='', **attr)

Bases: cmpy.machines.epsilonmachine.BaseEpsilonMachine, cmpy.machines.mealyhmm.MealyHMM

Recurrent Epsilon Machine

This class represents the recurrent portion of an epsilon machine. This is particularly important since many epsilon machines have an infinite number of transient states. Note, this includes eMs with multiple recurrent components.

Each state in a Recurrent Epsilon Machine represents an equivalence class over infinite pasts.

Methods

cryptic_order() Returns the cryptic order of the process.
crypticity([max_length]) Returns the crypticity of the process.
excess_entropy([max_length, tolerance]) Returns the excess entropy of the process.
excess_entropy_bounds(lengths[, return_be]) Returns lower and upper bounds for the excess entropy.
excess_entropy_exact([max_length]) Returns the excess entropy of the process.
is_markov() Returns True if the Markov order of the process is finite.
is_minimal_unifilar() Returns True if the machine is the minimial unifilar presentation.
markov_order() Returns the Markov order of the process.
predictive_information_rate([N]) Returns the predictive information rate, b_mu, of the process [abdallah2010].
predictive_information_rate_exact() Returns the predictive information rate, b_mu, of the process [abdallah2010].
reduce([copy]) Returns a new copy of self.
residual_entropy_rate([N]) Returns the residual entropy rate, r_mu, of the process [abdallah2010].
residual_entropy_rate_exact() Returns the residual entropy rate, r_mu, of the process [abdallah2010].
reverse([copy]) Reverses the machine so that it generates the ‘reverse’ process.
setcomplexity() Returns the set complexity of the process language.
statistical_complexity() Returns the statistical complexity of the process.
to_mealyhmm([copy]) Return this same presentation, but as a MealyHMM.
is_minimal_unifilar()

Returns True if the machine is the minimial unifilar presentation.

Note that an eM with transient states is not the smallest unifilar presentation of the process.

reduce(copy=True)

Returns a new copy of self.

Returns:

m : RecurrentEpsilonMachine

The recurrent portion of the machine.

The hmm Module

Base class for hidden Markov models.

class cmpy.machines.hmm.HMM(data=None, name='', **attr)

Bases: cmpy.machines.stochasticmachine.StochasticMachine

Base class for hidden Markov models.

Methods

conditional_probabilities(lengths[, logs, ...]) Returns list of dictionaries, representing conditional distributions.
conditional_probability(word, condition[, ...]) Returns Pr(word|condition).
entropy_rate_estimates(lengths[, method, logs]) Returns estimates of the entropy rate.
info([n]) Prints out summary information about the machine.
probabilities(lengths[, logs, ndist, ...]) Returns the probability of every word, for all lengths in lengths.
reverse([copy]) Reverses the machine so that it generates the ‘reverse’ process.
symbol() Returns a generated symbol.
symbols(n) Returns n symbols by evolving the HMM.
alphabet(*args, **kwargs)

Returns the alphabet of the HMM.

conditional_probabilities(lengths, logs=None, ndist=None, sparse=None)

Returns list of dictionaries, representing conditional distributions.

Each conditional distribution has the following semantics:
Pr(future_word | history_word)
Parameters:

lengths : list

A list of 2-tuples. Each element in the list specifies a desired conditional distribution. Each element should be a 2-tuple, where the first element specifies the history word length and the second element specifies the future word length. For example, (1,2) means that the conditional distribution will be over future words of length two conditioned on all history words of length one.

logs : bool

When True, we compute the log probabilities of every word, and when False, we compute standard probabilities. When None, we use the value from cmpyParams[‘logs’].

ndist : Distribution, LogDistribution

Probabilities are calculated from a node distribution. Typically, this is the stationary distribution. So when ndist is None, we calculate the stationary probabilities. Optionally, one may specify a different distribution.

sparse : bool

When True, only words with probabilities greater than 0 (or -inf) are included in the return value. When False, every word will be included. When None, we use the value from cmpyParams[‘sparse’].

Returns:

cdists : list

The calculated conditional distributions. The distributions will be in the order of the length tuples specified in lengths. Each distribution is a dictionary keyed by history words. The values are dictionaries keyed by future words and the values are probabilities.

conditional_probability(word, condition, logs=None, ndist=None)

Returns Pr(word|condition).

Parameters:

word : iterable

The word in Pr(word|condition).

condition : iterable

The condition in Pr(word|condition).

logs : bool, None

Specifies if logarithms should be used in the calculations of the conditional probability. If None, then the default value is taken from cmpyParams[‘logs’].

ndist : Distribution, LogDistribution

Probabilities are calculated from a node distribution. Typically, this is the stationary distribution. So when ndist is None, we calculate from the stationary probabilities. Optionally, one may specify a different distribution.

Returns:

cp : float

The conditional probability.

Raises:

ZeroDivisionError :

If Pr(condition) = 0.

entropy_rate(*args, **kwargs)

Returns the entropy rate.

entropy_rate_estimates(lengths, method='diff', logs=None)

Returns estimates of the entropy rate.

Parameters:

lengths : list

A list of word lengths. For each word length, we calculate the entropy rate estimate at that word length.

method : str

A specification of the method used to compute the estimates. Valid methods are:

‘diff’ : h(L) = H(L) - H(L-1) ‘ratio’ : h’(L) = H(L) / L

Note that the difference method is more accurate:

h <= h(L) <= h’(L)

where h is the true entropy rate.

logs : bool

If True, then logarithms are used in the calculation of the block entropies. If None, then we use cmpyParams[‘logs’].

Returns:

estimates : NumPy array

An array of entropy rate estimates. The order of the estimates will match the order of the specified lengths in lengths.

excess_entropy(*args, **kwargs)

Returns the excess entropy.

info(n=None, **kwargs)

Prints out summary information about the machine.

Parameters:

n : node

When not None, we print out specific information about a particular node.

logs : bool

When True, probabilities in the summary information are displayed as log probabilities. The default value is determined by cmpyParams[‘logs’].

probabilities(lengths, logs=None, ndist=None, sparse=None, method=None, norder=None, sort=False)

Returns the probability of every word, for all lengths in lengths.

Using a depth-first traversal, we efficiently calculate the probability of every word of every length less than some maximum length. By pruning the parse tree, we can achieve nice speedups.

Parameters:

lengths : list

A list of integers, specifying the desired word lengths. For each word length, we will calculate the probability of every word.

logs : bool

When True, we compute the log probabilities of every word, and when False, we compute standard probabilities. When None, we use the value from cmpyParams[‘logs’].

Returns:

probs : dict

The calculated probabilities.

probabilities_iter(*args, **kwargs)

Iterator of probabilties for all words of various lengths.

probability(*args, **kwargs)

Returns the probability of a word.

reverse(copy=True)

Reverses the machine so that it generates the ‘reverse’ process.

Nonrecurrent nodes will be removed. The initial distribution of the reversed machine will be set equal to averaged distribution of the original machine.

Parameters:

copy : bool

When True, we return a new machine. When False, we modify the machine in-place.

Returns:

m : machine

The reversed machine.

symbol()

Returns a generated symbol.

Note: This is not a read-only operation. The current node of the machine will be different after this call.

symbols(n)

Returns n symbols by evolving the HMM.

If the machine has no current node, then one is choosen randomly according to the stationary distribution.

Note: This implementation is read-only. The final node of the machine after the method is called will equal the initial node of the machine when this method was first called.

Parameters:

n : int

The number of symbols to generate.

Returns:

s : list

The list of generated symbols.

symbols_iter(*args, **kwargs)

Yields n symbols by evolving the machine’s current node.

The transducer Module

Base class for transducers.

class cmpy.machines.transducer.Transducer(data=None, name='', **attr)

Bases: cmpy.machines.stochasticmachine.StochasticMachine

Base class for transducers.

Methods

input_alphabet() Returns the input alphabet of the transducer.
output_alphabet() Returns the output alphabet of the transducer.
edge_attr_aliases = {'i': 'input', 'p': 'probability', 'value': 'probability', 'o': 'output'}
edge_attr_required = set(['input', 'probability', 'output'])
graph_attr_aliases = {}
graph_attr_required = set([])
input_alphabet()

Returns the input alphabet of the transducer.

input_machine(*args, **kwargs)

Returns the topological eM describing the input language.

node_attr_aliases = {}
node_attr_required = set([])
output_alphabet()

Returns the output alphabet of the transducer.

output_machine(*args, **kwargs)

Returns the topological eM describing the output language.

The stochasticmachine Module

The base stochastic machine class.

class cmpy.machines.stochasticmachine.StochasticMachine(data=None, name='', **attr)

Bases: cmpy.machines.base.BaseMachine

The base stochastic machine class for CMPy.

Attributes

edge_attr_required
graph_attr_required
node_attr_required

Methods

distribution() Updates and returns the current distribution.
distributions(n) Returns n distributions by evolving the current distribution.
distributions_iter([n]) Evolves and yields the current distribution n times.
get_current_distribution([logs]) Returns the current distribution.
get_current_node() Returns the current node.
get_start_distribution([logs]) Returns the start distribution.
get_sync_mode() Returns the sync mode.
is_irreducible() Returns True if the machine is irreducible and False if not.
node_transition_matrix([node_ordering, validate]) Returns the node transition matrix as a NumPy array.
randomize_current_distribution([logs, ...]) Randomizes the current distribution.
randomize_current_node([dist, validate]) Randomizes the current node.
recurrent_components([ndist, logs]) Returns the recurrent components as a distribution.
recurrent_nodes([ndist]) Returns the recurrent nodes of the machine.
reduce([recurrent, copy, create_using]) Reduces the machine by removing transient nodes.
relabel_nodes(mapping) Returns a copy of the machine with nodes relabeled.
renormalize_edges([style]) Renormalizes all the outgoing edge probabilities for each node.
set_current_distribution(dist[, logs]) Sets the current distribution.
set_current_node(n) Sets the current node.
set_start_distribution(d) Sets the start distribution.
set_sync_mode(sync) Sets the sync mode.
stationary_distribution([logs, sparse, ...]) Returns the stationary distribution.
stationary_entropy() Returns the entropy of the stationary distribution.
subgraph(nbunch[, copy, create_using]) Returns a node-induced subgraph of the graph.
synchronize([logs]) Synchronizes the current distribution to the current node.
topological_entropy_rate() Returns the topological entropy rate of the underlying Markov chain.
distribution()

Updates and returns the current distribution.

distributions(n)

Returns n distributions by evolving the current distribution.

Parameters:

n : int

The number of distributions to generate.

Returns:

d : list

The list of generated distributions.

distributions_iter(n=None)

Evolves and yields the current distribution n times.

If the machine has no current distribution, one is choosen randomly. The current distribution is evolved by multiplying the current distribution by the node transition matrix.

Parameters:

n : int, None

The number of times to evolve the current distribution. If n is None, then the current distribution will be evolved indefinitely.

Notes

The current distribution is returned. If you want to save it, you should make a copy of it, since it will be updated when the current distribution is evolved again.

get_current_distribution(logs=None)

Returns the current distribution.

Parameters:

logs : bool, None

If True, then the current distribution is returned as a log distribution. If False, then the current distribution is returned as a standard distribution. If None, then the current distribution is returned as it is currently stored.

Returns:

d : distribution

The current distribution.

get_current_node()

Returns the current node.

Returns:

n : node

The current node.

Notes

The current node is None only if it has never been set.

get_start_distribution(logs=None)

Returns the start distribution.

Parameters:

logs : bool, None

If True, then the start distribution is returned as a log distribution. If False, then the start distribution is returned as a standard distribution. If None, then the start distribution is returned as it is currently stored.

Returns:

d : distribution

The start distribution.

get_sync_mode()

Returns the sync mode.

If the sync mode is True (active), then the current distribution is synchronized to the current node.

Returns:

s : bool

The sync mode.

is_irreducible()

Returns True if the machine is irreducible and False if not.

A machine is irreducible if every node can be reached from every other node. That is, the machine must consist of a single strongly connected component. This is strictly a graph property, and any probability flow from the initial distribution is irrelevant.

node_transition_matrix(node_ordering=None, validate=False)

Returns the node transition matrix as a NumPy array.

Parameters:

node_ordering : list, None

An ordering of the nodes in the graph. The ordering determines the order of the rows and columns in the array. If None, then self.nodes() is used.

validate : bool

If True, then we verify that each row sum to 1. Note, the elements in the node transition matrix need not be between 0 and 1.

Returns:

m : NumPy array

The matrix whose elements describe the probability of moving between nodes in the HMM.

ordering : list

The ordering of the rows/columns in the matrix. This is returned only if node_ordering was None.

Raises:

InvalidProbabilitySum :

Raised when a node’s outgoing transitions do not sum to 1.

randomize_current_distribution(logs=None, event_type=<class 'cmpy.infotheory.events.Event'>)

Randomizes the current distribution.

The distribution is drawn uniformly from the space of all distributions.

Parameters:

logs : bool

Specifies if the current distribution should be a log or standard distribution. If None, then the distribution will match whatever type is already set in the HMM. If the HMM has never had a distribution, then the value of logs is obtained from cmpyParams[‘logs’].

Returns:

d : Distribution, LogDistribution

The randomly drawn distribution.

randomize_current_node(dist=None, validate=True)

Randomizes the current node.

Parameters:

dist : Distribution, LogDistribution

A distribution specifying how to draw the current node. If None, then we draw from the current distribution, provided it is not None. If the current distribution is NOne, then we draw from the stationary distribution.

validate : bool

Specifies whether the provided distribution should be validated before drawing from it.

Returns:

n : node

The updated current node.

recurrent_components(ndist=None, logs=None)

Returns the recurrent components as a distribution.

Parameters:

ndist : Distribution | LogDistribution | node | None

A distribution, a node in the machine, or None. If None, then an attempt is made to grab one from the machine.

logs : bool

Whether to return a Distribution or LogDistribution.

Returns:

dist : distribution

The distribution of the recurrent components.

recurrent_nodes(ndist=None)

Returns the recurrent nodes of the machine.

If the machine consists of a single attracting component, then the recurrent nodes will consist of all the nodes in that attracting component.

If the machine consists of multiple attracting components, then the recurrent nodes will consist of those nodes with positive probability in the averaged distribution.

Parameters:

ndist : Distribution | LogDistribution | node | None

A distribution, a node in the machine, or None. If None, then an attempt is made to grab one from the machine.

Returns:

nodes : list

The recurrent nodes, subject to the initial distribution.

reduce(recurrent=True, copy=True, create_using=None)

Reduces the machine by removing transient nodes.

If the machine consists of a single attracting component, then the resultant machine will contain only the nodes in that attracting component. If the machine consists of multiple attracting components, then there is a choice:

  1. Return the machine containing only those nodes with positive probability in the averaged distribution.
  2. Return the machine containing all attracting nodes.

If you reduce a nonergodic machine, then the start distribution will be set to the averaging distribution of the original machine.

Parameters:

recurrent : bool

If True, then option 1 is followed whenever there are multiple attracting components. If False, then option 2 is followed.

copy : bool

When True, we return a new machine. When False, we modify the machine in place, removing every node except the recurrent nodes.

create_using : callable

A callable which returns an instantiation of a machine class that is populated when `copy’ is True.

Returns:

m : machine

The recurrent portion of the original machine.

relabel_nodes(mapping)

Returns a copy of the machine with nodes relabeled.

Parameters:

mapping : dict

A dictionary mapping old nodes to new nodes.

renormalize_edges(style='as_is')

Renormalizes all the outgoing edge probabilities for each node.

Parameters:

style : str

The type of renormalization to perform. Valid options are:
‘as_is’

Use the current probabilities and simply renormalize them.

‘uniform’:

Probability is uniformly distributed per node.

Examples

>>> from cmpy import machines
>>> m = machines.Even()

Change the probabiltiies so that they are not normalized. Then renormalize.

>>> m['A']['A'][0]['p'] = 1
>>> m.renormalize_edges()

Now make the machine a topological machine.

>>> m.renormalize_edges(style='uniform')
set_current_distribution(dist, logs=None)

Sets the current distribution.

Parameters:

dist : Distribution, LogDistribution

A cmpy.infotheory distribution keyed by nodes in the HMM.

logs : bool

Specifies how the distribution should be stored. If True, then we store the distribution as a log distribution. If False, we store as a linear distribution. If None, we store it as is.

Returns:

d : Distribution, LogDistribution

The current distribution.

Raises:

CMPyException :

Raised when attempting to set an invalid distribution.

Notes

The current distribution is None only if it has never been set.

set_current_node(n)

Sets the current node.

Parameters:

n : node

The node to declare as the current node.

Raises:

CMPyException :

Raised when attempting to set the current node to a node which does not exist in the HMM.

Notes

The current node is None only if it has never been set.

set_start_distribution(d)

Sets the start distribution.

Parameters:

d : distribution

The distribution that will be set as the initial node distribution.

set_sync_mode(sync)

Sets the sync mode.

If the sync mode is True (active), then the current distribution is synchronized to the current node.

stationary_distribution(logs=None, sparse=None, node_ordering=None, meta=False, event_type=<class 'cmpy.infotheory.events.Event'>)

Returns the stationary distribution.

The stationary distribution is defined to be the distribution which satisfies d.T = d, where d is a distribution. This distribution is guaranteed to be unique if the machine is irreducible.

The stationary distribution is also unique for reducible machines which have only a single attracting component. However, now there can be other nonstationary distribuions which generate stationary process languages. To see this, consider any eM. One choice of a distribution which generates a stationary process is to place all probabilty in the start node. Another choice is to appropriately place all probability in the recurrent causal states. Although both distributions generate a stationary process, only the latter corresponds to a stationary distribution in the eigenvector sense.

When there are multiple attracting components, then the stationary distribution is not unique. In fact, the stationary distribution of any attracting component, extended to the entire machine, would qualify as a stationary distribution for the nonergodic machine. Indeed, any mixture of the stationary distributions from each ergodic component would itself be a stationary distribution. Given an initial node distribution, one can calculate a related distribution using averaged_distribution().

Note: the stationary distribution does not depend on the initial node distribution. Thus, periodic structures will have a stationary distribution, even though it is possible to choose an initial node distribution such that the generated process is not stationary. The stationary distribution simply tells us that if the initial node distribution is set equal to the stationary distribution, then the generated process will be stationary.

Parameters:

logs : bool, None

Specifies whether the returned distribution should be a log or standard distribution. If logs is None, then the value is taken from cmpyParams[‘logs’].

sparse : bool, None

Specifies whether nodes with zero probability should be included in the stationary distribution. If unspecified, then the value is taken from cmpyParams[‘sparse’]. Note, this parameter is only relevant if node_ordering is None.

node_ordering : list, None

If specified, then the distribution will be returned as a NumPy array. The list must include every node in the graph.

meta : bool

When True, the node transition matrix and node ordering that were used in the computation are stored in a dictionary, which is returned, under the keys ‘ntm’ and ‘node_ordering’.

Returns:

pi : Distribution, LogDistribution, NumPy array

The stationary distribution. Unless a node ordering is specified, the return value will be a distribution object. When a node ordering is specified, the return value will be a NumPy array.

meta : dict

Dictionary of useful quantities used during the calculation. This is returned only if meta is True.

stationary_entropy()

Returns the entropy of the stationary distribution.

subgraph(nbunch, copy=True, create_using=None)

Returns a node-induced subgraph of the graph.

Parameters:

nbunch : list

A list of nodes to include in the subgraph.

copy : bool

If True, then a new machine is created by making a deepcopy of all node/edge/graph attributes. Otherwise, every node except the nodes in nbunch are deleted.

create_using : [ callable | None ]

A callable which returns a machine instance that will be populated. If None then we use the class of the original machine.

Returns:

m : machine

The node-induced subgraph of the machine.

synchronize(logs=None)

Synchronizes the current distribution to the current node.

Parameters:

logs : bool

Specifies if the current distribution should be a log or standard distribution. If None, then the distribution will match whatever type is already set in the HMM. If the HMM has never had a distribution, then the value of logs is obtained from cmpyParams[‘logs’].

topological_entropy_rate()

Returns the topological entropy rate of the underlying Markov chain.

The topological entropy rate is defined as:

h = lim_{Lrightarrowinfty} 1/L log_2 N(L)

where N(L) is the number of allowable words of length L. It is a well-known result that the topological entropy rate equals the logarithm of the largest eigenvalue of the adjacency matrix. For a proof, see:

D. Lind and B. Marcus. An Introduction to Symbolic Dynamics and Coding. Cambridge University Press, 1995.

Notes

This does not compute the topological entropy rate of the HMM.

The moorehmm Module

Mealy HMM - symbols are output while in the state.

class cmpy.machines.moorehmm.MooreHMM(data=None, name='', **attr)

Bases: cmpy.machines.hmm.HMM

Machine where symbols are output while in the state.

Methods

alphabet([u]) Returns the alphabet of the machine.
distributions_iter([n, sdist]) Evolves and yields the current distribution n times.
alphabet(u=None)

Returns the alphabet of the machine.

Parameters:

u : node, None

If not None, then we return the alphabet as seen from node u. If None, then we return the union of the alphabets as seen from each node in the machine.

Returns:

a : set

The alphabet of the machine.

Raises:

AlphabetException :

When the alphabet cannot be obtained from the machine.

distributions_iter(n=None, sdist=None)

Evolves and yields the current distribution n times.

If the machine has no current distribution, one is choosen randomly. The distribution is evolved using a symbol distribution which describes the probability of observing a particular symbol.

Parameters:

n : int, None

The number of times to evolve the current distribution by the node transition matrix. If n is None, then the current distribution will be evolved indefinitely by the node transition matrix.

Notes

The current distribution is returned. If you want to save it, you should make a copy of it, since it will be updated when the current distribution is evolved again.

edge_attr_aliases = {'p': 'probability', 'value': 'probability'}
edge_attr_required = set(['probability'])
graph_attr_aliases = {}
graph_attr_required = set([])
node_attr_aliases = {'d': 'distribution'}
node_attr_required = set(['distribution'])

The mealytransducer Module

Mealy Transducers

Mealy transducers receive input and output symbols on the edges.

When querying about an input that is not defined for a particular node, the matrix is full of zeros. Then the probability can be computed in the same way as if working with a generator. The probability of error is the difference of the subnormalized probabilities from 1.

class cmpy.machines.mealytransducer.MealyTransducer(data=None, name='', **attr)

Bases: cmpy.machines.transducer.Transducer

Class for Mealy transducers.

Methods

alphabets([node]) Returns the input and output alphabets.
complete([ialphabet, reject, copy]) Returns the MealyTransducer as a complete MealyTransducer.
input_alphabet() Returns the input alphabet of the transducer.
input_machine([build, create_using]) Returns the eM describing the input language.
invertibility() Determines if a transducer is invertible with delay N.
invertibility_test_graph() Constructs the invertibility test graph for a transducer.
is_complete([ialphabet]) Determines if a transducer is complete or not.
joint_machine(generator) Returns the joint generator of the transducer composed with generator.
labeled_transition_matrices([node_ordering, ...]) Returns the labeled transition matrices.
output_alphabet() Returns the output alphabet of the transducer.
output_machine([build, create_using]) Returns the eM describing the output language.
renormalize_edges([style]) Renormalizes all the outgoing edge probabilities for each node.
stationary_distribution(generator[, gndist, ...]) Returns the stationary distribution induced by the generator.
stationary_entropy(generator[, gndist, ...]) Returns the entropy of stationary distribution induced by the generator.
transduce_generator(generator[, simple, ...]) Transducers the generator.
transduce_symbols_iter([ndist, error]) Returns a memoryful iterator for transducing symbols.
transduce_word(iword[, ndist, error, ...]) Transducers iword into an output word.
alphabets(node=False)

Returns the input and output alphabets.

Parameters:

node : bool

If True then the return values are dictionaries keyed by nodes. For each node, the corresponding alphabet is returned.

Returns:

ialphabet : set | dict

The input alphabet. If node is True, then this is a dictionary keyed by nodes in the transducer. The value for each key is the input alphabet from that node.

oalphabet : set | dict

The output alphabet. If node is True, then this is a dictionary keyed by nodes in the transducer. The value for each key is the output alphabet from that node.

complete(ialphabet=None, reject=None, copy=True)

Returns the MealyTransducer as a complete MealyTransducer.

That is, the transducer will be complete over the input symbols.

Parameters:

ialphabet : set

Additional input symbols used to make the transducer complete.

reject : hashable

The name given to the reject node.

copy : bool

If True, then a copy of the machine is made and returned. If False, the machine is updated in-place.

Returns:

m : MealyTransducer

The complete transducer.

input_alphabet()

Returns the input alphabet of the transducer.

input_machine(build=True, create_using=None, **kwargs)

Returns the eM describing the input language.

The transducer’s edges are added to a generator using only the input symbols. Each edge is given the same probability and the edges are renormalized after construction. Then the machine is transformed into its eM. Note that the machine will not necessarily be a topological eM.

All other parameters are passed to cmpy.machines.build_msr().

Parameters:

build : bool

If True, then the eM of the topological machine is constructed. If False, then the machine’s topology will resemble the transducer’s.

create_using : { callable | None }

A callable which returns a generator that is populated with the input machine.

Returns:

teM : RecurrentEpsilonMachine

The topological eM generating the input language.

invertibility()

Determines if a transducer is invertible with delay N.

Returns:

invertible : bool

True when the transducer is invertible with delay N, False if not invertible. Partial transducers result in an exception.

delay : int

When invertible = True, this is the inverse delay of the transducer. Otherwise, this is ‘Undefined’

invertibility_test_graph()

Constructs the invertibility test graph for a transducer.

Returns:

G : networkx.DiGraph()

The invertibility test graph for the transducer.

Examples :

——- :

>>> T = cmpy.machines.Delay(3) :

>>> G = T.invertibility_test_graph() :

>>> import cmpy.drawing :

>>> cmpy.drawing.draw_graphviz(G) :

is_complete(ialphabet=None)

Determines if a transducer is complete or not.

Parameters:

ialphabet : set

Additional input symbols used to make the transducer complete.

Returns:

complete : bool

True if the transducer is complete, otherwise False.

joint_machine(generator)

Returns the joint generator of the transducer composed with generator.

Parameters:

generator : machine

The generator to be composed with the transducer.

Returns:

jm : machine

The joint generator.

Notes

This is a raw operatation and does not account for start node distributions at all. This must be done after the fact.

labeled_transition_matrices(node_ordering=None, logs=False, mode='nan')

Returns the labeled transition matrices.

Each element in the matrix is given by:

T_{uv}^{i|o} = \Pr(o,v|u,i)

where u is the “from node”, v is the “to node”, i is the input symbol, and o is the output symbol. The value specifies the probability of seeing symbol o and transitioning to v given the current node u and input symbol i.

There is one matrix for each (input, output) symbol.

Parameters:

node_ordering : list

An ordering of nodes in the machine.

logs : bool | None

If True, then the log of the labeled transition matrices are returned.

mode : ‘nan’ | ‘zero’

When a node does not specify an input symbol, there are a number of ways one might choose to represent this fact in the labeled transition matrices. A natural choice is to use NaNs for the entire row. Choose ‘nan’ in this case. One can also choose to assign zero to the entire row. Choose ‘zero’ in this case.

Returns:

ltm : defaultdict

A dictionary of NumPy arrays keyed by (input, output) symbol pairs. The values are the labeled transition matrices. If an invalid key is specified, an array of zeros is returned.

norder : list

List of nodes, specifying the order of the rows/cols of the labeled transition matrices. This is returned only if node_ordering is None.

Raises:

CMPyException :

If a node in node_ordering does not exist in the machine.

output_alphabet()

Returns the output alphabet of the transducer.

output_machine(build=True, create_using=None, **kwargs)

Returns the eM describing the output language.

The transducer’s edges are added to a generator using only the output symbols. Each edge is given the same probability and the edges are renormalized after construction. Then the machine is transformed into its eM. Note that the machine will not necessarily be a topological eM.

All other parameters are passed to cmpy.machines.build_msr().

Parameters:

build : bool

If True, then the eM of the topological machine is constructed. If False, then the machine’s topology will resemble the transducer’s.

create_using : { callable | None }

A callable which returns a generator that is populated with the input machine.

Returns:

teM : RecurrentEpsilonMachine

The topological eM generating the output language.

renormalize_edges(style='as_is')

Renormalizes all the outgoing edge probabilities for each node.

Normalization is done from each node and for each input symbol.

Parameters:

style : str

The type of renormalization to perform. Valid options are:
‘as_is’

Use the current probabilities and simply renormalize them.

‘uniform’:

Probability is uniformly distributed per node.

Examples

>>> from cmpy import machines
>>> m = machines.FlipEveryOther()

Change the probabiltiies so that they are not normalized. Then renormalize.

>>> m['A']['B'][0]['p'] = 1
>>> m.renormalize_edges()

Now make the machine a topological machine.

>>> m.renormalize_edges(style='uniform')
stationary_distribution(generator, gndist=None, ndist=None, logs=None)

Returns the stationary distribution induced by the generator.

If the joint generator has a single attracting component, then it is not necessary to specify the generator and transducer start node distributions.

Parameters:

generator : machine

The generator that drives the transducer.

gndist : distribution

The start node distribution for the generator. If None, then one will be obtained from the machine, if possible.

ndist : distribution

The start node distribution for the transducer. If None, then one will be obtained from the machine, if possible.

logs : bool

If True, then the distribution will be a log distribution.

Returns:

dist : Distribution | LogDistribution

The induce stationary distribution over transducer nodes.

Notes

Unlike with generators, the same algorithm can be used to calculate the stationary distribution (ie, one attracting component) and averaged distribution (ie, multiple attracting components).

stationary_entropy(generator, gndist=None, ndist=None, logs=None)

Returns the entropy of stationary distribution induced by the generator.

Parameters:

generator : machine

The generator that drives the transducer.

gndist : distribution

The initial node distribution for the generator. If None, then one will be obtained from the machine, if possible.

ndist : distribution

The initial node distribution for the transducer. If None, then one will be obtained from the machine, if possible.

logs : bool

If True, then the distribution will be a log distribution.

Returns:

ent : float

The stationary entropy.

transduce_generator(generator, simple=False, gndist=None, ndist=None, error=None, logs=None)

Transducers the generator.

Parameters:

generator : machine

The generator which provides input symbols to the transducer.

simple : bool

If True, then the generator is transduced and no attention is given to the other parameters. That is, the simplist, most generic transduction is performed without any error analysis.

gndist : distribution

The initial node distribution for the generator. If None, then one will be obtained from the machine, if possible.

ndist : distribution

The initial node distribution for the transducer. If None, then one will be obtained from the machine, if possible.

error : int

If 0, then an exception will be raised if the output generator cannot be used without eventually emitting an error. If 1 or 2, then the output generator is constructed and output. If the value is None, then the value is taken from cmpyParams[‘transducers.error’].

logs : bool

If True, then the distribution over the output generator nodes will be a log distribution.

Returns:

m : MealyHMM

The transduced generator representing the output.

jd : distribution

The initial node distribution for the output generator. If simple is True, this value is not returned.

ed : distribution

The averaged node distribution for the output generator. If simple is True, this value is not returned

noerror : bool

If True, then the output generator will never output an error. If simple is True, this value is not returned.

Raises:

CMPyException :

If simple is False and error equals 0, then this exception is raised if the output generator eventually outputs an error.

If the alphabet of the generator has no intersection with the input alphabet of the transducer.

transduce_symbols_iter(ndist=None, error=None)

Returns a memoryful iterator for transducing symbols.

Calling this method returns an custom object which has a transduce method that converts a single input symbol into an output symbol and next state (of the transducer). The custom iterator uses a cached copy of the transducer’s labeled transition matrices. If the transducer is modified in any way, then a new memoryful iterator should be constructed.

If the input symbol is not allowed from the current state of the transducer, then a ‘?’ will be returned and the current state of the transducer will be updated to ‘?’. Any future transductions will result in ‘?’ as outputs again. In other words, you need to reset the current node of the transducer.

If the current state is None, then one is choosen according to the self._get_ndist().

Parameters:

symbol : symbol

The input symbol to pass to the transducer.

ndist : distribution

The initial node distribution or the initial node for the transducer. If None, then one will be obtained from the machine, if possible.

error : { int | None}

If 0, then an exception is raised whenever the input is not allowed from the current state of the transducer. If 1 or 2, then an invalid input will return (‘?’, ‘?’) representing an error output symbol and an error state. All future calls to transduce will result in the same output. If the value of error is None, then the value is taken from cmpyParams[‘transducers.error’].

Returns:

it : object

A customized iterator for transducing symbols.

Examples

>>> from cmpy import machines
>>> t = machines.GMtoEven()
>>> t.set_start_node('A')
>>> it = t.transduce_symbol_iter(error=1)
>>> it.transduce('0')
('1', 'B')
>>> it.transduce('1')
('1', 'A')
>>> it.transduce('0')
('1', 'B')
>>> t.transduce_word('010') # compare to transducing all at once
('1', '1', '1')
>>> it.transduce('0')  # contining but with bad input
TransducerError: Transducer errored on input ('0',)
transduce_word(iword, ndist=None, error=None, return_dist=False, logs=None)

Transducers iword into an output word.

Parameters:

ndist : { Distribution | LogDistribution }

The initial node distribution of the transducer. If None, then the stationary distribution induced by the minimal topological eM of the input language will be used.

error : { int | None}

If 0, then the conditional distribution will only include non-error outputs. This means that the probabilities will be renormalized, so as to remove the effect of the error output. If 1, then the total probability of error is included in the output distribution. If 2, then there will be multiple error events in the distribuiton and errors will be shown as they occur. If the value is None, then the value is taken from cmpyParams[‘transducers.error’].

return_dist : bool

If True, we return the distribution over output words given the input word iword.

logs : bool

If True and if return_dist is True, then the returned distribution is a log distribution; otherwise, it’s a standard distribution. If logs is None, then we use the value from cmpyParams[‘logs’].

Returns:

oword : tuple

The output word, transduced from the input word iword.

dist : distribution

The distribution over output words given iword. This is returned only if return_dist is True.

The misc Module

Miscellaneous machine related functions.

cmpy.machines.misc.validate_node_distribution(dist, machine, logs, test_prob=False)

Validates a node distribution.

A valid node distribution is a dictionary where every key is a node in the graph and the values of the dictionary must sum to 1. Note, the dictionary need not contain every node in the graph.

Parameters:

dist : dict

This is the distribution to be validated.

machine : CMPy machine

The machine we are checking against.

logs : bool

This tells us if the distribution is a log distribution.

test_prob : bool

This tells us to make sure each value is between 0 and 1.

Returns:

b : bool

True if the distribution is valid. Otherwise, an exception is raised.

Raises:

InvalidDistribution :

Raised when the distribution is not properly normalized.

CMPyException :

Raised when the distribution contains a node not in the HMM.

cmpy.machines.misc.validate_node_ordering(ordering, machine)

Validates a node ordering.

A valid node ordering includes each node in the machine exactly once.

Parameters:

ordering : list

The node ordering that will be checked for validity.

machine : CMPy machine

The machine used to check the validity of the ordering.

Returns:

b : bool

True if the ordering is valid. Otherwise, an exception is raised.

Raises:

InvalidOrdering :

Raised when the ordering is invalid.

The transform Module

Some basic transformations on machines.

cmpy.machines.transform.reassign_edge_probabilities(machine, style='uniform')

Reassigns edge probabilities for every edge in machine.

Parameters:

machine : MealyHMM

The machine whose edge probabilities will be reassignd.

style : str

If ‘uniform’, then edge probabilities are uniformly distributed per node. If ‘prime’, then each node is assigned a unique prime number to act as the starting value. This should make all edge distributions distinct.

cmpy.machines.transform.internal_markov_chain(machine, create_using=None)

Returns the internal Markov chain of the machine.

Parameters:

machine : MealyHMM

A machine.

create_using : callable

A callable used to create the instance to be populated. If None, then we use the MealyHMM class.

Returns:

mc : MealyHMM

The internal Markov chain of machine.

cmpy.machines.transform.edge_machine(machine, iterations=1, style=0, create_using=None)

Returns a MealyHMM corresponding to the edges of machine.

This is also know as a ‘line graph’.

Regardless of the requested style, we construct a PULL machine for the requested number of iterations. At the very end, we relabel the symbols if a PUSH machine is requested. Note that the intermediate stages are the same. That is, constructing a PULL machine from a PUSH’ed machine is the same as constructing a PULL machine from a PULL’ed machine.

Parameters:

machine : MealyHMM

A MealyHMM instance.

iterations : positive int, 1

It is possible to create an edge machine from the resulting edge machine. This parameter specifes how many times to compose the operation.

style : int, {0,1}

To create an EdgeHMM, we must still output symbols on the edges. There are at least two ways to do this. style=0

After creating all nodes, we PULL the symbol from the from_node.

style=1

After creating all nodes, we PUSH the symbol from the to_node.

Returns:

m : MealyHMM

The edge machine.

The label Module

Functions for constructing node/edge labels.

cmpy.machines.label.edge_label(attrs, input=True, value=True, output=True, supertime=False, join=True)

Returns a string to use when drawing edge labels.

Generally, an edge label is drawing in the following manner:

input | value | output

This format degrades gracefully whenever any of the quantities is omitted. For example, the edge label for a MealyHMM appears as:

value | output

where the value used is the edge probability.

Parameters:

attrs : aliasdict1

The attributes of the edge from a CMPy machine.

The convert Module

Convert CMPy graphs to and from other formats.

cmpy.machines.convert.from_string(s, name=None, style=0, renormalize=True)

Returns a CMPy machine from the string s.

Parameters:

s : str

A string representing the machine.

name : {str, None}

The name to give the presenatation.

style : int | string

An integer describing the type of machine to create. Valid values are:

0, ‘mealyhmm’ : MealyHMM

Syntax for s is that edges are separated by semicolons and each edge consists of the from-node, to-node, symbol, and transition probability. Each element of the edge should be separated by spaces. For example: “A A 1 .5; A A 0 .5”. If the probability is omitted, then uniform transition probabilities are applied to each edge. For example: “A A 1; A A 0” gives the fair coin.

1, ‘reM’ : RecurrentEpsilonMachine

Same syntax as 0. User must ensure that the machine described is both minimal and unifilar.

2, ‘mealytransducer’ : MealyTransducer

Syntax for s is that edges are separated by semicolons and each edge consists of the from-node, to-node, input-symbol, output-symbol, and transition probability. Each element of the edge should be separated by spaces. For example: “A A 1 0 .5; A A 1 1 .5; A A 2 1 1”.

3, ‘dfa’ : DFA

Syntax for s is that edges are separated by semicolons and each edge consists of the from-node, to-node, and symbol. Each element of the edge should be separated by spaces. For example: “A A 1; A A 0” gives the fair coin.

4, ‘markovchain’ : MarkovChain

Syntax for s is that edges are separated by semicolons and each edge consists of the from-node, to-node, and probability. Each element of the edge should be separated by spaces. For example: “A A .6; A B .4; B A 1”. If the probability is omitted, then uniform transitions are applied to each edge. So, “A A; A B; B A” gives the standard golden mean process.

renormalize : bool

If True, then the edge probabilities are renormalized before returning the machine.

Examples

To create the golden mean process:

>>> m = from_string('A A 1 .5; A B 0 .5; B A 1 1')

To create the golden mean process as a Markov chain:

>>> m = from_string('A A; A B; B A', style='markovchain')
cmpy.machines.convert.from_matrices(matrices, name=None, nodes=None, symbols=None, style=0)

Returns a CMPy machine from matrices.

Parameters:

matrices : {list, dict}

A list of matrices. The exact format depends on the value of style.

name : {str, None}

The name to give the presenatation.

nodes : {list, None}

The names to give the nodes, the order corresponds to the order of the rows/cols in the matrices. If None, the nodes will be labeled as integers beginning at 0.

symbols : {list, None}

The names given to the symbols (used only when style == 3). If None, the symbols will be integers beginning at 0.

style : int

An integer describing the type of machine to create. Valid values are:

0, ‘mealyhmm’ : MealyHMM

matrices should be a list of 2-tuples. Each 2-tuple should be of the form (symbol, matrix). matrices can also be a dict keyed by symbols with the labeled transition matrices as values.

1, ‘dfa’ : DFA

matrices should be a list of 2-tuples. Each 2-tuple should be of the form (symbol, matrix). matrices can also be a dict keyed by symbols with the labeled transition matrices as values.

2, ‘dfa_auto’: DFA

matrices should be a list of matrices. Symbols will be integers from 0.

3, ‘moorehmm’ : MooreHMM

matrices should be a 2-tuple consisting of the node transition matrix and the emission matrix.

4, ‘markovchain’ : MarkovChain

matrices should be a single NumPy 2D array that describes the state transition matrix of the Markov chain.

Examples

To create the golden mean process:

>>> T0 = (0, [[0,.5],[0,0]])
>>> T1 = (1, [[.5,0],[1,0]])
>>> m = from_matrices( (T0,T1) )
cmpy.machines.convert.convert_symbols_to_integers(m, as_strings=False)

Converts all output symbols to integers.

cmpy.machines.convert.from_distribution(d)

Builds a Markov process from a joint distribution.

The simplemoorehmm Module

Mealy HMM - symbols are output while in the state.

class cmpy.machines.simplemoorehmm.MooreHMM(data=None, name='', **attr)

Bases: cmpy.machines.hmm.HMM

Machine where symbols are output while in the state.

Methods

alphabet(u=None)

Returns the alphabet of the machine.

Parameters:

u : node, None

If not None, then we return the alphabet as seen from node u. If None, then we return the union of the alphabets as seen from each node in the machine.

Returns:

a : set

The alphabet of the machine.

Raises:

AlphabetException :

When the alphabet cannot be obtained from the machine.

distributions_iter(n=None, sdist=None)

Evolves and yields the current distribution n times.

If the machine has no current distribution, one is choosen randomly. The distribution is evolved using a symbol distribution which describes the probability of observing a particular symbol.

Parameters:

n : int, None

The number of times to evolve the current distribution by the node transition matrix. If n is None, then the current distribution will be evolved indefinitely by the node transition matrix.

Notes

The current distribution is returned. If you want to save it, you should make a copy of it, since it will be updated when the current distribution is evolved again.

edge_attr_aliases = {'p': 'probability', 'value': 'probability'}
edge_attr_required = set(['probability'])
graph_attr_aliases = {}
graph_attr_required = set([])
node_attr_aliases = {'d': 'distribution'}
node_attr_required = set(['distribution'])

The base Module

The base machine class.

class cmpy.machines.base.BaseMachine(data=None, name='', **attr)

Bases: networkx.classes.multidigraph.MultiDiGraph

The base machine class for CMPy.

Attributes

edge_attr_required
graph_attr_required
node_attr_required

Methods

add_edge(u, v[, key, attr_dict]) Add an edge between u and v.
add_edges_from(ebunch[, attr_dict]) Add all the edges in ebunch.
add_node(n[, attr_dict]) Add a single node to the graph and update its attributes.
add_nodes_from(nodes, **attr) Adds multiple nodes to the graph.
adjacency_matrix([node_ordering]) Returns the adjacency matrix as a NumPy integer array.
attracting_components() Returns the attracting components of the graph.
attracting_nodes() Returns the attracting nodes of the machine.
draw([filename, format, prefix, suffix, ...]) Draws the machine using the specified layout algorithm.
get_name() Returns the name of the machine.
get_start_node() Returns the start node.
is_irreducible() Returns True if the machine is irreducible and False if not.
set_name(name) Sets the name of the machine.
set_start_node(n) Sets the start node.
to_dot([filename, labelstyle]) Writes the machine to filename in the Graphviz DOT format.
add_edge(u, v, key=None, attr_dict=None, **attr)

Add an edge between u and v.

Since multiple edges between two nodes are allowed, each edge must be assigned a key. If a key is not specified, then one will be provided.

Parameters:

u,v : nodes

Nodes can be, for example, strings or numbers. Nodes must be hashable (and not None) Python objects.

key : hashable identifier, optional (default=lowest unused integer)

Used to distinguish multiedges between a pair of nodes.

attr_dict : dictionary, optional (default= no attributes)

Dictionary of edge attributes. Key/value pairs will update existing data associated with the edge.

attr : keyword arguments, optional

Edge data (or labels or objects) can be assigned using keyword arguments.

Returns:

key : The key uniquely identifying the added edge.

See also

add_edges_from

Notes

To replace/update edge data, use the optional key argument to identity a unique edge. Otherwise a new edge will be created.

add_edges_from(ebunch, attr_dict=None, **attr)

Add all the edges in ebunch.

Parameters:

ebunch : container of edges

Each edge given in the container will be added to the graph. The edges can be:

  • 2-tuples (u,v) or
  • 3-tuples (u,v,d) for an edge attribute dict d, or
  • 4-tuples (u,v,k,d) for an edge identified by key k

attr_dict : dictionary, optional (default= no attributes)

Dictionary of edge attributes. Key/value pairs will update existing data associated with each edge.

attr : keyword arguments, optional

Edge data (or labels or objects) can be assigned using keyword arguments.

Returns:

keys : The keys uniquely identifying the added edges.

See also

add_edge
add a single edge
add_weighted_edges_from
convenient way to add weighted edges

Notes

Adding the same edge twice has no effect but any edge data will be updated when each duplicate edge is added.

add_node(n, attr_dict=None, **attr)

Add a single node to the graph and update its attributes.

If the node already exists, then its attributes are updated.

Parameters:

n : node

Any hashable object except None.

attr_dict : dict (optional)

Dictionary of node attributes. If None, then no attributes are set.

attr : keyword arguments (optional)

Additional keywords that update attr_dict, before setting the node attributes.

Raises:

FrozenMachine :

If the machine is frozen.

See also

add_nodes_from

add_nodes_from(nodes, **attr)

Adds multiple nodes to the graph.

Parameters:

nodes : list

A list of nodes to add to the graph.

attr : keyword arguments (optional)

Additional keyword arguments passed to add_node().

adjacency_matrix(node_ordering=None)

Returns the adjacency matrix as a NumPy integer array.

Parameters:

node_ordering : list, None

An ordering of the nodes in the graph. The ordering determines the order of the rows and columns in the array. If None, then ‘self.nodes()’ is used.

Returns:

m : NumPy array

The adjacency matrix whose elements describe the number of edges between two nodes in the graph.

ordering : list

The ordering used to construc the array. This is returned only if node_ordering was None.

Notes

The return value is not necessarily a 0-1 matrix. The elements in the matrix will equal the number of edges between nodes.

attracting_components()

Returns the attracting components of the graph.

This does not construct machines of the attracting components.

Returns:

acc : list

A list of attracting components. Each attracting component is represented by a tuple of nodes in the attracting component.

attracting_nodes()

Returns the attracting nodes of the machine.

An attracting node belongs to some attracting component. An attracting component is a group of nodes which accummulates probability flow.

Note, that a node can be attracting without being recurrent. That is, the attracting property of a node is independent of whether or not probabilty actually flows through it.

Returns:

nodes : [nodes]

Returns a list of all the attracting nodes of the machine, even if they are in different strongly connected components.

draw(filename=None, format=None, prefix=None, suffix=None, layout='dot', args='', labelstyle=1, show=True, env=None)

Draws the machine using the specified layout algorithm.

Parameters:

filename : str, None

The name of the file to save the image to. If None, save to a temporary file. File formats are inferred from the extension of the filename. If the format parameter is provided, it overwrites any inferred value.

format : str

An output format. Note that not all may be available on every system depending on how Graphviz was built. If no filename is provided and no format is specified, then we write a ‘png’ image. Other values for format are:

‘canon’, ‘cmap’, ‘cmapx’, ‘cmapx_np’, ‘dia’, ‘dot’, ‘fig’, ‘gd’, ‘gd2’, ‘gif’, ‘hpgl’, ‘imap’, ‘imap_np’, ‘ismap’, ‘jpe’, ‘jpeg’, ‘jpg’, ‘mif’, ‘mp’, ‘pcl’, ‘pdf’, ‘pic’, ‘plain’, ‘plain-ext’, ‘png’, ‘ps’, ‘ps2’, ‘svg’, ‘svgz’, ‘vml’, ‘vmlz’, ‘vrml’, ‘vtx’, ‘wbmp’, ‘xdot’, ‘xlib’

prefix : str | None

If filename is None, we save to a temporary file. The value of prefix will appear at after ‘cmpy_‘ but before random string and file extension. If None, then the machine’s name is used.

suffix : str | None

If filename is None, we save to a temporary file. The value of suffix will appear at after the prefix and random string but before the file extension.

tempname : str, None

If filename is None, we save to a temporary file. The value of tempname will appear at the tail end of the temporary filename. If`tempname` is None, then we use the name of the machine.

layout : str

The graphviz layout algorithm. Options are the same as in pygraphviz.AGraph.draw().

args : str

Additional arguments to pass to the Graphviz layout program.

labelstyle : int, None

A specification of the label style. Default option is 0. Valid options:

0 No labels are drawn.

1 Full CMPy label in “input | probability | output” format.

If input or output is None, then we adjust accordingly.

2 Probability is removed from the label.

show : bool

If True, then the image is displayed using the default viewer after drawing it.

Notes

If this function is called in succession to quickly, it doesn’t seem to work. So you might want to sleep for a second between calls.

edge_attr_aliases = {}
edge_attr_required = set([])
get_name()

Returns the name of the machine.

get_start_node()

Returns the start node.

graph_attr_aliases = {}
graph_attr_required = set([])
is_irreducible()

Returns True if the machine is irreducible and False if not.

A machine is irreducible if every node can be reached from every other node. That is, the machine must consist of a single strongly connected component. Probabilistically, this is equivalent to asking if the machine consists only of its recurrent component.

node_attr_aliases = {}
node_attr_required = set([])
set_name(name)

Sets the name of the machine.

set_start_node(n)

Sets the start node.

Parameters:

n : node

The node that will be set as the start node.

to_dot(filename=None, labelstyle=1)

Writes the machine to filename in the Graphviz DOT format.

No layout program is ran on the machine to determine node positions. End of line characters are Unix-style: n

Parameters:

filename : str

The name of the file to save the image to. No matter the extension of filename, the format will always be the DOT format.

labelstyle : int

A specification of the label style. Default option is 1. Valid options:

0 No labels are drawn. 1 Full CMPy label in “input | probability | output” format.

If input or output is None, then we adjust accordingly.

2 Probability is removed from the label.