# Information Documentation¶

This page contains the Information Package documentation.

## The `exceptions` Module¶

Exceptions related to information theory.

exception `cmpy.information.exceptions.``InvalidDistribution`(*args)

Exception thrown when a distribution is not normalized.

exception `cmpy.information.exceptions.``InfiniteRelativeEntropy`(*args)

Exception to use if the relative entropy was calculated to be infinite.

exception `cmpy.information.exceptions.``InvalidProbability`(*args)

Exception thrown when a probability is not in [0,1].

## The `information` Module¶

Information-theoretic quantities.

Most of these functions assume you pass in a dictionary relating words to probabilities. From these dictionaries, relevant quantities are computed.

Additionally, all these functions assume discrete alphabets.

When we calculate a probability and the result is small, there is a design question: Should we include the small probability, if we want the distributions to remain sparse? In an attempt to prevent excessive propagation of tolerances, I decided to ALWAYS populate the outgoing distribution UNLESS the result is EXACTLY zero (due to a key not being present). This means we will not have nearly as many “is_forbidden” checks.

`cmpy.information.information.``conditional_distribution`(**keywords)

Given a joint distribution of X and Y, this function returns the distribution of Y given X. Optionally, this function will return the distribution of X given Y.

Required keywords:
distribution_XY
This is the joint distribution over X and Y. The keys are expected to be tuples of length 2.
Optional keywords:
condition_on_first=True
This specifies that we condition on the first variable. Thus, we return Pr(Y|X). The default value is True. If you want Pr(X|Y), set this keyword to False.
use_logs=True
This specifies that the distribution is actually a log distribution. The default value is True
validate=False
This specifies if we should check that the distribution is properly normalized. The default value is False, meaning that no check will be performed.
`cmpy.information.information.``conditional_entropy`(**keywords)

Computes H(Y|X) using the following formula:

H(Y|X) = H(X,Y) - H(X)

Given a dictionary representing the joint probability of two random variables, X and Y, this function returns entropy of Y conditioned on X. That is, we condition on the first variable. This can be changed so that we condition on the second random variable.

So the default behavior is the follows:
H(Y|X) = H(X,Y) - H(X)

where keys to the distribution are like (x,y).

Required keywords:
distribution_XY
This is the joint distribution describing X and Y. The keys should describe the joint values that X and Y can take on.
Optional keywords:
condition_on_first=True
This specifies that we should condition on the first random variable when examining the keys of the specified distribution. The default value for this keyword is True.
use_logs=True
This specifies that the distribution is actually a log distribution. The default value for this keyword is True.

For example,

```>>> import cmpy
>>> a = {(0,'H'):.2, (0,'T'):.1, (1,'H'):.3, (1,'T'):.4}
>>> cmpy.conditional_entropy(distribution_XY=a)
0.72192809488736231
>>> cmpy.conditional_entropy(distribution_XY=a, condition_on_first=False)
```
`cmpy.information.information.``convert_into_joint_distribution`(**keywords)

While testing algorithms, it is desirable to be able to generate two variable joint distributions. An easy way to do this is to generate word probabilities for a single machine—treating the first half of the word as the first random variable and the second of half of the word as the second random variable.

That is, (‘0’,‘1’,‘0’,‘1’) —> ((‘0’,‘1’),(‘0’,‘1’)) That is, (‘0’,‘1’,‘0’,‘1’,‘0’) —> ((‘0’,‘1’),(‘0’,‘1’,‘0’))

This function does the above conversion. Notice, we do not verify that the distribution is valid.

Required keywords:
distribution
This is the distribution we are splitting.
`cmpy.information.information.``joint_distribution_from_conditional`(**keywords)

Returns Pr(X,Y) given P(Y|X) and P(X).

Parameters: distribution_YgX : dictionary A dictionary keyed by X=x. The values are a dictionary keyed by Y=y, and the values of the inner dictionary represent Pr(Y=y|X=x). distribution_X : dictionary A dictionary keyed by X=x. The values represent Pr(X=x). use_logs : bool When True, the probability values are assumed to be log probabilities. The default value is taken from cmpyParams[‘use_logs’]. distribution_XY : dictionary A dictionary keyed by a tuple of tuples. The first element in the tuple is X=x and the second element in the tuple is Y=y. The value for each key is Pr(X=x, Y=y).
`cmpy.information.information.``distribution_from_conditional`(**keywords)

Given a probability distribution for X and a conditional distribution relating Y to X, this function returns the distribution for Y.

Thus, we compute:

Pr(Y=y) = Sum_x [ Pr(X=x) * Pr(Y=y|X=x) ]

Required keywords:
distribution_X
This is a dictionary that represents the probability of that X=x.
distribution_YgX
This is a dictionary of dictionaries that represents the conditional probability relating Y to X, namely Pr(Y=y|X=x). The keys in the outer dictionary should be the values of X. The keys in the inner dictionary should be the values of Y. The values of the inner dictionary should be the probabilities.
Optional keywords:
use_logs=True
This specifies that the distributions are actually log distributions. The default value is True.
include_forbidden=False
This specifies that forbidden probabilities will be included in the output. The default value is False.
validate=False
This specifies if we should check that the new distribution is properly normalized. The default value is False, meaning that no check will be performed.

Examples

```>>> import cmpy
>>> a = {0:.2, 1:.8}
>>> b = {0:{0:.5, 1:.5}, 1:{1:1}}
>>> cmpy.distribution_from_conditional(distribution_X=a, distribution_YgX=b, use_logs=False)
>>> {0: 0.10000000000000001, 1: 0.90000000000000002}
```
`cmpy.information.information.``entropy`(**keywords)

Given a dictionary representing the probability distribution of a random variable X, this function returns the entropy of X. Notice, this function works for any probability distribution, so it easily computes joint entropies as well.

Required keywords:
distribution
This is the distribution describing X. The keys should describe the values that X can take on.
Optional keywords:
use_logs=True
This specifies that the distribution is a log distribution. The default value is True.
For example, in the first example below: Pr(X=0) = .2 and
Pr(X=1) = .8
```>>> import cmpy
>>> a = {0:.2, 1:.8}
>>> cmpy.entropy(distribution=a, use_logs=False)
0.72192809488736231
```
```>>> b = {(0,0):.3, (0,1):.2, (1,0):.4, (1,1):.1}
>>> cmpy.entropy(distribution=b, use_logs=False)
1.8464393446710154
```

Notice, the second example can be interpreted as the entropy for the random variable X taking on values (0,0), (0,1), (1,0), and (1,1). Alternatively, the second example can be interpreted as the joint entropy for random variables X and Y where each random variable can take on the values 0 and 1.

`cmpy.information.information.``kullback_leibler_distance`(**keywords)

Given two distributions for the same random variable, this function returns the relative entropy (Kullback-Leibler distance) between the two distributions.

Suppose p(x) and q(x) are the two distributions. Then this function returns D(p(x)||q(x)); that is, it returns the entropy of p relative to q. To return the entropy of q relative to p, the keyword relative_to_first should be set to False.

Notice, the pdfs must have the same domain.

Required keywords:
distribution_1
This is the first distribution for the random variable.
distribution_2
This is the second distribution for the random variable.
Optional keywords:
relative_to_first
This specifies that we should compute the entropy relative to the first random variable. The default value for this variable is True.
use_logs=True
This specifies that the distributions are log distributions. The default value is True.
validate=False
This specifies if we should check that the distribution is properly normalized. The default value is False, meaning that no check will be performed.
`cmpy.information.information.``marginal_distribution`(distribution, **keywords)

Given any joint distribution and a list of the indices to sum over, this function returns the resulting distribution. For example, the joint distribution of X and Y might be specified as (x,y). To return the distribution of x, we would instruct this function to sum over index 1. Alternatively, the user may specify a list of indices to keep—all other indices will be summed over.

If we are going from two random variables to one random variable, then the indices go from (x,y) to x. If we are going from many random variables to more than one random variable, then the indices, for example, go from (x,y,z) to (x,y).

Required keywords:
distribution
This is the joint distribution. Keys should be specified as tuples, with each element representing a word in the random variable.
sum_indices
This is a list of indices to sum over. If this keyword is specified, the keep_indices keyword cannot be specified.
keep_indices
This is a list of indices to keep. All other indices will be summed over. If this keyword is specified, the sum_indices keyword cannot be specified.
Optional keywords:
use_logs=True
This specifies that the distribution is actually a log distribution. The default value is True.
validate=False
This specifies if we should check that the distribution is properly normalized. The default value is False, meaning that no check will be performed.

For example,

```>>> import cmpy
>>> a = {((0,),(0,)):.2, ((0,),(1,)):.5, ((1,),(0,)):.1, ((1,),(1,)):.2}
>>> cmpy.marginal_distribution(distribution=a, sum_indices=[1], use_logs=False)
{(0,):.7, (1,):.3}
>>> b = {(0,1,2):.25, (3,4,5):.25, (3,5,5):.5}
>>> cmpy.marginal_distribution(distribution=b, sum_indices=[1], use_logs=False)
{(0,2):.25, (3,5):.75}
>>> cmpy.marginal_distribution(distribution=b, keep_indices=[2], use_logs=False)
{2:.25, 5:.75}
```
`cmpy.information.information.``mutual_information`(**keywords)

Given the joint distribution for two random variables, this function computes the mutual information of those random variables.

Required keywords:
distribution_XY
This is the joint distribution of the two random variables.
Optional keywords:
use_logs=True
This specifies that the distributions are log distributions. The default value is True.
`cmpy.information.information.``perturb_distribution`(**keywords)

Given a distribution for a random variable X, this function returns the same distribution perturbed just slightly. Currently, there are two methods available. Notice, the perturbation will only be done on those words explicitly listed as keys in the distribution.

Required keywords:
distribution
This is the distribution to be perturbed. It should be a dictionary keyed by the words and the values should be the probabilities.
Optional keywords:
use_logs=True
This specifies if the distribution is actually a log distribution. The default value is True.
perturbation_scale=.01
This sets the scale of the perturbation. In general, you will want this to be small (much less than 1). Read about the various methods to understand how the perturbation_scale is used.
perturbation_attempts=50
After creating the distribution, we must check that it is a valid distribution. If it is not, we generate another random perturbation. This process will repeat as many times as specified by this keyword. The default value is 50. After the specified number of attempts, a PerturbationWaitTimeExceeded exception will be raised.
method=3
This specifies how the perturbation will be computed. The default value of this keyword is 1. Notice, no method currently exists that will generate a random perturbation uniformly on the space of all perturbed distributions.

Acceptable values for the ‘method’ keyword are described now:

method=1
The perturbation is done in the following way:
If Pr(w) < .5, then the perturbation is selected from [0,.5] If Pr(w) >= .5, then the perturbation is selected from [-.5,0]

Denote the perturbation by x. After selecting x according to the above logic, we scale x by the perturbation_scale. When perturbed in this fashion, we are guaranteed to have a valid distribution after a single perturbation attempt—Of course, this requires that we renormalize after adding the perturbation.

This method is particularly useful if you have many words with probabilities near 0 or 1. In such cases, a random perturbation will likely bring the probability out of the interval [0,1], and the wait time for a good perturbed distribution is very long.

method=2

The perturbation is uniform in the range [-1, 1] with mean 0. Denote the permutation by x. After selecting x according to the above logic, we scale x by the perturbation_scale.

Notice, there could be substantial wait times if you have probabilties that are near 0 or 1, as a random perturbation will probably take the probability to be less than 0 or greater than 1.

method=3

For each element, the perturbation is uniform in the the range [-1,1] with mean 0. Denote the permutation by x. After selecting x according to the above login, we scale x by the permutation_scale.

Then we add the permutation to existing distribution value. The goal is to keep the probabilties within [0,1]. So we take this new probability and push it through a special function which returns a valid probability. The function essentially bounces probabilities off the boundaries. So if a number is just slightly greater than 1, then it will be used as a number slightly less than one (same slightness). If a number is just slightly less than zero, then it will be used as a number just slightly greater than zero (same slightness). It should be clear that this, as with the other methods, is not a uniform selection over the space of all perturbed distributions. However, the upside is that when perturbed in this fashion, we are guaranteed to have a valid distribution after a single perturbation attempt

`cmpy.information.information.``relative_entropy`(**keywords)

Given two distributions for the same random variable, this function returns the relative entropy (Kullback-Leibler distance) between the two distributions.

Suppose p(x) and q(x) are the two distributions. Then this function returns D(p(x)||q(x)); that is, it returns the entropy of p relative to q. To return the entropy of q relative to p, the keyword relative_to_first should be set to False.

Notice, the pdfs must have the same domain. The following formula is used:

D_KL(p||q) = Sum_x [ p(x) log2 p(x)/q(x) ]

Required keywords:
distribution_1
This is the first distribution for the random variable.
distribution_2
This is the second distribution for the random variable.
Optional keywords:
relative_to_first
This specifies that we should compute the entropy relative to the first random variable. The default value for this variable is True.
use_logs=True
This specifies that the distributions are log distributions. The default value is True.
`cmpy.information.information.``symmetric_relative_entropy`(**keywords)
Required keywords:
distribution_1
This is the first distribution for the random variable.
distribution_2
This is the second distribution for the random variable.
Optional keywords:
use_logs=True
This specifies that the distributions are log distributions. The default value is True.
`cmpy.information.information.``mutual_information_from_conditional`(**keywords)

Computes the mutual information using the following formula:

I(X; Y) = H(Y) - H(Y|X)

Required keywords:
distribution_YgX
This is Pr(Y|X).
distribution_X
This is Pr(X).
Optional keywords:
use_logs=True
This specifies that the distributions are log distributions. The default value is True.
`cmpy.information.information.``conditional_entropy_from_conditional`(**keywords)

Computes H[Y|X] using the following formula:

H[Y|X] = sum_x ( Pr(x) * H[Y|X=x] )

Given Pr(Y|X) and Pr(X), return the entropy of Y conditioned on X.

Required keywords:
distribution_YgX
This is Pr(Y|X). This should be a dictionary of dictionaries.
distribution_X
This is Pr(X). This should be a dictionary.
Optional keywords:
use_logs=True
This specifies that the distribution is actually a log distribution. The default value for this keyword is True.
`cmpy.information.information.``probabilize`(probability, perturbation, use_logs)

Given a number on the real axis, this function returns a number in [0,1]. The mapping is a triangle wave of period 2. So, something like the following:

... -3 <= x <= -2 —-> f(x) = -(x+2) -2 <= x <= -1 —-> f(x) = (x+2) -1 <= x <= 0 —-> f(x) = -x

0 <= x <= 1 —-> f(x) = x 1 <= x <= 2 —-> f(x) = -(x-2) 2 <= x <= 3 —-> f(x) = (x-2)

...

The effect of this is that if a number is just slightly less than 0, it will be bounced back into the interval [0,1] by the same amount. Similarly, if a number is slightly greater than 1, it will be bounced back into the interval [0,1] by the amount it is greater than 1.

The perturbation should NOT be a log value. This function will take the log, when needed. The probability SHOULD be a log probability if use_logs is True and SHOULD be a standard probability if use_logs is False.

If use_logs is False, there is a simple formula we can use:

f(x) = 1 - 2 * abs( nint( .5 * (x-1) ) - .5 * (x-1) )

where nint(x) is the nearest integer of x. To avoid statistical bias, If x % 1 == .5, then we should always round to the nearest even number. In numpy, this function is around(). This formula was stated by Trott 2004, p. 228.

This function is especially useful when perturbing distributions. Done is this fashion, there is no need to worry about probabilities becoming greater than 1 or less than 0. Though, I’m not yet sure how this affects the distribution of the permutations.

`cmpy.information.information.``channel_capacity`(YgX, **keywords)

Computes the channel capacity from a conditional distribution.

Required arguments:
YgX
This is a dictionary of dictionaries where YgX[x][y] = Pr(y|x).
Optional keywords:
use_logs = False
This specifies whether the distribution is a log distribution.
rtol = 1e-10
This specifies how long we should continue iterating. When the tolerance is None, we will continue iterating until the channel capacity converges within floating point precision.
Returns:
C
This is the channel capacity.
X
This is a dictionary where X[x] = Pr(x).

This algorithm implements the alternating maximization procedure of Csiszar and Tusnady. For a reference, consult:

Cover and Thomas, “Elements of Information Theory”.

## The `misc` Module¶

class `cmpy.information.misc.``DistributionDict`

Bases: `dict`

A dictionary-based representation of a distribution.

Attributes

Methods

 `clear`(() -> None.  Remove all items from D.) `copy`(() -> a shallow copy of D) `entropy`() Returns the Shannon entropy of the distribution. `fromkeys`(...) v defaults to None. `get`((k[,d]) -> D[k] if k in D, ...) `has_key`((k) -> True if D has a key k, else False) `is_valid`([test_prob]) Returns True if the distribution is properly normalized. `items`(() -> list of D’s (key, value) pairs, ...) `iteritems`(() -> an iterator over the (key, ...) `iterkeys`(() -> an iterator over the keys of D) `itervalues`(...) `keys`(() -> list of D’s keys) `pop`((k[,d]) -> v, ...) If key is not found, d is returned if given, otherwise KeyError is raised `popitem`(() -> (k, v), ...) 2-tuple; but raise KeyError if D is empty. `renormalize`() Renormalizes the distribution (in place). `setdefault`((k[,d]) -> D.get(k,d), ...) `to_dense`(keys[, validate]) Returns a dense representation of the distribution. `update`(([E, ...) If E present and has a .keys() method, does: for k in E: D[k] = E[k] `values`(() -> list of D’s values) `viewitems`(...) `viewkeys`(...) `viewvalues`(...)
`entropy`()

Returns the Shannon entropy of the distribution.

`is_valid`(test_prob=False)

Returns True if the distribution is properly normalized.

Parameters: test_prob : bool If True, then we check that the probabilities are between 0 and 1. b : bool True if the distribution is properly normalized and when tested, all probabilities are between 0 and 1. Otherwise, False.
`log_distribution` = False
`renormalize`()

Renormalizes the distribution (in place).

`to_dense`(keys, validate=False)

Returns a dense representation of the distribution.

Parameters: keys : list A list of keys, containing the keys in the distribution. Any additional keys will receive the appropriate zero probability. validate : bool If True, validate that the dense representation is valid. It can be invalid if `keys’ does not contain every key in the distribution. dist : NumPy array A dense representation of the distribution. The order of the keys in `keys’ specifies the order of the elements in the array.
`cmpy.information.misc.``convert_distribution`(dist, logs, validate=False)

Returns a new, converted distribution from `dist’.

Parameters: dist : DistributionDict, dict, NumPy array The distribution to be converted. logs : bool Specifies the type of the converted distribution. If `logs’ is True, `dist’ is assumed to be a standard distribution and the converted distribution will be a log distribution. Analogously, when `logs’ is False. validate : bool If True, make sure the converted distribution is valid. d : DistributionDict, NumPy array The converted distribution. If `dist’ is some sort of dictionary, the converted distribution will be a DistributionDict instance. If `dist’ is some sort of list or tuple, then the converted distribution will be a NumPy array.
`cmpy.information.misc.``normalize_distribution`(dist, logs)

Returns a new, normalized distribution from `dist’.

Parameters: dist : dict, NumPy array The distribution that is to be normalized. If a dict-like structure, then the values represent the probabilities of the distribution. Otherwise, it should be a 2D array-like structure compatible with a NumPy array; each row represents a distribution. logs : bool If True, the distribution is treated as a log distributions.
`cmpy.information.misc.``normalize_conditional_distribution`(dist, logs)

Returns a new, normalized conditional distribution from `dist’.

Parameters: dist : dict, NumPy array The distribution that is to be normalized. If a dict-like structure, then it should be a dict of dicts where the inner and outer keys are the words and the inner values are the probabilities for each distribution. Otherwise, it should be a 2D array-like structure compatible with a NumPy array; each row represents a distribution. logs : bool This tells us if the distributions are log distributions. d : DistributionDict, NumPy array If `dist’ is dict-like, then a DistributionDict instance is returned. Otherwise, a 2D NumPy array is returned.
`cmpy.information.misc.``validate_conditional_distribution`(dist, logs, test_prob=False)

Validates that the conditional distribution is properly normalized.

Parameters: dist : dict, NumPy array The distribution that is to be normalized. If a dict-like structure, then it should be a dict of dicts where the inner and outer keys are the words and the inner values are the probabilities for each distribution. Otherwise, it should be a 2D array-like structure compatible with a NumPy array; each row represents a distribution. logs : bool If True, the distributions are treated as log distributions. test_prob : bool This tells us to make sure each value is between 0 and 1.
`cmpy.information.misc.``validate_distribution`(dist, logs, test_prob=False)

Validates that the distribution is properly normalized.

Parameters: dist : dict, NumPy array This is the distribution to be checked. logs : bool This tells us if the distribution is a log distribution. test_prob : bool This tells us to make sure each value is between 0 and 1. b : bool True if the distribution is valid. Otherwise, an exception is raised. InvalidDistribution : Raised when the distribution is not porperly normalized. CMPyException : Raised when the distribution contains invalid probabilities.