Defining a Probability Distribution¶
In CMPy there are many different types of probability distributions depending on what your needs are.
Standard Distributions¶
This is your normal probability distribution. It can either be a single random variable:
>>> dist = Distribution({'0': 0.5, '1': 0.5})
Or it can be a joint distribution:
>>> dist = Distribution({'000': 0.25, '011': 0.25, '101': 0.25, '110': 0.25})
where the event ‘000’ corresponds to .
Warning
When creating a joint distribution such as this, each joint event must have the same length.
If you wish to create a distribution where individual events are compound objects (e.g. a string rather than a character) you must specify that the distribution is not a joint distribution by supplying the joint keyword:
>>> dist = Distribution({'alpha': 0.5, 'beta': 0.5}, joint=False)
You may also create distributions with arbitrary event labels if all you care about are the probabilities:
>>> die = Distribution([1/6]*6)
>>> die
Distribution:
{0: 0.16666666666666666, 1: 0.16666666666666666, 2: 0.16666666666666666,
3: 0.16666666666666666, 4: 0.16666666666666666, 5: 0.16666666666666666}
Log Distributions¶
A log distribution is useful when your distribution is likely to contain very small probabilities – so small that floating point precision may be inadequate. In such an event it is beneficial to store the log of the probability rather than the probability itself. As a contrived example:
>>> dist = LogDistribution({'A': -1, 'B': -2, 'C': -3, 'D': -3})
Symbolic Distributions¶
If you have access to sympy, you can create symbolic distributions representing families or classes of distributions. For example, the family of biased coins would simply be:
>>> p = Symbol('p')
>>> dist = SymbolicDistribution({'H': p, 'T': 1-p})
Generalized Distributions¶
Sometimes you may want a distribution where event probabilities can be outside [0, 1] – for that there are generalized distributions:
>>> d = GeneralizedDistribution('A': -0.5, 'B': 1.5)
Note
Since there is no way to define the entropy of a generalized distribution, none of the information-theoretic methods are available.