:e TupleSections
:e BlockArguments
import Control.Monad.Bayes.Class
import Control.Monad.Bayes.Sampler.Strict
import Control.Monad.Bayes.Weighted
import Control.Monad.Bayes.Enumerator
import Control.Monad
import Control.Arrow
import Data.Bifunctor (bimap)
import Numeric.Log
import qualified Data.Text as T
:l ../plotting.hs
Monad-Bayes exports a Bayesian
datatype to make using Bayesian models easier. The idea is that you specify your model as a prior, generative model and a likelihood.
data Bayesian m z o = Bayesian
{ prior :: m z, -- prior over latent variable Z; p(z)
generative :: z -> m o, -- distribution over observations given Z=z
likelihood :: z -> o -> Log Double -- p(o|z)
}
Likelihood is the PDF of the generative model and is a function from the problem domain to positive real numbers. Generative model allows for drawing samples from the distribution. Likelihood and generative model are separated because you can't always get the density from a distribution definition.
Once you've defined Bayesian model, a number of standard operations, like obtaining the posterior or the posterior predictive, are automated for you.
Here's a simple example:
model :: MonadDistribution m => Bayesian m Bool Double
model = Bayesian {
prior = bernoulli 0.5,
generative = \x -> normal (if x then 2 else (-2)) 1,
likelihood = \x y -> normalPdf (if x then 2 else (-2)) 1 y
}
sampler $
plot . fmap (first (T.pack . show)) . toEmpirical <$>
replicateM 100000 (prior model)
Prior predictive model is an application of the generative distribution to the prior distribution.
Similarly as in Sampling.ipynb
example we need to introduce a constant weight by mapping samples to a pair (value, score) with (,constWeight) <$>
.
constWeight = 1 :: Log Double
sampler $ plot . histogramToList . histogram 100 <$> replicateM 100000 ((,constWeight) <$> priorPredictive model)
Posterior distribution is computed using likelihood
and observations $os = \{o_{1}, \dots o_{n} \}$.
Factors are computed for posterior for model
, and in the considered example yield following:
True
False
.Note that factors do not have to sum up to 1. Empirical distribution is obtained by normalizing factors to 1.
In our example, we experiment how observed data change the a priori model given the likelihood function. Observations should be in the domain of the likelihood function.
Taking symmetrical and evenly distributed points, we do not change the posterior distribution:
observations1 = [-3..3]
sampler $ toEmpiricalWeighted <$> replicateM 100000 (weighted $ posterior model observations1)
[(True,0.5005199999999994),(False,0.49948000000000126)]
On the other hand, if the observations are unevenly distributed adjustments to posterior are well visible:
observations2 = [-2.3, -2, 0, 0.3, 2, 2.3]
sampler $ toEmpiricalWeighted <$> replicateM 100000 (weighted $ posterior model observations2)
[(True,0.7669770523099223),(False,0.23302294769007775)]
sampler $
plot . fmap (bimap (T.pack . show) (ln . exp)) . toEmpiricalWeighted <$>
replicateM 1000 (weighted $ posterior model observations2)
Change the observations lists and see how the posterior is affected.
Change standard deviation of likelihood
and see how posterior changes.
Analoguosly to prior predictive is generated by generative
applied to posterior
.
sampler $ plot . histogramToList . histogram 100 <$> replicateM 100000 (weighted $ posteriorPredictive model observations2)
The classical example of linear regression:
range :: [Double]
range = [-10,-9.9..10]
regression :: MonadDistribution m => Bayesian m (Double, Double, Double) [Double]
regression = Bayesian {
prior = do
slope <- normal 0 2
intercept <- normal 0 2
noise <- gamma 4 4
return (slope, intercept, noise),
generative = \(slope, intercept, noise) -> do
forM range \x -> normal (x*slope + intercept) (sqrt noise),
likelihood = \(slope, intercept, noise) ys -> Prelude.sum [normalPdf (x*slope + intercept) (sqrt noise) y | (x,y) <- zip range ys]
}
prior
distribution has three independent parameters. generative
combines them into a new distribution.
ys <- sampler $ priorPredictive regression
plot (fmap (second (T.pack . show)) (zip (zip range ys) (Prelude.repeat "N/A")))
posterior
distribution is computed base on the observed samples. Let's see what happens if the model sees samples of the same value only.
To explore the space of possible parameters we use Monte Carlo Markov Chain inference algorithm.
import Control.Monad.Bayes.Inference.MCMC
observations = [replicate 100 1]
ys <- sampleIOfixed $ mcmc MCMCConfig {numMCMCSteps = 5000, numBurnIn = 300, proposal = SingleSiteMH}
$ posteriorPredictive regression observations
ys
is a list of lists containing values generated for arguments from range
. Let's see how an example of generated samples look like:
plot (fmap (second (T.pack . show)) (zip (zip range $ head ys) (Prelude.repeat "N/A")))
Given the observations, we expect the model have average value of slope
to be 0 and intercept
1.
avg l = Prelude.sum l / fromIntegral (length l)
avX = avg range
avSqr = Prelude.sum $ map (\a -> (a-avX)**2) range
slopeValue l = covXY / avSqr
where
avY = avg l
covXY = Prelude.sum $ (\(x,y) -> (x - avX) * (y - avY)) <$> zip range l
interceptValue l = avg l - slopeValue l * avX
nObservations = length $ head observations
print $ "Average slope for " <> show nObservations <> " observations: " <> (show . avg $ slopeValue <$> ys)
print $ "Average intercept for " <> show nObservations <> " observations: " <> (show . avg $ interceptValue <$> ys)
"Average slope for 100 observations: 0.6482984353867459"
"Average intercept for 100 observations: 1.2220350681014394"
Slope and intercept average values differ a bit from the expected 0 and 1. Let's how they behave given more observations of the same kind.
observations1 = [replicate 200 1]
ys1 <- sampleIOfixed $ mcmc MCMCConfig {numMCMCSteps = 5000, numBurnIn = 300, proposal = SingleSiteMH}
$ posteriorPredictive regression observations1
nObservations = length $ head observations1
print $ "Average slope for " <> show nObservations <> " observations: " <> (show . avg $ slopeValue <$> ys1)
print $ "Average intercept for " <> show nObservations <> " observations: " <> (show . avg $ interceptValue <$> ys1)
"Average slope for 200 observations: 0.20039201335018098"
"Average intercept for 200 observations: 1.1534663432408423"
To better understand how the Bayesian model works here, you may do following:
noise
only