:e ImportQualifiedPost
:e FlexibleContexts
:e BlockArguments
:e TupleSections
:e FlexibleContexts
:e OverloadedStrings
:e LambdaCase
import qualified Data.Text as T
import Control.Arrow (first,second)
import Control.Monad
import Graphics.Vega.VegaLite hiding (density)
import qualified Graphics.Vega.VegaLite as VL
import IHaskell.Display.Hvega (vlShow)
import Control.Monad.Bayes.Class
import Control.Monad.Bayes.Enumerator
import Control.Monad.Bayes.Weighted
import Control.Monad.Bayes.Sampler.Strict
import Control.Monad.Bayes.Traced.Static
import Control.Monad.Bayes.Inference.MCMC
:l ../plotting.hs
Following wikipedia:
Markov chain Monte Carlo (MCMC) is a powerful class of methods to sample from probability distributions known only up to an (unknown) normalization constant.
Markov Chain is a stochastic process where new state depends only on the previous state (Markov property for discrete-time process). Transition between states is governed by transition kernell $T$. In the limit of long simulation, probability distribution of steps of the Markov process converges to the stationary distribution $\pi_{\star}$. Reasonable assumptions of existence of the stationary distribution, non-periodicy of MC and availibility of all states must be held.
If we want to sample from a complicated distribution $\pi$, we can construct a Markov process which stationary distribution $\pi_{\star} = \pi$. The question is how to construct transition kernell to achieve $\pi$ as a stationary distribution.
MH is a classical and most widely use algorithm with many flavours. Here we present a general outline.
The algoritm is used to generate next point of the Markov Chain, namely $x_{i+1}$, in the space state. MH algorithms consist of proposal and acceptance/rejection steps:
Proposal: draw next candidate\ We construct transition kernel:
$$ T(x_{i+1}|x_{i}) = q(x_{i+1}|x_{i})\times p_{acc}(x_{i+1}|x_{i}) $$ where $q$ is proposal distribution for the next state of the chain and $p_{acc}$ is probability of sample being accepted.
Proposal distribution $q$ should be simple to draw from, for example uniform distribution around current point $\mathcal{U}(x_i - \delta, x_{i}+\delta)$, where $\delta$ is a small positive number.
Acceptance/rejection assess if the candidate might have been drawn from sampled distribution $\pi$\ Having symmetric proposal distribution and adding constraint of microscopic reversibility on the transition kernell:
$$ \pi(x_{i})T(x_{i+1}|x_{i}) = \pi(x_{i+1})T(x_{i}|x_{i+1}) $$
we have acceptance probability: $$ p_{acc}(x_{i+1}|x_{i}) = min\left\{1, \frac{\pi_{x+1}}{\pi_{x}}\right\} $$
For details on MCMC see the series of blog posts starting with https://www.tweag.io/blog/2019-10-25-mcmc-intro1/ or an online book https://bookdown.org/rdpeng/advstatcomp/markov-chain-monte-carlo.html.
monad-bayes
implementation¶There are several versions of Metropolis-Hastings algorith for MCMC defined in monad-bayes. The standard version is found in Control.Monad.Bayes.Inference.MCMC
- Traced MH.
Implementatio of a single step of of MH algorithm is mhTransWithBool
.
We'll start with the example of a simple regression. Function regressionData
defines a distribution of a linear regression values given a list of arguments.
paramPriorRegression = do
slope <- normal 0 2
intercept <- normal 0 2
noise <- gamma 4 4
return (slope, intercept, noise)
regressionData :: (MonadDistribution m, Traversable t) => t Double -> m (t (Double, Double))
regressionData xs = do
(slope, intercept, noise) <- paramPriorRegression
forM xs \x -> do
y <- normal (x*slope + intercept) (sqrt noise)
return (x, y)
range = [-10,-9.9..10] :: [Double]
regressionSamples <- sampleIOfixed $ regressionData range
plot (zip regressionSamples (Prelude.repeat $ T.pack "N/A"))
Functionregression
computes paramteres of the model, namely (slope, intercept, noise)
and their score.
regression :: (MonadMeasure m) => [Double] -> [Double] -> m (Double, Double, Double)
regression xs ys = do
params@(slope, intercept, noise) <- paramPriorRegression
forM_ (zip xs ys) \(x, y) -> factor $ normalPdf (slope * x + intercept) (sqrt noise) y
return (slope, intercept, noise)
mcmcConfig = MCMCConfig {numMCMCSteps = 1000, numBurnIn = 0, proposal = SingleSiteMH}
mhRunsRegression <- sampleIOfixed . unweighted . mcmc mcmcConfig $ uncurry regression (unzip regressionSamples)
MCMC solver puts new points at the beginning of the list. Last (in the list, so first from the simulation start) numBurnIn
points are removed as distributions from which they were drawn are not considered convergent to the stationary distribution $\pi$.
Let's se how the intercept and slope evolve during the simulations:
print "Slope"
l = length mhRunsRegression
xs = reverse $ fromIntegral <$> [1..l] ::[Double]
plot (xs, (\(x,_,_) -> x) <$> mhRunsRegression)
"Slope"
print "Intercept"
l = length mhRunsRegression
xs = reverse $ fromIntegral <$> [1..l] ::[Double]
plot (xs, (\(_,x,_) -> x) <$> mhRunsRegression)
"Intercept"
This is a sample from the MCMC walk. Since this is an easy inference problem, it wasn't hard to generate good samples.
We can also view the posterior predictive, as follows:
posteriorPredictive :: MonadMeasure m => [Double] -> [Double] -> m [Double]
posteriorPredictive xs ys = do
(slope, intercept, noise) <- regression xs ys
forM xs \x -> do
let y' = x * slope + intercept
normal y' (sqrt noise)
predictive <- head <$> (sampleIOfixed . unweighted . mcmc mcmcConfig $ uncurry posteriorPredictive (unzip regressionSamples))
plot (fmap (second (T.pack . show)) (zip (zip range predictive) (Prelude.repeat "N/A")))
Inspired by the tutorials on probabilistic programming language Gen (https://www.gen.dev/tutorials/iterative-inference/tutorial), we'll make the inference problem harder by using the example of a regression with outliers. The idea is that each datapoint $(x,y)$ has $y$ either linearly dependent on $x$, or randomly sampler (an outlier). So the goal of inference is to jointly work out what the linear relationship is and which points flout it.
paramPrior = do
slope <- normal 0 2
intercept <- normal 0 2
noise <- gamma 4 4
prob_outlier <- uniform 0 0.5
return (slope, intercept, noise, prob_outlier)
forward (slope, intercept, noise, probOutlier) x = do
isOutlier <- bernoulli probOutlier
let meanParams = if isOutlier
then (0, 20)
else (x*slope + intercept, sqrt noise)
return (meanParams, isOutlier)
regressionWithOutliersData :: (MonadDistribution m, Traversable t) => t Double -> m (t ((Double, Double), Bool))
regressionWithOutliersData xs = do
params <- paramPrior
forM xs \x -> do
((mu, std), isOutlier) <- forward params x
y <- normal mu std
return ((x, y), isOutlier)
This is our model. It describes a process for getting $y$ from $x$. Specifically, you start by drawing values for the slope $s$, bias $b$ and noise $n$. Then for each input $x$, you flip a coin. If it lands one way, you draw a $y$ value from a normal with mean $x*slope + bias$ and std $n$, and otherwise you draw from a centered normal with large variance.
Given a list of $x$ values, this gives a distribution over lists of $y$ values, from which we can sample:
range = [-10,-9.9..10] :: [Double]
samples <- sampleIOfixed $ regressionWithOutliersData range
plot (fmap (second (T.pack . show)) samples)
regressionWithOutliers :: (MonadMeasure m) =>
[Double] -> [Double] -> m ((Double, Double, Double, Double), [Bool])
regressionWithOutliers xs ys = do
params <- paramPrior
outliers <- forM (zip xs ys) \(x, y) -> do
((mu, std), isOutlier) <- forward params x
factor $ normalPdf mu std y
return isOutlier
return (params, outliers)
mhRuns <- sampleIOfixed . unweighted . mcmc mcmcConfig $ regressionWithOutliers range (snd . fst <$> samples)
outlierProb s = (\(x, y) -> x / (x+y) ) <$>
foldr
(\(_,lb) li -> [ if b then (num1+1, num2) else (num1,num2+1) | (b,(num1, num2)) <- zip lb li])
(Prelude.repeat (0,0)) s
plot $ take 5000 (zip (fst <$> samples) (outlierProb mhRuns))
Running MCMC gives us a list of samples. The results make sense: points that are very near the line are probably not outliers and ones very far are.
It would be nice to make our approach more sample efficient though. The SingleSiteMH
proposal is very naive. Two ways to improve on this would be to use customizable proposals, as in Gen
, or to use Hamiltonian Monte Carlo as in Stan
.