:e ImportQualifiedPost
:e FlexibleContexts
:e BlockArguments
:e TupleSections
:e FlexibleContexts
:e OverloadedStrings
:e LambdaCase
:e PackageImports
import Control.Monad.Bayes.Class
import Control.Monad.Bayes.Enumerator
import Control.Monad.Bayes.Weighted
import Control.Monad.Bayes.Sampler.Strict
import Control.Monad.Bayes.Density.Free
import Control.Monad.Bayes.Population
import Control.Monad.Bayes.Sequential.Coroutine
import Control.Monad.Bayes.Inference.SMC
import qualified Graphics.Vega.VegaLite as VL
import IHaskell.Display.Hvega (vlShow)
import Control.Applicative
import qualified Data.Text as T
import Pipes (Producer, (>->))
import qualified Pipes as P
import Pipes.Prelude (unfoldr)
import qualified Pipes.Prelude as P
import Data.Ord
import Data.List
import Control.Monad
import Control.Arrow (first)
import Numeric.Log
import Data.Vector qualified as V
import "matrix" Data.Matrix hiding ((!))
import Data.Vector.Generic as VG (Vector, map, mapM, null, sum, (!))
:l ../plotting.hs
This tutorial discusses particle filters, and sequential Monte Carlo more generally.
These techniques are relevant when performing inference on models where there is a large series of factor statements, some of which can be performed earlier than others. This situation often arises in time series models where the factor statements are the result of incoming observations, but the technique works for any probabilistic program.
As a motivating example, consider the following program:
ex :: MonadMeasure m => Int -> m [Bool]
ex n = replicateM n do
x <- bernoulli 0.5
condition x
return x
What distribution does this represent? It is the distribution over all lists of Booleans of length $n$, which for e.g. $n=4$, places all the weight on the sequence [True,True,True,True]
. Function condition
give a default score of 1 if x else 0 and effectively filters out all False
values.
Function normalForm
computes all possible combinations of discrete variable values, here a sequence of $n$ booleans, filters out ones with weight 0 and aggregates weights of equal values.
normalForm $ ex 4
[([True,True,True,True],6.25e-2)]
explicit $ ex 2
[([True,True],0.25),([True,False],0.0),([False,True],0.0),([False,False],0.0)]
This is inefficient
However, the naive approach to exactly inferring this distribution will not work. Why? Because it first constructs all $2^n$ possible solutions, and then throws away all but one. This has complexity exponential in $n$ and for e.g. $n=100$, it is hopeless.
Now if we look at the structure of the program, it's clear that this is unnecessary. Each time a condition
statement is made, we should throw away all possibilities with $0$ probability mass. If we do this, the size of the set of possible solutions never explodes.
We can perform this sequential enumeration with monad-bayes, as follows:
enumerate $ sis removeZeros 100 $ ex 100
[([True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True],1.0)]
sis
(Sequential Importance Sampling) is an inference method which performs a step of inference at each factor
statement in the program. In the present case, we have used it in conjunction with exact inference, but the idea generalizes naturally to approximate inference methods.
To motivate sis with approximate inferece methods, let's examine a problem for which exact inference is no longer feasible, a non-linear state space model. Here we will make use of the pipes library, which offers an elegant way to represent distributions over discrete random walks as infinite streams. See the examples/Streaming
notebook for more info.
A Producer
represents a stream of data, and a Pipe
maps one stream to another. Both are parametrized by a monad m
, which in our case will be MonadDistribution m => m
.
For example, a 2D random walk with a Gaussian transition
kernel can be written as follows:
-- how to move from one latent state to the next
-- for multivariate normal distribution
transition :: MonadDistribution m => V.Vector Double -> m (V.Vector Double)
transition vector = mvNormal (fmap (+0.5) vector) (identity (V.length vector))
-- For 2D case: a Markovian random walk starting at (0,0),
-- with `transition` as the kernel
-- a Producer is an infinite stream of values
prior :: MonadDistribution m => Producer (V.Vector Double) m r
prior = P.unfoldr (\s -> Right . (s,) <$> transition s) (V.fromList [0,0])
It's straightforward to take the first n
steps of this infinite stream, and convert it into a distribution over finite list, from which we can then sample:
-- convert the stream to a list, taking only the first 100 steps
toList :: Monad m => P.Producer a m () -> m [a]
toList prod = P.toListM (prod >-> P.take 100)
randomWalkPrior <- sampleIOfixed $ toList prior
plot
(zip
(fmap (\v -> (v V.! 0, v V.! 1)) $ randomWalkPrior)
(replicate 100 (T.pack "Latent")))
We can also produce a stream of observations from prior
, each a noisy perturbation of the datapoint in question, by using a Pipe
:
observationModel = (`mvNormal` fromLists [[20,0],[0,20]])
observations :: MonadDistribution m => P.Pipe (V.Vector Double) (V.Vector Double) m ()
observations = P.mapM observationModel
randomWalkObserved <- sampleIOfixed $ toList $ P.each randomWalkPrior >-> observations
plot
(zip
(fmap (\v -> (v V.! 0, v V.! 1)) $ randomWalkPrior <> randomWalkObserved)
(replicate 100 (T.pack "Prior") <> replicate 100 (T.pack "Observed")))
But most interestingly, we can update our prior, based on a stream of incoming observations. conditioning
below takes two Producer
s, the stream corresponding to the prior (see prior
above) and the stream corresponding to the observations (P.each observedVectors
, where P.each
lifts a list to a stream), and a likelihood, and returns a posterior, also expressed as a stream.
-- take the original random walk as a prior and condition on the observations
-- to obtain a posterior random walk
conditioning :: MonadMeasure m =>
P.Producer (V.Vector Double) m () ->
P.Producer (V.Vector Double) m () ->
((V.Vector Double, V.Vector Double) -> m ()) ->
P.Producer (V.Vector Double) m ()
conditioning prior observations observationModel =
P.zip prior observations
>-> P.chain observationModel
>-> P.map fst
posterior :: MonadMeasure m => m [V.Vector Double]
posterior = toList $ conditioning
prior
(P.each randomWalkObserved)
(\(v, v') -> do
prediction <- observationModel v
let (x, y, x', y') = (prediction V.! 0, prediction V.! 1, v' V.! 0, v' V.! 1)
factor $ normalPdf x 2 x' * normalPdf y 2 y' )
We can then use an inference method of our liking, but in this tutorial, we would like to see how SMC behaves.
Following Scibor et al:
Within the context of SMC, recall that a sample in a population is called a particle. The algorithm starts by initialising a population of size n, then repeatedly runs the program to the next
score
, resamples the population, runs to the next score and so on.
particles <- sampleIOfixed $ runPopulation $
smc SMCConfig {
numSteps = 100,
numParticles = 1000,
resampler = resampleMultinomial}
posterior
Distribution is generated over whole paths. We may compare realizations of prior and inferred (here 5 paths) with the observed path.
inferredVectors = fst =<< particles
plot
(zip
(fmap (\v -> (v V.! 0, v V.! 1)) $ randomWalkPrior <> randomWalkObserved <> inferredVectors)
(replicate 100 (T.pack "Prior") <> replicate 100 (T.pack "Observed") <> replicate 500 (T.pack "Inferred")))
We can see here that SMC
has done a pretty good job.
The intuition of how SMC does well is that it is searching the space of solutions with some backtracking (because it has a whole population of guesses to rely on), and moreover, this search results in unbiased sampling from the distribution of interest. So if it makes a series of guesses the leads it astray, it can throw away this possiblity at a later stage.