Image credit: Made with Midjourney
6. Probabilistic models: from simple to complex#
In this chapter, we take a deeper look at probabilistic models, which we have already encountered throughout. We show how to construct a variety of models, in particular by using the notion of conditional independence. We also describe some standard methods for estimating parameters and hidden states, as well as, for sampling. Finally, we discuss and implement some applications, including Kalman filtering and Gibbs sampling. Here is a more detailed overview of the main sections of the chapter|.
“Background: introduction to parametric families and maximum likelihood estimation” This section introduces parametric families of probability distributions, focusing on exponential families, which include many common distributions such as Bernoulli, categorical, multinomial, multivariate Gaussian, and Dirichlet distributions. It then discusses parameter estimation, specifically maximum likelihood estimation, which chooses the parameter that maximizes the probability of observing the data, and derives the maximum likelihood estimator for exponential families. The section proves that, under certain conditions, the maximum likelihood estimator is guaranteed to converge to the true parameter as the number of samples grows, a property known as statistical consistency. Finally, it presents generalized linear models, which provide an important generalization of linear regression using exponential families, and revisits linear and logistic regression from this perspective.
“Modeling more complex dependencies 1: using conditional independence” This section discusses techniques for constructing joint probability distributions from simpler building blocks, focusing on imposing conditional independence relations. It introduces the basic configurations of conditional independence for three random variables: the fork \((Y \leftarrow X \rightarrow Z)\), the chain \((X \rightarrow Y \rightarrow Z)\), and the collider \((X \rightarrow Z \leftarrow Y)\). The section then presents the Naive Bayes model as an example of applying conditional independence to document classification, where the presence or absence of words in a document is assumed to be conditionally independent given the document’s topic. Finally, it demonstrates fitting a Naive Bayes model using maximum likelihood estimation and Laplace smoothing.
“Modeling more complex dependencies 2: marginalizing out an unobserved variable” This section discusses modeling dependencies in joint distributions by marginalizing out an unobserved random variable. It introduces the concept of mixtures as convex combinations of distributions. The section then considers the specific case of mixtures of multivariate Bernoullis and the Expectation-Maximization (EM) algorithm for parameter estimation in this context, using the principle of majorization-minimization. Finally, the mixture of multivariate Bernoullis model is applied to clustering handwritten digits from the MNIST dataset.
“Application: linear-Gaussian models and Kalman filtering” This section discusses the application of linear-Gaussian models and Kalman filtering for object tracking. It begins by presenting the properties of block matrices and the Schur complement, which are used to derive the marginal and conditional distributions of multivariate Gaussians. The section then introduces the Kalman filter, a recursive algorithm for inferring unobserved states in a linear-Gaussian system, where the state evolves according to a linear-Gaussian model and noisy observations are made at each time step. The section concludes by applying the Kalman filter to a location tracking example, which consists in estimating the true path of an object from noisy observations.