In this session, I'm going to talk about LDA and DMR in more details. Let's first talk about LDA. LDA takes generative process. Each document is a random mixture of corpus-wise topics and each word is drawn from one of those topics. This assumes topics exist outside of the document collection. Each topic is a distribution over mixed vocabulary. As the arrow shows, it is a generative process. First, it chooses a distribution over topics, which is drawn from a direct distribution where yellow, pink, green and blue have some probabilities. Then repeatedly, it draws a word which highlighted from each distribution. Next, it looks up what each word topic it belongs to by the color. Finally, it chooses the word from the distribution. Let me talk about what happens behind the word. The figure shows conditional distribution of all latent variables given the observations, which are in this case, each of the words of the documents. However, we actually only observe the documents. And therefore, must infer the underlying topic structure. The goal is to infer the underlying topic structure given documents being considered or observed. What are the topics generated under these assumptions? What are the distributions over the terms that generated these topics? For each document, what is the distribution over topics associated with the document? For each word, which topic generated each word? Given the observations which are the words in the documents, conditional distribution of all of those latent variables has something to do it answering those questions. The figure shows graphical model with presentation of LDA. The boxes are plates representing replicates, as I mentioned a slide ago. Outer plate represents documents while the inner plate represents the repeated choice of topics and words within a document. Those are random variables. X's denote possible dependence. Observe variables are shaded. Plates denote replicated structure. Alpha is Dirichlet parameter. Theta of D is per document, topic proportions. G of D, comma n is per word topic assignment. Wd,n is observed word, which is nth word in the DS document. Βd,n is topics where beta means distribution over terms. Eta is topics hyperparameters. Dirichlet distribution is named after Peter Dirichlet. If family of continuous multi-varied probability distributors characterized by a vector alpha of positive numerals. It is the multi-varied generalization of the beta distribution. The Dirichlet Distribution are often used, as prior distribution in basin statistics. In LDA, the direct distribution of a fixed set of K topics is used to choose a topic mixture for the document. The Dirichlet is conjugated to the multinomial, given a multinomial observation the posterior distribution of theta is a Dirichlet. The parameter alpha control the main shape, as sparsity of theta. Parameter alpha is a K vector with components are thought of i, which is greater than 0. The topic proportions are a K-dimensional Dirichlet and the topics are a V dimensional Dirichlet. The objective of LDA Inference is to compute the condition of distribution, which is called posterior of the topic structure given the observed documents. LDA puts posterior topical words together. LDA accomplish it by maximizing the word probabilities by dividing the words among the topics. Inference takes places in joint distribution between hidden and observed variables. In a mixture model, it finds cluster of occurring words in the same topic. In LDA, a document will be penalized for having too many topics by hyperparameter. Usually, this can be thought of as softening the strict definition of co-occurrence in a mixture model. This flexibility leads to sets of terms that more tightly co-occur. Exact inference in LDA is not tractable. Thus, we need approximate posterior inference methods, such as mean field variational inference and collapsed Gibbs sampling. The Gibbs sampling algorithms is mainly used in Mallet. It is a typical Markov chain Monte Carlo method and it was originally proposed for image restoration. It defines a Markov Chain, whose stationary distribution is the posterior interest. It collects independent samples from the stationary distribution and then approximate the posterior with them. The chain is run by interoperably sampling from the conditional distribution of each hidden variable given observations and the current state of the other hidden variables. Once a chain has pour in, it collects samples that lead to approximate the posterior. DMR stands for Dirichlet Multinomial Regression and it was proposed by David Mimno, and Andrew McCallum in 2008. It is a variation of LDA that incorporates arbitrary types of observed features. DMR is used to model the influence of document metadata. Unlike LDA like models, the prior distribution over topics, which is alpha is a function of Ochur document features and it s therefore specific to each document. A multinomial distribution can be viewed as a set of independent Poisson random variables. Conditioned on the sum of those variables being equal to a constant N. Likewise, Dirichlet multinomial can be construed as a set of independent, compound, Poisson distribution given the same conditions. As a result, the Dirichlet multinomial distribution can be written as multinomial with the extra over dispersion parameter.