Word embedding is the process of turning text into real-valued vectors. It represents each term in the corpus in a multidimensional space. In this example, words are represented in a two-dimensional space for ease of interpretation, but in many real applications, this space can contain hundreds of dimensions. The goal of word embedding is two-fold. First, word embeddings enable us to visualize terms in a real-valued vector space. Terms that cluster provide similar meaning and information in the corpus. Second, to use text features within machine learning algorithms, the text must be converted to numeric values. Word embedding provides an alternative to the singular value decomposition for the creation of inputs that represent the information contained in text data. There are two different conceptual approaches to word embedding, with various methods for each approach. The first approach is to learn word representations for the entire corpus, where terms in the corpus determine the embedding. This is usually done using matrix factorization and is similar in idea to singular value decomposition using the term-by-document matrix. The second approach is to learn word representations locally. This approach uses shallow window-based methods to generate the embedding. In this context, the window selects words near a word of interest and uses them to help determine the embedding. The GloVe algorithm is an unsupervised global method for generating word embeddings. GloVe stands for Global Vectors for Word Representation, and the embeddings are derived from aggregated global word-to-word co-occurrence statistics from a corpus. The core idea is that word vectors are trained so that their dot products equal the logarithm of their probability of co-occurrence. In the table, the term ice co-occurs with the terms solid and water more than it does with gas and fashion, as expected. The term steam co-occurs with gas and water more than it does with solid and fashion. By taking ratios, you can see that ice is highly correlated with solid and steam is highly correlated with gas. The ratios for water and fashion are near 1, which indicates that both terms in the ratio are nearly equivalent. Ice and steam are both more correlated with water than with fashion, but since both ratios are near 1, neither is more correlated with water or fashion than the other. The results of this algorithm are trained word vectors that are created for each term in the corpus and are designed so that the vector differences between word vectors capture as much as possible the meaning specified by the juxtaposition of the two terms. The idea here is to perform well at analogy tasks, so that the numeric results of a vector operation like "king queen" is close in value to the results of "man woman" or "uncle aunt." After the vectors are trained, they can be used to compare similarity of words by performing the dot product operation, they can be visualized to understand the difference in a multidimensional space, or they can be fed as inputs into predictive modeling algorithms.