# Information Theoretic Models

Information Theoretic Models are defined as a theory that designed to explain an entire behavior or situation, with the idea which would eventually be able to predict that behavior. An example of Information Theoretic model is Entropy.

**Entropy :-**

The uncertainty is known as the entropy. Entropy is** **disorder, while max entropy is maximum disorder or minimum information.

Conceptually, **Boltzmann’s entropy=-log(f(x)/g(x))**

**And K-L=Ef(-Boltzmann’s entropy)**

**=Ef(log f(x)/g(x))**

**=ʃ f(x)log(f(x)/g(x))dx**

Thus, minimizing the K-L distance is equivalent to maximizing the entropy; hence the name maximum entropy principle.

**Maximum entropy principle :-**

Maximum entropy principle was proposed by Jaynes in 1958 which states that the experimenter should choose a distribution that maximizes the entropy of the data given the constraints. The Principle of Maximum Entropy is based on the premise that when estimating the probability distribution,distribution is selected which leaves the largest remaining uncertainty, that is, the maximum entropy consistent with the constraints. In that way any additional assumptions or biases is introduced into calculations.

**Mutual information :-**

Information theoretic metrics measure the value of combining qualitative and quantitative data. Information gain is used to measure the change in uncertainty, mutual information is used to measure triangulation, and conditional entropy is used to measure complementation and paradox. The convergence of qualitative and quantitative data could be measured in terms of mutual information and value of new perspectives obtained is measured in terms of conditional entropy. Mutual information of two random variables is a quantity that measures the mutual dependence of the two variables.

**Kullback-Leibler divergence :-**

Kullback-Leibler divergence can be to determine the complexity of a single image in the entire archive, being an objective measure. The Kullback-Leibler (KL) divergence is a measure in statistics that quantifies in bits how close a probability distribution p = {pi} is to a model (or candidate) distribution q = {qi},

**DKL(p || q) =Xipi log2piqi**

DKL is non-negative (>=0), not symmetric in p and q, zero if the distributions match exactly and can potentially equal infinity. A common technical interpretation is that the KL divergence is the “coding penalty” associated with selecting a distribution q to approximate the true distribution p.

**Maximum mutual information principle :-**

The maximum mutual information principle is a well-established computational technique which is used in the optimization, in particular, of decision trees and diagrams of logic functions. This principle operates by maximizing the mutual information employing a redundancy in the X compared with Y. It is used for computing and optimization of logic functions using decision diagrams.

**Infomax and redundancy reduction:**

Infomax principle is stated as- the transformation of a random vector X observed in the input layer of a neural network to a random vector Y produced in the output layer should be chosen so that the activities of the neurons in the output layer jointly maximize information about the activities in the input layer. The mutual information between X and Y is maximized, is the objective function.

Redundancy reduction principle is proposed as the goal of unsupervised learning. The goal of redundancy reduction is to factorize the input probability distribution without loosing information.

**Maximum likelihood estimation:**

This method is used for parameter estimation. MLE is a standard approach to parameter estimation and inference in statistics. Optimal properties in estimation of MLE are: sufficiency(complete information about the parameter of interest),efficiency(lowest possible variance of parameter estimates achieved asymptotically), consistency(generation of true parameter value the data recovered asymptotically),and parameterization invariance(same MLE solution obtained independent of the parameterization used).

**Maximum entropy method:**

The maximum entropy method is usually stated in a deceptively simple way: from among all the probability distributions compatible with empirical data. MEM is an information-theory based technique that was first developed in the field of radio astronomy to enhance the information obtained from noisy data. MEM has been introduced in charge density reconstruction. It can yield high resolution density distribution from a limited number of diffraction data.

**Related Questions and Answers**

**Q1.What is mutual information?**

Ans.- Mutual information is the difference between the entropy and the conditional entropy. It is a measure of uncertainty about the output of the system that is resolved by observing the system input.

**Q2. What is entropy?**

Ans.-The entropy is defined as measure of the average amount of information conveyed per message.

**Q3. What is maximum likelihood method?**

Ans.-The method of maximum likelihood may be applied to any estimation problem, if the joint probability distribution of the available set of observed data can be derived. Then the method yield almost all known estimates as special cases.

**Q4. What are MLE properties?**

Ans.- Maximum likelihood method(MLE) has following properties-sufficiency, consistency, efficiency and parameterization invariance.

**Readers can give their suggestions / feedbacks in the given below comment section to improve the article.**

**Related Topics:**

Data Ware house and Data Mining Questions

« Latest JNTU PACET Syllabus with PDF Latest VITEEE Syllabus with PDF »

## Tell us Your Queries, Suggestions and Feedback