The maximum entropy formalism is the work of E.T. Jaynes. It has seen applications in everything from spectral analysis to black hole entropy.

According to Jaynes, (in *Where do we stand on maximum entropy?*)

In principle, every different kind of testable information will generate a different kind of mathematical problem. But there is one important class of problems for which the general solution was given once and for all, by Gibbs.

This mathematical form consists of the identification of a discrete probability distribution, , for discrete variables and for which the distribution is constrained by the specification of the mean values of certain functions :

where the sequence of mean values, , are given in the problem statement. Assuming that , the problem is under-determined; that is, there are two few equations to define all of the unknown variables. [This is even more true when we take the problem over into the identification of a continuous probability distribution over a (possibly infinite) range of real numbers.]

The solution for the is found by maximizing the entropy of the distribution:

subject to the constraints on the mean values listed above together with the constraint that

The method of solution introduces Lagrange multipliers. The constraint on the total probability being leads to the definition of the “partition function”, as

where the are the sequence of Lagrange multipliers corresponding to the constraints.

The problem has the formal solution that the are

The Lagrange multipliers themselves are given by the equations

The resulting entropy maximum depends only on the given data and is

If this function can be explicitly found, then the are just

Given the set of probabilities defined by this maximum entropy solution, then the best prediction that we can make of any other quantity, , related to the problem statement is just its expectation

Likewise, many statements about the variances, covariances, and reciprocities between such additional quantities can be taken from the identity

There is a useful special cases in which and as follows:

and likewise

The various functions, may depend upon certain parameters , thus

In any given problem, these parameters may have a given physical meaning such as volume, electric field intensity, dipole moment, or whatever. In this instance, we can consider variations in the maximum entropy problem such that

where

The notation here has been chosen for a specific historical reason. In thermodynamics, is heat when the constraint on is the mean energy, and the heat is considered not to be a function of state so that is not considered to be an exact differential of the state of a thermodynamic system, unlike, for example, the energy or entropy. Nonetheless, the temperature is considered to be an integrating factor for the heat so that is an exact differential of a state function; viz., the entropy, .

In classical thermodynamics, the commonly used label for the first Lagrange multiplier is where is Boltzmann’s constant and in our current notation . In the more general application of the formalism, we see an echo of the distinction between exact differentials of state functions and the use of the Lagrange multipliers as integrating factors for more general functions that arise in the problem. In Gibb’s original exposition of the method, the additional constraints were the particle numbers of the various components of a heterogenous equilibrium mixture and the other Lagrange multipliers, generally written as , were the chemical potentials of these various species.

## Fluctuations and Variances

In the context of a consideration of fluctuations in the functions given in the statement of the problem, it is worth pointing out a distinction between the degree of accuracy of an estimate of a mean value and the physical fluctuations in the function. These two are often conflated inappropriately.

Consider any function of a physical system. If we have a probability distribution for , then the best prediction that we can make for in the sense of the minimum expected square error is

The reliability of this estimator is related to the variance in this mean value; viz.,

Assuming that the system under consideration is at equilibrium, then the mean and variance are stable estimators independent of the time, . The estimator is reliable only to the extent that .

To be clear, is a measure of uncertainty in the value of as a predictor. Often, this number is also taken to be an estimator of the measurable root mean square fluctuations in . Statistically speaking, there is a distinction between the standard deviation of the expectation and the expectation of the standard deviation. Said another way, knowledge of the mean of to an accuracy of, say ±5% does not imply that fluctuates by ±5%.

Let us say that we can find a time average for through actual measurement as

Assuming that there is independent knowledge of the probability distribution for then the best estimator for is

So, to the extent that is independent of the time, then

which is to say that while any specific time average of a physical variable may not equal the true mean, the expectation of the time average and the true mean are equal.

But even this equality does not tell whether any given value of is a reliable or accurate one. For this, again, we compute the variance:

And again, the value of is a good estimator if and only if .

In equilibrium systems, we can introduce the covariance function of which is a function only of the difference in time values; viz.,

So the variance in the time average can be written in terms of the covariance function as

Let us now define the correlation time as

If the integrals in the definition converge and is finite, then in the limit we have that

which implies that will tend to 0 in proportion to . This may be seen as a feature of time-sampling over independent time windows of duration about . On the other hand, if is not finite and correlations persist beyond any practical measurement time, then there can be no accurate value for since the variance will tend to infinity for large sample times.

Now, this entire argument can be repeated for measurable fluctuations in ; viz., ,

Once more, we turn to the expectation of the fluctuation as

From this it is apparent that the expected measurable fluctuation is not the same as unless is negligible compared to , which depends on certain characteristics of the system and a long averaging time.

To see whether or not the estimator is reliable, we have to go one more step up the hierarchy and consider its variance:

This variance can be expressed as

where

which employs symmetry in the domain of integration over .

Now, going back to where we started; viz., the assumed equivalence between , the estimator of the standard deviation of the mean of f, and the measurable fluctuations of f, we find that this depends in fact on some fairly significant assumptions about the nature of the system under consideration.

There are many situations in which we find that fails to converge. This does not imply that the measurable fluctuations, , are infinite. Rather, this simply implies that the information available in the problem fails to make any reliable prediction of the mean, .

## Summary

In this post, I have simply set forth the basic elements of the formalism as given by Jaynes. None of this is original work in any way. I am only setting this down in one place for reference. Since the Wikipedia article on the maximum entropy method does not contain any reference to the variance, covariance, and fluctuation aspects of the formalism, I thought it was worth including these explicitly.

The formalism can be extended to the continuous case. I’ll consider that in another post.

## No Comments

## Trackbacks/Pingbacks