IAN STEWART: ENTROPY VS. DISORDER AND GRAVITICS VS. THERMODYNAMICS
By Mark Perakh
Posted on November 11, 2004
Ian Stewart is a distinguished British mathematician and writer. This essay is a brief commentary on certain aspects of Ian Stewart’s article titled “The Second Law of Gravitics and the Fourth Law of Thermodynamics” which is found in the collection From Complexity to Life edited by Niels Henrik Gregersen.  In two essays published previously I commented on the articles by Paul Davies  and Charles H. Bennett  in the same collection, so this essay is the third installment in the planned review of the entire collection (or at least of most of its chapters).
Overall, Ian Stewart’s paper is thought-provoking and contains many points I agree with. I have, however, a few reservations regarding some of Stewart’s notions.
One such notion concerns the relationship between entropy and “disorder.” For example, Stewart renders his thesis as follows (page 122):
“… order/disorder metaphor is fraught with tacit assumptions (that are often wrong) and is inconsistent, ill defined, and by no mean unique.”
He also writes (page 144):
“Thermodynamic entropy is metaphorically identified with disorder. However, the assumptions involved in such terminology are exceedingly dubious.”
On pages 122-123, Stewart discusses an example of gas that expands from one corner of a container to fill the entire available volume, and writes that the conventional interpretation of that process concludes (page 123) that “loss of order corresponds to gain of entropy, so entropy equals disorder.” However, in Stewart’s view such an interpretation is far from being universal. As I understand Stewart’s notions, in his view the distinction between “order” and “disorder” has no rigorous definition but only an intuitive understanding which is one of the reasons that the definition of entropy as a measure of disorder is not well-substantiated.
To my mind, this Stewart’s conclusion is not quite convincing.
Indeed, if there is no rigorous and commonly accepted definition of “disorder” except for an intuitive understanding (as Stewart seems to think) then whatever definition of entropy were chosen, it couldn’t contradict the non-existent definition of disorder. If this were the case, what would prevent us from choosing entropy to be a measure of “disorder” by carefully defining the connection between the two concepts? Adopting entropy as a measure of disorder is not prohibited by any known laws of physics or theorems of information theory.
In fact, however, there are definitions of disorder. Therefore, in order to decide whether defining entropy as a measure of disorder is substantiated, we have to see if such a definition is compatible with the existing definitions of disorder.
In my view, the definition of entropy as a measure of disorder can quite well fit some of the existing concepts of disorder. In particular, a consistent and fruitful definition of disorder is given in the algorithmic theory of complexity/information by Solomonoff, Kolmogorov, and Chaitin (SKC), whose seminal concepts have been explained, for example, in the elegant article by Gregory Chaitin reprinted in the same collection.
The SKC theory ties together complexity, randomness, and information (although “Kolmogorov information” differs from Shannon information ). It is easiest to apply it to informational systems, e.g. to texts.
Let us represent the system under investigation as a text. A sufficiently complete description of the system in plain words and/or mathematical formulae can serve as the source of such a text. This description can always be transcribed in binary symbols, i.e. as a string of zeroes and ones. (A simple example is representing a text, written in any conventional alphabet, by Morse alphabet, wherein each dot represents a 0, and each dash a 1).
In terms of algorithmic information theory, a truly random (i.e. disordered) text cannot be compressed preserving the amount of information it carries. In terms of classical information theory a random text has zero redundancy. Redundancy is a single-valued function of the text’s Shannon’s entropy (the latter measured in bits per character):
R=1- Sact/ Smax
where R is redundancy (a dimensionless quantity, 0≤R<1), Smax is the maximum possible specific entropy of a text (i.e. the entropy per character of a perfectly random text whose redundancy is R=0) which equals Smax=log2 N. N is the number of symbols available in the alphabet. Sact is the specific entropy of the actual specimen of text which, unlike the perfectly random text, possesses a certain redundancy R>0.
To my mind, the definition of entropy as a measure of disorder (i.e. of degree of randomness) is fully compatible with the concept of disorder (randomness) as defined in the algorithmic theory of information.
Is the same true for thermodynamic entropy? First, recall that Shannon’s formula for text’s entropy (also referred to as Shannon’s uncertainty) is the same as Boltzmann-Gibbs’s formula for thermodynamic entropy except for a constant. This formula is
K is the constant which in Shannon’s version is dimensionless and taken to be K=1 while in Gibbs-Boltzmann’s version K is the Boltzmann coefficient measured in Joule/Kelvin. Also, in Gibbs-Boltzmann’s version, natural logarithms are traditionally used while in Shannon’s version it most often is a logarithm to the base of 2, which results in measuring H in bits per character (although natural logarithms are occasionally used as well in which case instead of bits the units are called nats or nits). Finally, in Shannon’s version pi is probability of an “event” (usually the event is the occurrence of a given character at a specific location in a string of characters), while in Gibbs-Boltzmann’s version pi is probability of an accessible thermodynamic microstate.
The mentioned differences between Shannon’s and thermodynamic entropies do not affect the principal interpretation of entropy as a measure of disorder. (In statistical physics  entropy is treated as an essentially dimensionless quantity, thus erasing its most salient difference from Shannon’s uncertainty).
In order to interpret thermodynamic entropy as a measure of disorder, it is useful to review the behavior of “parameters” of the system. Generally speaking, each system is affected by an immense number of parameters, but only a fraction of them exert a substantial influence on the system’s behavior. Choosing the set of a system’s essential parameters means selecting those few parameters which are assumed to principally affect the system’s behavior, and ignoring a vast number of other parameters assumed to play an insignificant role in that behavior. The selected “crucial” parameters must be given a quantitative representation. For example, for a system which is a certain amount of ideal gas of mass m, the essential parameters can be reasonably limited to only three – pressure, volume, temperature – P,V,T. Then the system’s degree of order may be defined as the degree to which the parameters have gradients.
In a perfectly “disordered” (randomized) system there are no gradients of the appropriate parameters. Of course order is a matter of degree. A system can be more or less ordered (or disordered). There can never be certainty that a system is perfectly disordered (random). However, it is principally feasible to assert that a system is not random – to this end it is sufficient to point to any non-zero gradients of its parameters (for example, for the ideal gas of mass m, the gradients in question may be such as |dP/dx|>0, |dV/dx|>0, |dT/dx|>0, |dP/dy|>0, |dV/dy|>0, |dT/dy|>0, |dP/dz|>0, |dV/dz|>0, and/or |dT/dz|>0, etc.
I believe this approach in no way contradicts Solomonoff-Kolmogorov-Chaitin’s concepts of randomness and complexity.
In view of the above, I submit that it is possible to assert that the thermodynamic entropy (treated statistically) is essentially the same concept as informational entropy (i.e. as Shannon’s uncertainty) and can reasonably be chosen as a measure of disorder as well.
Now turn to another point in Stewart’s paper. On page 142 we read, “In information theory (Shannon & Weaver 1964) there is a quantity called entropy that is the negative of information.”
Since I don’t have at hand Shannon & Weaver’s book of 1964, I’ll refer to Shannon’s original work of 1948, titled A Mathematical Theory of Communication. If we look it up we’ll discover that entropy, in Shannon’s original interpretation, could hardly be consistently interpreted as the negative of information. In fact, Shannon had defined entropy as the average information per character in a string, and it is this definition which reflects the contents of the above formula for H. I don’t see how the quantity which is the average information can simultaneously be construed a negative of information as a general interpretation.
Perhaps the confusion may stem from the view of entropy as a property of a system. In fact the only consistent interpretation of entropy is viewing it as a measure of disorder but not as a property of either the system or of its components. As a measure, entropy of, say, gas, is not a property of the molecules but only a measure of the degree of randomness of their distribution over the container’s volume (and/or of the distribution of molecules’ momenta). Likewise, Shannon entropy of a string is not a property of characters of which the string is composed but the measure of randomness of the characters’ distribution along the string.
A measuring tape can be used to measure, say, my height. However, nobody would think that the tape is my height, or that the tape is a part of my organism. Likewise, entropy can be used to measure the loss of information but it does not make it the negative of information as a definition. Entropy (i.e., average information in Shannon’s conceptual system) can be used to measure the decrease of uncertainty at the
receiving end of a communication chain but it does not make it the negative of information either. It can equally be used to measure the amount of information carried out from the source of a sent message (whereas it is not necessarily accompanied by the loss of information at the source; sending information out may not cause a loss of information at the source). Entropy and information have no “plus” or “minus” sign inherently attached – they can be construed either positive or negative depending on the vantage point. What is negative for John may be positive for Mary.
Indeed, continuing his discourse Stewart denies the quoted interpretation of Shannon’s entropy as the negative information. For example, on page 143 we read,
“…in information theory, the information ‘in’ a message is not negative information-theoretic entropy. Indeed, the entropy of the source remains unchanged, no matter how many messages it generates .”
I agree with that statement. Note that it is at odds with the concepts affirmed in the papers by Paul Davies [8, 9] in the same collection, as is discussed in my review of Davies’s first paper . (I intend to write up a discussion of Davies’ second paper  wherein, in particular, to argue against Davies’s thesis regarding entropy as being a negative of information; I plan to specifically discuss there Davies’s example of information’s and entropy’s changes in the process of achieving equilibrium between portions of a gas that initially had different temperatures.)
I also believe that accepting the notion of entropy as a measure rather than a property makes attempts to define a meaningful “law of conservation of information” hopeless – such a law would be meaningless as information is not a commodity which can be conserved, and the above quotation from page 143 of Stewart’s article is in agreement with my statement. (Another contributor to the collection in question, William Dembski, has been promoting for several years his version of the “law of the conservation of information” which he claims to be the “4th law of thermodynamics.” Besides the general fault of such attempts because conservation of information is a concept void of a meaningful interpretation, the alleged law, in Dembski‘s formulation, contradicts the 2nd law of thermodynamics, as discussed elsewhere ).
Back to the role of thermodynamic entropy as a measure of disorder, let us note that entropy of texts (including molecular chains such as DNA and the like) which is a reflection of the degree of text’s incompressibility in Kolmogorov-Chaitin sense, on the other hand can be naturally tied to gradients of appropriate parameters.
For example, as was shown , semantically meaningful texts, regardless of language, authorship, etc., display what we called Letter Serial Correlation (LSC) which is absent in gibberish texts. In particular, LSC in semantically meaningful strings results in the existence of what we called the Average Domain of Minimal Letter Variability which is not found in gibberish strings. Along the semantically meaningful string there is a gradient of a parameter which we called Letter Serial Correlation sum, Sm. No such gradients are observed in “disordered” texts, i.e. in randomized gibberish, where Sm displays haphazard variations along the string. In highly ordered but meaningless strings of symbols gradients of Sm often display wave-like regular variations along the string. In semantically meaningful strings gradients of Sm are also variable along the text, but the variations of Sm along the string display a peculiar, easily recognizable shape, qualitatively identical for all meaningful texts regardless of language but differing quantitatively depending on the language and specific texts.
Accordingly, the entropy of texts is the highest for chaotic gibberish “texts” where Sm varies haphazardly so no gradients emerge (except for accidental local mini-gradients). It is minimal in highly ordered meaningless texts where gradient of Sm varies along the string in a regular wave-like manner. For semantically meaningful texts entropy has intermediate values. Therefore it also seems reasonable to define entropy as a measure of disorder for texts from the standpoint of gradients. (There seem to be a certain limitation to this interpretation, pertaining to ideally ordered meaningless strings; it is, however, due to the peculiar choice of Sm as the parameter in question rather than to the overall gist of the argument. This limitation can be removed by choosing, instead of Sm, another, slightly different parameter; this point will not be further discussed here because of its extraneous character for this discourse).
For thermodynamic systems, to consistently apply the above definition of entropy as a measure of disorder we have to look at the behavior of appropriate thermodynamic parameters as well, and if we discover gradients, we define the system as having a degree of order. If a process results in steeper gradients, either constant or variable, entropy decreases (as well as the degree of disorder).
The 2nd law of thermodynamics requires increase of entropy only in closed systems, i.e. as a result of spontaneous processes only. If a process is imposed on a system, entropy can behave in different ways and the 2nd law does not prohibit its decrease (while also not asserting its inevitability).
Stewart considers an example of gas gathering in a corner of a container. The gathering of gas molecules in a corner of a container, while not entirely impossible, is highly unlikely to occur spontaneously. We conclude that a spontaneous process most likely results in gas spreading all over the container’s volume, and is accompanied by entropy increase and a reduction (up to the almost complete elimination) of gradients of density, temperature, and pressure (except for local fluctuations). Nothing prohibits interpreting this as an increase of disorder, and such a definition does not contradict the definition of entropy as a measure of disorder. If, though, molecules are driven into a corner by external force, entropy of gas decreases and we may legitimately assert that the order within the system increases as well; such an assertion will be in tune with the interpretation of disorder both from the standpoint of Kolmogorov-Chaitin’s incompressibility of a string representing the system and from the standpoint of gradients as measures of order.
Stewart suggests that a Fourth law of thermodynamics may indeed exist, but in his view it is not the putative law of conservation of information (which, I believe, belongs in crank science) but rather a more plausible “2nd law of gravitics.”
I enjoyed Stewart’s discourse where he suggested that putative new law, although I have some reservations. One very minor reproach I could offer is about Stewart’s reference (page 116) to “standard three laws of thermodynamics.” It is an odd reference because there are not three but four “standard” laws of thermodynamics (the zeroth, the first, the second, and the third). An additional law of thermodynamics, if ever established, would indeed be named the 4th law, but not because there are only three such laws so far, but because of the peculiar way the existing four laws have been historically named.
The “2nd law of gravitics” which, according to Stewart, is so far just a “place-holder” for the future law supplementing the existing laws of thermodynamics and which is expected to reside outside of thermodynamics, is formulated by Stewart (page 146):
As time passes (forwards according to the thermodynamic arrow) gravitic systems tend to become more and more clumped.
To my mind, the above statement sounds more like an observation, i.e., a statement of fact than like what usually is given the status of a law of science. Of course, in a broad sense any statement of facts may be claimed a law, and to my mind this is a legitimate usage of terms. Say, “at a pressure of about 105 Pascal, ice melts at 273.16 Kelvin” is a statement of fact derived from observation and may legitimately be viewed as a scientific “law.” However, if we survey the conventional practice, the honorific title of “law” is more often reserved for more profound statements, especially if such a statement is offered as something on a par with the 2nd law of thermodynamics. This is, of course, a minor point, more so because Stewart adds:
“Observations support this law – but what causes the clumping?”
Stewart suggests that clumping results from “scaling symmetry” of the law of gravitation. This topic is beyond the scope of my commentary, so all I can say that I tend to accept the notions about “scaling symmetry” and “oligarchic growth”  of clusters of matter, but all this, in my view, is yet insufficient to justify introduction of a new and very profound “law” which would complement the four known laws of thermodynamics. While Stewart’s ideas regarding the possible law serving as a substitute for the 4th law of thermodynamics are, in my view, of interest and deserve further analysis, so far I remain unconvinced that the above “2nd law of gravitics” in that formulation has indeed accomplished the task. Of course I have no objection to viewing it as a “place holder” for a possible future law.