**STUDY OF
SERIAL LETTER CORRELATION (LSC) IN SOME ENGLISH, HEBREW, ARAMAIC, AND RUSSIAN TEXTS**

**2.
EXPERIMENTAL RESULTS - RANDOMIZED TEXTS**

**by Mark Perakh
and Brendan McKay **

*Posted on
February 9, 1999*

**CONTENTS**

** A. Behavior of
expected serial correlation sums and densities in randomized texts**

**
** Identifying
and filtering out artifacts****

**
b. Behavior of expected densities**

** B. Behavior
of measured correlation sums and densities in texts randomized by permuting letters of a
meaningful text.
**

**
b. Experimental
results with permuted texts - sums. **

**
c. Additional
discussion of randomness. Crystal vs liquid analogy**

**
d. Behavior
of Letter Serial correlation densities in permuted texts**

**
**

**This is the second part of the report on the study of the Letter Serial Correlation
(LSC) effect. In the first part (see
http://members.cox.net/marperak/Texts/Serialcor1.htm ) the calculation of the expected correlation sums and expected correlation
densities as well as the measurements of the actual correlation sums and densities were
described in detail. In this part the experimental results are described obtained for
random texts. In the third part (see
http://members.cox.net/marperak/Texts/Serialcor3.htm ) the experimental results obtained with real semantically-meaningful texts
are presented. In the fourth part (see
http://members.cox.net/marperak/Texts/Serialcor4/htm ) the discussion and interpretation of the experimental data is offered. All
four parts constitute one article and therefore the figures and tables are
numbered
continuously throughout all parts.**

A.Behavior of expected serial correlation functions in randomized texts

While we expect that the expected sum

Swill behave similarly in all_{e}randomizedtexts, the specific values of_{ }Smust depend, according to eq. (13C) on a) the total length_{e}Lof the tested text; b)on the values ofM- the numbers of occurrences of each letter in that text, and c) hence, on the number_{x}kof chunks in the particular test (e.g. on the sizen=L/kof a chunk).

Furthermore, if the text has been randomized, then the variations of letter frequencies

Xbetween adjacent chunks in such a randomized text must be random as well, and therefore along with the increase in chunk's size these variations must become smaller relative to a chunk's overall size. (Recall that quantitatively the expected sum for randomized texts is practically the same as for perfectly random texts). This must cause a decrease of the expected sumS, when_{e}kdecreases and, hence,nincreases. Ultimately,Smust tend to drop toward zero for very large chunks. However, while the described general behavior of_{e}Scould be guessed with a reasonable degree of certainty, and also follows from formula (13C), possible quantitative peculiarities of_{e}S's behavior cannot be excluded. Therefore we have calculated values of_{e}Sfor randomized texts of different length, in Hebrew, Aramaic, English, and Russian. An example of the dependency of_{e}Son the chunk's size_{e}nis shown in Fig 1.The overall behavior of the expected sum is essentially identical for various text's length and even for various languages. This is compatible with the fact that the expected sum is calculated for a randomized collection of symbols, without any relation to their graphical appearance, or to their meaning or to a semantic context, and therefore it must depend only on the number of symbols in the set, on the overall length of the "text," and on the symbols' frequency distributions.

We have obtained many more graphs of the type presented in Fig 1, all displaying similar principal shape of the

Scurve._{e}vs nThe shape of the curve in Fig 1 is the result of two factors. One factor is an illusion, as it is simply caused by the non-proportional graduation of the horizontal axis in Fig. 1. Replacing the scale on the abscissa by a proportional one would stretch the right side of the graph, so the steep drop of the curve at

n>100 would be largely eliminated. However, even with a proportional horizontal scale the graph would not convert into a straight line with the intercept of A and slope of B, as formula (13C) suggests, and, again contrary to the prediction of formula (13C), S_{e}will reach zero value not atn, but at a smaller_{0}=Ln. The reason for that is the use of variable, truncated values of_{f}<n_{0}L* instead of the full text's lengthL, as it was explained earlier in Part 1 of this paper.If we look again at Table 1 in Part 1 of this paper, we notice that there are stretches of values of

nfor whichL* is the same. For example, for all three values ofn= 10,n= 20, andn= 30, the value ofL* is 78060. Similarly, fornof 100, 200, and 300,L*=78000 for all three of thosen. Such stretches of constantL* are interspersed with values ofn, for whichL* undergoes an abrupt change. For example, forn=2,L=78064, but forn=3,L*=78063. Wherever there is stretch ofnvalues with an identical value ofL*, the corresponding segment of theScurve is a perfect straight line, as formula (13C) suggests (if the horizontal scale is made proportional). This is illustrated in Fig. 1a, showing the segment of_{e}-nSgraph for_{e}-nnbetween 10 and 30, the horizontal scale being this time proportional. The straight line in this graph, with a slope of 1.85, fits perfectly the prediction of equation (13C).On the other hand, wherever the value of

L* changes abruptly between two adjacent values ofn, the slope ofSgraph undergoes a steep turn. An example is shown in Fig.1b, for_{e}-nnbetween 1 and 3. The small change ofL* from 78064 forn=2 to 78063 forn=3 caused the drastic slope's increase from 1 to 4. As the chunk's size grows, the slope's increases accumulate, so overall the graph takes the shape of an incomplete polygone dropping toward zero with a slope increasing to the right.

Another result of the text's truncation, also tied to the slope's variations, is the appearance of wriggles on S

_{e}-ngraph. As can be seen in Table 1 (in Part 1 of this paper) the value ofL*, while displaying an overall gradual decrease whennincreases, at some values ofnshows local increases. For example, forn=70, there wasL*=78050, but forn=100 it becameL*=78064. In Fig 2 the graph of the slope ofScurve, as a function of_{e}vs nn, is shown as an example, deliberately using a large scale, illustrating the effect of text's truncation that causes the above mentioned wriggles. This example was obtained on the entire text of the Torah in Hebrew. Similar graphs were observed for other texts as well.

It seems appropriate to make now a temporary excursion into the next section of this paper, to discuss the problem of the relation of the wriggles in question to the shape of the graphs of

measuredsum,Sas this problem is one of filtering out artifacts generated by text's truncation._{m}vs n,

** Identifying and filtering out artifacts generated by texts' truncationThe curves for both

Sand_{e}vs nSare obtained, one by calculation, and the other by measurement, for the same set of actual lengths_{m}vs nL*. Therefore the wriggles caused by the text's truncation distort both curves at the same values ofn, even though the sizes of the wriggles may be different forSand_{e}-nScurves._{m }-nThe wriggles on

Sgraphs can be erroneously believed to be genuine manifestations of the texts properties, while actually being artifacts stemming from texts' truncation._{m}-nLet us discuss those possible artifacts. Three alternative situations can be envisioned, to wit: 1) At a certain value of

n, there are similar local irregularities (either maxima or minima) on bothSand_{e}-nSgraphs. Almost certainly, it indicates that the irregularities on_{m}-nSgraph are juct artifacts caused by the text's truncation, as it was described earlier, rather than genuine manifestations of texts' properties. 2) At a certain value of_{m}-nn, there is either a local maximum or a local minimum onSgraph, but there is no corresponding irregularity on_{m}-nSgraph. Almost certainly this indicates the presence of a genuine characteristic point on_{e}-nSgraph which reflects some intrinsic facet of the measured sum's behavior. 3) A rare situation when there is an irregularity at a certain value of_{m}-nnonSgraph, but no corresponding deviation from a smooth graph's run on_{e}-nSgraph. This may indicate that there is a genuine (probably not strongly pronounced) maximum or minimum on_{m}Sgraph, and also, at the same_{m}-nn, accidentally exists a quirk on S_{m}-n graph, caused by text's truncation, whose deviation from the smooth S_{m}-n curve is in the direction opposite that of the genuine irregularity and is thus masking the genuine maximum or minimum on S_{m}-n graph.Hence, to filter out the artifacts in question, both S

_{m}-n and S_{e}-n graphs must be viewed simultaneously.The first two of the three described situations had been actually observed in real texts. Often the simplest way to distinguish between an artifact in question and a real characteristic point is to plot the ratio

R=SIf the wriggles caused by truncation happen to be of comparable size for both S_{m}/S_{e}._{e}and S_{m}, these wriggles will be largely suppressed onR vs ncurve, thus indicating the presence of artifacts. In Fig. 3 the slope of a ratio R of the measured sum Sm (see equation A) to the calculated expected sum S_{e}(see formula (13C)R=Sis shown as a function of_{m}/S_{e}n, for the same text as in Fig. 2. We see that the slope ofRis rather smoothly changing withn, thus illustrating the suppression of wriggles which are caused by the text's truncation, when viewing graphs for ratioRrather than for S_{e}or S_{m}separately.

The ultimate judgement as to which irregularities observed on

Sgraph are genuine characteristic points which reflect the text's properies, and which are artifacts stemming from text's truncation, can be made by reviewing several similar graphs for a number of texts, preferrably with different values of actual lengths L*, and choosing the alternative that is most consistent with all the information available on the behavior of texts in question. Except for a few rare situations where the evidence seemed to be somehow ambiguous, usually the distinction between artifacts and genuine manifestations of text's properties was rather apparent. **_{m}-nNow we return to the subject of this section. In Fig. 4 three curves are shown displaying the expected trivial dependence of

Son the overall length of the text._{e}

The uppermost curve in Fig. 4 shows the expected sum for an English text, whose length was 151836 characters which is the length of the English translation of the Book of Genesis. The lowermost curve on that graph relates to a Hebrew text 78064 letters long which is the size of the Book of Genesis in Hebrew. The curve slightly above the one for the Hebrew text was obtained for an English text whose original length was 151836 letters, but which was stripped of all vowels, so its overall length decreased to 99493 letters. This graph clearly demonstrates the expected natural effect of the text's overall length on the expected sum.

b) Behavior of expected densities

Fig. 1 which shows the curve for the expected correlation sum for the Book of Genesis is again reproduced here, and next to it Fig. 1c is placed demonstrating the dependence of the Letter Serial correlation

densitydon the chunk's size_{e}n, for the same text.

Comparing the two graphs shows the clear difference between the behavior of the
extensive quantity- the expected sum *S*_{e}, and the intensive quantity -
the expected density *d _{e }*for the same text. Since the theoretical
equation for the expected density (formula 17) is that of a hyperbolic curve, which
implies linear dependence of the logarithm of density on the logarithm of

Regression analysis of the plot in Fig. 1d reveals that the graph is very close to a perfect straight line (the correlation coefficient for the least square fit is close to 0.999), and the equation best approximating the

dfunction is for this text as follows:_{e }vs nd

_{e}=145121×n^{-1.014 }.Comparing this equation with the theoretical equation (17) shows that the text's truncation caused a change of the power from the theoretical value of -1 to -1.014, which means that in equation (20)

q=1.014 (and, of course, the curve described by the above equation, is shifted vertically relative to that given by eq. 20, by a distance ofT, which shift is inconsequential for our discussion). Otherwise, the Letter Serial Correlation density behaves quite close to the theoretical expectation.Very similar results were obtained for all explored texts.

B. Behavior of the

measuredcorrelation sum in texts randomized by permuting letters of a meaningful text

a. General discussion of randomness in permuted texts.We will refer to the texts obtained by permuting the letters of an original meaningful text as randomized texts. It must be realized though that permuting letters of a text by no means guarantees that the permuted version will have a high degree of randomness. If a meaningful text comprises

Lletters, and the pertinent alphabet consists of z letters, there potentially exist P_{ni}=L!/n_{1}!n_{2}!n_{3}!......n_{z}! equally probable distinguishable permutations of that text where n_{1}, n_{2}.......n_{z}are numbers of occurrences of each letter of the alphabet in the text in question. For example, if the text is in English, and comprises, say, about 150000 letters, which is the approximate size of the English translation of the Book of Genesis, the number of its potentially possible, equally probable distinguishable permutations is (150000)!/n_{a}!n_{b}!n_{c}!_{.... }n_{z}! where z=26. A repeated process of random permutations can produce, with equal probability, distinguishable and non-distinguishable permutations. The number of all possible, equally probable permutations, including non-distinguishable versions, is even larger, as it is P_{i}=L!. It is a very large number indeed. Among those numerous permutations potentially exist versions both more random than the original meaningful text, and less random than the original text. For example, there is necessarily among the possible permutations one version where all letters A from the original text are gathered one after the other at the beginning of the permuted text, followed by all letters B from the original text bunched sequentially, then all letters C from the original text arranged sequentially right after the last of B, and so on, throughout the alphabet. Such version would possess a very high degree of order, i.e. a very low value of entropy. The creation of such a version as a result of a random permutation of the original text has the same probability (which is 1/P , i.e. is very small indeed) as the creation of any other of the multitude of possible permutations.The above statement can be illustrated by the following simple example.

Consider the following sentence:

THESE ARE EXAMPLES OF MULTIPLE ELS CREATED WITHIN ARANDOM CONGLOMERATE OF VARIOUS LETTERS. (ELSis a commonly used abbreviation forEquidistant Letter Sequences[1-4 ]). This text consists of 77 letters (we ignore spaces). There are 77! (which is about 1.45×10^{113 }) possible permutations of that text. Among those vastly numerous permutations are the following three distinguishable permutations (swapping positions of identical letters, many permutations indistinguishable from each of those exemplified below can be constructed):1)TMITAMIHPPENEOELLDDRUSEEWOASESEIMTLAOLTCEERFSHOOTEMCINFTEURNGVEXLEA-

LARATARORS.

2) SRORATARLEALXEVGNRUETFNICMETOOHSFREECTLOALTMIESESAOWEESURDDLLEOENE-

PPHIMATIMT.

3)TMHPIELPASELTNESEEACAOEDROERFLWANRFEMSINGAVOEUCTDLTAUEXLRHOOERSTEAT--

EIMMOILTRS.

Each of the above strings of letters contains exactly the same 77 letters as the original message, these letters being shuffled, so that all three strings are permutations of the text of the original message. At a glance, all three above permutations of the original message look as gibberish, that is as fully random conglomerates of letters. Actually, however, each of the above permutations possesses the same degree of order as the text of the original meaningful message. These three permutations are actually encrypted (in a rather simple way) texts with the same (hidden) semantic contents as the original text. To decode the above encryptions let us mentally concatenate the ends of each of those strings of letters to the beginnings of the same strings. First look at string #1. Starting from its first letter

Tcount seven subsequent letters. In position 8 there is letterH. Skip again seven positions, and there is letterE. Continue the procedure, and, when the end of the string has been reached, go back to the beginning of the string, and continue skipping seven-letters intervals. Following this rule, we decode, letter by letter, the original text, which has been encrypted usingELS(Equidistant Letter Sequences) with askipof 7.In string # 2 start with the last letter and count letters from right to left (as string #2 is actually string #1 written in the reverse order). Again, we find that the original text has been encrypted in this string, but this time with the skip of -7.

Finally, string #3 also is an encrypted version of the original text, but this time instead of

ELS(Equidistant Letter Sequences)theGISLS (Gradually Increasing Skip Letter Sequences)have been employed. The skip between the first and the second letter of the message is 2, between the second and the third letter it is 3, etc., and when we reach the end of the string, the skip having increased to 11, we continue by skipping 12 positions and going back to the beginning of the string, and continue counting skips, again starting with skip of 2, following it by skip of 3, etc. Hence, even though all three encrypted versions are among the permutations of the original text, they possess the same degree of order, and hence the same value of entropy, as the original text.It is easy also to construct an example of a permutation of the above 77-letter text which would possess a higher degree of order than the original message. The message contains the following frequencies of letters: A-7, B-0, C-2, D-2, E-12, F-1, G-1, H-2, I-4, K-0, L-6, M-4, N-3, O-5, P-2, R-6, S-5, T-7, U-2, V-1, W-1, X-1, Y-0, Z-0, the total of 77 letters. Let us arrange them in the following order:

4) AAAAAAACCDDEEEEEEEEEEEEFGHIIIILLLLLLLMMMMNNNOOOOOPPRRRRRRSSSSSTTT-

TTTTUUVWX.

The above string consists of the same 77 letters as the original message, and therefore it is one of the possible permutations of the original text. The degree of order in that permutation is higher than in the original text since all letters are now arranged in a strict order. This particular permutation's appearance has the same individual probability as any other of the multiple possible permutations of the original text.The above consideration must not be construed as the statement that the appearance of highly ordered permutations is very likely. Actually, the likelihood of its appearance is exceedingly small, even though it is not any smaller than for any other permutation. Furthermore, among the multitude of possible permutations, the number of versions with a high degree of randomness is much larger than that of versions with a high degree of order. That number is an exponential function of the version's entropy. Therefore, while the probability of creation (

viapermutation of the original text) ofanyspecificversion is the same for all versions regardless of the version's entropy, versions with high entropy, i.e. with high degree of randomness, will be createdviarandom permutations much more often than versions with high degree of order, simply because there are many more possible versions with high entropy. The probability that a random permutation results insomeversion with high entropy is much larger than that it results in the creation ofsomeversion with a low entropy. In other words, among the multitude of possible permutations of a text, there are many versions with high entropy than there are versions with low entropy. By far the most likely result of a set of random permutations is a set of versions greatly randomized as compared with the original, well ordered meaningful text.

b. Experimental results with permuted texts - sums.In view of the above consideration, let us look at the results of the comparison of serial correlation sums, measured for randomly permuted versions of original texts, with expected sums for the same texts, calculated as described in Part 1. This comparison will be our next step in establishing foundation for the interpretation of the behavior of serial correlation sums for real meaningful texts.

Some selected results in question are shown in Fig. 5 through 8 In Fig. 5 both the calculated expected correlation sum (red curve) and the measured correlation sum (blue curve) are shown for one randomly permuted version of the Hebrew text of the Book of Genesis, whose length is 78064 letters. This picture exemplifies the typical behavior of the measured sum for randomly permuted texts. As long as the chunks are relatively small, both curves, that for S_{m }and that for S_{e}run rather close to each other, so that the ratio of S_{m}/S_{e}is quite close to 1. Starting at some value of chunk's sizen, the measured sum experiences increased fluctuations around the diminishing value of the expected correlation sum. To locate the threshold value ofn, let us look at a zoomed-in graph of the ratio R=S_{m}/S_{e}, shown in Fig. 6 for the same text. We can see that the fluctuations of S_{m}around S_{e}(i.e. deviations of R from 1) increase quite drastically starting atn=20.

**
**

Similar behavior was observed for other permuted texts. One such is shown in Fig. 7 for the English translation of the Book of Genesis whose length is 151,836 letters. To locate the position of the threshold at which the fluctuations ofS_{m }(about the dropping value ofS_{e}) substantially increase, look at the zoomed-in curve for ratio R=S_{m}/S_{e}vsn, shown in Fig. 8. The threshold in question seems to be in this case at about n=70.

The observation of data like those shown in Fig 5 through 8 reveals that the substantial increase of fluctuations of the measured sumS_{m}about the calculated value of the expected sumS_{e}starts at a threshold value of chunk's sizenwhich, within the framework of the precision level inherent in these graphs, in all cases matches the chunk's sizenbeing either at or a little abovez, the number of letters in the pertinent alphabet.

As this section is of a preliminary character, and is mainly designed only to establish reference points for the study of LSC effect in real, non-randomized texts, we will not undertake here an attempt at a detailed interpretation of the mechanism connecting the mentioned threshold to the number of letters in an alphabet, however this question might invoke curiosity in its own right. We will rather limit ourselves to a statement of a factual observation, namely that atn=zor at a little larger values ofn, when the chunk's size becomes larger thanz, the constraints imposed by the limited size of chunks are lifted, and the letters of the text take advantage of the now available freedom of fluctuations.

c. Additional discussion of randomness in permuted texts. Crystal vs liquid analogy

Let us discuss a little more the question of the degree of randomness of the permuted and non-permuted texts. There is an analogy here with the question of degree of order in a solid crystalvsliquid. We will use this analogy later in this paper to analyze LSC in texts.

Term "crystal" in Physics means a solid body whose constituents (atoms, or ions, or molecules) are arranged in space in an orderly fashion. On the other hand, amorphous bodies (also referred to as liquids, even if they seem to be solid, as, for example, glass) consist of elements (molecules, ions, atoms) whose distribution within the volume of the body is largely chaotic. Physicists distinguish between thelong rangeand theshort rangeorders in crystals. Long range order manifests itself in a repeated spatial configuration of particles throughout the entire macroscopic dimensions of the crystal. Short range order extends only over certain number of "steps" if one imagines moving through the crystal. When the number of steps exceeds certain value, usually not more than about ten-fifteen steps, each step being the size of the interatomic distance, the configuration of particles forming the short range order pattern, changes. If the temperature of the crystal rises above the melting point, crystals transform into liquid. In the melt, the long range order becomes destroyed. However, short range order may be preserved to a certain extent if the temperature is not much higher than the melting point. Investigation of liquids indicates the presense of a certain degree of such short range order. The ordered clusters of particles that are present in the liquid as islands of order within the sea of disorder, may have various origins. Some of these ordered clusters may be inherited from the parent crystal, which may be due, for example, to gradients of temperature and density within the melting crystal in the vicinity of the melting point. More of those clusters are generated however by thermal fluctuations of particles' spatial distribution in the liquid itself. Such ordered clusters appear at various locations, then disappear, appear at other locationsetc. The result is that there is a certain degree of order in liquids, even though overall they are amorphous bodies.

Similarly, most of the texts obtained by permutations of a meaningful text, preserve a certain degree of order. Within the sea of disorder created by shuffling letters of the original text, there may exist (and indeed do exist more often than not) islands of ordered confguration of letters. Some of them may be inherited from the original text by a sheer chance, but more of them emerge stochastically as a result of the random permutation. It is desirable to have some, at least quite approximate, measure of the degree of randomness of a text. One such measure may be introduced as follows. Look at Table 2. It contains, as an example, the results of an actual experiment, this one perfomed on an arbitrarily chosen particular permuted version of the Hebrew text of the Book of Genesis.

Table 2. Genesis, Hebrew, permuted version

n - chunk's size; Sm - measured sum; Se - expected sum; R=Sm/Se

n

Sm

Se

R

1

145390

145121

1.002

2

145110

145120

1.000

3

145030

145116

0.999

5

145382

145106

1.002

7

143870

145110

0.991

10

145948

145097

1.006

20

148192

145079

1.021

30

145272

145060

1.001

50

143714

145004

0.991

70

144946

144967

1.000

100

146646

144819

1.013

200

147214

144633

1.018

300

144440

144447

1.000

500

149454

144075

1.037

700

137454

143145

0.960

1000

157866

143165

1.103

2000

134550

141286

0.952

3000

157404

139427

1.129

5000

117252

130116

0.901

7000

148346

130128

1.140

10000

115072

111522

1.032

The leftmost column lists the sizes of chunks explored (recall that the chunk's size isn=L/k,whereLis the total length of the text which for the Hebrew text of Genesis is 78064 letters, andkis the number of chunks into which the text is divided for each measurement). Two columns in the middle show the values of the measured S_{m }and expectedS_{e}serial correlation sums. Finally, the rightmost column shows the ratio R=S_{m }/S_{e }._{ }

Since the expected sumS_{e }is calculated based on the assumption of randomness of the text, then, the larger is the degree of randomness of the text, the closer must be S_{m }to S_{e }. In other words, the deviation of ratioRfrom the value of 1 may serve as an indication of the degree to which the tested text is close to be perfectly randomized. MeasuringRprovides some clue as to whether randomization has destroyed the type of order represented by "serial correlation." Since "serial correlation" does not exist in vacuum, but is a part of the text's overall complex structure, thenRbeing different from 1 indicates also the presence of some types of orders different from "serial correlation" as well.

To estimate the degree of randomness, we first calculate the mean value ofRover the entire rightmost column. For the above table, it turned out to be_{m}R=1.014. Then we calculate the standard deviation of R for the same set of values. For the above table it happens to be_{m}std(R)=0.053. We repeat the described procedure for a number of permuted versions of the text in question. Fig. 9 shows the results obtained for six such trials, including 5 Hebrew and 1 English texts of Genesis.

I

n Fig 9, the lowermost (green) curve displays the values ofstd(R)- standard deviation ofR- for those six arbitrarily chosen permuted versions of the Book of Genesis. The blue curve in the middle of the triplet of curves at the top of the graph represents the mean values of ratioR, calculated for the same arbitrarily chosen permuted texts. The uppermost (red) curve displays the sum [Rwhile the lower (black) curve in the triplet shows the value of [_{m }+ std(R]R. The first conclusion from surveying Fig. 9 is that all six permuted versions of the text happened to be well randomized, as the mean of ratio,_{m }- std(R)]Rfor all of them is reasonably close to 1. On the other hand, it is obvious, that each of those six versions possessed a certain degree of order, as in no case was observed_{m}R=1._{m}

To estimate the degree of randomness, we may suggest the following coefficient, which will be denotedD_{r}(which stands for "degree of randomness")

D_{r}=1-[std(R)/R]....................(14)

For the text represented by Table 1, the value of this coefficient happened to beD_{r}=0.948.

We have no illusions in regard to the limitations of that coefficient. Indeed: a) This coefficient is just one of many possible quantities which can be used for the estimation of randomness; b) This coefficient is a rather crude measure of randomness. Indeed, it is based on measuring the destruction of only one type of order, namely that of serial correlation. Even though all types of order present in the text must be interconnected, and overlap each other, still the destruction of the serial correlation is not necessarily accompanied by equal destruction of other types of orders, which may be weakened but still preserved to some extent, different from that of the serial correlation. We do not know which types of order and to which extent contribute to the overall degree of order, serial correlation being only one of many possible overlapping types of order.

It may be nevertheless advantageous to applyD_{r}to compare degrees of randomization which has been produced by various means (for example letter permutation, verses permutation, word within verse permutation, etc). While usingD_{r}, we should remember the crude nature of that measure, but its advantages are its being simple, easily calculated, and transparent as a first approximation measure.

The ultimate judgement in regard to the desirability of usingD_{r}as a measure of text's randomness can be done only by having actually used it and observed its behavior. We will see that in some situations the coefficient in question turns out to be reasonably useful as a tool sensitive to variations in texts.

For example, for 15 randomly permuted versions of the text of Book of Genesis, of which five were in Hebrew and ten in English, the mean value of the coefficient in question turned out to beD_{r}= 0.94, while the minimum value of it among the fifteen permutations explored happened to be D_{r}=0.93, and the maximum D_{r}=0.961. It can be interptreted that, by a rough estimate, the process of permutations succeeded to produce texts being, on the averagge, 94% randomized, the randomness in those fifteen texts varying between 93% and 96.1%. Applying the coefficient in question to non-permuted meaningful texts may enable us to estimate their degree of order, presumably being much larger than in permuted versons, at least as far as the letter serial correlation is concerned, and, hence, possibly reveal some inherent distinctions between different texts, as it will be demonstrated later in this paper.

d) Behavior of Letter Serial correlationdensitiesin permuted texts.

As we will see in this section, Letter Serial correlation densities, which, unlike the sums, are intensive quantities, behave quite differently from the coresponding total sums. Look at the graphs in Fig. 7 and 7a. Fig 7 was shown before and is reproduced here once again. Both graphs show the data for the same permuted text. While in Fig. 7 the sum S_{m}was plottedvschunk's sizen, in Fig. 7a the correlation density d_{m}is plottedvsn.

Comparing the graphs in Fig. 7a with the graphs shown previously in the section for expected densities (Fig. 1c) shows that the fluctuations ofS_{m }about the level ofS_{e}in permuted texts, are largely eliminated in the densities' behavior. Again, the regression analysis of graphs in Fig. 7a reveals that thelogd_{e}vs log ndependence for permuted texts is very closely represented by a straight line. An example of such a dependence is shown in Fig. 7b, where bothlogd_{e}vslognandlogd_{m}vs log ndependencies are shown. The graphs for the expected and the measured densities are practically indistinguishable. This similarity of the structures of the expected and of the actually measured permuted texts was not revealed by viewing the graphs of total sums, but becomes obvious when viewing the graphs for densities.

The equations representing the graphs in Fig 7b (which are in this case almost identical for bothd_{e}vs nandd_{m}vs ndependencies) as obtained by means of a regression analysis are as follows:

d_{e}=282523×n^{-1.0068 }

d_{m}=282494×n^{-1.007 }

with correlation coefficients of 0.99998 fordand 0.9998 for d_{e}_{m}. These results show that in the texts in this example, equation (20) fordis in effect, with_{e}q=1.0068 (instead ofq=1 as it is required by the theoretical eq. 17; also, the curves in Fig. 7b are shifted vertically byTas compared with the curves described by eq. 20, which shift does not affect the curves' slopes, and is of no consequence for our discussion). For the measured density the values ofq=1.007 and correlation coefficient of 0.9998 indicated the very good degree of randomization of the permuted text in question. These data will serve as reference levels for the analysis of real, meaningful, not permuted texts which are described in the following part of this report (see http://members.cox.net/marperak/Texts/Serialcor3.hym ).

In part 4 of this report a general discussion and interpretation of all the experimental data will be offered.