Assorted comments on some uses and misuses of probability theory
First posted on June 22, 1999; updated in September 2001; the section "Probability Estimate Is Often Tricky" updated in October 2006.
By Mark Perakh
What is the basis of the casinos’ unbreakable sequence of immense profitability? There are two. One is a human psychological frailty ensuring an uninterrupted supply of fools hoping to catch Lady Luck’s attention. The other is a science.
The science in question is mathematical statistics. Of course, the casinos’ operators are by and large scientifically illiterate. They don’t need to know mathematical statistics any more than an old lady driving her motorized jalopy needs to know the physical chemistry of oil’s oxidation in the cylinders of her engine. All she needs is some primitive skill in pushing certain pedals and rotating the steering wheel. However, it is the physical chemistry of the oil’s oxidation which makes her driving possible. Likewise, even though the casinos’ operators normally have hardly any knowledge of mathematical statistics, it is mathematical statistics which makes their business so immensely profitable.
Another field where mathematical statistics is the basis of success is the insurance industry, where professional statisticians are routinely employed to determine the values of premiums and payoffs necessary to maintain the insurer’s profitability.
In both cases, that of casinos and that of insurance, the success is based on the proper (even if sometimes not quite conscious) use of mathematical statistics.
There are, however, situations where mathematical statistics is being used in a way contradicting its own main concepts. When that happens it often results in claims, which may look statistically sound but are actually meaningless and often misleading.
At the core of mathematical statistics is probability theory. Besides mathematical statistics, probability theory is also the foundation of statistical physics. It deals with the quantity called probability. While the concept of probability may seem to be rather simple for laymen, probability theory reveals that that quantity is multi-faceted and its use must follow certain precautions. When those precautions are not adhered to, the result is often a meaningless conclusion.
While an incorrect application of mathematical statistics may involve any part of that science, a large portion of the errors in question occur already at the stage when its seminal quantity, probability, is miscalculated or misinterpreted.
One example of an incorrect application of the probability concept is the attempts by the proponents of the so-called Bible code to calculate the probability of occurrence of certain letter sequences in various texts. Another example is the often-proposed calculation of the probability of the spontaneous emergence of life on earth. There are, of course, many other examples of improper uses of the probability calculation.
There are many good textbooks on probability theory. Usually they make use of a rather sophisticated mathematical apparatus. This article is not meant to be one more discussion of probabilities on a rigorously mathematical level. In this article I will discuss the concept of probability mostly without resorting to mathematical formulas or to the axiomatic foundation of probability theory. I will rather try to clarify the concept in question by considering examples of various situations in which different facets of probability manifest themselves and can be viewed in as simple a way as possible. Of course, since probability theory is essentially a mathematical discipline, it is only possible to discuss probability, without resorting to some mathematical apparatus, to a very limited extent. Hence, this paper will stop at the point where the further discussion without mathematical tools would become too crude.
Calculation of probabilities is sometimes a tricky task even for qualified mathematicians, not to mention laymen. Here are two examples of rather simple probabilistic problems whose solution often escaped even some experienced scientists.
The first problem is as follows. Imagine that you watch buses arriving at a certain stop. After watching them for a long time, you have determined that the interval between the arrivals of any two sequential buses is, on the average, one minute. The question you ask is: How long should you expect to wait for the next bus if you start waiting at an arbitrary moment of time? Many people asked to answer that question would confidently assert that the average time of waiting is thirty seconds. This answer would be correct if all the buses arrived at exactly the same interval of one minute. However, the situation is different in that one minute is just the average interval between any two consecutive bus arrivals. This number – one minute - is a mean of a distribution over a range from zero to a maximum which is larger than one minute. Therefore, the average waiting time,
which in the case of a constant inter-arrival interval equals half the inter-arrival interval, in the case of a varying interval is always larger than a half of the inter-arrival interval. While this result can be proven in a rigorously mathematical way [ ] it can easily be understood intuitively. If the inter-arrival intervals vary, some of them being shorter than others, then obviously, on the average, more passengers are expected to start waiting for the next bus within longer inter-arrival intervals than starting within shorter intervals. Therefore the average waiting time happens to be longer than a half of the inter-arrival interval (as it was in the case of constant intervals). For a certain inter-arrivals distributions (for example for the so called Poisson process, studied in mathematical statistics, which corresponds to the perfectly random arrivals), the average waiting time exactly equals the average inter-arrival interval (i.e. in our example, 1 minute). For an arbitrary arrivals distributions the average waiting time is larger than a half of the average inter-arrival intervals, and may even be larger than the entire average inter-arrival intervals.
The second problem had been used in a popular TV game show conducted by Monty Hall, wherein the players were offered a choice among the three closed doors. Behind one of the doors, there was a valuable prize, while behind the two other doors there was nothing. Obviously, whichever door a player chose, the probability of winning the prize would be 1/3. However, after the player chose a certain door, the compere, who knew where the price was, would open one of the two doors not chosen by the player, and show that there was nothing behind it. At that point, the player would be given a choice, either to stick to the door he had already chosen, or to choose instead the remaining closed door. The problem a player faced was to estimate whether or not changing his original choice would provide a better chance of winning. Most of the people, including some trained mathematicians, answered that the probability of winning is exactly the same regardless of whether the participant sticks to the originally chosen door or switches to the other yet unopened door. Indeed, at first glance it seems that the chance of the prize being behind any of the two yet unopened doors is the same. Such a conclusion would be correct only if the compere chose at random which door to open. In Monty Hall’s actual game, however, he knew precisely where the prize was hidden and chose which door to open not at random, but with a confidence that the door he opens hides no prize. In this case changing the choice from the door originally chosen by the player to the other yet unopened door actually doubles the probability of winning.
To see why this is so, note that at the beginning of the game, there was only one “winning” door and two “losing” doors. Hence, when a player chose arbitrarily one of the doors, the probability of his choosing the “winning” door was 1/3 while the probability of his choosing the “losing” door was 2/3, i.e., twice as large. Now, if the player luckily chose the “winning” door, he would win if he did not change his choice. This situation happens, on the average, in 1/3 of games, if the game is played many times. If, though, the player happened to choose the “losing” door, he had to change his choice in order to win. This situation happens, on the average, in 2/3 of the games if they are played many times. Hence, to double his chance to win, the player better has to change his original choice.
I can suggest a simple semi-formal proof of the above conclusion.
Denote the doors A, B, and C. P(X) is probability of X being the winning door. Obviously P(A)=P(B)=P(C)=1/3 and P(A)+P(B)+P(C)=1. In our case P(A)=1/3 and P(~A)=P(B)+P(C)=2/3; Assume the compere opened door B and showed that it did not hide the prize (as the compere already knew with a 100% certainty). Now we see that P(B)=0, hence P(A)+P(C)=1. Since P(A)=1/3, P(C)=2/3. QED.
Instead of 3, any number N of doors can be used. The calculation (left to readers) shows that in such a case, changing the choice from the originally chosen door to some other (specified) of N-2 not originally chosen and still closed doors increases probability of winning (N-1)/(N-2) times (while the probability of the originally chosen door losing, that is the probability of some (unspecified) of the originally not chosen door winning, increases N-1 times).
Comment 1: I refer to the above proof as “semi-formal” because it is simplified for the sake of readers not well versed in probability theory; a more detailed proof would use conditional probabilities; the result, however, would not change. The most rigorous but substantially more complicated proof can be performed using the so called Bayesian approach.
Comment 2: the above simple proof is based on the fact that the compere knew precisely where the prize was and which door, B or C, was empty. In the absence of such knowledge, that is were Monty Hall to choose at random which door (B or C) to open, the above proof would become invalid; indeed, in such a case it does not matter whether the player sticks to the originally chosen door or switches to an alternative door – the chance of winning in both cases will be the same.
If many people, including trained mathematicians, are sometimes confused by the above rather simple probabilistic situations, the misuse of probabilities in many more complex cases happens quite often, thus showing the extreme caution necessary if probabilities are used to arrive at important conclusions.
The above section was substantially improved due to discussions with several colleagues. Specifically, the part related to the “buses” problem was clarified thanks to comments by Brendan McKay and Peter Olofsson; the part related to Monty-Hall game, was likewise edited thanks to comments by Brendan McKay, Peter Olofsson, Jason Rosenhouse, and Douglas Theobald.
Consider a game with a coin. Each time we toss a coin it can result in either tails or heads facing up. If we toss a die, it can result in any of six numbers facing up, namely 1, 2, 3, 4, 5 and 6. If we want to choose one card out of fifty cards scattered on a table, face down, and turn its face up, it can result in any one of fifty cards facing up.
Let us now introduce certain terms. Each time we toss a coin or a die, or turn over a card, this will be referred to as a trial. In the case of a coin, the trial can have either of two possible outcomes, tails (T) or heads (H). In the game with dice, each trial can result in any of six possible outcomes, 1, 2, 3, 4, 5, or 6. In the case of 50 cards, each trial can result in any one of fifty possible outcomes, such as five of spades, or seven of diamonds, etc.
Now assume we conduct the game in sets of several trials. For example, one of the players tosses a die five times in a row, resulting in a set of five outcomes, for example 5, 3, 2, 4, and 4. Then his competitor also tosses the die five times resulting in some other combination of five outcomes. The player whose five trials result in a larger sum of numbers wins. The set of 5 (or 10, or 100, or 10,000 etc) trials constitutes a test. The combination of 5 (or 10, or 100, or 10,000 etc) outcomes obtained in a test constitutes an event. Obviously, if each test comprises only a single trial, terms trial and test as well as terms outcome and event become interchangeable.
For the further discussion we have to introduce the concept of an "honest coin" (also referred to as a "fair coin"). It means we postulate that the coin is perfectly round and its density is uniform all over its volume, and that in no trial do the players consciously attempt to favor either of the two possible outcomes. If our postulate conforms to reality, what is our estimate of the probability that the outcome of an arbitrary trial will be, for example T (or H)?
First, it seems convenient to assign to something that is certain, a probability of 1 (or, alternatively, 100%). It is further convenient to assign to something that is impossible a probability of zero. Then the probability of an event that is not certain, will always be between 0 and 1 (or between 0% and 100%).
Now we can reasonably estimate the actual value of some probabilities in the following way. For example, if we toss a fair coin, the outcomes H and T actually differ only in the names we give them. Thus, in a long sequence of coin tosses, H and T can be expected to happen almost equally often. In other words, outcomes H and T can be reasonably assumed to have the same probability. This is only possible if each has probability of ½ (or 50%).
Since we use the concept of probability, which by definition is not certainty, it means that we do not predict the precise outcome of a particular trial. We expect, though, that in a large number of trials the number of occurrences of H will be roughly equal that of T. For example, if we conduct a million of trials, we expect that in approximately one half of trials (i.e. in close to 500,000 trials) the outcome will be T and in about the same number of trials it will be H.
Was our postulate of an honest coin correct? Obviously it could not be absolutely correct. No coin is perfect. Each coin has certain imprecision of shape and mass distribution, which may make the T outcome slightly more likely than the H outcome, or vice versa. A player may inadvertently favor a certain direction, which may be due to some anatomical peculiarities of his/her arm. There may be occasional wind affecting the coin’s fall, etc. However, for our theoretical discussion we will ignore the listed possible factors and assume an honest coin. Later we will return to the discussion of the above possible deviations from a perfectly honest coin.
We see that our postulate of an honest coin led to another postulate, that of the equal probability of possible outcomes of trials. In the case of a coin there were two different possible outcomes, T and H, equally probable. In some other situation there can be any number of possible outcomes. In some situations those possible outcomes can be assumed to be all equally probable, while in some other situations the postulate of their equal probability may not hold. In each specific situation it is necessary to clearly establish whether or not the postulate of equal probability of all possible outcomes is reasonably acceptable or whether it must be dismissed. Ignoring this requirement has been the source of many erroneous considerations of probabilities. We will discuss specific examples of such errors later on.
Now consider one more important feature of probability. Suppose we have conducted a trial and the outcome was T. Suppose we proceed to conduct one more trial, tossing the coin once more. Can we predict the outcome of the second trial given the known outcome of the first trial? Obviously, if we accept the postulate of an honest coin and the postulate of equal probability of outcomes, the outcome of the first trial has no effect on the second trial. Hence, the postulates of an honest coin and of equal probability of outcomes lead us to a third postulate, that of independence of tests. The postulate of independence of tests is based on the assumption that in each test the conditions are exactly the same, which means that after each test the initial conditions of the first test are exactly restored. The applicability of the postulate of tests’ independence must be ascertained before any conclusions can be made in regard to the probabilities’ estimation. If the independence of tests cannot be ascertained, the probability must be calculated differently from the situation when the tests are independent.
We will discuss situations with independent and not independent tests in more detail later on.
A discussion analogous to that of the honest coin can be also applied to those cases where the number of possible outcomes of a trial is larger than two, be this number three, six, or ten million. For example, if, instead of a coin, we deal with a die, the postulate of an honest coin has to be replaced with the similar postulate of an honest die, while the postulates of equal probability of all possible outcomes (of which there now are six instead of two) and of independent tests have to be verified as well before calculating the probability.
The postulate of an honest coin or its analogs are conventionally implied when probabilities are calculated. Except for some infrequent situations, this postulate is usually reasonably valid. However, some writers who calculate probabilities do not verify the validity of the postulates of equal probability and of independence of tests. This is not an uncommon source of erroneous estimation of probabilities. Pertinent examples will be discussed later on.
Suppose we conduct our coin game in consecutive sets of 10 trials each. Each set of 10 trials constitutes a test. In each ten-trial test the result is a set of 10 outcomes, constituting an event. For example, suppose that in the first test the event comprised the following 10 outcomes: H, H, T, H, T, T, H, T, H, and H. Hence, the event in question included 6 heads and 4 tails. Suppose that the next event comprised the following outcomes: T, H, T, T, H, H, H, T, T, and T. This time the event included 6 tails and 4 heads. In neither of the two events the number of T was equal the number of H, and, moreover, the ratio of H to T was different in the two tests. Does this mean that, first, our estimate of the probability of, say, H, as ½ was wrong, and, second, that our postulate of equal probabilities of H and T was wrong? Of course not.
We realize that the probability does not predict the exact outcome of each trial and hence does not predict particular events. What is, then, the meaning of probability?
If we accept the three postulates introduced earlier (honest coin, equal probability of outcomes and independence of tests) then we can define probability in the following manner. Let us suppose that the probability of a certain event A is expressed as 1/N, where N is a positive number. For example, if the event in question is the combination of two outcomes of tossing a coin, the probability of each such event is 1/4, where N=4. It means that in a large number X of tests event A will occur, on the average, once in every N tests. For this prediction to hold, X must be much larger than N. The larger is the ratio X/N, the closer the number of occurrences of event A will be to the probability value, i.e. to 1 occurrence in every N tests.
For example, as we concluded earlier, in a test comprising two consecutive tosses of a coin the probability of each of the four possible events is the same ¼, so N=4. It means that if we repeat the described test X times, where X is much larger than 4 (say, one million times) each of the four possible events, namely HH, HT, TT, and TH will happen, on the average once in every four tests.
We have now actually introduced (not quite rigorously) one more postulate, sometimes referred to as the law of large numbers. The gist of that law is that the value of probability can be some accidental number unless it is determined over a large number of tests. The value of probability does not predict the outcome of any particular test, but in a certain sense we can say that it "predicts" the results of a very large number of tests in terms of the values averaged over all the tests.
If any one of the four postulates is not held (the number of test is not much larger than N, the "coin" or "die" etc is not "honest," the outcomes are not equally probable, and finally if the tests are not independent) the value of probability calculated as 1/N has no meaningful interpretation.
Ignoring the last statement is often the source of unfounded conclusions from probability calculations.
Later we will also discuss situations when some of the above postulates do not hold (in particular, the postulate of independence) but nevertheless the probabilities of events comprising several trials each can be reasonably estimated.
The above descriptive definition of probability is sometimes referred to as the "classical" one.
There are in probability theory also some other definitions of probability. They overcome certain logical shortcomings of the classical definition and generalize it. In this paper we will not use explicitly (even though they may be sometimes implied) those more rigorous definitions since the above offered classical definition is sufficient for our purpose.
Let us now discuss the calculation of the probability of an event. Remember that event is defined as the combination of outcomes in a set of trials. For example, what is the probability that in a set of two trials with a coin the event will be "T, H" i.e. that the outcome of the first trial will be T and of the second trial will be H? We know that the probability of T in the first trial was ½. This conclusion stemmed from the fact that there were two equally probable outcomes. The probability of ½ was estimated dividing 1 by the number (which was 2) of all possible equally probable outcomes. If the trials are conducted twice in a row, how many possible equally probable events can be imagined? Here is the obvious list of all such events: 1) T, T; 2) T, H; 3) H, H; 4) H, T. The total of 4 possible results, all equally probable, covers all possible events. Obviously, the probability of each of those four events is the same ¼. We see that the probability of the event comprising the outcomes of two consecutive trials equals the product of probabilities of each of the sequential outcomes. This is one more postulate, which is based on the independence of tests, the rule of probabilities multiplication. The probability of an event is the product of the probabilities of the outcomes of all sequential trials constituting that event. As we will see later, this rule has certain limitations.
(In textbooks on the probability theory the independence of tests is often treated in the opposite way, namely establishing that if the probability of a combination of events equals the product of the probabilities of the individual events, then these individual events are independent).
Let us discuss certain aspects of probability calculations which have been a pivotal point in the dispute between "creationists" (who assert that the life could not have emerged spontaneously but only via a divine act by the Creator) and "evolutionists" (who adhere to a theory asserting that life emerged as a result of random interactions between chemical compounds in the primeval atmosphere of our planet or of some other planet).
In particular, the creationists maintain that the probability of life’s spontaneous emergence was so negligibly low that it must be dismissed as improbable.
To ascertain their view, the creationists use a number of various arguments. Lest I be misunderstood, I would like to point out that I am not discussing here whether the creationists or evolutionists are correct in their assertions in regard to the origin of life. This question is very complex and multi-faceted and the probabilistic argument often employed by the creationists is only one aspect of their view. What I will show is that the probabilistic argument itself, as commonly used by many creationists, is unfounded and cannot be viewed as a proof of their views, regardless of whether those views are correct or incorrect.
The probabilistic argument often used by the creationist is as follows. Imagine tossing a die with six facets. Repeat it 100 times. There are many possible combinations of the six numbers (we would say there are possible many events, each comprising 100 outcomes of individual trials). The probability of each event is exceedingly small (about one over 1077 ) and is the same for each combination of numbers, including, say, a combination of 100 "fours," that is 4,4,4,4,4,4,4,4,4,4 … etc, ("four" repeated 100 times in a row). However, say the creationists, the probability that the set of 100 numbers will be some random combination is much larger than the probability of 100 "fours," which is a unique, or "special" event. Likewise, the spontaneous emergence of life is a special event whose probability is exceedingly small, hence it could not happen spontaneously. Without discussing the ultimate conclusion about the origin of life, let us discuss only the example with the die.
Indeed, the probability that the event will be some random collection of numbers is much larger than the probability of "all fours." It does not mean anything. The larger probability of random sets of numbers is simply due to the fact that it is a combined probability of many events, while for "all fours" it is the probability of only one particular event. From the standpoint of the probability value, there is nothing special about "all fours" event; it is an event which is exactly as probable as any other individual combination of numbers, be it "all sixes," "half threes + half sevens" or any arbitrary disordered set of 100 numbers made up of six symbols, like 2,5,3,6,1,3,3,2…. etc. The probability that 100 trials result in any particular set of numbers is always less then the combined probability of all the rest of the possible sets of numbers, exactly to the same extent as it is for "all fours." For example, the probability that 100 consecutive trials will result in the following disordered set of numbers: 2, 4, 1, 5, 2, 6, 2, 3, 3, 4, 4, 6, 1…etc., which is not a "special" event, is less than the combined probability of all other about 1077 possible combinations of outcomes, including the "all fours" event, to the same extent as this is true for the "special" event of "all fours" itself.
The crucial fault of the creationists’ probabilistic argument is their next step. They proceed to assert that the "special’ event whose probability is extremely small, simply did not happen. However, this argument can be equally applied to any competing event whose probability is equally extremely small. In the case of a set of 100 trials, every one of about 1077 possible events has the same exceedingly small probability. Nevertheless, one of them must necessarily take place. If we accept the probabilistic argument of the creationists, we will have to conclude that none of the 1077 possible events could have happened, which is an obvious absurdity.
Of course, nothing in probability theory forbids any event to be "special" in some sense, and spontaneous emergence of life qualifies very well for the title of a "special" event. Being "special" from our human viewpoint in no way makes this or any other event stand alone from the standpoint of probability estimation. Therefore probabilistic arguments are simply irrelevant when the spontaneous emergence of life is discussed.
Let us look once more, by way of a very simple example, at the argument based on the very small probability of a "special" event versus "non-special" ones. Consider a case when the events under discussion are sets of three consecutive tosses of a coin. The possible events are as follows: HHH, HHT, HTT, HTH, TTT, TTH, THH, THT. Let say that for some reasons we view events HHH and TTT as "special" while the rest of the possible events are not "special." If we adopt the probabilistic arguments of creationists, we can assert that the probability of a "special" event, say, HHH (which is in this case 1/8) is less then the probability of event "Not HHH" (which is 7/8). This assertion is true. However, it does not at all mean that event HHH is indeed special from the standpoint of probability. Indeed, we can assert by the same token that the probability of any other of the eight possible events, for example of event HTH (which is also 1/8) is less than the probability of event "Not HTH" (which is 7/8). There are no probabilistic reasons to see event HHH as happening by miracle. Its probability is not less than that of any of the other eight possible events. This conclusion is equally applicable to situations in which not eight but billions of billions alternative events are possible.
The "all fours" type of argument has no bearing whatsoever on the question of the spontaneous emergence of life.
I will return to the discussion of supposedly "special" vs. "non-special" events in a subsequent section of this essay.
Now I will discuss situations in which the probabilities calculated before the first trial cannot be directly multiplied to calculate the probability of an event.
Again, consider an example. Imagine a box containing six balls identical in all respects except for their colors. Let one ball be white, two balls, red, and three balls, green. We randomly pull out one ball. (The term "randomly" in this context is equivalent to the previously introduced concepts of an "honest coin" and an "honest die"). What is the probability that the randomly chosen ball is of a certain color? Since all balls are otherwise identical and are chosen randomly, each of the six balls has the same probability of 1/6 to be chosen in the first trial. However, since the number of balls of different colors varies, the probability that a certain color is chosen, is different for white, red, and green. Since there is only one white ball available, the probability that the chosen ball will be white is 1/6. Since there are two red balls available, the probability of a red ball to be chosen is 2/6=1/3. Finally, since there are three green balls available, the probability that the chosen ball happens to be green is 3/6=1/2.
Assume first that the ball chosen in the first trial happens to be red. Now, unlike in the previous example, let us proceed to the second trial without replacing the red ball. Hence, after the first trial there remain only five balls in the box, one white, one red and three green. Since all these five balls are identical except for their color, each of them has the same probability of 1/5 to be randomly chosen in the second trial. What is the probability that the ball chosen in the second trial is of a certain color? Since there is still only one white ball available, the probability of that ball to be randomly chosen is 1/5. There is now only one red ball available, so the probability of a red ball to be randomly chosen is also 1/5. Finally, for a green ball the probability is 3/5. So if in the first trial a red ball was randomly chosen, the probabilities of balls of different colors to be randomly chosen in the second trial are 1/5 (W), 1/5 (R), and 3/5 (G).
Assume now that in the first trial not a red, but a green ball was randomly chosen. Again, adhering to the "no replacement" procedure, we proceed to the second trial without replacing the green ball in the box. Now there remain again only five balls available, one white, two red and two green. What are the probabilities that in the second trial balls of specific colors will be randomly chosen? Each of the five balls available has the same probability to be randomly chosen, 1/5. Since, though, there are only one white ball, two red and two green balls available, the probability that the ball randomly chosen in the second trial happens to be white is 1/5, while for both red and green balls it is 2/5.
Hence, if the ball chosen in the first trial happened to be red, then the probabilities to be chosen in the second trial would be 1/5 (W), 1/5 (R) and 3/5 (G). If, though, the ball chosen in the first trial happened to be green, then the probabilities in the second trial would change to 1/5 (W), 2/5 (R) and 2/5 (G).
The conclusion: in the case of trials without replacement, the probabilities of outcomes in the second trial depend on the actual outcome of the first trial, hence in this case the tests are not independent.
When the tests are not independent, the probabilities calculated separately for each of the sequential trials cannot be directly multiplied. Indeed, the probabilities calculated before the first trial were as follows: 1/6 (W), 2/6=1/3 (R) and 3/6=1/2 (G). If we multiplied the probabilities like in the case of independent test, we would have obtained the probabilities, for example, for the event (RR) as 1/3 times 1/3 which equals 1/9. Actually, though, the probability of that event is 1/3 times 1/5 which is 1/15. Of course, probability theory provides an excellent way to deal with the "no replacement" situation, using the concept of so-called "conditional probabilities." However some writers utilizing probability calculations seem to be unaware of the distinction between independent and non-independent tests. Ignoring that distinction has been a source of crude errors.
One example of such erroneous calculations of probabilities is how some proponents of the so-called Bible code estimate the probability of the appearance in a text of certain letter sequences.
The letter sequences in question are the so-called ELS which stands for "equidistant letter sequences." For example, in the preceding sentence the word "question" includes the letter "s" as the fourth letter from the left. Skip the preceding letter "e" and there is the letter "u." Skip again the preceding letter "q" and the space between the words (which is to be ignored), and there is the letter "n." The three letters, s, u, and n, separated by "skips" of 2, constitute the word "sun" if read from right to left. This is an ELS with a negative "skip" of –2. There are many such ELS, both read from right to left and from left to right in any text.
There are people who are busy looking for arrays of ELS in the Hebrew Bible believing these arrays had been inserted into the text of the Bible by the divine Creator and constitute a meaningful "code." As one of the arguments in favor of their beliefs, the proponents of the "code" attempt to show that the probability of such arrays of ELS happening in a text by sheer chance is exceedingly small and therefore the presence of those arrays of ELS must be attributed to the divine design.
There are a few publications in which attempts have been made to apply an allegedly sound statistical test to the question of the Bible code. In particular, D. Witzum, E. Rips, and Y. Rosenberg (WRR) described such an attempt in a paper published in 1994 in "Statistical Science" (v. 9, No 3, 429 – 438). The methodology by WRR has been thoroughly analyzed in a number of critical publications and shown to be deficient. This methodology goes further than the application of probability theory, making use of some tools of mathematical statistics, and therefore is not discussed here since this paper is only about probability calculations. However, besides the paper by WRR and some other similar publications, there are many publications where no real statistical analysis is attempted but only "simple" calculations of probabilities are employed. There are common errors in those publications, one being the multiplication of probabilities in cases when the tests are not independent. (There are also many web publications in which a supposedly deeper statistical approach is utilized to prove the existence of the Bible code. These calculations purport to determine the probability of appearance in the text not just of individual ELS, but of whole clusters of such. Such analysis usually starts with the same erroneous calculation of probabilities of individual words as examined in the following paragraphs.
Usually the calculations in question start by choosing a word whose possible appearance as an ELS in the given text is being explored. When such a word has been selected, its first letter becomes thus determined. The next step is estimating the probability of the letter in question to appear at arbitrary locations in a text. The procedure is repeated for every letter of the chosen word. After having allegedly determined the probabilities of occurrence of each letter of a word constituting an ELS, the proponents of the "code" then multiply the calculated probabilities, thus supposedly finding the probability of the occurrence of the given ELS.
Such multiplication is illegitimate. Indeed, a given text comprises a certain set of letters. When the first letter of an ELS has been chosen (and the probability of its occurrence anywhere in the text has been calculated) this makes all the sites in the text occupied by that letter inaccessible to any other letter. Let us assume that the first letter of the word in question is X, and it happens x times in the entire text, whose total length is N letters. The proponents of the code calculate the probability of X occurring at any arbitrary site as x/N. This calculation would be correct only for a random collection of N letters, among which letter X happens x times. For a meaningful text this calculation is wrong. However, since we wish at this time to address only the question of test’s independence, let us accept the described calculation for the sake of discussion. As soon as letter X has been selected, and the probability of its occurrence at any location in the text allegedly determined, the number of sites accessible for the second letter in the chosen word decreases from N to N-x. Hence, even if we accept the described calculation, then the probability of the second letter (let us denote it Y) to appear at an arbitrary still accessible site is now y/(N-x) where y is the number of occurrences of letter Y in the entire text. It is well known that the frequencies of various letters in meaningful texts are different. For example, in English the most frequent letter is e, whose frequency (about 12.3%) is about 180 times larger than that of the least frequent letter, which is z (about 0.07%).
Hence, depending on which letter is the first one in the chosen word, i.e., on what the value of x is, the probability of the occurrence of the second letter, estimated as y/(N-x), will differ.
Therefore we have in the described case a typical situation "without replacement" where the outcome of the second trial (the probability of Y) depends on the outcome of the preceding trial (which in its turn depends on the choice of X). Therefore the multiplication of calculated probabilities performed by the code proponents as the second (as well as the third, the fourth, etc) step of their estimation of ELS probability is illegitimate and produces meaningless numbers of alleged probabilities.
The probabilities of various individual letters appearing at an arbitrary site in a text are not very small (mostly between about 1/8 and 1/100). If a word consists of, say, six letters, the multiplication of six such fractions results in a very small number which is then considered to be the probability of an ELS but is actually far from the correct value of the probability in question.
Using y/(N-x) instead of y/N, and thus correcting one of the errors of such calculations, would not suffice to make the estimation of the probability of an ELS reliable. The correct probability of an ELS could be calculated based on certain assumption in regard to the text’s structure, which distinguishes meaningful texts from random conglomerates of letters. There is no mathematical model of meaningful texts available, and therefore the estimations of the ELS probability, even if calculated accounting for interdependence of tests, would have little practical meaning until such a mathematical model is developed.
Finally, the amply demonstrated presence of immense numbers of various ELS in both biblical and any other texts, in Hebrew as well as in other languages, is the simplest and also the most convincing proof that the allegedly very small probabilities of ELS appearance, as calculated by the proponents of the "code," are indeed of no evidential merit whatsoever.
So far I have discussed the quantitative aspects of probability. I will now discuss probability from a different angle, namely analyzing its cognitive aspects. This discussion will be twofold. One side of the cognitive meaning of probability is that it essentially reflects the amount of information available about the possible events. The other side of the probability’s cognitive aspect is the question of what the significance of this or that value of probability essentially is.
I will start with the question of the relationship between the calculated probability and the level of information available about the subject of the probability analysis. I will proceed by considering certain examples illustrating that feature of probability.
Imagine that you want to meet your friend who works for a company with offices in a multistory building in the downtown. Close to 5 pm you are on the opposite side of the street, waiting for your friend to come out of the building. Let us imagine that you would like to estimate the probability that the first person coming out will be male. You have never been inside that building so you have no knowledge of the composition of the people working in that building. Your estimate will necessarily be that the probability of the first person coming out being male is ½, and the same probability for female. Let us further imagine that your friend who works in that building knows that among the people working there about 2/3 are female and about 1/3 are male. Obviously his estimate will be that the probability of the first person coming out to be male is 1/3 rather than ½. Obviously, the objective likelihood of a male coming out first does not depend on who makes the estimate. It is 1/3. The different estimates of probability are due to something that has no relation to the subject of the probability estimation. They are due to the different level of information about the subject possessed by you and your friend. Because of a very limited knowledge about the subject, you have to assume that two possible events – a male or a female coming first, are equally probable. Your friend knew more, in particular he knew that the probability of a female coming out first was larger than of a male coming out first.
This example illustrates an important property of the calculated probability. It reflects the level of knowledge about a subject. If we possess the full knowledge about the subject we know exactly, in advance, the outcome of a test, so instead of probability we deal with certainty.
A common situation in which we have full knowledge of the situation is when an event has actually occurred. In such a situation the question of the probability of the event is meaningless. After the first person had actually come out of the building, the question of the probability of that event becomes moot. Of course we still can calculate the probability of that event, but doing so we necessarily deal with an imaginary situation assuming the event has not yet actually occurred.
Being the reflection of the level of knowledge about a subject is the ubiquitous and most essential feature of the probability from the viewpoint of its cognitive essence.
What about the examples with a coin or a die, where we thought we possessed the full knowledge of all possible outcomes and all those possible outcomes definitely seemed to be equally probable?
We did not possess such knowledge! Our assumption of the equal probability of either heads or tails, or of the equal probability of each of the six possible outcomes of a trial with a die was due to our limited knowledge about the actual properties of the coin or of the die. No coin and no die are perfect. Therefore, in the tests with a coin, either head or tail may have a slightly better chance of occurring. Likewise, in the test with a die, some of the six facets of the die may have a slightly better chance to face upward. In tests conducted by K. Pearson with a coin (1921), after it was tossed 24,000 times, head occurred in 12,012 trials, while tail, in 11988 trials. Generally speaking, the slight difference between the numbers of heads and tails is expected in a large sequence of truly random tests. On the other hand, we cannot exclude that the described result was due, at least partially, to a certain imperfection in the coin used, or in the procedure employed.
Since we have no knowledge of the particular subtle imperfections of a given coin or die, we have to postulate the equal probability of all possible outcomes.
In the tests with a die or a coin, we at least know all possible outcomes. There are many situations in which we have no such knowledge. If that is the case, we have to assume the existence of some supposedly possible events which actually are impossible, but we simply cannot rule them out.
For example, assume we wish to estimate the probability that upon
entering a property at
Quite often the very small calculated probabilities of certain events are due to the lack of information and hence to an exaggerated number of supposedly possible events many of which are actually impossible. One example of such a greatly underestimated probability of an event is the alleged estimation of the probability of life’s spontaneous emergence. The calculations in question are based on a number of arbitrary assumptions and deal with a situation whose details are largely unknown. Therefore, in such calculations the number of possible events is greatly exaggerated, and all of them are assumed to be equally probable, which leads to extremely small values of calculated probability. Actually, many of the allegedly possible paths of chemical interactions may be impossible, and those possible are by no means equally probable. Therefore (and for some other reasons as well) the extremely small probability of life’s spontaneous emergence must be viewed with the utmost skepticism.
Of course, it is equally easy to give an example of a case in which insufficient knowledge of the situation results not in an increased but rather in a decreased number of supposedly possible outcomes of a test. Imagine that you made an appointment over the phone to meet John Doe at the entrance to his residence. You have never before seen his residence. When you arrive at his address you discover that he lives in a large apartment house which seems to have two entrances at the opposite corners of the building. You have to watch both entrances. Your estimate of the probability that John would exit from the eastern door is ½, as it is also that he would exit from the western door. The estimated number, ½, results from your assumption of equal probability of John’s choosing either of the exits and from your knowledge that there are two exits. However, what if you don’t know that the building has also one more exit in the rear? If you knew that fact, your estimated probability would drop to 1/3 for each of the doors. Insufficient knowledge (you knew only about two possible outcomes) led you to an increased estimated probability compared with that calculated with a more complete knowledge of the situation, accounting for all three possible outcomes.
The two described situations, one when the number of possible outcomes is assumed to be larger than it actually is, and the other when the number of supposedly possible outcomes is less that the actual number of them, may result in two different types of judgment, leading either to exaggerated or to underestimated probability for the event in question.
Now let us discuss the other side of the probability’s cognitive aspect. What is the real meaning of probability’s calculated value if it happens to be very small?
Consider first the situation when all possible outcomes of trials are supposedly equally probable. Assume the probability of an event A was calculated as 1/N where N is a very large number so the probability of the event is very low. Often, such a result is interpreted as an indication that the event in question should be considered, to all intents and purposes, as practically impossible. However, such an interpretation, which may be psychologically attractive, has no basis in probability theory. The actual meaning of that value of 1/N is just that – the event in question is one of N equally probable events. If event A has not occurred it simply means that some other event B has occurred instead. But event B had the same very low probability of occurring as event A. So why could the low-probability event B actually occur but event A which had the same probability as B, could not occur?
An extremely low value for a calculated probability has no cognitive meaning in itself. Whichever one of N possible events has actually occurred, it necessarily had the same very low probability as the others, but has occurred nevertheless. Therefore the assertion of impossibility of such events as the spontaneous emergence of life, based on its calculated very low probability, has no merit.
If the possible events are actually not equally probable, which is a more realistic approach, a very low calculated probability of an event has even less of a cognitive meaning, since its calculation ignored the possible existence of preferential chains of outcomes which could ensure a much higher probability for the event in question.
The above discourse may produce in the minds of some readers an impression that my thesis was to show that the concept of probability is really not very useful since its cognitive contents is very limited. This was by no means my intention. When properly applied and if not expected to produce unrealistic predictions, the concept of probability may be a very potent tool for shedding light on many problems in science and engineering. When applied improperly and if expected to be a magic bullet to produce predictions, it often becomes misleading and a basis for a number of unfounded and sometimes ludicrous conclusions. The real power of the properly calculated and interpreted probability is, however, not in the calculations of probability of this or that event, when it is indeed of a limited value, but when the probability is utilized as an integrated tool within the much more sophisticated framework of either mathematical statistics or statistical physics.
The scientific theories often seem to contradict common sense. When this is the case, it is the alleged common sense that is deceptive, while the assertions of science are correct. The whole science of quantum mechanics, which is one of the most magnificent achievements of the human mind, seems to be contrary to the "common sense" based on the everyday experience of men.
One good example of the above contradiction is related to the motion of spacecrafts in orbit about a planet. If there are two spacecrafts moving in the same orbit, one behind the other, what should the pilot of the craft that is behind do if he wishes to overtake the one ahead? "Common sense" tells us that the pilot in question has to increase the speed of his craft along the orbital path. Indeed, that is what we do when we wish to overtake a car that is ahead of us on a road. However, in the case of an orbital flight the "common sense" is wrong. To overtake a spacecraft that is ahead in the orbit, the pilot of the craft that lags behind must decrease rather than to increase his speed. This theoretical conclusion of the science of mechanics has been decisively confirmed in multiple flights of spacecrafts and artificial satellites, despite its seemingly contradicting the normal experience of car drivers, pedestrians, runners, and horsemen, and "common sense" based on that experience. Likewise, many conclusions of probability theory may seem to contradict common sense, but nevertheless probability theory is correct while "common sense" in those cases is wrong.
Consider an experiment with a die, where events in question are sets of 10 trials each. Recall that we assume an "honest die" and in addition the independence of outcomes. If we toss the die once, each of the six possible outcomes has the same chance of happening, the probability of each of the six numbers to face up being the same 1/6. Assume that in the first trial the outcome was, say, 3. Then we toss the die the second time. It is the same die, tossed in the same way, with the same six equally probable outcomes. To get an outcome of 3 is as probable as any of the five other outcomes. The tests are independent, so the outcome of each subsequent trial does not depend on the outcomes of any of the preceding trials.
Now toss the die in sets of 10 trials each. Assume that the first event is as follows: A (3, 5, 6, 2, 6, 5, 6, 4, 1, 1). We are not surprised in the least since we know that there are 610 (which is 60,466,176) possible, equally probable events. Event A is just one of them and does not stand alone in any respect among those over sixty million events, so it could have happened in any set of 10 trials as well as any other of those sixty million variations of numbers. Let us assume that in the second set of 10 trials the event is B (6, 5, 4, 2, 6, 2, 3, 2, 1, 6). Again, we have no reason to be surprised by such a result since it just another of those millions of possible events and there is no reason whatsoever for it not to happen. So far the probability theory seems to agree with common sense.
Assume now that in the third set of 10 trials the event is C (4, 4, 4, 4, 4, 4, 4, 4, 4, 4). I am confident that in such a case everybody would be amazed and the immediate explanation of that seemingly "improbable" event would be the suspicion that either the die has been tampered with or that it was tossed using some sleight of hand.
While cheating cannot be excluded, the event with all ten "fours" does not necessarily require the assumption of cheating.
Indeed, what was the probability of event A? It was one in over sixty million. Despite the exceedingly small probability of A, its occurrence did not surprise anybody. What was the probability of event B? Again only one in over sixty million but we were not amazed at all. What was the probability of event C? The same one in over sixty million, but this time we are amazed.
From the standpoint of probability theory there is no difference whatsoever between any of the sixty million possible events, including events A, B and C, and all other (60,466,176 – 3 = 60,466,173) possible variations of a six-number combination.
Is the ten-time repeat of "four" extremely unlikely? Yes, it is. Indeed, its probability was only 1 in over sixty million! However, we should remember that any other combination of six numbers is as unlikely (or as likely) to occur as has the "all fours" combination. The occurrence of 10 identical outcomes in a row is very unlikely, but not less likely than the occurrence of any other possible set of ten numbers.
The theory of probability asserts that if we repeat this ten-die-tossing test, say a billion billion billions times, then each of the about sixty million possible combinations of ten numbers will happen approximately once in every 60,466,176 ten-tossing tests. This is true equally for the "all-fours" combination and for any other of the over sixty million competing combinations.
Why does the "all-fours" event seem amazing? Only for psychological reasons. It seems easier to assume cheating on the part of the dice-tossing player than the never before seen occurrence of "all fours" in ten trials. What is not realized is that the overwhelming majority of events other than "all fours" was never seen, either. There are so many possible combinations of ten numbers, composed of six different unique numbers, that each of them occurs extremely rarely. The set of 10 identical numbers seems psychologically to be "special" among combinations of different numbers. For probability theory, though, the set of "all fours" is not special in any respect.
Of course, if the actual event is highly favorable to one of the players, it justifies a suspicion of cheating . The reason for that is our experience which tells us that cheating is rather highly probable when a monetary or other award is in the offing. However, the probability of cheating is actually irrelevant to our discussion. Indeed, the probability of cheating is just a peculiar feature of the example with a game of dice. This example is used, however, to illustrate the question of the spontaneous emergence of life where no analog of cheating is present. Therefore, the proper analogy is one in which cheating is excluded. When the possibility of cheating is excluded, only the mathematical probability of any of the over sixty million possible events has to be considered. In such a case, every one of those over sixty million events is equally probable. Therefore the extremely low probability of any of those events, including an ordered sequence of "all fours," is of no cognitive significance. However special this ordered sequence may be from a certain viewpoint, it is not special at all from the standpoint of probability. The same must be said about the probability of spontaneous emergence of life. However small it is, it is not less than the probability of any of the competing events and therefore its extremely small probability in no way means it could not have happened.
Probability theory is a part of science and has been overwhelmingly confirmed to be a good theory of great power. There is no doubt that the viewpoint of probability theory is correct. The psychological reaction to ten identical outcomes in a set is as wrong as is the suggestion to a pilot of a spacecraft lagging behind to increase his speed if he wishes to overcome a craft ahead in orbit.
Another example of erroneous attitude to an "improbable" event, based on psychological reasons, is the case of multiple wins in a lottery.
Consider a simple raffle in which there are only 100 tickets on sale. To determine the winner, numbers from 1 to 100 are written on small identical pieces of paper, the pieces are rolled up and placed in a rotating cylinder. After the cylinder has been rotated several times, a child whose eyes are covered with a piece of cloth pulls one of the pieces out of the cylinder. This procedure seems to ensure as complete an absence of bias as humanly possible.
Obviously each of the tickets has the same probability of winning, namely 1/100. Let us assume John Doe is the lucky one. We congratulate him but nobody is surprised by John’s win. Out of the hundred tickets one must necessarily win, so why shouldn’t John be the winner?
Assume now that the raffle had not 100 but 10,000 tickets sold. In this case the probability of winning was the same for each ticket, namely 1/10,000. Assume Jim Jones won in that lottery. Are we surprised? Of course not. One ticket out of 10,000 had to win, so why shouldn’t it be that of Jim?
The same discussion is applicable to any big lottery where there are hundreds of thousands or even millions of tickets. Regardless of the number of tickets available, one of them, either sold or unsold, must necessarily win, so why shouldn’t it be that of Jim or John?
Now let us return to the small raffle with only 100 tickets sold. Recall that John Doe won it. Assume now that, encouraged by his win, John decides to play once again. John has already won once; the other 99 players have not yet won at all. What is the probability of winning in the second run? For every one of the 100 players, including John, it is again the same 1/100. Does John’s previous win provide him with any advantages or disadvantages compared to other 99 players? None whatsoever. All one hundred players are in the same position, including John.
Assume now that John wins again. It is as probable as that any of the other 99 players winning this time, so why shouldn’t it be John? However, if John wins the second time in a row, everybody is amazed by his luck. Why the amazement?
Let us calculate the probability of a double win, based on the assumption that no cheating was possible. The probability of winning in the first run was 1/100. The probability of winning in the second run was again 1/100. The events are independent, therefore the probability of winning twice in a row is 1/100 times 1/100 which is 1 in 10,000. It is exactly the same probability as it was in the raffle with 10,000 tickets played in one run. When Jim won that raffle, we were not surprised at all, despite the probability of his win being only 1 in 10,000, nor should we have been. So why should we be amazed at John’s double win whose probability was exactly the same 1 in 10,000?
Let us clarify the difference between the cases of a large raffle played only once and a small raffle played several times in a row.
If a raffle is played only once and N tickets have been distributed, covering all N possible versions of numbers, of which each one has the same chance to win, then the probability that a particular player wins is p(P)=1/N while the probability that someone out of N players (whoever he or she might be) wins is p(S)=1 (i.e. 100%).
If though the raffle is played k times, and each time n players participate, where n^k=N, the probability that a particular player wins k times in a row is again the same 1/N. Indeed, in each game the probability of winning for a particular player now is 1/n. The games are independent of each other. Hence the probability of winning k times equals the product of probabilities of winning in each game, i.e. it is (1/n)^k=1/N.
However, the probability that someone (whoever he/she happens to be) wins k times in a row is now not 1, but not more than n/N, that maximum value corresponding to the situation in which the same n players play in all k games. Indeed, for each particular player the probability of winning k times in a row is 1/N. Since there are n players, each with the same chance to win k times, the probability of someone in that group winning k times in a row is n times 1/N i.e. n/N . In other words, in a big raffle played only once somebody necessarily wins (p=1). On the other hand, in a small raffle played k times, it is likely that nobody wins k times in a row, as the probability of such a multiple win is small.
Here is a numerical example. Let the big raffle be such that N=1,000,000. If all N tickets are distributed, the probability that John Doe wins is one in a million. However, the probability that somebody (whoever he/she happens to be) wins is 1 (i.e.100%).
If the raffle is small, such that only n=100 tickets are distributed, the probability of any particular player winning in a given game is 1/100. If k=3 games are played, the probability that a particular John Doe wins 3 times in a row is (1/100)^3 which is again one in a million, exactly as it was in a one-game raffle with N=1,000,000 tickets.
However, the probability that someone wins three times in a row, whoever he or she happens to be, is now not 100% but not more than only n/N=100/1,000,000 (or less, if the composition of the players group changes from game to game) which is 0.0001, i.e. 10,000 times less than in a one-game raffle with one million tickets. Hence, such a raffle may be played time after time after time, without anybody winning k times in a row. Actually such a multiple win must be very rare.
When John Doe wins three times in a row, we are amazed not because the probability of that event was one in a million (which is the same as for a single win in a big one-game raffle) but because the probability of anyone winning three times in a row is ten thousand times less than it is in a one-game big raffle.
Hence, while in the big raffle played just once, the fact that somebody won is a 100% probable (i.e. is a certain event), in the case of a small raffle played three times a triple win is a rare event of low probability (in our example 1 in 10,000).
However, if we adhere to the postulate of a fair game, a triple win is not a special event despite its low probability. It is as probable as any other combination of three winning tickets, namely in our example one in a million. To suspect fraud means to abolish the postulate of a fair game. Indeed, if we know that fraud is possible, intuitively, we compare the probability of an honest triple win with the probability of fraud. Our estimate is that the probability of an honest triple win (in our case 1 in 10,000) is less than the probability of fraud (which in some cases may be quite high).
The above discussion related only to a raffle-type lottery. If the lottery
is what sometimes is referred to as the Irish-type lottery, the situation is
slightly different. In this type of a lottery, the players themselves choose a
set of numbers for their tickets. For example, I believe that in the
From the above we can conclude that when a particular player wins more than once in consecutive games, we are amazed not because the probability of winning for that particular player is very low, but because the probability of anybody (whoever he/she happens to be) winning consecutively in more than one game is much less than the probability of someone winning only once in an even much larger lottery. We intuitively estimate the difference between the two situations. However, the important point is that what impresses us is not the sheer small probability of someone winning against enormous odds. This probability is equally small in the case of winning only once in a big lottery, but in that case we are not amazed. This illustrates the psychological aspect of probability.
Let us briefly discuss the meaning of the term "special event." When stating that none of the N possible, equally probable events was in any way special, I only meant to say that it was not special from the standpoint of its probability. Any event, while not special in the above sense, may be very special in some other sense.
Consider an example. Let us imagine a die whose six facets bear, instead of numbers from 1 to 6, six letters A, B, C, D, E, and F. Let us imagine further that we toss the die in sets of six trials each. In such a case there are 66 = 46,656 possible, equally probable events. Among those events are the following three: ABCDEF, AAAAAA, and FDCABE. Each of these three events has the same probability of 1 in 46,656. Hence, from the standpoint of probability none of these three events is special in any sense.
However, each of these three events may be special in a sense otherwise
than probabilistic. Indeed, for somebody interested in alphabets the first of
the three events may seem to be very special since the six outcomes are in the
alphabetical order. Of course, alphabetical order in itself has no intrinsic special
meaning, thus a person whose language is, for example, Chinese, would hardly
see anything special in that particular order of symbols. The second event,
with its six identical outcomes may seem miraculous to a person inclined to see
miracles everywhere and to attach some special significance to coincidences
many of which happen all the time. The third event seems to be not special but
rather just one of the large number of possible events. However, imagine a
person whose first name is
Whatever special significance this or that person may be inclined to attribute to any of the possible, equally probable events, none of them is special from the standpoint of probability. This conclusion is equally valid regardless of the value of the probability, however small it happens to be.
In particular, the extremely small probability of the spontaneous emergence of intelligent life, as calculated (usually not quite correctly) by the opponents of the hypothesis of life’s spontaneous emergence, by no means indicates that the spontaneous emergence of life must be ruled out. (There are many non-probabilistic arguments both in favor of creationism and against it which we will not discuss in this essay). The spontaneous emergence of life was an extremely unlikely event, but all other alternatives were extremely unlikely as well. One out of N possible events did occur, and there is nothing special in that from the standpoint of probability, even though it may be very special from your or my personal viewpoint.
In this appendix, I will calculate the probability of more than one player simultaneously winning the Irish type lottery.
Let N be the number of possible combinations of numbers to be chosen by players (of which one combination, chosen randomly, will be the winning set). Let T <= N be the number of tickets sold.
Now calculate p(L), the probability that exactly L players select the winning combination.
The number of choices of L tickets out of T is given by the binomial distribution:
bin(T,L) = T!/(L!(T-L)!)
For those L tickets to be the only winners, they must all select the winning combination . The probability of that is (1/N)^L. All the other T-L players must select a non-winning combination, probability of that being (1-1/N)^(T-L).
We multiply those three quantities, which yields the formula
P(L) = bin(T,L) (1/N)^L (1-1/N)^(T-L).
This formula can be simplified, preserving a good precision. Since usually N and T are very large and L is very small, we can use the following approximations:
T!/(T-L)! approx= T^L ; (1-1/N)^(T-L) approx= exp(-T/N).
Now the formula becomes
P(L) approx= (T/N)^L exp(-T/N) / L!
This approximate (but quite accurate) formula is the Poisson distribution with mean T/N. In the case when T=N (i.e. when all available tickets are sold) we have a simpler formula:
P(L) approx= exp(-1) / L!.
(A complication in practice may be that when one person buys more than one ticket he/she certainly makes sure that all the combinations of numbers he/she chooses are different. However, the approximate formula will still be very accurate unless someone is buying a large fraction of all tickets, which is unlikely).
The probability that only one, but not less than one player wins once in this type of a lottery is now less than 100%, but is (assuming that L=1) p(1)=1/E=0.368, or close to 37%, which is still thousands time more than the probability of the same player winning consecutively in more than one drawings.
Mark Perakh's main page: http://members.cox.net/marperak .