A Response To Roy Reinhold's April 6, 2000
"Statistics in Bible Codes Programs"

Author: Keith York

This article is property of the author and may not be reprinted or distributed without permission

April 19, 2000

 

[6/18/2000 postscript: Please read Why Cluster Analysis Is Flawed which pertains to this article.]

As had been noted in the "Articles in the works" sidebar on the News index page, I have been working on an article going into greater detail on the issue of statistical analysis.  My plan had been to develop a new protocol for statistical analysis of Bible code arrays.  I see that Roy Reinhold has beat me to the punch and come out with a 3-part paper on "Statistics in Bible Code Programs" posted on April 6 (click here).  This short article consists of my analysis of his paper and approach.

First of all, Roy acknowledges the need for statistical analysis of code arrays.  As he states, a matrix probability "gives the viewer a mathematical gauge to determine how important the matrix is".  Secondly, he puts his statistical approach on a solid footing by introducing the concept of expected occurrences.  This is simply the number of times a particular ELS may be expected to occur in a search text in a certain skip distance range -d to +d.  Ed Sherman and Dave Swaney at http://www.biblecodedigest.com use expected occurrences in their statistical analyses.  In my own protocol that I have been working on but have not yet published, I have been using actual number of occurrences as the initial basis of analysis.  Sometimes actual occurrences will be higher than expected occurrences, and sometimes lower.  However, when averaged over a large number of analyses, the two methods should give roughly similar results.  The main disadvantage of using actual number of occurrences is that each skip distance range has to be searched to determine that number.  The main advantage of using expected number of occurrences is that the software (if one is using CodeFinder, which is my own personal preference in codes programs) automatically calculates the negative logarithm of the number of expected occurrences for any inputted ELS, search text, and skip distance range.

In Part 2 of his paper, Roy defines this negative logarithm of expected occurrences (which is the same as saying logarithm of [1 divided by the number of expected occurrences]) as a text R-value.  (The R-value is a statistical measure named after its inventor Dr. Alex Rotenberg.)  He then introduces a concept called a matrix R-value, which equals "log (1/Ematrix) (where E is the expected occurrences in the matrix)".  This simple innovation which Kevin Acres has programmed into the latest beta version of CodeFinder allows one to do an amazingly quick and easy calculation of a matrix probability.  As he states, one can "sum the matrix R-values for those terms with positive matrix R-values to arrive at an overall R-value for the matrix.  Why would we sum only the positive R-values?  The answer is in the table above, where we see that with negative matrix R-values, we have expected occurrences of greater than 1.000.  This means that we almost certainly expect to find a term with an expected occurrence of 1.000 within the matrix.  Those negative matrix R-value terms do not add anything to the probability of the matrix".  Once one has added the positive matrix R-values together to get an overall R-value, one can find the overall probability for the array as being 1 in [antilog (overall R-value)].  For example, if the overall R-value is 5, then antilog 5 = 100,000 (i.e., 10 to the 5th power), and the overall probability for the array is 1 in 100,000.

The attractiveness of the above methodology is its ease and quickness of use.  Once a person has generated a matrix report from CodeFinder, he can perform the above calculation in less than a minute.  In my own protocol, I sought to develop a method that would be easy to understand and use.  I have to say that the calculation Roy and Kevin have developed is much easier to understand and use than what I have been working on.

Having demonstrated this approach, in Part 3 Roy admits some of the limitations of the method and some thoughts on other possible methods.  He does a good job in discussing the problems involved.  As he states, the method used by Doron Witztum, Eliyahu Rips, and Yoav Rosenberg in the "Great Rabbis" experiment uses a measure that was "applied only to word pairs, but is much tougher to apply to many terms in a cluster within a matrix."  This has been the impetus behind the development of a new statistical method by many individuals, myself included.  As Roy states, developing a single method that would accurately capture all the important facets of the codes in an easy-to-use probability calculation is a daunting challenge, and he admits that what he has found is only a step toward that goal.

When I first contacted Roy by e-mail concerning his paper, I expressed two concerns about his method and example calculation.  First of all, he correctly states that terms with negative matrix R-values can be expected to occur one or more times in the matrix simply by chance.  Examining his Sid Roth life array, I noticed that only 6 of the 73 ELS's had positive matrix R-values.  If the other 67 can be expected to occur in the matrix simply by chance, then why should they be included in the array in the first place?  Secondly, I pointed out some factors (which I will not go into here) that led me to believe that the overall probability for the Sid Roth array was worse than he had calculated.  Roy responded that the majority of those 67 terms are found in 8 large clusters found within the larger array.  He even states in Part 3 that if a more detailed statistical analysis was feasible that related the terms within clusters to each other and overall clusters to the central term, that the overall probability of that array would be better.  He then provided some initial calculations which he had performed which took the cluster analyses into account, changing some of the negative matrix R-values to positive ones.  He proceeded to show that when this is done, even when the factors that I brought up were taken into account (and he stated that he will take these factors into account in the future), that the overall probability for the array is even better than what he had presented.  He chose not to make the results of the better calculation public until he had ironed out all the details of how to perform that type of analysis.  I commend him in this decision.  Thus rather than his results being overstated, as I initially thought, he showed that they were actually understated.  [4/22/2000 postscript: Roy posted a Part 4 dated April 20 in which he incorporates the two factors I suggested: (1) dividing the overall result by either the expected or actual number of occurrences of the central term in the -d to +d skip distance range in the search text, and (2) dividing the overall result by the row split number.  Part 4 also incorporates the calculations he sent me by e-mail.] 

Having now read Roy's paper, I will have to examine the method of statistical analysis I have been working on to see if it is possible to incorporate matrix R-values into the protocol.  If it is, I hope to build on his and Kevin Acres' work.  In the meantime, I recommend Roy's 3-part paper to the reader as a good discussion of the issue of statistical analysis of the Bible codes.

Return to Feature Articles

Return to The Bible Codes