The Duplication Of Bible Code Arrays Using A Control Text: A Formal Experiment

Author: Keith York

Introduction:

The paper "Statistics and the Bible Codes" makes the following statement. "Should statistics be used to determine if the codes are significant? MBBK and WRR will continue to argue about this for a long time. Nobody has proven either way yet. It is our contention that statistics is not the best method. There is a much easier method, based on the following statement which nobody can disagree with: If the codes found in the Bible are by chance, then similar codes should be found in any text similar in length. Anyone can be just as scientific as MBBK and WRR and use the Scientific Method. Set up a 'control' to compare the Hebrew scriptures to. Make a list of 15-20 terms that describe a certain event. Search for all of those in the Tanach (Hebrew Old Testament), and then use a control text similar in length, like 'Moby Dick'. See what you'll find. You'll probably find every term in both texts, but you'll only find a lot of them together in the Tanach. Of course, this is only for people with Bible Codes software. For control experiments, CodeFinder is the best program." (Note: WRR refers to researchers Witztum, Rips, and Rosenberg. MBBK refers to researchers McKay, Bar-Natan, Bar-Hillel, and Kalai.)

The above statement provided the inspiration to do just that in a formal experiment. Rather than create a list of terms just for this experiment, though, it was decided to use an already-published set of arrays, namely the arrays presented in the articles in the August 1999 News section of thebiblecodes.com (http://thebiblecodes.com/news/august). Each matrix is identified by a matrix number (T#), date published on the site and subject. They are as follows: T1, 08/04/1999, 'Mark Barton'; T2, 08/08/1999, 'Rachel Joy Scott'; T3, 08/12/1999, 'Eclipse'; T4, 08/12/1999, 'Hitler and Germany'; T5, 08/23/1999, 'Earthquake in Turkey'; T6, 08/24/1999, 'Earthquakes'; and T7, 08/31/1999, 'Lady Diana'. Rather than reprint those matrices here, the reader is referred to the link to print and read the originals. These are the test arrays (i.e., arrays found in the Tanach which are being tested to see if similar results can be produced from a control text), hence the T in the labels. The corresponding control arrays (i.e., similar matrices produced from a control text) are labeled C1 through C7.

The control text used for the experiment is the TorahControlText included with CodeFinder version 1.20 (first included in version 1.19). The Settings for CodeFinder include two relevant parameters the values of which needed to be decided before each control text search.. First is the Rows and Columns. Each test array has r rows and c columns. If r > c, then the value for Rows and Columns for the control text search was set at 2r. If c > r, then the value for Rows and Columns for the control text search was set at 2c. (As an example, say that the test array is 38 columns by 46 rows. In this case, the control text setting would be to search for matrices that are 92 columns by 92 rows.) Second is the skip distance search range. Determine the central term of the test array and its absolute skip distance (which will be called n). The skip distance range for the control text is set at -10n to 10n. For instance, if the central term has a skip distance of -817, its absolute skip distance is n = 817. Thus the control text setting would be to search from -8170 to 8170 skip distance. One then attempts to construct a control array containing as many of the terms in the test array as possible while adhering to the above restrictions.

The array construction technique is now briefly described. First, a "good" control array is searched for containing ONLY the terms in the test array which are at least 5 letters in length. One seeks to optimize the following characteristics: (1) a short skip distance of the central term; (2) a small area (rows X columns); (3) geometrical compactness of the terms themselves [small (delta x)-squared + (delta y)-squared]. This is in accordance with accepted principles of codes research. After this initial array is found, then shorter terms (those with 3 or 4 letters) are searched for one by one. In this phase of the array construction, one seeks to keep the total area of the control array as small as possible while secondarily keeping these terms geometrically compact as well. Note that 2-letter ELS's are not searched for. Since they are so frequent in their occurrence, their presence is assumed in any decent-sized array. (Note: Since T2, the 'Rachel Joy Scott' array had only two terms that were 5 letters in length, the initial search was performed with those two terms plus four terms that were each 4 letters in length.)

When the control array is finished, it is compared to the test array using the following ratios. (R is for "ratio".)

R(F) = F(C)/F(T), where F(T) is the total number of terms in the test array and F(C) is the total number of these terms found in the control array. Note that F is for "fraction".

R(A) = A(C)/A(T), where A(T) is the total area in rows X columns of the test array and A(C) is the total area in rows X columns of the control array. Note that A is for "area".

R(S) = S(C)/S'(T), where S(T) is the absolute skip distance of the central term in the test array. S'(T) [read "S-prime of T"] is S(T) as modified by the length of the test text compared to the TorahControlText. If the test array occurs completely within the Torah, then the Torah is the relevant text; otherwise it is the Tanach. Since the Tanach is 3.927 times the length of the Torah, the absolute skip distance of the central term in a Tanach test array is multiplied by 3.927. If the Torah is the test text, then S'(T) = S(T). S(C) is the skip distance rank of the central term in the control array. S'(T) is used rather than absolute skip distance for the following reason. The Tanach is 3.927 times the length of the Torah (which is the same length as the ControlTorahText). For a given skip distance range -n to +n, there is likely to be approximately 3.927 times as many occurrences in the Tanach as in the Torah. S'(T) corrects for the differing lengths of the Torah and Tanach.

R(Z) = Z(C)/Z(T), where Z(T) is the total sum of (delta z)-squared = (delta x)-squared + (delta y)-squared for all terms in the test array that are at least five letters in length and are also found in the control array. Z(C) is the total sum of (delta z)-squared = (delta x)-squared + (delta y)-squared for all terms in the control array that are at least five letters in length and are also found in the test array. Note that Z is for "z-squared". (This is simply an application of the Pythagorean theorem, which states that for a right triangle the square of the hypotenuse is the sum of the squares of the two sides. In the case of a Bible code array delta z is the distance between two letters in an ELS.)

Q = R(A)*R(S)*R(Z)/R(F). Note that Q is for "quantitative comparison".

Q is the final numerical score comparing the test array with the control array. If Q < 1, then the test array is judged to be a random word pattern since an array using the same terms can be found in the TorahControlText that is at least as "good" as the test array. If Q is much greater than 1, then the test array is judged to be a valid encoded array. If Q is moderately greater than 1, then the validity of the test array is questionable. The reasoning behind this is is explained below.

If R(A) < 1, then the area of the control array is smaller than the area of the test array. Valid Bible code arrays tend to have their terms clustered into a smaller area than could be expected to happen by chance. Thus the control array being smaller in area than the test array is evidence in favor of the test array being a random word pattern rather than a valid encoded array.

If R(S) < 1, then the central term of the control array is more near-minimal than the central term of the test array. Valid Bible code arrays tend toward near-minimality for their central term's skip distance. (Note that near-minimality will vary from word to word depending upon the number of letters in an ELS and the letter frequency of those letters.) Thus the central term of the control array being more near-minimal than the central term of the test array is evidence in favor of the test array being a random word pattern rather than a valid encoded array.

If R(Z) < 1, then the longer terms (i.e., those with at least 5 letters) of the control array are on the whole more geometrically compact than those same terms in the test array. Valid Bible code arrays tend toward having their more important terms more geometrically compact than one would expect by chance. Thus the longer terms of the control array being more geometrically compact than in the test array is evidence in favor of the test array being a random word pattern rather than a valid encoded array.

R(F) < 1 has the opposite effect as described above and that is why it is in the denominator of the calculation for Q. If R(F) < 1, then not all of the terms found in the test array can be duplicated in the control array given the constraints on possible area and skip distance search range described above. The fewer terms than can be found in the control array, the smaller R(F) will be and thus the larger Q will be.

Since each of the above describe trends within the Bible code phenomenon as seen in the arrays presented at thebiblecodes.com, only by utilizing a calculation that takes into account all of these factors can we determine whether any given array is a valid encoded array according to this paradigm.

Note that "according to this paradigm" is a very important qualifier. The original work by Witztum, Rips, and Rosenberg (WRR) only looked at word pairings where (1) each ELS was near-minimal in skip distance, and (2) each ELS was both geometrically compact and in close proximity to each other in a two-dimensional array. This is a valid way of looking at the Bible codes. Some impressive arrays utilizing this approach are showcased in chapters 10,11 (pp. 155-189) of Jeffrey Satinover's Cracking The Bible Code, 1997, William Morrow and Company, Inc. Another approach to the Bible codes is presented at http://www.integrityonline30.com/theprophetspage with an accompanying statistical method. The details of this approach to the Bible codes can be read at that site. I believe that each of the three are valid paradigms. Note, however, that an array derived using one of these three paradigms should be analyzed according to its corresponding method of statistical analysis. Using a method of statistical analysis on a code array for which it was not designed may produce bogus results.

Results:

The results for the seven test arrays (T1 through T7) and the corresponding control arrays (C1 through C7) are tabulated below. As a reminder the publication dates and subjects are listed again here.

T1 08/04/1999 'Mark Barton'; T2 08/08/1999 'Rachel Joy Scott'; T3 08/12/1999 'Eclipse'; T4 08/12/1999 'Hitler and Germany'; T5 08/23/1999 'Earthquake in Turkey'; T6 08/24/1999 'Earthquakes'; T7 08/31/1999 'Lady Diana'.

Notes:

(1) The 08/12/1999 article "Code Finding about the Sun Eclipse" contained two arrays. Since no other article from August 1999 contained more than one array, only the first of the two arrays in this article was analyzed.

(2) T5 'Earthquake in Turkey' has three terms which were initially used in the control array search: 'Earthquake' (EODA ZSY), 'in Turkey' (EJXYIB), and 'Izmit' (IJOGA). Note that NO control array was found with all three of these terms in a matrix of 118 rows X 118 columns. Thus the term 'Turkey' (EJXYI), a 5-letter ELS was substituted for the 6-letter ELS 'in Turkey' to create the control array. 'Turkey' in C5 did not count toward F(C), the total number of terms from T5 found in the control array. However, the (delta z)-squared value of 'in Turkey' was applied to Z(T) and the (delta z)-squared value of 'Turkey' was applied to Z(C).

(3) T6 'Earthquakes' did not have a clear central term. Possible candidates included 'Venus' at 130 skip distance, 'occultation' at -42 skip distance, and '5 Tishri' at 89 skip distance. 'Venus' was chosen because it was closest to vertical of the three and, being at the highest skip distance, it was the most conservative choice.

Q is calculated for each test array below and reported to three significant digits. As a reminder, Q = [A(C)/A(T)]*[S(C)/S'(T)]*[Z(C)/Z(T)]/[F(C)/F(T)].

T1 'Mark Barton': Q = 0.114

T2 'Rachel Joy Scott': Q = 0.0257

T3 'Eclipse': 40.9

T4 'Hitler and Germany': 36.3

T5 'Earthquake in Turkey': 5.93

T6 'Earthquakes': 68.6

T7 'Lady Diana': 0.406

Conclusions:

By the Q test described above, four of the test arrays were found to be valid encoded arrays and three were found to be random word patterns.

The best-scoring array was found to be T6 from the article "Code Findings about Recent Earthquakes" published August 24, 1999. In the article (written by the Webmaster of thebiblecodes.com) it is mentioned that Kevin Acres, who discovered the array, felt that a significant earthquake would occur on the date of 5 Tishri (corresponding to September 15, 1999) since that date was included as an ELS in the array. Interestingly, an earthquake did occur on that date near Tarija, Bolivia. Even more interestingly, a subsequent search by Kevin Acres found an ELS for 'Tarija' in the array at its minimum skip distance occurrence in the whole Torah. (Details are in the September 22, 1999 article "Was the Sept. 15 Earthquake Predicted by the Bible Codes?" at http://thebiblecodes.com/news/september.)

The second-best-scoring test array (T3) is the most compact at 340 characters. Its control array required 5.39 times that area (1833 characters) to enclose all of its eight characters. The third-best-scoring (T4) is also quite compact and is significant for being in the book of Isaiah rather than in the Torah. Much of the early statistical work done in the Bible codes focused solely on the Torah, but here is an analysis of a code finding outside the Torah that is shown to be statistically significant. The fourth-best-scoring test array (T5) had a moderate Q score compared to the other three, yet I still believe it is a valid encoded array for the reasons described in Note 3 underneath the above Table.

T6 and the other two most significant arrays by the Q test have two things in common: (1) They are the three most compact in area, and (2) they have the fewest terms. This is in contrast to the three arrays where Q < 1. Those three have the most terms of the seven test arrays and are also quite large in area. Though I would not want to over-generalize from a limited data set, these findings suggest to me that the goal of Bible codes research should not be to find huge arrays with dozens of terms describing events in minute detail. Rather the goal should be to find compact arrays with sufficient detail (fewer than 20 terms in most cases) to show that the event was foreseen by God and encoded in the Bible. Many of the terms that are often included in a huge array with dozens of terms are short words that can be expected to occur in any decent-sized array. If these terms sufficiently enlarge the area of the array to an extent where a control array of comparable size can be found, then the purpose of the array is defeated. In other words, some huge arrays of dozens of terms may include much more compact arrays which have been cluttered by several extraneous terms.

The final conclusion from this experiment is that statistical analysis is a valuable aspect of Bible codes research. This is not to say that no Bible code array should ever be published unless statistical analysis upon it has been performed. Statistical analysis is by its very nature a tedious exercise. A well-described methodology can serve as a potential check upon published arrays (for any reader who desires to do so) even if not every published array is subjected to such analysis. Furthermore, the results presented in this paper may give readers and researchers a better "feel" for what is a valid encoded array and what is not, which would hopefully be of service to others in the future doing codes research. Finally, it is hoped that the methodology described in this paper can serve as a supplement to the methodology developed by Witztum, Rips, and Rosenberg (WRR). WRR have developed a quite rigorous method of statistical analysis, but the math involved and the software required to generate multiple copies of randomized Torahs places the use of their methodology outside the realm of all but a few specialists. Thus when sites such as this one and others have been criticized for omitting statistical analysis, it can be replied that it is because no method of analysis has been yet widely introduced that can be used by more than a handful of people. Again, that which is described in this paper is not intended as a replacement for their methodology, for as it was noted, a number of valid arrays have been found according to their paradigm of only looking in the Torah at only near-minimal skip distance ELS's. However, we at this site believe that their view of the Bible codes is only one aspect of a larger phenomenon. By showing that there exist arrays which cannot be adequately reproduced in all important factors (such as compact array area, near-minimality of the central term's skip distance, and geometrical compactness of the longer terms) in a control text, we hope to reinforce that point.

Appendix:

The raw data of this experiment (the actual control arrays with their lists of terms and skip distances) is included in the following links. Note that the lists of Hebrew terms do not include English translations since these words were entered into CodeFinder directly as Hebrew letters. All of the test arrays' original articles (except for T2) include English translations of the Hebrew terms, so the reader is directed to those articles for the translations. For T2, a list of English translations for the Hebrew terms is included. (Thanks to J.W. Embry for providing me with this list.) All control arrays and reports were compiled using CodeFinder version 1.20, a software program highly recommended for those seeking to do serious Bible codes research.

The original article containing test array T1 'Mark Barton' can be seen here.

To see control array C1 click here. To see the report for C1 click here.

The original article containing test array T2 'Rachel Joy Scott' can be seen here.

To see the English translations of Hebrew words in T2 click here.

To see control array C2 click here. To see the report for C2 click here.

The original article containing test array T3 'Eclipse' can be seen here.

To see control array C3 click here. To see the report for C3 click here.

The original article containing test array T4 'Hitler and Germany' can be seen here.

To see control array C4 click here. To see the report for C4 click here.

The original article containing test array T5 'Earthquakes in Turkey' can be seen here.

To see control array C5 click here. To see the report for C5 click here.

The original article containing test array T6 'Earthquakes' can be seen here.

To see control array C6 click here. To see the report for C6 click here.

The original article containing test array T7 'Lady Diana' can be seen here.

To see control array C7 click here. To see the report for C7 click here.


Back to: Feature Articles