A Protocol For The Statistical Analysis Of Bible Code Arrays
Part 3

Author: Keith York

This article is property of the author and may not be reprinted or distributed without permission

April 22, 2000

 

A Protocol For The Statistical Analysis Of Bible Code Arrays Part 1 set forth the need for a new method of statistical analysis of Bible code arrays and then described a new protocol for this purpose.  Part 2 illustrated the protocol by performing a statistical analysis on two different arrays.  Part 3 now offers some guidelines for critical judgment in use not only of this protocol, but also when examining any Bible code findings.

The Word List

The place to start when judging a Bible code array is the word list, or listing of terms given in the array's report.  A Bible code array is supposed to be a list of related words about a given subject (person, event, or topic).  Ask yourself how well the words relate together in describing the given subject.  Take, for example, the array concerning last year's impeachment proceedings against President Clinton.  The terms found answer the fundamental questions.  Who? Clinton.  What? Impeachment (remember that impeachment is a process, rather than a result).  Where? USA Senate.  When? (Hebrew year) 5759.

Could there have been other relevant terms that one might have potentially found in the array?  Sure.  'President' is one.  'Acquittal' is another.  They were not found.  Should the fact that they were not found cause you to reject an analysis of the words that were found?  No, as long as the words that were found are strongly and definitely related to the subject at hand.  Some contend that a valid statistical analysis can only be done on an a priori word list, such as name of rabbi paired with date of rabbi's birth or death.  Though important studies can be done on a priori word lists, it is my belief (as well as that of others in the field) that restricting study to only a priori word lists would cause much of what God has encoded in the Bible to go unfound and unanalyzed.  However, since many arrays that are presented are not the results of formal experiments involving a priori word list choices, critical judgment and common sense are needed in examining these findings.  This is what I hope to convey in this section of this article.

A similar examination of the second array in Part 2 yields a comparable conclusion.  The words 'Germany', 'Hitler', 'Nazi', 'death', and 'in 5705' are all strongly and definitely related to the subject at hand.  Let's say, though, that you are surfing the Web and find an array published that has the following terms: 'Adolph', 'mustache', 'brown', 'froth', and 'suicide'.  In this purely hypothetical example, the writer offers the following explanation of the "significance" of the array.  Adolph Hitler was well known for his mustache, he usually dressed in brown, he worked himself up so much during his speeches that he frothed at the mouth, and he committed suicide at the end of World War Two.  Should you take that (hypothetical) array and explanation seriously?  No, even if he uses this protocol to "prove" that the odds of this array being by chance are 100,000 to 1.  The words are only weakly and tenuously related together.

Having said all this, it must be pointed out that judging how strongly and definitely (or weakly and tenuously) related a group of words are to each other is a somewhat subjective exercise.  One cannot come up with a mathematical number or formula to calculate how strongly and definitely related words are to each other, but common sense tells you that the first list in the preceding paragraph is much better than the second list.  Thus guideline #1 for critical judgment in examining Bible code findings is: Use common sense and critical judgment when examining the word list for a Bible code array.  Are the words strongly and definitely related to the subject of the array, or are they weakly and tenuously related?  Do the words seem to be natural choices in describing the subject or do they seem unnatural, forced, vague, or of only minor relevance?

Examine Both The Whole And The Parts

A well-written statistical analysis will give the reader enough data to verify the author's calculations.  With that in mind, an important thing to keep in mind is that not only is the final probability (or odds) of the whole array important, but so is the probability of each element.

Remember that any R-value is log (1/Expected) or if E = expected number of occurrences, R-value = log (1/E).  By taking the antilog of an R-value, one can quickly calculate the expected number of occurrences of a term.  Say that R = 0.300.  Antilog (0.300) = 1.995 = 1/E.  Therefore, E = 1/1.995 = 0.5012.  Now for any value E, one can calculate from the Poisson distribution that the probability that there will be zero occurrences is e^-E, where e is Euler's number (approx. 2.7182818) and "^-E" means to the -E power.  In the above example, e^-0.5012 = 0.6058, meaning there is a 60.58% probability that there will be zero occurrences.  Subtracting this from 100% means that there is a 39.42% probability of there being one or more occurrences.  (Remember that even though E may be a fractional number, actual occurrences can only be integers.  An ELS may occur zero or one times, for instance, but it will not actually occur 0.300 times.)  This means that for an occurrence of an ELS with a matrix R-value of 0.300 there is a 39.42% probability that it will occur in the matrix simply by mere chance.  (Likewise, if R(A') = 0.300, there is a 39.42% probability that it will occur in the rectangle A' simply by mere chance.)

If a report for an array includes matrix R-values or R(A') values that are negative numbers, this is the same as saying that those ELS's are expected to occur at least once in the matrix simply by mere chance.  (In this case, you should examine the list of words whose R > 0 and ask yourself if only these words were considered with the central term, would the word list be judged to be strongly and definitely related?)  However, it can also be the case that ELS's having positive matrix R-values or R(A') values have a substantial probability of occurring in the matrix simply by mere chance.  This can happen for positive but low values of R, as shown below.

R = 0.200 is equivalent to a 46.79% chance of occurring at least once in a matrix
R = 0.400 is equivalent to a 32.84% chance of occurring at least once in a matrix
R = 0.600 is equivalent to a 22.21% chance of occurring at least once in a matrix
R = 0.800 is equivalent to a 14.66% chance of occurring at least once in a matrix
R = 1.000 is equivalent to a   9.52% chance of occurring at least once in a matrix
R = 1.500 is equivalent to a   3.11% chance of occurring at least once in a matrix
R = 2.000 is equivalent to a   0.99% chance of occurring at least once in a matrix
R = 2.500 is equivalent to a    0.32% chance of occurring at least once in a matrix

What can be seen is that the larger R is, the more unlikely an ELS is to be found in the array by mere chance.  Not only does this affect the overall odds for a matrix, it also affects how willing one should be to accept a particular ELS as being encoded rather than a random word pattern.  If the reader likes, you can always set a cutoff criterion for R-values.  In other words, unless R exceeds the cutoff value, the likelihood of it being a random word pattern is too great for the reader to feel comfortable accepting the ELS as a valid code.  (This cutoff would not affect R0 for the central term.  Remember that it is a search text R-value and not a matrix R-value.  Also, if the central term were deleted, one could not run any analyses on the other terms.)

At what level should that cutoff be set?  That's up to the reader.  The default cutoff of the protocol is R = 0, where any ELS's with R(A') > 0 are considered to be true codes and the overall odds are calculated accordingly.  If the cutoff value is set at some higher number r (typically somewhere between 0 and 1), then R(sum) would be calculated with only those ELS's whose R(A') > r and the overall odds might be different.  In this case, you should examine the word list of those ELS's with R(A') > r and ask yourself if only these words were considered with the central term, would the word list be judged to be strongly and definitely related?  If the answer is yes, calculate the overall odds and see how it compares to the overall odds from a lower cutoff value.  Thus guideline #2 for critical judgment in examining Bible code findings is: Look not only at the overall probability for the whole array, but also at the probability for individual ELS's.  If R-values are low, then consider the possibility that those ELS's may be there by mere chance.  If need be, re-examine the remaining word list using guideline #1 and recalculate the overall odds using the chosen cutoff value for R.

How does guideline #2 affect the two test cases of Part 2?  In the first array, each R(A') was greater than 1.  Thus as long as the cutoff value of R was less than one, the overall array and odds would be unchanged.  The lowest R(A') of the second array was 0.963, which is equivalent to a 10.32% probability of occurring at least once by mere chance.  As long as that probability of mere chance is acceptable to the reader, the overall array and odds would be unchanged.

Let's say though that the reader is very skeptical of the codes and sets a cutoff value of R = 2.  In the first array, the message would become ambiguous: 'Clinton', '5759', and 'Senate'.  It could refer to the impeachment hearings, but since 'impeachment' has been considered suspect, the three remaining terms no longer make a good array together.  In the second array, the three terms remaining would be 'Germany', 'Hitler', and 'Nazi'.  These are still strongly and definitely related to each other.  What are the overall odds for these three ELS's in this array?

R(M) = R(sum) + R0 = 2.974 + 2.108 - 0.824 = 4.258.

Antilog [R(M)]/(Sdif)(Smax) = 18,113/(1)(1) = 18,100 to three significant digits.

In other words, the odds are quite a bit lower than 1,670,000 to one, but still quite significant.  Given that 'Germany', 'Hitler', and 'Nazi' are so strongly and definitely related together, who could argue that an array with odds of 18,100 to one were just due to chance?  One could set a cutoff value of R > 3 and cause the array to "fail", but this would be equivalent to saying that you are rejecting all ELS's unless their probabilities of occurring at least once in an array were 0.10% or less.  To me, this would not be a reasonable cutoff criterion, but rather evidence of a bias against the existence of the codes.

In conclusion, it is acceptable for the reader to set a cutoff point (possibly between 0 and 1) for R, and elements of some arrays will survive even high cutoff points.  However, setting a cutoff point that is too high is not acceptable, but rather evidence of a bias against the existence of the codes.

Clusters Within Arrays

The protocol as I have written it analyzes each term in relation to the central term of an array.  Thus it is not set up to allow analyses of clusters within arrays.  This is not to say that such clusters do not exist, only that my method is unsuitable for analyzing them.  Roy Reinhold has developed such methods for statistically analyzing clusters within arrays and has invited Ed Sherman and Dave Swaney (two Bible codes proponents) and Randall Ingermanson (a Bible code opponent) to examine his techniques and calculations.  I look forward to hearing what they decide.

Return to Part 1

Return to Part 2

Return to Feature Articles

Return to The Bible Codes