Part 3: Future Statistics Solutions for Code Programs
by Roy A. Reinhold April 6, 2000

Calculating the probability of clusters in a matrix, by taking into account proximity or distance, is not an easy solution. Doron Witzum and Dr. Eliyahu Rips applied distance between words in a word pair in their original published Great Rabbis' experiment. However, that method is applied only to word pairs, but is much tougher to apply to many terms in a cluster within a matrix. Doron Witzum applied distance between related terms, by using slant distance as a means of calculating the probability of the word pair along with frequency (expected occurrence).

Dr. Robert Haralick, a professor at the University of Washington, has proposed a method for calculating the probability of a cluster by creating a smaller boundary box within the larger matrix, that would encompass or enclose all the terms in a cluster. This method would be fairly straightforward in application to codes software programs, but may have some shortcomings. For example, let's say we have two large terms in a matrix that cross. However, one term is vertical and takes up 2/3 of the vertical height of the matrix, while the other term is horizontal and takes up 2/3 of the width of the larger matrix. Therefore, a boundary box encompassing these two terms would be quite large. At the same time, the slant distance between them is "0.0" since they cross. It would seem to me that the most important aspect of the two related terms is that they cross. The boundary box enclosing them might not reflect the fact that related terms cross.

Another proposed method for calculating all the terms in the matrix would measure slant distance from the main term or center term in the matrix. In application, since all terms are related to the center term, then the center term and each term in the matrix would be calculated as word pairs. The apparent shortcoming of this proposed method is that a cluster in the matrix is related to the center term, but the best aspect is the close proximity of all the terms in the cluster, where the terms cross or are closely parallel. You can see these relationships between terms in a cluster in the Sid Roth life matrix on this website. In that matrix, there are 73 terms and 8 large clusters. Some of the clusters have many terms and are quite significant. The significance of the cluster is the quantity of terms in that cluster, and that terms cross or are closely parallel in a small area. However, the cluster while greatly significant, may not be close to the center term of the overall matrix. For this reason, calculating cluster probability using the main/center term of the matrix is probably not an accurate method in order to reach an accurate overall probability for the matrix.

You might guess then, that we could designate a pivot term for each cluster and then calculate the probability for each of the other terms in the cluster as a word pair from the pivot term. That might be a good method for calculating the probability of clusters. It certainly is within the capabilities of software programmers to incorporate it into commercial code software programs. A shortcoming might be that if the software user designates the pivot term for that cluster and identifies to the software all the other terms in the cluster, then the results are very subjective. The software user may make a poor choice of the cluster pivot term, giving less than optimal probability calculations.

Another pie-in-the-sky solution would be identify all the terms in a single cluster to the software program, and then let the software calculate an optimal artificial point that gives the least slant distance to all terms in the cluster. Then the software would calculate the probability for each term based on the slant distance from the optimal artificial point and take into account frequency (expected occurrence) and arrive at an accurate solution for the probability of a cluster. At least this method would account for the close proximity of terms in a cluster and avoid the extremes that could arise if a boundary box method was used to enclose the terms in a matrix cluster.

In building a complete method to accurately calculate the probability of a matrix with many terms, we must recognize that some terms will be assigned to individual clusters, while others are in the matrix because of their overall relation to the main/center term of the matrix. In those cases, we could determine the worth to the matrix based solely on frequency and not incorporate distance or proximity for them. If that were the case, then an overall method for calculating the probability of a matrix to a single number might look like this:

matrix probability = (probability from sum of positive matrix R-values) x (overall probablity calculated for all clusters after calculating the probability for each cluster)

Of course, we would not want to double count, so if the probability of a term is calculated by its being in a cluster, then we would not include the term when summing positive R-values in the matrix. This proposed piecemeal approach calculates the probability for each term in the matrix, but some terms are calculated by frequency using R-value, while other terms are calculated using the cluster method. One problem arises, where a single term in the matrix may be part of 2 or more clusters. How do we account for a term like that?

If the above method were possible and it accurately calculated the probability, then it could be incorporated in commercial code programs. The result would be to arrive at a single accurate measure of probability for a matrix, that anyone could understand and relate to their own worldview. Average people want to be able to look at a Bible code matrix and see an overall theme, they want to examine the terms in the matrix, and they want a single number for the probability of the matrix stated as for example, 1 in 2000 chance of occurrence.

As a reminder for current users of commerical code software programs, an accurate method for calculating the probability of a large matrix with many terms has not yet been invented. When it has been determined, then commercial code software programs will certainly build it into their program.

Go to Part 4: the full statistical calculation for the Sid Roth Life matrix

Go Back to Articles page