Having looked at the possible identities of the annotators, it's time to look at the distribution of the marks they made in the Folger's Geneva Bible. Oxfordian hopes rest on the discovery of pronounced similarities between what is marked and what is referenced in the Bible. Since the dissertation committee at Amherst administered their benison to Stritmatter's thesis, most Oxfordians (though not all, by any means) have been taking the connection between the annotators and the playwright for granted.

It is not, on the face of it, an unreasonable hope.

In the age of big data, comparing one large dataset to another ought to be able to reveal any strong indications that the annotators and the playwright have close ties. There is no problem with the location of the marks. The Folger has them all available online.

The problem lies in defining the nature of references. Their numbers cannot be derived programmatically from online texts. The Bible verses do not sufficiently resemble the references in the plays for computerised extraction. Instead, we have worked with lists extracted by scholars, a single scholar in our case—Naseeb Shaheen. The only calculated field in the dataset extracts matches where a reference occurs within the range of a marked passage. Even there, a little manual help was required.

Depressingly for Oxfordians, however one sets about setting up the data for comparison, the results are the same. The data shows nothing suggestive, let alone convincing, to justify confidence in the playwright's ownership of the Bible.

The Section Mismatch

Looking first at the sections highlights an immediate and large mismatch is instantly apparent. The Geneva Bible has three sections, The Old Testament, The Apocrypha and The New Testament.

Sections

Almost half the playwright's references are from the New Testament in which the annotators make only 12% of their marks. The playwright doesn't much care for The Apocrypha. In fact, it would not be unreasonable to assume the playwright's bible did not contain the Apocrypha. With only 1 in 33 references coming from the disputed Books, he may well have heard all of his Apocryphal verses in church. Yet the total of Apocryphal annotations is only just shy of 1 in 4.

The New Testament is another instant mismatch. Not only is the playwright is eight times less likely to reference The Apocrypha, he is four times more likely to reference the New Testament.

It's an immediate wet blanket flung over the idea that we are looking at Shakespeare's marks. Like finding an ex-Army gumboot when you're hoping for Cinderella's slipper. If we're talking doughnuts, and for a second or two we are, the second doughnut chart shows the relative interest in the different books of the Bible. It's a patternless quilt, a printless foot on the sand. There isn't the smallest correspondence apart from their mutual interest in Revelations, represented by the pale pink sectors in the doughnut. Even that will turn out to be something of a disappointment for the faithful. As we shall see.

Verse and worse

Having fallen at the first hurdle, the thesis continues to fall at all of the others. Looking at the frequency of marks and references at the book level tends to exacerbate the lack of correspondence between their interests. Drill down, and the mismatch becomes greater and greater. And more and more obvious.

These two charts present different views of the same data - a two-way table comparing of the number of references and number of marks in each of the books of the Bible. The first is a simple histogram showing numeric values, the second shows each column with the number of marks is expressed as a percentage of the total for each book. This has the advantage of showing the relationship more clearly on the less well-marked books but the disadvantage that each column is not to scale with its neighbours.

All Marks

All Marks

 

Again, right from the off, there is a very large mismatch. The marked and referenced books are not suggestive of coincidence. The tall purple columns, representing the books containing the largest number of references, have little relation to the taller of the red columns recording the largest number of marks. There are some columns where the marks are almost equal. Though you might expect it, these equally matched books do not necessarily show where the annotators' and playwright's interest combine. They turn out to be better indicators of where Stritmatter has been hardest at work with his additions.

Book Unmatched references Close but no match Match to Bible Mark Match added by Stritmatter
Genesis 191 1 0 0
Exodus 103 0 5 5
Leviticus 34 0 1 1
Numbers 21 0 0 0
Deuteronomy 104 3 1 7
Numbers 1 0 0 0
Joshua 19 0 0 0
Judges 23 0 0 0
Ruth 2 0 0 0
  498 4 7 13

The first nine books include three of the playwright's favourites, yet are almost mutually exclusive with the annotators' interest. Another gumboot fished out of the canal. And a wheel from a supermarket trolley—whilst Shakespeare and the annotators make a very poor match, the Bard's interests actually line up quite well with another Bankside playwright, Christopher Marlowe.

Nine books with almost 500 references and they contain only 20 matches. 96% of what interests the playwright is ignored by the annotators. Over two thirds of these matched references have been added by Stritmatter. Without the additions the ratio of marked to unmarked references is 1:71. Only Numbers is favoured by the annotators, the other eight books are largely ignored. As anyone who has read it might expect, Numbers is ignored by the playwright.

This is a very, very bad start to the quest for a convincing identification based on probability.

The first book that draws approximately equal interest is the tenth, 1 Samuel.

In a book of equal interest, we should expect to find a corresponding interest demonstrated by the selection of similar chapter and verse.

Here's the chart again.

1Samuel

We can see that down at the lowest granular level, in this book, the idea of overlapping interests is still untenable.

1 Samuel has the 10th highest wordcount in The Bible at 20837. It has 31 chapters and 810 verses, an average of 26.13 verses per chapter. What makes it remarkable in this study is the very high number of annotators' marks. It is one of only two books with more marks (80) than references (72). To these, Stritmatter has added another 15 choosing 6 for his Diagnostics (although only three fit his parameters) while he ignores two more, including the most referenced, 25.39, part of a tight cluster of 13 referenced verses.

This table shows the 11 verses in 1 Samuel which have more than one Shaheen reference.

Chapter Folger Bible Mark Shaheen Bible Reference Stritmatter Additional Reference Direct Diagnostic Indirect Diagnostic
25:39 0 8 0 0 0
10:1 0 6 0 1 0
24:11 1 6 0 1 0
26:11 1 5 0 0 0
16:7 1 3 0 1 0
7:9 1 2 0 0 0
14:39 1 2 0 0 0
14:45 0 2 0 0 0
17:4 0 2 0 0 0
17:7 0 2 0 0 0
26:9 0 2 0 0 0

1 Samuel's total of 80 is the largest number of annotators' marks in a single book, the next being Ecclesiasticus with 55. The book's 72 Shakespeare references represent the 15th highest number. Yet despite being retrofitted with the second highest number of Stritmatter additions, the individual marks and references in 1 Samuel still tend towards exclusive rather than overlapping. 11 verses have more than one reference of which only five are marked yet only three of these count as Stritmatter's Direct Diagnostics.

With two datasets of this size, in the densest single set, we ought to be looking for a very strong overlap here to indicate the existence of non-random forces.

Of the 72 references, only 7 (1:10) are matches before the additions. After the additions there is a still far from convincing 22 (1:4). Four verses are marked by two different annotators yet despite their extra emphasis, they are all ignored by the playwright.

Under optimal conditions for Stritmatter's thesis, with every allowance made in his favour, the best possible overlap produces only 1 in 4 hits.

There are books which have fewer marks and one or two with slightly better overlaps than 1 Samuel, but the majority exhibit even stronger mutually exclusive patterns or no matches at all.

The clusters are also revealing here. There are four tight clusters of marks (possibly four of our red-underlining vicar's sermons?). Only two of these contain matches. There are six clusters of references none of which contain marks. One of Stritmatter's strongest claims is that the more frequently a passage is referenced in the plays, the more likely it is to be marked. This chart, made from the annotator's most popular chapter, demonstrates almost no support for the claim. In fact, since we can easily make our own 'Diagnostics' now, we have looked at the 100 most referenced passages (and the 1000 most referenced) and the tables produce yet more bad news.

Most Popular References Unmarked Marked   Stritmatter Additions   Total  
First 100 594 23 3.87% 13 2.19% 36 6.06%
First 1000 2323 50 2.15% 67 2.88% 117 5.04%
First 2500 3760 81 2.15% 99 2.63% 180 4.79%

 

The fog is unplaited. Contrary to Stritmatter's claim, 96.13% of the playwright's 100 most referenced verses are not marked at all before his additions are made. After the additions, the ratio of matched to unmatched references still refuses to suggest anything approaching a convincing overlap. The very strongest Oxfordian goggles and a large amount of context shearing are necessary before even the feeblest of links can be created between the two datasets.


Venn Diagram

Scaled Venn diagrams are usually excellent ways of showing relative overlaps but apart from the two books which received most of Stritmatter's additions, the diagrams for most books resemble tiny asteroid strikes on gas giant planets.


Far from supporting Stritmatter's central premise, the data supports the opposite statement—that the playwright's references are 25 times more likely to be ignored by the annotators than honoured with a mark.

The books most popular with the playwright are Matthew (442 marks), Psalms (441), Luke (300) and Genesis (192). Genesis has 32000 words, 50 chapters, 1533 verses. The whole of Genesis contains just a single mark, ignored by the playwright. Luke contains only one mark very close to a reference (3.16 and 3.17) but not actually marked. In the playwright's four most popular books, containing 1366 references, there are only 9 matches - a ratio of 152:1. In the playwright's 10 most referenced books, the totals are 2205 and 38, representing a somewhat improved but still highly unhelpful 58:1. 

What tables tell us

A few simple tables provide everything necessary to dispense with the idea that there exists any strong relationship between marks and references. Entirely. 

The first is a summary table, showing matches, book by book, ordered by the number of matches. The books without matches have their own table. Many of these are large books with many references, like Genesis. Yet there are no convincing overlaps to be found anywhere. Nothing amounts to a fraction of the level similarity between Marlowe's references and Shakespeare's. It would be ironic if all this tabulation disposed of one candidate only to strengthen the claims of another. 

The final table shows the 50 verses of the Bible which are most popular with the playwright.

Nothing is left of Stritmatter's claims of coincidence, frequency or favour.

Books with the highest number of matches Unmatched marks and references Close but no match Match Stritmatter added match Total  % Matches with additions % Matches without additions
1 Samuel 130 9 18 16 173 20% 10%
Ecclesiasticus 127 1 1 21 150 15% 1%
2 Samuel 95 1 11 6 113 15% 10%
Revelation 141 4 8 7 160 9% 5%
Matthew 430 1 11 0 442 2% 2%
Exodus 103 0 5 5 113 9% 4%
Ezekiel 77 0 4 6 87 11% 5%
Wisdom 37 0 2 8 47 21% 4%
1 Kings 61 6 7 2 76 12% 9%
Deuteronomy 104 3 1 7 115 7% 1%
Hosea 22 0 1 7 30 27% 3%
Isaiah 138 0 5 2 145 5% 3%
Mark 119 0 6 0 125 5% 5%
2 Corinthians 53 0 1 5 59 10% 2%
Psalms 435 1 5 0 441 1% 1%
1 Peter 42 0 2 3 47 11% 4%
Job 155 1 4 0 160 3% 3%
Romans 119 0 3 1 123 3% 2%
2 Esdras 20 0 0 4 24 17% 0%
1 Chronicles 9 0 4 0 13 31% 31%
Judith 5 0 1 2 8 38% 13%
Jeremiah 57 0 2 0 59 3% 3%
2 Chronicles 40 0 0 2 42 5% 0%
Leviticus 34 0 1 1 36 6% 3%
Philippians 14 1 0 2 17 12% 0%
1 Thessalonians 14 0 1 1 16 13% 6%
Micah 12 0 2 0 14 14% 14%
Luke 299 0 0 1 300 0% 0%
Hebrews 51 1 0 1 53 2% 0%
Ecclesiastes 37 0 0 1 38 3% 0%
Joel 11 0 1 0 12 8% 8%

 

This table lists the books which contain no matches at all. More than half of the Bible. In the table above, Matthew, Psalms, and Luke represent nearly 1200 references, almost exactly one third of Shaheen's list. Yet they contain only 17 matches between them.

 

Books with the highest number of matches Unmatched marks and references Close but no match Match Stritmatter added match Total  % Matches with additions % Matches without additions
Genesis 191 1 0 0 192 0% 0%
Proverbs 98 0 0 0 98 0% 0%
1 Corinthians 92 1 0 0 93 0% 0%
Acts 78 0 0 0 78 0% 0%
Ephesians 71 0 0 0 71 0% 0%
John 64 0 0 0 64 0% 0%
Daniel 30 0 0 0 30 0% 0%
James 27 0 0 0 27 0% 0%
2 Peter 25 0 0 0 25 0% 0%
Judges 23 0 0 0 23 0% 0%
Numbers 22 0 0 0 22 0% 0%
Colossians 21 0 0 0 21 0% 0%
Galatians 20 0 0 0 20 0% 0%
Joshua 19 0 0 0 19 0% 0%
1 Timothy 18 0 0 0 18 0% 0%
Lamentations 18 0 0 0 18 0% 0%
Zechariah 17 0 0 0 17 0% 0%
2 Maccabees 16 0 0 0 16 0% 0%
2 Kings 13 0 0 0 13 0% 0%
Amos 11 0 0 0 11 0% 0%
2 Timothy 9 0 0 0 9 0% 0%
Baruch 9 0 0 0 9 0% 0%
Jonah 9 0 0 0 9 0% 0%
Jude 9 0 0 0 9 0% 0%
Nehemiah 8 1 0 0 9 0% 0%
Titus 9 0 0 0 9 0% 0%
1 Maccabees 8 0 0 0 8 0% 0%
Esther 8 0 0 0 8 0% 0%
SongofSolomon 8 0 0 0 8 0% 0%
Tobit 8 0 0 0 8 0% 0%
1 John 6 0 0 0 6 0% 0%
1 Esdras 5 0 0 0 5 0% 0%
Susanna 5 0 0 0 5 0% 0%
2 Thessalonians 4 0 0 0 4 0% 0%
Malachi 4 0 0 0 4 0% 0%
Nahum 4 0 0 0 4 0% 0%
Habakkuk 3 0 0 0 3 0% 0%
Zephaniah 3 0 0 0 3 0% 0%
Ezra 2 0 0 0 2 0% 0%
Haggai 2 0 0 0 2 0% 0%
Ruth 2 0 0 0 2 0% 0%
Beland 1 0 0 0 1 0% 0%
Obadiah 1 0 0 0 1 0% 0%
Philemon 1 0 0 0 1 0% 0%

The Top 50

The next table represents the playwright's favourite 50 Bible references. None of the Top 20 Bible references are marked in the Folger Geneva. This is extraordinary if the playwright was using it as inspiration or even as an aide memoire. Only three in the Top 50 are marked.

Bible references that are made twice or more in the canon number 681. Only 31 of those 681 references are marked. A ratio of 1:22.

Perhaps we should point out that the 'Diagnostics' Stritmatter chooses for himself do not fairly represent the most popular references, though this was their stated purpose in his analysis. He has ignored over half of our Top Fifty 'Diagnostics' in his calculations and we would be very glad to get to grips with the reason why. Extend the table to the Top 100 references and only 34 qualify as Direct Diagnostics when all of them should make the cut (apart from those which contain only Stritmatter creations). None of the eight Indirect Diagnostics which appear in the Top Fifty are marked.

Again, Stritmatter's purpose in their inclusion, beyond adding a slight sheeny addition to his visibility in the upper reaches of the overall dataset, is opaque unless the matching process of strategic omission, when he comes to drawing conclusions, is deliberate. Which, of course, it is.

His claim that the more a passage is referenced by the playwright, the more likely it is to be marked by the annotators is certainly not supported by the data. None of his claims are supported by the data.

    Annotators Mark Reference by Playwright Stritmatter Direct Diagnostic Stritmatter Indirect Diagnostic
1 Genesis 3.19 0 13 0 1
2 Matthew 18.1 0 12 0 0
3 Matthew 5.44 0 11 1 0
4 1 Peter 3.7 0 10 1 0
5 Genesis 3.24 0 11 0 0
6 Ephesians 6.12 0 9 1 0
7 Hebrews 1.14 0 9 1 0
8 Job 21.26 0 9 1 0
9 Joshua 2.19 0 10 0 0
10 Psalms 7.17(7.16) 0 10 0 0
11 2 Corinthians 11.14 0 8 0 1
12 Psalms 34.7 0 9 0 0
13 Revelation 20.1 0 9 0 0
14 Romans 13.4 0 9 0 0
15 1 Samuel 25.39 0 8 0 0
16 2 Samuel 1.16 0 8 0 0
17 Ecclesiasticus 13.1 0 7 0 1
18 Isaiah 14.12 0 7 1 0
19 James 5.12 0 8 0 0
20 Luke 14.27 0 8 0 0
21 Mark 10.21 1 6 1 0
22 Matthew 4.1 0 8 0 0
23 1 Samuel 10.1 0 6 1 0
24 Isaiah 51.8 0 7 0 0
25 Matthew 16.17 0 6 1 0
26 Matthew 23.23 0 6 1 0
27 Matthew 5.37 0 7 0 0
28 Psalms 140.3 0 6 0 1
29 Revelation 12.9 0 6 0 1
30 2 Samuel 22.5 0 6 0 0
31 Ecclesiastes 3.2 0 6 0 0
32 Genesis 18.16 0 5 1 0
33 Genesis 2.24 0 5 0 1
34 Genesis 4.1 0 6 0 0
35 Job 14.1 1 4 1 0
36 Job 24.2 0 6 0 0
37 Jude 6 0 6 0 0
38 Luke 9.23 0 6 0 0
39 Mark 5.9 0 5 1 0
40 Mark 8.34 0 6 0 0
41 Matthew 10.38 0 6 0 0
42 Matthew 12.24 0 5 1 0
43 Matthew 16.24 0 6 0 0
44 Matthew 16.25 0 5 1 0
45 Matthew 27.24 0 6 0 0
46 Matthew 5.9 0 5 1 0
47 Matthew 7.15 0 5 0 1
48 Psalms 18.3 0 5 0 1
49 Revelation 21.8 0 5 1 0
50 1 Chronicles 21.1 1 4 0 0

What individual books tell us

The idea that the annotators and the playwright are the same person disintegrates completely when further individual books are examined in detail. The chart for 1 Samuel showed that even in books popular with annotators and the playwright, the overlaps between marks and references within the book are not in any way helpful to Stritmatter's thesis.

Although the majority of the books have no links or matches at all, here are four more charts from individual books.

Revelation, chosen because this book looked like the most promising place for matches on the doughnut chart above, where the similar size of the two pink sections possibly suggested a coincidence of interest.

Psalms is a favourite book of both Shakespeare and Marlowe, so the marks should overlap strongly with the references in the book most popular with playwrights.

Ecclesiasticus is the second favourite of the annotators. The 55 marks contained in this book should produce many matches with the references. They do not. However, Ecclesiasticus most clearly shows Stritmatter's thumb in the scales in the orange marks on the chart, where he adds "Indirect Connections' to improve his odds. These are connections to connections, mere fanciful makeweights.

Genesis is the favourite Old Testament book of both Marlowe and Shakespeare, possibly indicating the popularity of The Creation with creative artists. The story of creation inspired thousands of works of art even before the the Bible was widely accessible in vernacular versions. Yet the Folger copy contains only a single, small mark under one of the verse numbers (and it's one of the bible's naughty bits). This alone should have started Stritmatter's alarm bells ringing before questing after this false grail.

He was, however, already enslaved by the Oxfordian Idée Fixe, unshakeably convinced of his conclusion before he even started his analysis. Unable to make any headway with his dataset, he was forced to improve it with additions and contortions, which were then subjected to even more distortion by his followers.

 

Three charts

Genesis

Why would a playwright, using the Bible as a guide, consistently mark passages to which he never referred, while drawing 98% of his references from passages he ignored?

Now it's our turn to state the obvious. He didn't. These are not the playwright's marks. Shakespeare's usage of the Bible demonstrates clearly that individually, none of the annotators demonstrates a close interest in his work and even taken together, the overlap is not significant.

If you insist that all the marks in the Folger Bible are made by a single person, there is still no support for the idea that person was the playwright.

On the evidence of the marks, this is not the playwright's Bible.

Bible