Hand of Damocles
Stylometry takes a step forward
In the face of such methodological shortcomings, conflicting opinions, and duelling analyses, what is one to think? An obvious explanation is that to- day’s orthodox scholars, including all the stylometricians here mentioned, are groping blindly in the wrong paradigm, and are handicapped by the confines of the conventional Shakespearean dating system. (Craig and Kinney are familiar with the Oxfordian argument, and mention it several times, once even citing an article in The Oxfordian.) In addition, very few scholars of any period have given any consideration to the idea of a substantial corpus of Shakespearean juvenilia. We can be sure that Shakespeare did not always write like Shakespeare. |
In 2009, Hugh Craig and Arthur F Kinney published a book called Shakespeare, Computers and the Mystery of Authorship, revised in 2012, which has received surprisingly little attention in the SAQ debate, given its title. Craig is Professor of English at the University of Newcastle, Australia and a contributor to the Early Modern Literary Studies site and is the author of a very good study on Jonson which looks at the problems thrown up by lexical analysis of his work.
Arthur F Kinney is the Thomas W, Copeland Professor of Literary History at the University of Massachusetts at Amherst. He is a founding editor of English Literary Renaissance and he has also written on SAQ-relevant issues in Shakespeare's Web (2004) and Shakespeare and Cognition (2006). This book drew in the work of two doctoral students at Amherst; Philip Palmer and Timothy Irish Watt.
All sorts of bells should be ringing by now. However, the only reviews I can find in the field comprise a solitary Amazon review and a very positive notice from Linda Theil. I doubt she read it or even looked inside. There is also a mystifying article on the book in the Shakespeare Oxford Newsletter in which Ramon Jiménez proves he has definitely read some of its contents but, concluding from the above conclusion, he has either understood nothing or is presenting his summary of an entirely different book.
Comparing Shakespeare's work systematically to those of half a dozen contemporaries is very useful. Being able to access it against nearly all surviving literary texts from the period, as the Chadwyck Healey database already makes possible, or against all printed books from the period, as the early English Books Online project promises, will allow a truer estimate of its distinctive style.
C&K, Introduction
More uncertainty?
Before looking at Craig & Kinney's results and conclusions, we should look at their method, as it is different to previous stylometric methods in two large areas. The first is that their results are always derived from the largest possible data sets, not subsections tailored to available computing resource. The second is that the method is entirely algorithmic. Their tests consist only of algorithms operating on data. This has advantages and disadvantages.
The biggest advantage derives from the very large amount of high quality digital source material newly available. Once their test algorithms are complete they can take advantage of huge increases in processing power and concomitant huge decreases in the cost of memory and mass storage, to run all of their tests (and even partial tests) against all of the available data, every time. If subtracting part of it makes sense in terms of increasing the validity, such as removing plays from after 1603 if the question concerns Elizabethan drama or removing the play texts on which conclusions are to be drawn, then the procedures are relatively simple and do not require a change of method. They can effortlessly pull Coriolanus and Hengist out of the test bank if they want to make deductions about where they would fit when they put them back again.
The amount of data addressed by Craig & Kinney would have been unmanageably large 10 years ago and unthinkable during the Shakespeare Wars of the 1990's when stylometry was the tool of the moment. Yet today, even on desktop computers (new ones!), this entire array can be be addressed for each and every test. |
|
Early Modern English Plays |
165 |
Words of dialogue |
3,250,000 |
Plays from 1590-1619 |
138 |
Plays of undisputed single authorship |
112 |
Complete works |
WS, TM, CM, JW, BJ |
Four or more complete plays |
JL, RG, GP, TD, TH, JF, JF |
Three plays |
RW, GC, JS |
Usage dictionary |
Online OED |
Endless 'what-ifs?' are possible with this method. The tests are capable of refining themselves. Take all of Middleton's undisputed work and they keep iterating the lexical and function word tests until they can divide segments of Middleton from segments of undisputed Shakespeare. Any size of segment—the efficiency of different segment boundaries can be tested too. Using the same parameters (and billions of calculations) unattributed segments of the same size can be placed on the same chart (see below).
Sensitivity analysis, difficult with previous testing methods, is now a snap. The process of experimenting with lots of different assumptions, even silly ones, and letting the algorithms take care of the output, gradually makes the system robust. C&K did, in fact, refine their tests in prolonged comparisons with uncontested attributions and other stylometric results. Once reliable results are being produced consistently, anomalies can be addressed by refining the questions rather than tinkering with the machine that produces the answers.
The end result is a mathematician's design and has that unholy smell of the clinical laboratory but if you want to support argument with metrics, you need metrics that are started and finished by mathematical procedures.
And that's what is new.
The biggest disadvantage is that subjectivity is pretty much out, completely. Questions that cannot be expressed in machine-readable language cannot be asked. So until the database of early modern texts is refined to the point that individual words and verse features can take attributes, enabling adjectives and pronouns to be separated, trochees and spondees treated differently and imagery categorised, all propositions have to be based on quantitative analysis of the data as it is currently digitised. There is no method of analysing a stylistic trend, such as the growth of feminine endings, without a distinguishing attribute in the base data. Despite the fact that C&K have adjusted usage to accommodate headwords, this is just the tip of the subjective iceberg so we still need Caroline Spurgeon. More disappointingly, the method will not allow the drawing of conclusions from alterations in stylistic development over a period, the way Elliot & Valenza can, having measured and recorded some of these attributes manually.
For the conclusions to be worth anything, the method has to be valid.
It is.
Signature vocabulary - lexical analysis
Identifying lists of words that one dramatist uses more than another has been a key technique in attribution since analysis began. Without computer assistance, the method has relied on identifying rarer words and spellings, such as Shakespeare's unique use of 'scilens' in Henry IV and Hand D. However, if a process can run on the entire works of an individual dramatist and produce sets of words they use frequently and words they use rarely, the words themselves can be ignored as the set should be unique. No will else will have an identical pattern of usage based on a lifetime's work. C&K spend quite a lot of time explaining why this is is a valid form of discrimination and why the top and bottom segments of such a set, the 500 most and least favoured words, can be used to discriminate accurately between writers.
Conducting comparisons, C&K introduce a further stage with a procedure that requires a bit of serious computing power. The efficiency of vocabulary comparison can be greatly increased by comparing the words most likely to be used by Shakespeare with the words least likely to be shared by whoever or whatever is being compared, turning a measurable characteristic into an active co-efficient of discrepancy. C&K call these Shakespeare and non-Shakespeare markers. Markers can be built for any playwright, though the amount of available work will affect the overall accuracy.
The markers they use to test Coriolanus are created from all the Shakespeare segments (minus Coriolanus) which contain the favoured words and all the segments from the entire database of non-Shakespearean drama, which do not contain them. This creates a very distinctive Shakespeare/non-Shakespeare map on which plays under consideration can then be replaced to test for authorship. The demonstration builds a similar map for Middleton's Hengist then uses PCA to combine them.
Since the method of creating these datasets is algorithmic, it's possible to vary the size of the set according to dates or sets of work, and make comparisons based on the the whole of Early Modern English Literature. Or any other way. And it's possible to run extensive sanity tests to discover whether the results are meaningful.
Function word analysis
Vocabulary is a conscious choice. A writer's use of function words, 'the', 'if', 'and' etc., can reveal an unconscious signature. Again this is not a new idea but tackling it from an entirely algorithmic point of view is new. Once more, C&K spend a great deal of time explaining the validity of this form of analysis and their method, which strays into the sort of mathematical rocket science used by hedge fund analysts and economic forecasters. Standard deviation, Principle Component Analysis and other horribly difficult procedures are employed to look, once again, at the entire output of a dramatist in the search of a metrical signature. The great advantage here is that the process is entirely separate from lexical analysis, resulting in two separate tests.
Two tests are better than one, especially since in the second test, the method, the data generated and the resulting algorithm are all unduplicated in the first.
The tests
Of course, for some dramatist's, there isn't much output. And then there is the question of influence. Can the method discriminate between Marlowe and passages where Shakespeare is imitating Marlowe? Between Pamela and Shamela? The function word test, unsurprisingly, is a great deal better at this but C&K's lexical test turns out to be up to the task as well.
To cut a long story short and get down to the business of looking at what new light C&K have produced, we'll proceed with just one caveat: that large as the datasets are, the small amount of data available for many dramatists is a far greater handicap than the shortcomings of the method of discrimination.
The method is shown to work.
However, there are still questions that can't be answered. And there are uncertainties which lie beyond the scope of what can be resolved. Resolution is a measure of how much detail can be discerned in an image. Stylometric tools are still, at this early stage in their development, low-resolution, tools. The most promising feature of C&K's approach is the idea that large improvements in resolution might be possible with further refinement, enabling more accurate tests to be carried out on smaller amounts of data.
So, as with all stylometric analysis there is room for doubt. Old doubt and even newly-created doubt. The existence of (not many) false positives needs explanation. Gaps exist for the Oxfordian coach and horses to aim at. Room for doubt - but is there room for Doubters?
Well the big news, contrary to Jimenez's 'conclusions', is that none of the gaps look viable any more. Hand D, for example, is well within its scope.
A result
Lots of new horizons open up once we can test reliably using an algorithmic process. For example, it can reduce the many assumptions which have to be made before tests are designed using other methods. C&K are not bound, in the way E&V were, to test like with like. They can judge the importance and weight of different 'likes'. They can measure the difference between the vocabularies of Shakespeare's prose and verse in a meaningful way, (Oxford's too, if they wished), testing whether the difference requires separate analysis for successful discrimination. Maybe, maybe not, is the answer. With Shakespeare, anyway. There's so much in each category that properly implemented mathematical tests can overcome any problems. That's starting to look a bit risky in the case of the Earl, now. After all, we have a lot of his prose.
Patience. We're getting there.
Let's have a look at Timon. This is a bald summary of the process.
First C&K take Timon out of the test data. They now have a databank which contains the 27 plays of Shakespeare without collaboration and, since Middleton is the favoured collaborator, all of his undisputed work is also present.
The process builds two signature sets of favourite/least favourite words, calculating which words each dramatist is likely and least likely to use - a bespoke Shakespeare/Middleton profile. Then they carry out a function word test on each dramatist. Using some more fancy math and lots of CPU time, they tie the two test sets together. In this test, they compare every 2,000 word segment (with random boundaries) after testing whether running the tests following scene boundaries would improve accuracy. There's not a great deal of difference in the results, suggesting the undisputed nature of the attributions is correct. The chart shows the result of scene boundary and 2,000 word segments.
Now they use the same analysis technique on same-size segments, scenes in this case, of the play(s) under examination. Having subjected them to the same analysis which created the author profiles, they then overlay the sections of the play on to the segment profiles.
Here's the result:
Timon of Athens Shakespeare and Middleton segments
Reproduced with the authors' permission
Shakespeare's diamond-shaped segments represent the analysis of all his undisputed work thus excluding Timon. Middleton's black circles represent all his work except his supposed contributions to Timon. There is almost no overlap between the work of the two dramatists. A line can be drawn which almost completely isolates one from the other since only two of Shakespeare's segments sit on the Middleton side of the chart. This is not an inaccuracy or a weakness in the system. It merely says that very occasionally, in less than 2% of cases, Shakespeare can sound a bit like Middleton. It's impressive consistency, not weakness. In 98% of the segments, he's speaking in his own clear voice.
Analysis of the play segments drops each of the four segments of Timon on to the chart. Three fall into Shakespeare's segment area, one very close to what C&K call a Shakespeare centroid§ and the Middleton segment drops smack bang on one of the Middleton centroids itself.
Go on, tell me that's not pretty…
For it to be meaningful, results for other dramatists should place the Timon play segments well outside the area occupied by most of their work. And they do. There are hundreds of charts which demonstrate their segment mapping attribution techniques, tested on a range of work and dramatists. Sadly, to enjoy all their results, you are going to have spring for the book.
I've enjoyed their book far too much to give away all their exciting goodies. It's easy to follow as the authors are very, very good at explaining what the math does without dwelling too long on how. The long chapter on H6, which puts the method through its paces, is excellent. The chapter on Edmond Ironside demonstrates its limitations, just as the chapter on Lear Quartos and emendation demonstrates its potential. It's £18 from Google books and it's the best money I've spent on a SAQ book.
Tell us what we want to know!!!
And Oxford? How about his claims? Well, he gets a dishonourable mention or two. Stylometry was once the great white hope of Oxfordians. There's no reason why C&K's work couldn't be used on his prose and verse for yet another emphatic elimination. E&V, of course, made just such a categorical statement. The Oxfordian response has been limited to ignoring them or pretending the whole business of computerised analysis is worthless. However, though his followers have earned him a place in the index, C&K ignore Oxford's work entirely.
Do they perhaps they have a different kind of elimination in mind? Judge for yourselves.
Oxfordians who don't like E&V's elimination of Oxford from the authorship stakes do enjoy airily criticising the data, the method, the maths and almost everything they ever did. All their work, in fact, apart from one single instance in which everything unaccountably went right, E&V do not agree that Hand D is Shakespeare's. Their analysis says not. Oxfordians rejoice. In our feature article on Hand D, we have included their analysis, Mac Jackson's analysis of their analysis and E&V's subsequent analysis of Mac Jackson's analysis of their analysis (we defer to none when it comes to the Rules of Style here).
C&K go one better. Or maybe three of four or possibly even ten better. Not only does their own analysis place Hand D centrically in Shakespeare's lap but they also re-run E&V's negative result-producing tests using their much larger datasets and now that test produces a positive result as well, vindicating themselves, us, Mac Jackson, E&V and Shakespeare at a single stroke. Have a root beer, guys, we're buying.
Another neat trick enables them to isolate lexical and function word vocabulary sets for individual plays. The vocabulary set Hand D most resembles is Othello, which would suggest the arguments favouring a later date around 1603 are also correct.
What then, would be the point of running tests on the Earl of Oxford's work to eliminate him, stylometrically, all over again? Hugh Craig's email reply to an online Oxfordian question hints at what their response might be to a van load of misguided Oxfordian scepticism. The answer, so far, is no point at all. What's to be gained from kicking a dead horse?
In fact, 'Wat nedeth sermone more?' as Chaucer said.
It is apparent that we are still near the beginning of the process of refining stylometry into an an attribution science. This book is a big step forward. In the right direction. It is equally apparent that every step forward makes the Oxfordian case more improbable.
In fact, they will soon have enough improbability to power Zaphod Beeblebrox's Improbability Drive all by themselves.
§ Centroids are the centre of a cluster of datapoints. The centroid's coordinates are the value of the average for the cluster points on each axis. I could tell you were going to ask.