Big Data or Not Enough? Zeta Test Reliability and the Attribution of Henry VI

Tools

Barber, Ros. 2021. Big Data or Not Enough? Zeta Test Reliability and the Attribution of Henry VI. Digital Scholarship in the Humanities, 36(3), pp. 542-564. ISSN 0268-1145 [Article]

Preview

Text
2020-05-01_GRO_Big-Data-Not-Enough.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial.
Download (1MB) | Preview

Official URL: https://academic.oup.com/dsh/advance-article-abstr...

Abstract or Description

In 2016, the editors of the New Oxford Shakespeare announced that certain Shakespeare plays could be attributed to co-authors, and certain anonymous plays to Shakespeare, on the basis of non-traditional attribution methods known collectively as computational stylistics, or stylometry. This article investigates the efficacy of a key algorithm used to attribute parts of the Henry VI plays to Christopher Marlowe, the Zeta method invented by John Burrows and adapted by Hugh Craig. Zeta, a test widely used in computational stylistics, is described by Gabriel Egan as ‘by some way the most powerful general-purpose authorship tool currently available’. This article offers extensive independent testing of Zeta. Following criticism of the existing method of Zeta analysis, this article introduces a new, statistically sound method for analysing Zeta results. It investigates a claim that the test is 99.9% reliable in differentiating Shakespeare’s style from Marlowe’s. Examining the conditions under which certain authors were ruled in or out of co-authorship of the Henry VI plays, it determines the effect of disparity in data set size on Zeta’s reliability, as well the effect of small data sets. Several test results confirm that Zeta is unduly influenced by genre. The article concludes that in the light of this study, the small canons of most Early Modern dramatists, particularly where they are genre-skewed like Marlowe’s, do not provide enough data for Zeta to be reliable.

Item Type:

Article

Identification Number (DOI):

https://doi.org/10.1093/llc/fqaa041

Additional Information:

This is a pre-copyedited, author-produced version of an article accepted for publication in Digital Scholarship in the Humanities following peer review. The version of record, 'Barber, Ros (2020). Big Data or Not Enough? Zeta Test Reliability and the Attribution of Henry VI. Digital Scholarship in the Humanities', is available online at: https://doi.org/10.1093/llc/fqaa041

The database of all the playtexts used in these tests, the code used to run the tests, and instructions for how to run them, can be found at this data repository: https://doi.org/10.25602/GOLD.00028390

Keywords:

Shakespeare, Marlowe, authorship attribution, computational stylistics, stylometry, Zeta

Related URLs: