Big Data or Not Enough? Zeta Test Reliability and the Attribution of Henry VI

Barber, Ros. 2021. Big Data or Not Enough? Zeta Test Reliability and the Attribution of Henry VI. Digital Scholarship in the Humanities, 36(3), pp. 542-564. ISSN 0268-1145 [Article]

2020-05-01_GRO_Big-Data-Not-Enough.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial.

Download (1MB) | Preview

Abstract or Description

In 2016, the editors of the New Oxford Shakespeare announced that certain Shakespeare plays could be attributed to co-authors, and certain anonymous plays to Shakespeare, on the basis of non-traditional attribution methods known collectively as computational stylistics, or stylometry. This article investigates the efficacy of a key algorithm used to attribute parts of the Henry VI plays to Christopher Marlowe, the Zeta method invented by John Burrows and adapted by Hugh Craig. Zeta, a test widely used in computational stylistics, is described by Gabriel Egan as ‘by some way the most powerful general-purpose authorship tool currently available’. This article offers extensive independent testing of Zeta. Following criticism of the existing method of Zeta analysis, this article introduces a new, statistically sound method for analysing Zeta results. It investigates a claim that the test is 99.9% reliable in differentiating Shakespeare’s style from Marlowe’s. Examining the conditions under which certain authors were ruled in or out of co-authorship of the Henry VI plays, it determines the effect of disparity in data set size on Zeta’s reliability, as well the effect of small data sets. Several test results confirm that Zeta is unduly influenced by genre. The article concludes that in the light of this study, the small canons of most Early Modern dramatists, particularly where they are genre-skewed like Marlowe’s, do not provide enough data for Zeta to be reliable.

Item Type:


Identification Number (DOI):

Additional Information:

This is a pre-copyedited, author-produced version of an article accepted for publication in Digital Scholarship in the Humanities following peer review. The version of record, 'Barber, Ros (2020). Big Data or Not Enough? Zeta Test Reliability and the Attribution of Henry VI. Digital Scholarship in the Humanities', is available online at:

The database of all the playtexts used in these tests, the code used to run the tests, and instructions for how to run them, can be found at this data repository:


Shakespeare, Marlowe, authorship attribution, computational stylistics, stylometry, Zeta

Related URLs:

Departments, Centres and Research Units:

English and Comparative Literature


8 June 2020Accepted
7 October 2020Published Online
September 2021Published

Item ID:


Date Deposited:

05 May 2020 15:28

Last Modified:

07 Oct 2022 01:26

Peer Reviewed:

Yes, this version has been peer-reviewed.


View statistics for this item...

Edit Record Edit Record (login required)