Big Data or Not Enough? Zeta Test Reliability and the Attribution of Henry VI

Barber, Ros. 2020. Big Data or Not Enough? Zeta Test Reliability and the Attribution of Henry VI. Digital Scholarship in the Humanities, ISSN 0268-1145 [Article] (In Press)

No full text available
[img] Text
2020-05-01_GRO_Big-Data-Not-Enough.pdf - Accepted Version
Permissions: Administrator Access Only until 7 October 2022.
Available under License Creative Commons Attribution Non-commercial.

Download (1MB)

Abstract or Description

In 2016, the editors of the New Oxford Shakespeare announced that certain Shakespeare plays could be attributed to co-authors, and certain anonymous plays to Shakespeare, on the basis of non-traditional attribution methods known collectively as computational stylistics, or stylometry. This article investigates the efficacy of a key algorithm used to attribute parts of the Henry VI plays to Christopher Marlowe, the Zeta method invented by John Burrows and adapted by Hugh Craig. Zeta, a test widely used in computational stylistics, is described by Gabriel Egan as ‘by some way the most powerful general-purpose authorship tool currently available’. This article offers extensive independent testing of Zeta. Following criticism of the existing method of Zeta analysis, this article introduces a new, statistically sound method for analysing Zeta results. It investigates a claim that the test is 99.9% reliable in differentiating Shakespeare’s style from Marlowe’s. Examining the conditions under which certain authors were ruled in or out of co-authorship of the Henry VI plays, it determines the effect of disparity in data set size on Zeta’s reliability, as well the effect of small data sets. Several test results confirm that Zeta is unduly influenced by genre. The article concludes that in the light of this study, the small canons of most Early Modern dramatists, particularly where they are genre-skewed like Marlowe’s, do not provide enough data for Zeta to be reliable.

Item Type:

Article

Identification Number (DOI):

https://doi.org/10.1093/llc/fqaa041

Additional Information:

This is a pre-copyedited, author-produced version of an article accepted for publication in Digital Scholarship in the Humanities following peer review. The version of record, 'Barber, Ros (2020). Big Data or Not Enough? Zeta Test Reliability and the Attribution of Henry VI. Digital Scholarship in the Humanities', is available online at: https://doi.org/10.1093/llc/fqaa041

The database of all the playtexts used in these tests, the code used to run the tests, and instructions for how to run them, can be found at this data repository: https://doi.org/10.25602/GOLD.00028390

Keywords:

Shakespeare, Marlowe, authorship attribution, computational stylistics, stylometry, Zeta

Related URLs:

Departments, Centres and Research Units:

English and Comparative Literature

Dates:

DateEvent
8 June 2020Accepted
7 October 2020Published Online

Item ID:

28363

Date Deposited:

05 May 2020 15:28

Last Modified:

17 Nov 2020 06:36

Peer Reviewed:

Yes, this version has been peer-reviewed.

URI:

https://research.gold.ac.uk/id/eprint/28363

View statistics for this item...

Edit Record Edit Record (login required)