Research Online

Logo

Goldsmiths - University of London

Live versus Archive: Comparing a Web Archive and to a Population of Webpages

Hale, Scott A.; Blank, Grant and Alexander, Victoria D.. 2017. Live versus Archive: Comparing a Web Archive and to a Population of Webpages. In: Niels Brügger and Ralph Schroeder, eds. The Web as History. London: UCL Press, pp. 45-61. ISBN 978–1–911307–42–6 [Book Section]

[img]
Preview
Text
Live versus Archive Chapter Final Typescript.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview

Abstract or Description

With its seemingly limitless scope, the World Wide Web promises enormous advantages, along with enormous problems, to researchers who seek to use it as a source of data. Websites change continually and a high level of flux makes it challenging to capture a snapshot of the web, or even a cross-section of a small subset of the web. A web archive, such as those at the Internet Archive, promises to store and deliver repeated cross-sections of the entire web, and it also offers the potential for longitudinal analysis. Whether this potential is realized depends on the extent to which the archive has truly captured the web. Therefore, a crucial question for Internet researchers is: ‘How good are the archival data?’
We ask if there are systematic biases in the Internet Archive, using a case study to address this question. Specifically, we are interested in whether biases exist in the British websites stored in the Internet Archive data. We find that the Internet Archive contains a surprisingly small subset, about 24%, of the webpages of the website that we use for our case study (the travel site, TripAdvisor). Furthermore, the subset of data we found in the Internet Archive appears to be biased and is not a random sample of the webpages on the site. The archived data we examine has a bias toward prominent webpages. This bias could create serious problems for research using archived websites.

Item Type:

Book Section

Identification Number (DOI):

https://doi.org/10.14324/111.9781911307563

Departments, Centres and Research Units:

Institute for Cultural and Creative Entrepreneurship (ICCE)

Dates:

DateEvent
1 March 2017Published

Item ID:

20698

Date Deposited:

12 Jul 2017 11:52

Last Modified:

11 Jul 2018 11:38

URI:

http://research.gold.ac.uk/id/eprint/20698

View statistics for this item...

Edit Record Edit Record (login required)