Logo
Logo

Goldsmiths - University of London

Improving Record Linkage Accuracy with Hierarchical Feature Level Information and Parsed Data

Zhou, Yun; Wang, Minlue; Haberland, Valeriia; Howroyd, John; Danicic, Sebastian and Bishop, Mark. 2017. Improving Record Linkage Accuracy with Hierarchical Feature Level Information and Parsed Data. New Generation Computing, 35(1), pp. 87-104. ISSN 0288-3635 [Article]

[img]
Preview
Text
Yun_AMBN_2015_journal.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview

Abstract or Description

Probabilistic record linkage is a well established topic in the literature. Fellegi-Sunter probabilistic record linkage and its enhanced versions are commonly used methods, which calculate match and non- match weights for each pair of records. Bayesian network classifiers – naive Bayes classifier and TAN have also been successfully used here. Recently, an extended version of TAN (called ETAN) has been developed and proved superior in classification accuracy to conventional TAN. However, no previous work has applied ETAN to record linkage and investigated the benefits of using naturally existing hierarchical feature level information and parsed fields of the datasets. In this work, we ex- tend the naive Bayes classifier with such hierarchical feature level information. Finally we illustrate the benefits of our method over previously proposed methods on 4 datasets in terms of the linkage performance (F1 score). We also show the results can be further improved by evaluating the benefit provided by additionally parsing the fields of these datasets.

Item Type: Article

Identification Number (DOI):

10.1007/s00354-016-0008-5

Departments, Centres and Research Units:

Computing

Dates:

DateEvent
10 January 2017Published
18 March 2016Accepted

Item ID:

17342

Date Deposited:

22 Mar 2016 09:03

Last Modified:

08 Aug 2017 09:21

URI: http://research.gold.ac.uk/id/eprint/17342

View statistics for this item...

Edit Record Edit Record (login required)