Extending Naive Bayes Classifier with Hierarchy Feature Level Information for Record Linkage
Zhou, Y.; Howroyd, John; Danicic, Sebastian and Bishop, Mark (J. M.). 2015. 'Extending Naive Bayes Classifier with Hierarchy Feature Level Information for Record Linkage'. In: AMBN 2015: the second workshop on Advanced Methodologies for Bayesian Network. Yokohama, Japan. [Conference or Workshop Item]
Abstract or Description
Probabilistic record linkage has been well investigated in re- cent years. The Fellegi-Sunter probabilistic record linkage and its enhanced version are commonly used methods, which calculate match and non-match weights for each pair of corresponding fields of record-pairs. Bayesian network classifiers – naive Bayes classifier and TAN have also been successfully used here. Very recently, an extended version of TAN (called ETAN) has been developed and proved superior in classification accuracy to conventional TAN. However, no previous work has applied ETAN in record linkage and investigated the benefits of using a nat rally existing hierarchy feature level information. In this work, we extend the naive Bayes classifier with such information. Finally we apply all the methods to four datasets and estimate the F1 scores.