Duplicate detection in facsimile scans of early printed music
Rhodes, Christophe; Crawford, Tim and d'Inverno, Mark. 2014. 'Duplicate detection in facsimile scans of early printed music'. In: European Conference on Data Analysis. Bremen, Germany 2 - 4 July 2014. [Conference or Workshop Item] (Submitted)
Official URL: http://ecda2014.eu/
Abstract or Description
There is a growing number of collections of readily-available scanned musical documents, whether generated and managed by libraries, research projects or volunteer efforts. They are typically digital images; for computational musicology we also need the musical data in machine-readable form. Optical Music Recognition (OMR) can be used on printed music, but is prone to error, depending on document condition and the quality of intermediate stages in the digitization process such as archival photographs. In performing OMR on the British Library’s Early Music Online collection (Pugin and Crawford, 2013) of 16th century volumes we must deal with the problem of images which are rescans of the same pages. These images are not precise digital duplicates of each other, and so must be detected through some approximate means. As well as duplicate scans, there are other forms of similarity present in the collection, such as musical relatedness and movable type reuse. We present our work on developing and combining image-based near-duplicate detection, based on SIFT features (Lowe, 1999), with OMR-based musical content near-duplicate detection. We evaluate an order-statistic based method for finding duplicate scans of pages, and additionally identify a number of distinct kinds of approximate similarity from our distance measures: substantial reuse of graphical material; musical quotation; and title page detection.