Searching Page-Images of Early Music Scanned with OMR: A Scalable Solution Using Minimal Absent Words

Crawford, Tim; Badkobeh, Golnaz and Lewis, David. 2018. 'Searching Page-Images of Early Music Scanned with OMR: A Scalable Solution Using Minimal Absent Words'. In: 19th International Society for Music Information Retrieval Conference. Paris, France 24-27 September 2018. [Conference or Workshop Item]

[img]
Preview
Text
Paper_final_with_copyright.pdf - Published Version
Available under License Creative Commons Attribution.

Download (2MB) | Preview

Abstract or Description

We define three retrieval tasks requiring efficient search of the musical content of a collection of ~32k page-images of 16th-century music to find: duplicates; pages with the same musical content; pages of related music.
The images are subjected to Optical Music Recognition (OMR), introducing inevitable errors. We encode pages as strings of diatonic pitch intervals, ignoring rests, to reduce the effect of such errors. We extract indices comprising lists of two kinds of ‘word’. Approximate matching is done by counting the number of common words between a query page and those in the collection.
The two word-types are (a) normal ngrams and (b) minimal absent words (MAWs). The latter have three important properties for our purpose: they can be built and searched in linear time, the number of MAWs generated tends to be smaller, and they preserve the structure and order of the text, obviating the need for expensive sorting operations.
We show that retrieval performance of MAWs is comparable with ngrams, but with a marked speed improvement. We also show the effect of word length on retrieval. Our results suggest that an index of MAWs of mixed length provides a good method for these tasks which is scalable to larger collections.

Item Type:

Conference or Workshop Item (Paper)

Departments, Centres and Research Units:

Computing > Intelligent Sound and Music Systems

Dates:

DateEvent
4 October 2018Published

Event Location:

Paris, France

Date range:

24-27 September 2018

Item ID:

24502

Date Deposited:

05 Oct 2018 09:14

Last Modified:

21 Dec 2022 14:30

URI:

https://research.gold.ac.uk/id/eprint/24502

View statistics for this item...

Edit Record Edit Record (login required)