Searching Page-Images of Early Music Scanned with OMR: A Scalable Solution Using Minimal Absent Words

Crawford, Tim; Badkobeh, Golnaz and Lewis, David. 2018. 'Searching Page-Images of Early Music Scanned with OMR: A Scalable Solution Using Minimal Absent Words'. In: Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27, 2018. Paris, France 23 – 27 September 2018. [Conference or Workshop Item]

[img]
Preview
Text
Crawford, T., Badkobeh, G., Lewis, D. (2020) Searching Page-Images of Early Music Scanned with OMR- A Scalable Solution Using Minimal Absent Words.pdf - Published Version
Available under License Creative Commons Attribution.

Download (2MB) | Preview

Abstract or Description

We define three retrieval tasks requiring efficient search of the musical content of a collection of ~32k page images of 16th-century music to find: duplicates; pages with the same musical content; pages of related music. The images are subjected to Optical Music Recognition (OMR), introducing inevitable errors. We encode pages as strings of diatonic pitch intervals, ignoring rests, to reduce the effect of such errors. We extract indices comprising lists of two kinds of ‘word’. Approximate matching is done by counting the number of common words between a query page and those in the collection. The two word-types are (a) normal ngrams and (b) minimal absent words (MAWs). The latter have three important properties for our purpose: they can be built and searched in linear time, the number of MAWs generated tends to be smaller, and they preserve the structure and order of the text, obviating the need for expensive sorting operations. We show that retrieval performance of MAWs is comparable with ngrams, but with a marked speed improvement. We also show the effect of word length on retrieval. Our results suggest that an index of MAWs of mixed length provides a good method for these tasks which is scalable to larger collections.

Item Type:

Conference or Workshop Item (Paper)

Related URLs:

Departments, Centres and Research Units:

Computing

Dates:

DateEvent
2018Accepted
2018Published

Event Location:

Paris, France

Date range:

23 – 27 September 2018

Item ID:

29105

Date Deposited:

30 Jul 2020 13:57

Last Modified:

30 Apr 2021 14:16

URI:

https://research.gold.ac.uk/id/eprint/29105

View statistics for this item...

Edit Record Edit Record (login required)