Analysis of Minimum Distances in High-Dimensional Musical Spaces

Casey, Michael A.; Rhodes, Christophe and Slaney, Malcolm. 2008. Analysis of Minimum Distances in High-Dimensional Musical Spaces. IEEE Transactions on Audio, Speech, and Language Processing, 16(5), pp. 1015-1028. ISSN 1558-7916 [Article]

No full text available

Abstract or Description

We propose an automatic method for measuring content-based music similarity, enhancing the current generation of music search engines and recommended systems. Many previous approaches to track similarity require brute-force, pair-wise processing between all audio features in a database and therefore are not practical for large collections. However, in an Internet-connected world, where users have access to millions of musical tracks, efficiency is crucial. Our approach uses features extracted from unlabeled audio data and near-neigbor retrieval using a distance threshold, determined by analysis, to solve a range of retrieval tasks. The tasks require temporal features-analogous to the technique of shingling used for text retrieval. To measure similarity, we count pairs of audio shingles, between a query and target track, that are below a distance threshold. The distribution of between-shingle distances is different for each database; therefore, we present an analysis of the distribution of minimum distances between shingles and a method for estimating a distance threshold for optimal retrieval performance. The method is compatible with locality-sensitive hashing (LSH)-allowing implementation with retrieval times several orders of magnitude faster than those using exhaustive distance computations. We evaluate the performance of our proposed method on three contrasting music similarity tasks: retrieval of mis-attributed recordings (fingerprint), retrieval of the same work performed by different artists (cover songs), and retrieval of edited and sampled versions of a query track by remix artists (remixes). Our method achieves near-perfect performance in the first two tasks and 75% precision at 70% recall in the third task. Each task was performed on a test database comprising 4.5 million audio shingles.

Item Type:

Article

Identification Number (DOI):

https://doi.org/10.1109/TASL.2008.925883

Departments, Centres and Research Units:

Computing
Research Office > REF2014

Dates:

DateEvent
2008Published

Item ID:

6826

Date Deposited:

16 Apr 2012 12:33

Last Modified:

20 Jun 2017 11:52

Peer Reviewed:

Yes, this version has been peer-reviewed.

URI:

https://research.gold.ac.uk/id/eprint/6826

Edit Record Edit Record (login required)