Linear Storage and Potentially Constant Time Hierarchical Clustering Using the Baire Metric and Random Spanning Paths

Murtagh, Fionn and Contreras, Pedro. 2016. Linear Storage and Potentially Constant Time Hierarchical Clustering Using the Baire Metric and Random Spanning Paths. In: Adalbert F.X. Wilhelm and Hans A. Kestler, eds. Analysis of Large and Complex Data. Springer, pp. 43-52. ISBN ISBN-10: 3319252240 ISBN-13: 978-3319252247 [Book Section]

No full text available
[img] Text (Linear Storage and Potentially Constant Time Hierarchical Clustering Using the Baire Metric and Random Spanning Paths)
MurtaghContreras_v5.pdf - Published Version
Permissions: Administrator Access Only

Download (514kB)

Abstract or Description

We study how random projections can be used with large data sets in order (i) to cluster the data using a fast, binning approach which is characterized in terms of direct inducing of a hierarchy through use of the Baire metric; and (ii) based on clusters found, selecting subsets of the original data for further analysis. In this work, we focus on random projection that is used for processing high dimensional data. A random projection, outputting a random permutation of the observation set, provides a random spanning path. We show how a spanning path relates to contiguity- or adjacency-constrained clustering. We study performance properties of hierarchical clustering constructed from random spanning paths, and we introduce a novel visualization of the results.

Item Type:

Book Section

Identification Number (DOI):

https://doi.org/10.1007/978-3-319-25226-1_4

Keywords:

Big Data, data analytics, random projection, spanning path, hierarchical clustering, divisive clustering, visualization

Departments, Centres and Research Units:

Computing

Dates:

DateEvent
1 September 2016Published
13 July 2016Accepted

Item ID:

18980

Date Deposited:

26 Sep 2016 11:35

Last Modified:

29 Apr 2020 16:20

URI:

https://research.gold.ac.uk/id/eprint/18980

View statistics for this item...

Edit Record Edit Record (login required)