Research Online

Logo

Goldsmiths - University of London

Extracting Clusters of Specialist Terms from Unstructured Text

Gerow, Aaron. 2014. 'Extracting Clusters of Specialist Terms from Unstructured Text'. In: 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP '14). Doha, Qatar October 25-29, 2014. [Conference or Workshop Item]

[img]
Preview
Text (Extracting Clusters of Specialist Terms from Unstructured Text)
gerow_clusters.pdf - Accepted Version
Available under License Creative Commons Attribution No Derivatives.

Download (281kB) | Preview

Abstract or Description

Automatically identifying related specialist terms is a difficult and important task required to understand the lexical structure of language. This paper develops a corpus-based method of extracting coherent clusters of satellite terminology — terms on the edge of the lexicon — using co-occurrence networks of unstructured text. Term clusters are identified by extracting communities in the co-occurrence graph, after which the largest is discarded and the remaining words are ranked by centrality within a community. The method is tractable on large corpora, requires no document structure and minimal normalization. The results suggest that the model is able to extract coherent groups of satellite terms in corpora with varying size, content and structure. The findings also confirm that language consists of a densely connected core (observed in dictionaries) and systematic, se mantically coherent groups of terms at the edges of the lexicon.

Item Type:

Conference or Workshop Item (Paper)

Identification Number (DOI):

https://doi.org/10.3115/v1/D14-1149

Departments, Centres and Research Units:

Computing

Dates:

DateEvent
14 October 2014Published

Event Location:

Doha, Qatar

Date range:

October 25-29, 2014

Item ID:

22653

Date Deposited:

09 Jan 2018 12:43

Last Modified:

09 Jul 2018 15:05

URI:

http://research.gold.ac.uk/id/eprint/22653

View statistics for this item...

Edit Record Edit Record (login required)