Information retrieval system and method that generates weighted comparison results to analyze the degree of dissimilarity between a reference corpus and a candidate document

Wyard, Peter J; Russell-Rose, Tony; British Telecommunications PLC 2000. Information retrieval system and method that generates weighted comparison results to analyze the degree of dissimilarity between a reference corpus and a candidate document. US6167398.

[img]
Preview
Text
US6167398.pdf - Published Version

Download (1MB) | Preview

Abstract or Description

An internet information agent accepts a reference document, performs an analysis upon it in accordance with metrics defined by its analysis algorithm and obtains respective lists (word, character-level n-gram, word-level n-gram), derives weights corresponding to the metrics, applies the metrics to a candidate document and obtains respective returned values, applies the weights to the returned values and Sums the results to obtain a Document Dissimilarity (DD) value. This DD is compared with a Dissimilarity Threshold (DT) and the candidate document is stored if the DD is less than the DT. A user can apply relevance values to the Search results and the agent modifies the weights accordingly. The agent can be used to improve a language model for use in Speech recognition applications and the like.

Item Type:

Patent

Identification Number (DOI):

US6167398

Additional Information:

US Patent 6,167,398

Departments, Centres and Research Units:

Computing

Date:

26 December 2000

Item ID:

29763

Date Deposited:

25 Mar 2021 16:27

Last Modified:

25 Mar 2021 16:34

URI:

https://research.gold.ac.uk/id/eprint/29763

View statistics for this item...

Edit Record Edit Record (login required)