A Text Similarity Measurement Method Based on Singular Value Decomposition and Semantic Relevance

Xu Li, Chunlong Yao, Fenglong Fan and Xiaoqiang Yu
Volume: 13, No: 4, Page: 863 ~ 875, Year: 2017
10.3745/JIPS.02.0067
Keywords: Natural Language Processing, Semantic Relevance, Singular Value Decomposition, Text Representation, Text Similarity Measurement
Full Text:

Abstract
The traditional text similarity measurement methods based on word frequency vector ignore the semantic relationships between words, which has become the obstacle to text similarity calculation, together with the high-dimensionality and sparsity of document vector. To address the problems, the improved singular value decomposition is used to reduce dimensionality and remove noises of the text representation model. The optimal number of singular values is analyzed and the semantic relevance between words can be calculated in constructed semantic space. An inverted index construction algorithm and the similarity definitions between vectors are proposed to calculate the similarity between two documents on the semantic level. The experimental results on benchmark corpus demonstrate that the proposed method promotes the evaluation metrics of F-measure.

Article Statistics
Multiple requests among the same broswer session are counted as one view (or download).
If you mouse over a chart, a box will show the data point's value.


Cite this article
IEEE Style
Xu Li, Chunlong Yao, Fenglong Fan, and Xiaoqiang Yu, "A Text Similarity Measurement Method Based on Singular Value Decomposition and Semantic Relevance," Journal of Information Processing Systems, vol. 13, no. 4, pp. 863~875, 2017. DOI: 10.3745/JIPS.02.0067.

ACM Style
Xu Li, Chunlong Yao, Fenglong Fan, and Xiaoqiang Yu, "A Text Similarity Measurement Method Based on Singular Value Decomposition and Semantic Relevance," Journal of Information Processing Systems, 13, 4, (2017), 863~875. DOI: 10.3745/JIPS.02.0067.