Word Similarity Calculation by Using the Edit Distance Metrics with Consonant Normalization

Seung-Shik Kang
Volume: 11, No: 4, Page: 573 ~ 582, Year: 2015
10.3745/JIPS.04.0018
Keywords: Consonant Normalization, Edit Distance, Korean Character, Normalization Factor
Full Text:

Abstract
Edit distance metrics are widely used for many applications such as string comparison and spelling error corrections. Hamming distance is a metric for two equal length strings and Damerau-Levenshtein distance is a well-known metrics for making spelling corrections through string-to-string comparison. Previous distance metrics seems to be appropriate for alphabetic languages like English and European languages. However, the conventional edit distance criterion is not the best method for agglutinative languages like Korean. The reason is that two or more letter units make a Korean character, which is called as a syllable. This mechanism of syllable-based word construction in the Korean language causes an edit distance calculation to be inefficient. As such, we have explored a new edit distance method by using consonant normalization and the normalization factor.

Article Statistics
Multiple requests among the same broswer session are counted as one view (or download).
If you mouse over a chart, a box will show the data point's value.


Cite this article
IEEE Style
Seung-Shik Kang, "Word Similarity Calculation by Using the Edit Distance Metrics with Consonant Normalization," Journal of Information Processing Systems, vol. 11, no. 4, pp. 573~582, 2015. DOI: 10.3745/JIPS.04.0018.

ACM Style
Seung-Shik Kang, "Word Similarity Calculation by Using the Edit Distance Metrics with Consonant Normalization," Journal of Information Processing Systems, 11, 4, (2015), 573~582. DOI: 10.3745/JIPS.04.0018.