Impact of Instance Selection on kNN-Based Text Categorization

Fatiha Barigou
Volume: 14, No: 2, Page: 418 ~ 434, Year: 2018
Keywords: Classification Accuracy, Classification Efficiency, Data Reduction, Instance Selection, k-Nearest Neighbors, Text Categorization
Full Text:

With the increasing use of the Internet and electronic documents, automatic text categorization becomes imperative. Several machine learning algorithms have been proposed for text categorization. The k-nearest neighbor algorithm (kNN) is known to be one of the best state of the art classifiers when used for text categorization. However, kNN suffers from limitations such as high computation when classifying new instances. Instance selection techniques have emerged as highly competitive methods to improve kNN through data reduction. However previous works have evaluated those approaches only on structured datasets. In addition, their performance has not been examined over the text categorization domain where the dimensionality and size of the dataset is very high. Motivated by these observations, this paper investigates and analyzes the impact of instance selection on kNN-based text categorization in terms of various aspects such as classification accuracy, classification efficiency, and data reduction.

Article Statistics
Multiple requests among the same broswer session are counted as one view (or download).
If you mouse over a chart, a box will show the data point's value.

Cite this article
IEEE Style
F. Barigou, "Impact of Instance Selection on kNN-Based Text Categorization," Journal of Information Processing Systems, vol. 14, no. 2, pp. 418~434, 2018. DOI: 10.3745/JIPS.02.0080.

ACM Style
Fatiha Barigou . 2018. Impact of Instance Selection on kNN-Based Text Categorization, Journal of Information Processing Systems, 14, 2, (2018), 418~434. DOI: 10.3745/JIPS.02.0080.