SVD-LDA: A Combined Model for Text Classification

Nguyen Cao Truong Hai, Kyung-Im Kim and Hyuk-Ro Park
Volume: 5, No: 1, Page: 5 ~ 10, Year: 2009
10.3745/JIPS.2009.5.1.005
Keywords: Latent Dirichlet Allocation, Singular Value Decomposition, Input Filtering, Text Classification, Data Preprocessing.
Full Text:

Abstract
Text data has always accounted for a major portion of the world¡¯s information. As the volume of information increases exponentially, the portion of text data also increases significantly. Text classification is therefore still an important area of research. LDA is an updated, probabilistic model which has been used in many applications in many other fields. As regards text data, LDA also has many applications, which has been applied various enhancements. However, it seems that no applications take care of the input for LDA. In this paper, we suggest a way to map the input space to a reduced space, which may avoid the unreliability, ambiguity and redundancy of individual terms as descriptors. The purpose of this paper is to show that LDA can be perfectly performed in a ¡°clean and clear¡± space. Experiments are conducted on 20 News Groups data sets. The results show that the proposed method can boost the classification results when the appropriate choice of rank of the reduced space is determined.

Article Statistics
Multiple requests among the same broswer session are counted as one view (or download).
If you mouse over a chart, a box will show the data point's value.


Cite this article
IEEE Style
Nguyen Cao Truong Hai, Kyung-Im Kim and Hyuk-Ro Park, "SVD-LDA: A Combined Model for Text Classification," Journal of Information Processing Systems, vol. 5, no. 1, pp. 5~10, 2009. DOI: 10.3745/JIPS.2009.5.1.005.

ACM Style
Nguyen Cao Truong Hai, Kyung-Im Kim and Hyuk-Ro Park, "SVD-LDA: A Combined Model for Text Classification," Journal of Information Processing Systems, 5, 1, (2009), 5~10. DOI: 10.3745/JIPS.2009.5.1.005.