Word-Level Embedding to Improve Performance of Representative Spatio-temporal Document Classification


Byoungwook Kim, Hong-Jun Jang, Journal of Information Processing Systems Vol. 19, No. 6, pp. 830-841, Dec. 2023  

10.3745/JIPS.04.0296
Keywords: Spatio-temporal Document Classification, Tokenization, Word-Level Embedding
Fulltext:

Abstract

Tokenization is the process of segmenting the input text into smaller units of text, and it is a preprocessing task that is mainly performed to improve the efficiency of the machine learning process. Various tokenization methods have been proposed for application in the field of natural language processing, but studies have primarily focused on efficiently segmenting text. Few studies have been conducted on the Korean language to explore what tokenization methods are suitable for document classification task. In this paper, an exploratory study was performed to find the most suitable tokenization method to improve the performance of a representative spatio-temporal document classifier in Korean. For the experiment, a convolutional neural network model was used, and for the final performance comparison, tasks were selected for document classification where performance largely depends on the tokenization method. As a tokenization method for comparative experiments, commonly used Jamo, Character, and Word units were adopted. As a result of the experiment, it was confirmed that the tokenization of word units showed excellent performance in the case of representative spatio-temporal document classification task where the semantic embedding ability of the token itself is important.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.




Cite this article
[APA Style]
Kim, B. & Jang, H. (2023). Word-Level Embedding to Improve Performance of Representative Spatio-temporal Document Classification. Journal of Information Processing Systems, 19(6), 830-841. DOI: 10.3745/JIPS.04.0296.

[IEEE Style]
B. Kim and H. Jang, "Word-Level Embedding to Improve Performance of Representative Spatio-temporal Document Classification," Journal of Information Processing Systems, vol. 19, no. 6, pp. 830-841, 2023. DOI: 10.3745/JIPS.04.0296.

[ACM Style]
Byoungwook Kim and Hong-Jun Jang. 2023. Word-Level Embedding to Improve Performance of Representative Spatio-temporal Document Classification. Journal of Information Processing Systems, 19, 6, (2023), 830-841. DOI: 10.3745/JIPS.04.0296.