Improving Abstractive Summarization by Training Masked Out-of-Vocabulary Words


Tae-Seok Lee, Hyun-Young Lee, Seung-Shik Kang, Journal of Information Processing Systems Vol. 18, No. 3, pp. 344-358, Jun. 2022  

10.3745/JIPS.02.0172
Keywords: BERT, Deep Learning, Generative Summarization, Selective OOV Copy Model, Unknown Words
Fulltext:

Abstract

Text summarization is the task of producing a shorter version of a long document while accurately preserving the main contents of the original text. Abstractive summarization generates novel words and phrases using a language generation method through text transformation and prior-embedded word information. However, newly coined words or out-of-vocabulary words decrease the performance of automatic summarization because they are not pre-trained in the machine learning process. In this study, we demonstrated an improvement in summarization quality through the contextualized embedding of BERT with out-of-vocabulary masking. In addition, explicitly providing precise pointing and an optional copy instruction along with BERT embedding, we achieved an increased accuracy than the baseline model. The recall-based word-generation metric ROUGE- 1 score was 55.11 and the word-order-based ROUGE-L score was 39.65.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.




Cite this article
[APA Style]
Lee, T., Lee, H., & Kang, S. (2022). Improving Abstractive Summarization by Training Masked Out-of-Vocabulary Words. Journal of Information Processing Systems, 18(3), 344-358. DOI: 10.3745/JIPS.02.0172.

[IEEE Style]
T. Lee, H. Lee, S. Kang, "Improving Abstractive Summarization by Training Masked Out-of-Vocabulary Words," Journal of Information Processing Systems, vol. 18, no. 3, pp. 344-358, 2022. DOI: 10.3745/JIPS.02.0172.

[ACM Style]
Tae-Seok Lee, Hyun-Young Lee, and Seung-Shik Kang. 2022. Improving Abstractive Summarization by Training Masked Out-of-Vocabulary Words. Journal of Information Processing Systems, 18, 3, (2022), 344-358. DOI: 10.3745/JIPS.02.0172.