Video Captioning with Visual and Semantic Features


Sujin Lee, Incheol Kim, Journal of Information Processing Systems Vol. 14, No. 6, pp. 1318-1330, Dec. 2018  

https://doi.org/10.3745/JIPS.02.0098
Keywords: Attention-Based Caption Generation, Deep Neural Networks, Semantic Feature, Video Captioning
Fulltext:

Abstract

Video captioning refers to the process of extracting features from a video and generating video captions using the extracted features. This paper introduces a deep neural network model and its learning method for effective video captioning. In this study, visual features as well as semantic features, which effectively express the video, are also used. The visual features of the video are extracted using convolutional neural networks, such as C3D and ResNet, while the semantic features are extracted using a semantic feature extraction network proposed in this paper. Further, an attention-based caption generation network is proposed for effective generation of video captions using the extracted features. The performance and effectiveness of the proposed model is verified through various experiments using two large-scale video benchmarks such as the Microsoft Video Description (MSVD) and the Microsoft Research Video-To-Text (MSR-VTT).


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.




Cite this article
[APA Style]
Lee, S. & Kim, I. (2018). Video Captioning with Visual and Semantic Features . Journal of Information Processing Systems, 14(6), 1318-1330. DOI: 10.3745/JIPS.02.0098.

[IEEE Style]
S. Lee and I. Kim, "Video Captioning with Visual and Semantic Features ," Journal of Information Processing Systems, vol. 14, no. 6, pp. 1318-1330, 2018. DOI: 10.3745/JIPS.02.0098.

[ACM Style]
Sujin Lee and Incheol Kim. 2018. Video Captioning with Visual and Semantic Features . Journal of Information Processing Systems, 14, 6, (2018), 1318-1330. DOI: 10.3745/JIPS.02.0098.