Image Understanding for Visual Dialog

Yeongsu Cho and Incheol Kim
Volume: 15, No: 5, Page: 1171 ~ 1178, Year: 2019
10.3745/JIPS.04.0141
Keywords: Attribute Recognition, Image Understanding, Visual Dialog
Full Text:

Abstract
This study proposes a deep neural network model based on an encoder–decoder structure for visual dialogs. Ongoing linguistic understanding of the dialog history and context is important to generate correct answers to questions in visual dialogs followed by questions and answers regarding images. Nevertheless, in many cases, a visual understanding that can identify scenes or object attributes contained in images is beneficial. Hence, in the proposed model, by employing a separate person detector and an attribute recognizer in addition to visual features extracted from the entire input image at the encoding stage using a convolutional neural network, we emphasize attributes, such as gender, age, and dress concept of the people in the corresponding image and use them to generate answers. The results of the experiments conducted using VisDial v0.9, a large benchmark dataset, confirmed that the proposed model performed well.

Article Statistics
Multiple requests among the same broswer session are counted as one view (or download).
If you mouse over a chart, a box will show the data point's value.


Cite this article
IEEE Style
Y. C. I. Kim, "Image Understanding for Visual Dialog," Journal of Information Processing Systems, vol. 15, no. 5, pp. 1171~1178, 2019. DOI: 10.3745/JIPS.04.0141.

ACM Style
Yeongsu Cho and Incheol Kim. 2019. Image Understanding for Visual Dialog, Journal of Information Processing Systems, 15, 5, (2019), 1171~1178. DOI: 10.3745/JIPS.04.0141.