Multimodal Context Embedding for Scene Graph Generation


Gayoung Jung, Incheol Kim, Journal of Information Processing Systems Vol. 16, No. 6, pp. 1250-1260, Dec. 2020  

10.3745/JIPS.02.0147
Keywords: deep neural network, Multimodal Context, Relationship Detection, Scene Graph Generation
Fulltext:

Abstract

This study proposes a novel deep neural network model that can accurately detect objects and their relationships in an image and represent them as a scene graph. The proposed model utilizes several multimodal features, including linguistic features and visual context features, to accurately detect objects and relationships. In addition, in the proposed model, context features are embedded using graph neural networks to depict the dependencies between two related objects in the context feature vector. This study demonstrates the effectiveness of the proposed model through comparative experiments using the Visual Genome benchmark dataset.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.




Cite this article
[APA Style]
Jung, G. & Kim, I. (2020). Multimodal Context Embedding for Scene Graph Generation. Journal of Information Processing Systems, 16(6), 1250-1260. DOI: 10.3745/JIPS.02.0147.

[IEEE Style]
G. Jung and I. Kim, "Multimodal Context Embedding for Scene Graph Generation," Journal of Information Processing Systems, vol. 16, no. 6, pp. 1250-1260, 2020. DOI: 10.3745/JIPS.02.0147.

[ACM Style]
Gayoung Jung and Incheol Kim. 2020. Multimodal Context Embedding for Scene Graph Generation. Journal of Information Processing Systems, 16, 6, (2020), 1250-1260. DOI: 10.3745/JIPS.02.0147.