Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism


Min Liu, Jun Tang, Journal of Information Processing Systems Vol. 17, No. 4, pp. 754-771, Aug. 2021  

10.3745/JIPS.02.0161
Keywords: AlexNet Networks, Attention Mechanism, Concordance Correlation Coefficient, Deep Learning, Feature Layer Fusion, Multimodal Emotion Recognition, Social Networks
Fulltext:

Abstract

In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.




Cite this article
[APA Style]
Liu, M. & Tang, J. (2021). Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism. Journal of Information Processing Systems, 17(4), 754-771. DOI: 10.3745/JIPS.02.0161.

[IEEE Style]
M. Liu and J. Tang, "Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism," Journal of Information Processing Systems, vol. 17, no. 4, pp. 754-771, 2021. DOI: 10.3745/JIPS.02.0161.

[ACM Style]
Min Liu and Jun Tang. 2021. Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism. Journal of Information Processing Systems, 17, 4, (2021), 754-771. DOI: 10.3745/JIPS.02.0161.