Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion


Xuan Zhou, Journal of Information Processing Systems Vol. 17, No. 2, pp. 337-351, Apr. 2021  

10.3745/JIPS.01.0067
Keywords: Double Layer Cascade Structure, facial expression recognition, feature fusion, Image Detection, Spatiotemporal Recursive Neural Network
Fulltext:

Abstract

Automatically recognizing facial expressions in video sequences is a challenging task because there is little direct correlation between facial features and subjective emotions in video. To overcome the problem, a video facial expression recognition method using spatiotemporal recurrent neural network and feature fusion is proposed. Firstly, the video is preprocessed. Then, the double-layer cascade structure is used to detect a face in a video image. In addition, two deep convolutional neural networks are used to extract the time-domain and airspace facial features in the video. The spatial convolutional neural network is used to extract the spatial information features from each frame of the static expression images in the video. The temporal convolutional neural network is used to extract the dynamic information features from the optical flow information from multiple frames of expression images in the video. A multiplication fusion is performed with the spatiotemporal features learned by the two deep convolutional neural networks. Finally, the fused features are input to the support vector machine to realize the facial expression classification task. The experimental results on cNTERFACE, RML, and AFEW6.0 datasets show that the recognition rates obtained by the proposed method are as high as 88.67%, 70.32%, and 63.84%, respectively. Comparative experiments show that the proposed method obtains higher recognition accuracy than other recently reported methods.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.




Cite this article
[APA Style]
Zhou, X. (2021). Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion. Journal of Information Processing Systems, 17(2), 337-351. DOI: 10.3745/JIPS.01.0067.

[IEEE Style]
X. Zhou, "Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion," Journal of Information Processing Systems, vol. 17, no. 2, pp. 337-351, 2021. DOI: 10.3745/JIPS.01.0067.

[ACM Style]
Xuan Zhou. 2021. Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion. Journal of Information Processing Systems, 17, 2, (2021), 337-351. DOI: 10.3745/JIPS.01.0067.