Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning


Sayan Maity, Mohamed Abdel-Mottaleb, Shihab S. Asfour, Journal of Information Processing Systems Vol. 16, No. 1, pp. 6-29, Feb. 2020

10.3745/JIPS.02.0129
Keywords: Auto-Encoder, Deep Learning, multimodal biometrics, Sparse Classification
Fulltext:

Abstract

Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics. In this paper, we present a novel multimodal recognition system that trains a deep learning network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denoising auto-encoders to automatically extract robust and non-redundant features. The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Moreover, the proposed technique has proven robust when some of the above modalities were missing during the testing. The proposed system has three main components that are responsible for detection, which consists of modality specific detectors to automatically detect images of different modalities present in facial video clips; feature selection, which uses supervised denoising sparse auto-encoders network to capture discriminative representations that are robust to the illumination and pose variations; and classification, which consists of a set of modality specific sparse representation classifiers for unimodal recognition, followed by score level fusion of the recognition results of the available modalities. Experiments conducted on the constrained facial video dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% Rank-1 recognition rates, respectively. The multimodal recognition accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, nonplanar movement, and pose variations present in the video clips even in the situation of missing modalities.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.




Cite this article
[APA Style]
Sayan Maity, Mohamed Abdel-Mottaleb, & Shihab S. Asfour (2020). Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning. Journal of Information Processing Systems, 16(1), 6-29. DOI: 10.3745/JIPS.02.0129.

[IEEE Style]
S. Maity, M. Abdel-Mottaleb and S. S. Asfour, "Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning," Journal of Information Processing Systems, vol. 16, no. 1, pp. 6-29, 2020. DOI: 10.3745/JIPS.02.0129.

[ACM Style]
Sayan Maity, Mohamed Abdel-Mottaleb, and Shihab S. Asfour. 2020. Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning. Journal of Information Processing Systems, 16, 1, (2020), 6-29. DOI: 10.3745/JIPS.02.0129.