One Model to Rule Them all: A Universal Transformer for Biometric Matching

dc.contributor.authorMadina Abdrakhmanova
dc.contributor.authorAssel Yermekova
dc.contributor.authorYuliya Barko
dc.contributor.authorVladislav Ryspayev
dc.contributor.authorMedet Jumadildayev
dc.contributor.authorHüseyin Atakan Varol
dc.date.accessioned2025-08-26T08:37:46Z
dc.date.available2025-08-26T08:37:46Z
dc.date.issued2024-01-01
dc.description.abstractThis study introduces the first single branch network designed to tackle a spectrum of biometric matching scenarios, including unimodal, multimodal, cross-modal, and missing modality situations. Our method adapts the prototypical network loss to concurrently train on audio, visual, and thermal data within a unified multimodal framework. By converting all three data types into image format, we employ the Vision Transformer (ViT) architecture with shared model parameters, enabling the encoder to transform input modalities into a unified vector space. The multimodal prototypical network loss function ensures that vector representations of the same speaker are proximate regardless of their original modalities. Evaluation on SpeakingFaces and VoxCeleb datasets encompasses a wide range of scenarios, demonstrating the effectiveness of our approach. The trimodal model achieves an Equal Error Rate (EER) of 0.27% on the SpeakingFaces test split, surpassing all previously reported results. Moreover, with a single training, it exhibits comparable performance with unimodal and bimodal counterparts, including unimodal audio, visual, and thermal, as well as audio-visual, audio-thermal, and visual-thermal configurations. In cross-modal evaluation on the VoxCeleb1 test set (audio versus visual), our approach yields an EER of 24.1%, again outperforming state-of-the-art models. This underscores the effectiveness of our unified model in addressing diverse scenarios for biometric verification.en
dc.identifier.citationAbdrakhmanova Madina, Yermekova Assel, Barko Yuliya, Ryspayev Vladislav, Jumadildayev Medet, Varol Huseyin Atakan. (2024). One Model to Rule Them all: A Universal Transformer for Biometric Matching. IEEE Access. https://doi.org/https://doi.org/10.1109/access.2024.3426602en
dc.identifier.doi10.1109/access.2024.3426602
dc.identifier.urihttps://doi.org/10.1109/access.2024.3426602
dc.identifier.urihttps://nur.nu.edu.kz/handle/123456789/10066
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartofIEEE Accessen
dc.rightsOpen accessen
dc.sourceIEEE Access, (2024)en
dc.subjectBiometricsen
dc.subjectComputer scienceen
dc.subjectTransformeren
dc.subjectMatching (statistics)en
dc.subjectArtificial intelligenceen
dc.subjectMathematicsen
dc.subjectVoltageen
dc.subjectEngineeringen
dc.subjectElectrical engineeringen
dc.subjectStatisticsen
dc.subjecttype of access: open accessen
dc.titleOne Model to Rule Them all: A Universal Transformer for Biometric Matchingen
dc.typearticleen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
10.1109_ACCESS.2024.3426602.pdf
Size:
1.37 MB
Format:
Adobe Portable Document Format

Collections