DSpace Repository

SPEAKINGFACES: A LARGE-SCALE MULTIMODAL DATASET OF VOICE COMMANDS WITH VISUAL AND THERMAL VIDEO STREAMS

Show simple item record

dc.contributor.author Abdrakhmanova, Madina
dc.contributor.author Kuzdeuov, Askat
dc.contributor.author Jarju, Sheikh
dc.contributor.author Khassanov, Yerbolat
dc.contributor.author Lewis, Michael
dc.contributor.author Varol, Huseyin Atakan
dc.date.accessioned 2021-08-27T10:23:26Z
dc.date.available 2021-08-27T10:23:26Z
dc.date.issued 2021-05-16
dc.identifier.citation Abdrakhmanova, M., Kuzdeuov, A., Jarju, S., Khassanov, Y., Lewis, M., & Varol, H. A. (2021). SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams. Sensors, 21(10), 3465. https://doi.org/10.3390/s21103465 en_US
dc.identifier.issn 1424-8220
dc.identifier.uri https://www.mdpi.com/1424-8220/21/10/3465
dc.identifier.uri https://doi.org/10.3390/s21103465
dc.identifier.uri http://nur.nu.edu.kz/handle/123456789/5731
dc.description.abstract We present SpeakingFaces as a publicly-available large-scale multimodal dataset developed to support machine learning research in contexts that utilize a combination of thermal, visual, and audio data streams; examples include human–computer interaction, biometric authentication, recognition systems, domain transfer, and speech recognition. SpeakingFaces is comprised of aligned high-resolution thermal and visual spectra image streams of fully-framed faces synchronized with audio recordings of each subject speaking approximately 100 imperative phrases. Data were collected from 142 subjects, yielding over 13,000 instances of synchronized data (∼3.8 TB). For technical validation, we demonstrate two baseline examples. The first baseline shows classification by gender, utilizing different combinations of the three data streams in both clean and noisy environments. The second example consists of thermal-to-visual facial image translation, as an instance of domain transfer. en_US
dc.language.iso en en_US
dc.publisher MDPI AG en_US
dc.relation.ispartofseries Sensors;2021, 21(10), 3465; https://doi.org/10.3390/s21103465
dc.rights Attribution-NonCommercial-ShareAlike 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/us/ *
dc.subject Computer vision en_US
dc.subject Datasets en_US
dc.subject Domain transfer en_US
dc.subject Human–computer interaction en_US
dc.subject Multimodal learning en_US
dc.subject Thermal imaging en_US
dc.subject Type of access: Open Access en_US
dc.title SPEAKINGFACES: A LARGE-SCALE MULTIMODAL DATASET OF VOICE COMMANDS WITH VISUAL AND THERMAL VIDEO STREAMS en_US
dc.type Article en_US
workflow.import.source science


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States