SPEAKINGFACES: A LARGE-SCALE MULTIMODAL DATASET OF VOICE COMMANDS WITH VISUAL AND THERMAL VIDEO STREAMS

dc.contributor.authorAbdrakhmanova, Madina
dc.contributor.authorKuzdeuov, Askat
dc.contributor.authorJarju, Sheikh
dc.contributor.authorKhassanov, Yerbolat
dc.contributor.authorLewis, Michael
dc.contributor.authorVarol, Huseyin Atakan
dc.date.accessioned2021-09-16T09:20:05Z
dc.date.available2021-09-16T09:20:05Z
dc.date.issued2021-05-16
dc.description.abstractWe present SpeakingFaces as a publicly-available large-scale multimodal dataset developed to support machine learning research in contexts that utilize a combination of thermal, visual, and audio data streams; examples include human–computer interaction, biometric authentication, recognition systems, domain transfer, and speech recognition. SpeakingFaces is comprised of aligned high-resolution thermal and visual spectra image streams of fully-framed faces synchronized with audio recordings of each subject speaking approximately 100 imperative phrases. Data were collected from 142 subjects, yielding over 13,000 instances of synchronized data (∼3.8 TB). For technical validation, we demonstrate two baseline examples. The first baseline shows classification by gender, utilizing different combinations of the three data streams in both clean and noisy environments. The second example consists of thermal-to-visual facial image translation, as an instance of domain transfer.en_US
dc.identifier.citationAbdrakhmanova, M., Kuzdeuov, A., Jarju, S., Khassanov, Y., Lewis, M., & Varol, H. A. (2021). SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams. Sensors, 21(10), 3465. https://doi.org/10.3390/s21103465en_US
dc.identifier.issn1424-8220
dc.identifier.urihttps://www.mdpi.com/1424-8220/21/10/3465
dc.identifier.urihttps://doi.org/10.3390/s21103465
dc.identifier.urihttp://nur.nu.edu.kz/handle/123456789/5797
dc.language.isoenen_US
dc.publisherMDPI AGen_US
dc.relation.ispartofseriesSensors;2021, 21(10), 3465; https://doi.org/10.3390/s21103465
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/us/*
dc.subjectType of access: Open Accessen_US
dc.subjectComputer visionen_US
dc.subjectDatasetsen_US
dc.subjectDomain transferen_US
dc.subjectHuman–computer interactionen_US
dc.subjectMultimodal learningen_US
dc.subjectThermal imagingen_US
dc.subjectResearch Subject Categories::TECHNOLOGYen_US
dc.titleSPEAKINGFACES: A LARGE-SCALE MULTIMODAL DATASET OF VOICE COMMANDS WITH VISUAL AND THERMAL VIDEO STREAMSen_US
dc.typeArticleen_US
workflow.import.sourcescience

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Article 33.pdf
Size:
9.19 MB
Format:
Adobe Portable Document Format
Description:
Article

Collections