PSYCHOACOUSTIC OPTIMIZATION OF THE VQ-VAE AND TRANSFORMER ARCHITECTURES FOR HUMAN-LIKE AUDITORY PERCEPTION IN MUSIC INFORMATION RETRIEVAL AND GENERATION TASKS

dc.contributor.authorRakhmatullin, Elnur
dc.date.accessioned2023-06-16T10:22:47Z
dc.date.available2023-06-16T10:22:47Z
dc.date.issued2023
dc.description.abstractDespite incredible advancements in the utilization of learning-based architectures (AI) in natural language and image domains, their applicability to the domain of music has remained limited. In fact, the performance of state-of-the-art Automated Music Transcription (AMT) systems has seen only marginal improvements from novel AI architectures. Moreover, the importance of psychoacoustic perception and its incorporation into MIR systems have mostly stayed addressed, leading to shortcomings in current approaches. This thesis provides an overview of music processing and novel neural architectures, investigates the reasons behind the subpar performance achieved by their utilization in music information retrieval (MIR) tasks, and proposes several ways of adjusting both the music (data-related) pre-processing pipelines, and psychoacoustically-adjusted transformer-based model to improve the performance on MIR and AMT tasks. In particular, a new music transformer architecture is proposed, and various algorithms of music pre-processing for psychoacoustic optimization are implemented along with several adaptive models aimed at addressing the missing factor of modeling human music perception. The preliminary performance results exhibit promising outcomes, warranting the continued investigation of transformer architectures for music information retrieval applications. Several intriguing insights unveiled during the research process are discussed and presented. The thesis concludes by delineating a set of promising future research directions, paving the way for further advancements in the field of music information retrieval and generation using proposed architectures.en_US
dc.identifier.citationRakhmatullin, Eю (2023). Psychoacoustic Optimization of the VQ-VAE and Transformer Architectures for Human-like Auditory Perception in Music Information Retrieval and Generation Tasks. School of Engineering and Digital Sciencesen_US
dc.identifier.urihttp://nur.nu.edu.kz/handle/123456789/7233
dc.language.isoenen_US
dc.publisherSchool of Engineering and Digital Sciencesen_US
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/us/*
dc.subjecttype of access: open accessen_US
dc.subjectHuman-like Auditory Perceptionen_US
dc.subjectVQ-VAE and Transformer Architecturesen_US
dc.titlePSYCHOACOUSTIC OPTIMIZATION OF THE VQ-VAE AND TRANSFORMER ARCHITECTURES FOR HUMAN-LIKE AUDITORY PERCEPTION IN MUSIC INFORMATION RETRIEVAL AND GENERATION TASKSen_US
dc.typeMaster's thesisen_US
workflow.import.sourcescience

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
5412_Elnur_Rakhmatullin_Master_Thesis_PDF_234213_2109999911.pdf
Size:
782.13 KB
Format:
Adobe Portable Document Format
Description:
thesis