MFCC Parameters of the Speech Signal: An Alternative to Formant-Based Instantaneous Vocal Tract Length Estimation

Registro completo de metadados
MetadadosDescriçãoIdioma
Autor(es): dc.contributorUniversidad Industrial de Santander-
Autor(es): dc.contributorUniversidade Estadual Paulista (UNESP)-
Autor(es): dc.creatorVasquez-Serrano, P.-
Autor(es): dc.creatorReyes-Moreno, J.-
Autor(es): dc.creatorGuido, Rodrigo Capobianco-
Autor(es): dc.creatorSepúlveda-Sepúlveda, Alexander-
Data de aceite: dc.date.accessioned2025-08-21T21:44:27Z-
Data de disponibilização: dc.date.available2025-08-21T21:44:27Z-
Data de envio: dc.date.issued2025-04-29-
Data de envio: dc.date.issued2022-12-31-
Fonte completa do material: dc.identifierhttp://dx.doi.org/10.1016/j.jvoice.2023.05.012-
Fonte completa do material: dc.identifierhttps://hdl.handle.net/11449/298915-
Fonte: dc.identifier.urihttp://educapes.capes.gov.br/handle/11449/298915-
Descrição: dc.descriptionOn the one hand, the relationship between formant frequencies and vocal tract length (VTL) has been intensively studied over the years. On the other hand, the connection involving mel-frequency cepstral coefficients (MFCCs), which concisely codify the overall shape of a speaker's spectral envelope with just a few cepstral coefficients, and VTL has only been modestly analyzed, being worth of further investigation. Thus, based on different statistical models, this article explores the advantages and disadvantages of the latter approach, which is relatively novel, in contrast to the former which arises from more traditional studies. Additionally, VTL is assumed to be a static and inherent characteristic of speakers, that is, a single length parameter is frequently estimated per speaker. By contrast, in this paper we consider VTL estimation from a dynamic perspective using modern real-time Magnetic Resonance Imaging (rtMRI) to measure VTL in parallel with audio signals. To support the experiments, data obtained from USC-TIMIT magnetic resonance videos were used, allowing for the 2D real-time analysis of articulators in motion. As a result, we observed that the performance of MFCCs in case of speaker-dependent modeling is higher, however, in case of cross-speaker modeling, which uses different speakers’ data for training and evaluating, its performance is not significantly different of that obtained with formants. In complement, we note that the estimation based on MFCCs is robust, with an acceptable computational time complexity, coherent with the traditional approach.-
Descrição: dc.descriptionConselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)-
Descrição: dc.descriptionEscuela de Ing. Eléctrica Electrónica y de Telecomunicaciones (E3T) Universidad Industrial de Santander-
Descrição: dc.descriptionInstituto de Biociências Letras e Ciências Exatas Unesp – Univ Estadual Paulista (São Paulo State University), Rua Cristóvão Colombo 2265, Jd Nazareth, SP-
Descrição: dc.descriptionInstituto de Biociências Letras e Ciências Exatas Unesp – Univ Estadual Paulista (São Paulo State University), Rua Cristóvão Colombo 2265, Jd Nazareth, SP-
Idioma: dc.languageen-
Relação: dc.relationJournal of Voice-
???dc.source???: dc.sourceScopus-
Palavras-chave: dc.subjectAcoustic-to-articulatory inversion-
Palavras-chave: dc.subjectFormants-
Palavras-chave: dc.subjectMFCCs-
Palavras-chave: dc.subjectVocal tract length-
Título: dc.titleMFCC Parameters of the Speech Signal: An Alternative to Formant-Based Instantaneous Vocal Tract Length Estimation-
Tipo de arquivo: dc.typelivro digital-
Aparece nas coleções:Repositório Institucional - Unesp

Não existem arquivos associados a este item.