广西科学

引用本文：

徐正丽,肖素芳,简敏,杨明浩.基于小样本数据统计的双阶段舌位建模研究[J].广西科学,2023,30(4):745-753. [点击复制]
XU Zhengli,XIAO Sufang,JIAN Min,YANG Minghao.Tongue Shapes Modeling from Small Data Using Two-Stage Autoencoder[J].Guangxi Sciences,2023,30(4):745-753. [点击复制]

【打印本页】【在线阅读全文】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

本文已被：浏览 337次下载 386次	码上扫一扫！
基于小样本数据统计的双阶段舌位建模研究
徐正丽¹, 肖素芳¹, 简敏¹, 杨明浩²
0 字体:加大+\|默认\|缩小-
(1.桂林电子科技大学, 广西桂林 541004;2.中国科学院自动化研究所, 北京 100190)

摘要:

舌头是人类重要的发音器官，对发音时其形状的降维分析能有效协助语言学家分析人类的发音模式。主成分分析(Principal Component Analysis,PCA)是目前最常用的舌位轮廓降维分析方法。近年来，基于深度学习的自动编码器在降维方面被证明优于PCA。然而，舌头隐藏于口腔内部，难以获得大量的相关数据，这使得传统自动编码器无法直接用于舌位轮廓建模研究。为此，本文提出一种面向小样本舌位运动轮廓数据的双阶段自动编码器降维方法。首先该方法采用主动形状模型(Active Shape Model,ASM)产生大量舌头轮廓生理变形数据，并构建通用轮廓重建模型；接着，在第一阶段模型上添加降维层，用于对舌位轮廓数据进行压缩和分析。实验选取了从人类发音X光片中获得的240个元音舌形数据，并将该方法与传统PCA方法进行比较。结果表明，所提出方法获得的元音舌位图谱在二维平面上相对于传统PCA方法，区分度更好，具有更好的舌形降维和重建能力。

关键词: 深度神经网络自动编码器主成分分析舌位轮廓隐藏单元

DOI：10.13656/j.cnki.gxkx.20230928.014

投稿时间：2023-02-15修订日期：2023-04-25

基金项目:国家自然科学基金项目(71463010，22180155466)，广西科技计划项目(2021GXNSFBA220048，桂科AB21220038)和桂林科技计划项目(2023010123)资助。

Tongue Shapes Modeling from Small Data Using Two-Stage Autoencoder

XU Zhengli¹, XIAO Sufang¹, JIAN Min¹, YANG Minghao²

(1.Guilin University of Electronic Technology, Guilin, Guangxi, 541004, China;2.Institute of Automation of the Chinese Academy of Sciences, Beijing, 100190, China)

Abstract:

The tongue plays a crucial role in human speech production.The dimensionality reduction analysis of tongue pronunciation can effectively assist linguists in analyzing human pronunciation patterns.Traditional methods for tongue position contour compression often relay on Principal Component Analysis (PCA) for dimensionality reduction.In recent years,deep-learning-based autoencoders have been widely used for data compression.However,they require a large number of samples and cannot be directly and effectively used for tongue motion pattern researches.Besides,obtaining a substantial volume of tongue movement data has been challenging due to the tongue's location within the oral cavity.To address these limitations,this paper introduces a two-stage autoencoder dimensionality reduction method designed for small-sample tongue motion contour data.Firstly,Active Shape Model (ASM) is used to generate a large amount of physiological deformation data of tongue contour,and a general tongue contour reconstruction model is constructed based on a conventional automatic encoder.Secondly,on the basis of the automatic encoder in the previous stage,an additional network layer is added to compress and analyze the tongue position data.In experiments,240 vowel and tongue shape datasets obtained from X-ray films of human speech are selected.The tongue position model and traditional PCA methods were compared.The results show that the vowel tongue position map obtained by the proposed method exhibits better discrimination on the two dimensional plane,and has better tongue shape reconstruction performance.

Key words: deep neural network autoencoder Principle Component Analysis (PCA) tongue contour hidden units

用微信扫一扫