高级检索
当前位置: 首页 > 详情页

End-to-End Mandarin Speech Reconstruction Based on Ultrasound Tongue Images Using Deep Learning

文献详情

资源类型:
WOS体系:

收录情况: ◇ SCIE

机构: [1]Beihang Univ, Sch Biol Sci & Med Engn, Beijing 100191, Peoples R China [2]Nagoya Univ, Grad Sch Informat, Nagoya 4640823, Japan [3]Capital Med Univ, Beijing Tongren Hosp, Dept Otolaryngol Head & Neck Surg, Beijing 100730, Peoples R China [4]Nagoya Univ, Informat Technol Ctr, Nagoya 4640823, Japan
出处:
ISSN:

关键词: Tongue Ultrasonic imaging Image reconstruction Feature extraction Hidden Markov models Training Speech enhancement Autoencoders Vocoders Transducers Ultrasound tongue image speech reconstruction end-to-end generative adversarial networks (GANs) Mandarin speech

摘要:
The loss of speech function following a laryngectomy usually leads to severe physiological and psychological distress for laryngectomees. In clinical practice, most laryngectomees retain intact upper tract articulatory organs, emphasizing the significance of speech rehabilitation that utilizes articulatory motion information to effectively restore speech. This study proposed a deep learning-based end-to-end method for speech reconstruction using ultrasound tongue images. Initially, ultrasound tongue images and speech data were collected simultaneously with a designed Mandarin corpus. Subsequently, a speech reconstruction model was built based on adversarial neural networks. The model includes a pretrained feature extractor to process ultrasound images, an upsampling block to generate speech, and discriminators to ensure the similarity and fidelity of the reconstructed speech. Finally, both objective and subjective evaluations were conducted for the reconstructed speech. The reconstructed speech demonstrated high intelligibility in both Mandarin phonemes and tones. The character error rate of phonemes in automatic speech recognition was 0.2605, and tone error rate obtained from dictation tests was 0.1784, respectively. Objective results showed high similarity between the reconstructed and ground truth speech. Subjective perception results also indicated an acceptable level of naturalness. The proposed method demonstrates its capability to reconstruct tonal Mandarin speech from ultrasound tongue images. However, future research should concentrate on specific conditions of laryngectomees, aiming to enhance and optimize model performance. This will be achieved by enlarging training datasets, investigating the impact of ultrasound tongue imaging parameters, and further refining this method.

基金:
语种:
被引次数:
WOS:
中科院(CAS)分区:
出版当年[2025]版:
大类 | 2 区 医学
小类 | 1 区 康复医学 2 区 工程:生物医学
最新[2025]版:
大类 | 2 区 医学
小类 | 1 区 康复医学 2 区 工程:生物医学
JCR分区:
出版当年[2023]版:
Q1 REHABILITATION Q2 ENGINEERING, BIOMEDICAL
最新[2023]版:
Q1 REHABILITATION Q2 ENGINEERING, BIOMEDICAL

影响因子: 最新[2023版] 最新五年平均 出版当年[2023版] 出版当年五年平均 出版前一年[2022版]

第一作者:
第一作者机构: [1]Beihang Univ, Sch Biol Sci & Med Engn, Beijing 100191, Peoples R China
通讯作者:
推荐引用方式(GB/T 7714):
APA:
MLA:

资源点击量:23568 今日访问量:2 总访问量:1284 更新日期:2025-04-01 建议使用谷歌、火狐浏览器 常见问题

版权所有©2020 首都医科大学附属北京同仁医院 技术支持:重庆聚合科技有限公司 地址:北京市东城区东交民巷1号(100730)