详情页 - 首都医科大学附属北京同仁医院知识库

当前位置：首页 > 详情页

Mandarin speech reconstruction from surface electromyography based on generative adversarial networks

文献详情

资源类型：

WOS体系：

收录情况： ◇ ESCI

作者：

机构： [1]Beihang Univ, Sch Biol Sci & Med Engn, Beijing 100083, Peoples R China [2]Nagoya Univ, Grad Sch Informat, Nagoya 4640823, Japan [3]Capital Med Univ, Beijing TongRen Hosp, Dept Otolaryngol Head & Neck Surg, Beijing 100730, Peoples R China [4]Nagoya Univ, Informat Technol Ctr, Nagoya 4640823, Japan

出处：

DOI：

关键词： Surface electromyography speech reconstruction Generative adversarial networks Mandarin speech

摘要：

The loss of speech function due to conditions such as laryngectomy and vocal cord paralysis significantly impacts the quality of life for patients. Achieving effective communication for these patients is a goal pursued by researchers. This study primarily explores a method for reconstructing Mandarin speech based on voice-related neck and facial surface electromyography (sEMG). Neck and facial sEMG signals and speech waveform were synchronously collected during normal speech production. A speech reconstruction model for Mandarin speech, based on multi-scale feature extraction from EMG and a generative adversarial network (GAN), was developed. Both subjective and objective evaluations were conducted to assess the speech reconstruction performance of the model. The evaluation results indicate that the model effectively reconstructs speech from neck and facial sEMG signals. The reconstructed speech closely matches the original in terms of spectrogram and fundamental frequency, with mel-cepstrum distortion of 8.45 dB, log F0 RMSE of 0.40, F0 correlation coefficient of 0.71 and F0 voiced/unvoiced estimation accuracy of 0.80. The character error rate of the reconstructed speech is 0.32, while the tone error rate is 0.26. The subjective listening test results show that the naturalness of the reconstructed speech is acceptable, with a mean opinion score greater than 3. This study demonstrates the potential of deep learning techniques in effectively reconstructing Mandarin speech from sEMG.

基金：

语种：

WOS：

第一作者：

第一作者机构： [1]Beihang Univ, Sch Biol Sci & Med Engn, Beijing 100083, Peoples R China

通讯作者：

推荐引用方式(GB/T 7714)：

APA：

MLA：

相关文献

[1]End-to-End Mandarin Speech Reconstruction Based on Ultrasound Tongue Images Using Deep Learning [2]人工智能生成对抗网络在眼科临床教学中的应用初探 [3]Retinal image enhancement with artifact reduction and structure retention [4]Mandarin Speech Test Materials (MSTMs) [5]Development of Mandarin monosyllabic speech test materials in China [6]Evaluation of neuromuscular activity in patients with obstructive sleep apnea using chin surface electromyography of polysomnography [7]Interaction between speech variations and background noise on speech intelligibility by Mandarin-speaking cochlear implant patients [8]Mandarin Speech Perception in Combined Electric and Acoustic Stimulation [9]Speech recognition outcomes in Mandarin-speaking cochlear implant users with fine structure processing [10]Development and evaluation of Mandarin disyllabic materials for speech audiometry in China.