[1]柏 顺,颜夕宏,张生平,等.基于梅尔频率倒谱系数与短时能量的低信噪比语音端点检测[J].南京师大学报(自然科学版),2021,44(02):117-120.[doi:10.3969/j.issn.1001-4616.2021.02.016]
 Bai Shun,Yan Xihong,Zhang Shengping,et al.Voice Activity Detection Based on Mel Frequency CepstrumCoefficient and Short Time Energy in Low SNR[J].Journal of Nanjing Normal University(Natural Science Edition),2021,44(02):117-120.[doi:10.3969/j.issn.1001-4616.2021.02.016]
点击复制

基于梅尔频率倒谱系数与短时能量的低信噪比语音端点检测()
分享到:

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

卷:
第44卷
期数:
2021年02期
页码:
117-120
栏目:
·计算机科学与技术·
出版日期:
2021-06-30

文章信息/Info

Title:
Voice Activity Detection Based on Mel Frequency CepstrumCoefficient and Short Time Energy in Low SNR
文章编号:
1001-4616(2021)02-0117-04
作者:
柏 顺1颜夕宏2张生平2陈建飞1张 胜1
(1.南京邮电大学电子与光学工程学院,江苏 南京 210023)(2.南京梧桐微电子科技有限公司,江苏 南京 210023)
Author(s):
Bai Shun1Yan Xihong2Zhang Shengping2Chen Jianfei1Zhang Sheng1
(1.College of Electronic and Optical Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)(2.Nanjing Wutong Microelectronics Technology Co.,Ltd,Nanjing 210023,China)
关键词:
语音端点检测梅尔频率倒谱系数短时能量模糊C均值聚类低信噪比
Keywords:
VADMFCCshort-term energyfuzzy C-means clusteringlow SNR
分类号:
O429,TP391.9
DOI:
10.3969/j.issn.1001-4616.2021.02.016
文献标志码:
A
摘要:
低信噪比环境下语音信号的端点检测在语音识别与通信等领域具有重要意义,目前低信噪比环境下的端点检测还存在效率低、识别率不高等问题. 本文在分析梅尔频率倒谱系数(MFCC)和短时能量在端点检测中应用的基础上,提出将MFCC前三维度分量相加(MFCCa),再与短时能量相除(梅尔能量比)作为语音特征参数的语音端点检测测度,最后利用模糊C均值聚类算法自适应确定双门限阈值进行端点检测. 选取TIMIT语音库中的50条语音信号进行实验,结果表明:在信噪比为5 dB、0 dB、-5 dB的噪声环境下,与能零比、谱熵等算法相比,本算法端点识别准确率均有所提高,其中在-5 dB信噪比环境下提升了约30%.
Abstract:
Voice Activity Detection(VAD)in low SNR environment is of great significance in speech recognition and communication. At present,VAD in low SNR environment still has problems of low efficiency and low recognition rate. Based on the analysis of the application of Mel Frequency Cepstrum Coefficient(MFCC)and short-time energy in VAD,this paper proposes a speech endpoint detection method that adds the three-dimensional components before MFCC(MFCCa)and divides them with short-time energy(Mel Energy Ratio)as the speech feature parameter. Finally,fuzzy C-means clustering algorithm is used to determine the thresholds of double threshold method for VAD adaptively. 50 speech signals in TIMIT speech database are selected for experiments. The results show that in the noise environment with SNR of 5 dB,0 dB and -5 dB,the accuracy of the algorithm is improved compared with the algorithms of energy zero ratio and spectral entropy,especially when the SNR is -5 dB,the accuracy is improved by about 30%.

参考文献/References:

[1] SUN L H,SU M,YANG Z Z. An adaptive speech endpoint detection method in low SNR environments[J]. International journal of speech technology,2017,20(3):651-658.
[2]CAO D Y,XUE G,LEI G. An improved endpoint detection algorithm based on MFCC cosine value[J]. Wireless personal communications,2017,95(3):2073-2090.
[3]JIE L,ZHOU P,JING X,et al. Speech endpoint detection method based on TEO in noisy environment[J]. Procedia engineering,2012,29:2655-2660.
[4]LU J X,HAN X. Novel speech endpoint detection algorithm for voice detectors in interaction of intelligent terminals[J]. Sensors and transducers,2020,242(3):1-5.
[5]董胡. 基于先验信噪比和能零熵的语音端点检测算法[J]. 计算机技术与发展,2017,27(7):72-75.
[6]董胡,钱盛友. 改进的能量谱熵端点检测算法[J]. 测控技术,2016,35(6):26-29.
[7]陈昊泽,张志杰. 基于能量和频带方差结合的语音端点检测方法[J]. 科学技术与工程,2019,19(26):249-254.
[8]HSIEH C H,FENG T Y,HUANG P C. Energy-based VAD with grey magnitude spectral subtraction[J]. Speech communication,2009,51(9):810-819.
[9]张婷,何凌,黄华,等. 基于小波及能量熵的带噪语音端点检测算法[J]. 计算机工程与设计,2013,34(4):1331-1335.
[10]刘妮. 多特征和支持向量机相结合的语音端点检测模型[J]. 重庆邮电大学学报(自然科学版),2013,25(5):686-689.
[11]胡波,肖熙. 检测语音端点及基音的概率模型及方法[J]. 清华大学学报(自然科学版),2013,53(6):749-752.
[12]吴新忠,夏令祥,张旭,等. 基于谱熵梅尔积的语音端点检测方法[J]. 北京邮电大学学报,2019,42(2):87-93.
[13]SONG Q Q,YU F Q. Speech endpoint detection based on EMD and improved double threshold method[J]. Audio engineering,2009,33(8):60-63.
[14]DAVIS S V,MERMELSTEIN P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences[J]. IEEE transactions on acoustics speech and signal processing,1980,28(4):57-366.
[15]TIAN Y,WU J,WANG Z,et al. Fuzzy clustering and Bayesian information criterion based threshold estimation for robust voice activity detection[C]//IEEE International Conference on Acoustics. Hong Kong,China,2003:I444-I447.
[16]TIAN H,HONG G Z,ZHONG Z,et al. Auditory perception speech signal endpoint feature detection based on temporal structure[J]. Journal of Jilin University(engineering and technology edition),2019,49(1):313-318.

备注/Memo

备注/Memo:
收稿日期:2020-04-29.
基金项目:国家自然科学基金项目(61601237).
通讯作者:张胜,博士,教授,研究方向:信号检测、嵌入式应用、智能信息处理. E-mail:zhangsheng@njupt.edu.cn
更新日期/Last Update: 2021-06-30