|Table of Contents|

Voice Activity Detection Based on Mel Frequency CepstrumCoefficient and Short Time Energy in Low SNR(PDF)

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

Issue:
2021年02期
Page:
117-120
Research Field:
·计算机科学与技术·
Publishing date:

Info

Title:
Voice Activity Detection Based on Mel Frequency CepstrumCoefficient and Short Time Energy in Low SNR
Author(s):
Bai Shun1Yan Xihong2Zhang Shengping2Chen Jianfei1Zhang Sheng1
(1.College of Electronic and Optical Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)(2.Nanjing Wutong Microelectronics Technology Co.,Ltd,Nanjing 210023,China)
Keywords:
VADMFCCshort-term energyfuzzy C-means clusteringlow SNR
PACS:
O429,TP391.9
DOI:
10.3969/j.issn.1001-4616.2021.02.016
Abstract:
Voice Activity Detection(VAD)in low SNR environment is of great significance in speech recognition and communication. At present,VAD in low SNR environment still has problems of low efficiency and low recognition rate. Based on the analysis of the application of Mel Frequency Cepstrum Coefficient(MFCC)and short-time energy in VAD,this paper proposes a speech endpoint detection method that adds the three-dimensional components before MFCC(MFCCa)and divides them with short-time energy(Mel Energy Ratio)as the speech feature parameter. Finally,fuzzy C-means clustering algorithm is used to determine the thresholds of double threshold method for VAD adaptively. 50 speech signals in TIMIT speech database are selected for experiments. The results show that in the noise environment with SNR of 5 dB,0 dB and -5 dB,the accuracy of the algorithm is improved compared with the algorithms of energy zero ratio and spectral entropy,especially when the SNR is -5 dB,the accuracy is improved by about 30%.

References:

[1] SUN L H,SU M,YANG Z Z. An adaptive speech endpoint detection method in low SNR environments[J]. International journal of speech technology,2017,20(3):651-658.
[2]CAO D Y,XUE G,LEI G. An improved endpoint detection algorithm based on MFCC cosine value[J]. Wireless personal communications,2017,95(3):2073-2090.
[3]JIE L,ZHOU P,JING X,et al. Speech endpoint detection method based on TEO in noisy environment[J]. Procedia engineering,2012,29:2655-2660.
[4]LU J X,HAN X. Novel speech endpoint detection algorithm for voice detectors in interaction of intelligent terminals[J]. Sensors and transducers,2020,242(3):1-5.
[5]董胡. 基于先验信噪比和能零熵的语音端点检测算法[J]. 计算机技术与发展,2017,27(7):72-75.
[6]董胡,钱盛友. 改进的能量谱熵端点检测算法[J]. 测控技术,2016,35(6):26-29.
[7]陈昊泽,张志杰. 基于能量和频带方差结合的语音端点检测方法[J]. 科学技术与工程,2019,19(26):249-254.
[8]HSIEH C H,FENG T Y,HUANG P C. Energy-based VAD with grey magnitude spectral subtraction[J]. Speech communication,2009,51(9):810-819.
[9]张婷,何凌,黄华,等. 基于小波及能量熵的带噪语音端点检测算法[J]. 计算机工程与设计,2013,34(4):1331-1335.
[10]刘妮. 多特征和支持向量机相结合的语音端点检测模型[J]. 重庆邮电大学学报(自然科学版),2013,25(5):686-689.
[11]胡波,肖熙. 检测语音端点及基音的概率模型及方法[J]. 清华大学学报(自然科学版),2013,53(6):749-752.
[12]吴新忠,夏令祥,张旭,等. 基于谱熵梅尔积的语音端点检测方法[J]. 北京邮电大学学报,2019,42(2):87-93.
[13]SONG Q Q,YU F Q. Speech endpoint detection based on EMD and improved double threshold method[J]. Audio engineering,2009,33(8):60-63.
[14]DAVIS S V,MERMELSTEIN P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences[J]. IEEE transactions on acoustics speech and signal processing,1980,28(4):57-366.
[15]TIAN Y,WU J,WANG Z,et al. Fuzzy clustering and Bayesian information criterion based threshold estimation for robust voice activity detection[C]//IEEE International Conference on Acoustics. Hong Kong,China,2003:I444-I447.
[16]TIAN H,HONG G Z,ZHONG Z,et al. Auditory perception speech signal endpoint feature detection based on temporal structure[J]. Journal of Jilin University(engineering and technology edition),2019,49(1):313-318.

Memo

Memo:
-
Last Update: 2021-06-30