西安电子科技大学学报 ›› 2021, Vol. 48 ›› Issue (4): 168-175.doi: 10.19665/j.issn1001-2400.2021.04.022

• 计算机科学与技术&网络空间安全 • 上一篇    下一篇

利用密集卷积神经网络的语音变换欺骗检测

王泳1(),苏卓艺1(),朱铮宇1,2()   

  1. 1.广东技术师范大学 网络空间安全学院,广东 广州 510665
    2.华南理工大学 音频、语音与视觉处理实验室,广东 广州 510641
  • 收稿日期:2020-06-01 出版日期:2021-08-30 发布日期:2021-08-31
  • 通讯作者: 朱铮宇
  • 作者简介:王 泳(1976—),男,副教授,E-mail: isswy@mail.sysu.edu.cn|苏卓艺(1995—),男,广东技术师范大学硕士研究生,E-mail: 364085901@qq.com
  • 基金资助:
    国家自然科学基金(61672173);广东省普通高校青年创新人才类项目(2018KQNCX140)

Detection of voice transformation spoofing using the dense convolutional neural network

WANG Yong1(),SU Zhuoyi1(),ZHU Zhengyu1,2()   

  1. 1. School of Cyberspace Security,Guangdong Polytechnic Normal University,Guangzhou 510665,China
    2. Audio,Speech and Vision Processing Laboratory,South China University of Technology,Guangzhou 510641,China
  • Received:2020-06-01 Online:2021-08-30 Published:2021-08-31
  • Contact: Zhengyu ZHU

摘要:

语音变换欺骗是指利用语音处理算法改变原说话人的语音特征,从而导致说话人识别系统产生极高的错误拒绝率,达到隐藏说话人身份的目的。其实现成本低廉,并且已集成在众多的音频处理工具中,对社会安全带来严重威胁。然而,目前对于变换欺骗的检测研究仍然不足。为此,提出了一种基于密集卷积神经网络的语音变换欺骗检测方法,以区分欺骗语音和真实语音。该网络总共包含135层的网络层,通过最大化短路径地连接强化数据传输,可同时利用深层和浅层的边缘特征进行分类,抑制退化现象,从而进一步提高检测的准确率。实验结果表明,该算法对不同欺骗因子下的欺骗语音的检测准确率超过了98%。

关键词: 语音变换欺骗, 安全, 检测, 神经网络

Abstract:

Voice transformation (VT) spoofing refers to the operations for hiding the speaker’s identity which change a speaker’s acoustic features by speech processing algorithms and result in extremely high false reject rates for automatic speaker recognition (ASR) systems.VT spoofing is implemented with a low cost and has been integrated in many audio editing tools,thus presenting serious threats to social security.However,the research on VT spoofing detection is still insufficient.Hence,in this paper we propose a dense convolutional neural network (DenseNet) based VT detection method for distinguishing spoofed voices and genuine ones.The proposed network consists of 135 layers in total.By maximizing the skip-layers,the data transmission can be enhanced,and both the deep and shallow edge features can be used for classification,so as to alleviate the degradation phenomenon and further to improve detection accuracy.Experimental results show that the detection accuracy with various spoofing factors is over 98%.

Key words: voice transformation spoofing, detection, security, neural network

中图分类号: 

  • TP39