西安电子科技大学学报 ›› 2019, Vol. 46 ›› Issue (6): 9-16.doi: 10.19665/j.issn1001-2400.2019.06.002

• • 上一篇    下一篇

N-DenseNet的城市声音事件分类模型

曹毅1,2,黄子龙1,2,张威1,2,刘晨1,2,李巍3   

  1. 1. 江南大学机械工程学院,江苏 无锡 214122
    2. 江苏省食品先进制造装备技术重点实验室,江苏 无锡 214122
    3. 苏州工业职业技术学院,江苏 苏州 215104
  • 收稿日期:2019-05-15 出版日期:2019-12-20 发布日期:2019-12-21
  • 作者简介:曹 毅(1974—), 男,教授,博士,E-mail:caoyi@jiangnan.edu.cn
  • 基金资助:
    江苏省“六大人才高峰”计划(ZBZZ-012);高等学校学科创新引智计划(B18027);江苏省研究生创新计划(KYCX18_0630);江苏省研究生创新计划(KYCX18_1846);江南大学研究生科研与实践创新计划(JNKY19_048);江南大学研究生科研与实践创新计划(JNSJ19_005)

Urban sound event classification with the N-order dense convolutional network

CAO Yi1,2,HUANG Zilong1,2,ZHANG Wei1,2,LIU Chen1,2,LI Wei3   

  1. 1. School of Mechanical Engineering, Jiangnan University, Wuxi 214122, China
    2. Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment and Technology, Wuxi 214122, China
    3. Suzhou Instiute of Industrial Technology, Suzhou 215104, China
  • Received:2019-05-15 Online:2019-12-20 Published:2019-12-21

摘要:

针对城市声音事件分类领域中现有模型分类准确率不高、泛化能力不强的问题,提出了一种N阶密集卷积神经网络的城市声音事件分类模型。首先,介绍了密集卷积神经网络的结构;其次,基于N阶马尔可夫模型将密集连接改进为N阶有关连接;然后,结合两者提出了一种更适合音频分类的模型——N阶密集卷积神经网络。该模型在避免梯度消失的前提下, 有针对性、规律性减少了特征图层之间的连接,更高效地融合了前N特征图层的信息,使得模型的收敛速度更快;最后,为了验证该模型,采用N阶密集卷积神经网络的一阶、二阶子模型,基于UrbanSound8K和Dcase2016数据集开展了城市声音事件分类研究。研究结果表明,其模型准确率分别为83.63%、81.03%,验证了该模型具有良好的分类准确率和泛化能力。

关键词: 声音事件分类, 密集卷积神经网络, N阶马尔可夫模型, N阶密集卷积神经网络

Abstract:

An urban sound event classification model based on the N-order Dense Convolutional Network (abbreviated to N-DenseNet) is proposed for the problems of insufficient classification accuracy and generalization ability of existing models. First, the network structure of the DenseNet is briefly introduced. Then, dense connection in the DenseNet is improved by N-order state-dependent connection based on the N-order Markov model. Furthermore, combining advantages of both the DenseNet and N-order Markov, a novel network architecture, i.e., the N-DenseNet, is proposed in this paper. Theoretically, the N-DenseNet satisfying the premise of alleviating vanishing-gradient, can not only produce efficient integration of feature information from the layers, but also accelerate the convergence speed. Finally, in order to validate advantages of the new model, 1-DenseNet and 2-DenseNet are respectively exploited in the urban sound event classification based on the UrbanSound8K and Dcase2016 dataset. Experimental results show that the accuracy of the two above-mentioned models is respectively 83.63% and 81.03%, which also demonstrates a higher classification accuracy and a better generalization performance of the N-DenseNet.

Key words: sound event classification, dense convolutional network, N-order Markov model, N-order dense convolutional

中图分类号: 

  • TP391.42