电子科技 ›› 2025, Vol. 38 ›› Issue (1): 88-94.doi: 10.16180/j.cnki.issn1007-7820.2025.01.012

• • 上一篇    

基于跨用户语音域适应网络的抑郁症检测

吴伟1, 马龙华1,2, 赵祥红2()   

  1. 1.浙江理工大学 信息科学与工程学院,浙江 杭州 310018
    2.浙大宁波理工学院 信息科学与工程学院,浙江 宁波 315100
  • 收稿日期:2023-06-25 修回日期:2023-07-16 出版日期:2025-01-15 发布日期:2025-01-06
  • 通讯作者: 赵祥红(1981-),男, E-mail:wuwei_ah@163.com,博士,副教授。研究方向:机器学习、生物医学信号处理。
  • 作者简介:吴伟(1999-),男,硕士研究生。研究方向:机器学习、嵌入式开发。
    马龙华(1965-),男,博士,教授。研究方向:复杂系统综合集成建模。
  • 基金资助:
    国家自然科学基金(61972350);国家自然科学基金(32073028);宁波市自然科学基金(2022J165)

Depression Detection Based on Cross User Audio Domain Adaptation Network

WU Wei1, MA Longhua1,2, ZHAO Xianghong2()   

  1. 1. School of Information Science and Engineering,Zhejiang Sci-Tech University,Hangzhou 310018,China
    2. School of Information Science and Engineering,Ningbo Tech University,Ningbo 315100,China
  • Received:2023-06-25 Revised:2023-07-16 Online:2025-01-15 Published:2025-01-06
  • Supported by:
    National Natural Science Foundation of China(61972350);National Natural Science Foundation of China(32073028);Ningbo Natural Science Foundation(2022J165)

摘要:

由于抑郁症的检测方式主观性较强,因此使用用户语音诊断抑郁症已成为一种较具有潜力的辅助方式,但不同用户的语音信号存在差异。文中提出了一个跨用户语音域适应网络(Cross User Audio Domain Adaptation Network, CUADAN)来检测抑郁症。从语音中提取可视化的梅尔频谱,利用CUADAN模型的特征提取器从梅尔频谱中获取更深层次的抑郁特征。由于源域和目标域中包含不同健康用户和抑郁用户的语音特征,因此利用CUADAN模型的域分类器在不同用户数据之间进行域适应,从而通过已有分类器对未知用户进行检测。实验结果表明,CUADAN模型的抑郁症检测准确率更高,其平均准确率达到81.0±2.4%。因此,CUADAN模型可以有效削弱不同用户语音之间的差异性,提高跨用户抑郁症检测的准确率。

关键词: 域适应, 抑郁症检测, CUADAN, 语音, 跨用户, 梅尔频谱, 特征提取, 削弱差异性

Abstract:

Because of the subjective detection of depression, the use of user voice diagnosis of depression has become a more potential auxiliary way. However, the speech signals of different users are different. In this study, a CUADAN(Cross User Audio Domain Adaptation Network) is proposed to detect depression. Visual Mel spectrograms are extracted from the audio, and the feature extractor of the CUADAN model is used to extract deeper depression features from the Mel spectrograms. Since the source domain and target domain contain the voice features of different healthy users and depressed users, the domain classifier of CUADAN model is used to perform domain adaptation between different user data, so that unknown users can be detected by existing classifiers. The experimental results show that the CUADAN model has a higher depression detection accuracy, with an average accuracy of 81.0±2.4%. Therefore, the CUADAN model can effectively weaken the differences between different users' voices and improve the accuracy of cross-user depression detection.

Key words: domain adaptation, depression detection, CUADAN, audio, cross-user, Mel spectrogram, feature extraction, weakening differences

中图分类号: 

  • TP181