西安电子科技大学学报 ›› 2016, Vol. 43 ›› Issue (2): 102-107.doi: 10.3969/j.issn.1001-2400.2016.02.018

• 研究论文 • 上一篇    下一篇

正则化分段区分性特征变换方法

陈斌;张连海;屈丹;李弼程   

  1. (解放军信息工程大学 信息系统工程学院,河南 郑州  450001)
  • 收稿日期:2014-12-04 出版日期:2016-04-20 发布日期:2016-05-27
  • 通讯作者: 陈斌
  • 作者简介:陈斌(1987-),男,解放军信息工程大学博士研究生,E-mail: chenbin873335@163.com.
  • 基金资助:

    国家自然科学基金资助项目(61175017,61403415);国家863计划资助项目(2012AA011603)

Regularized discriminative segmental feature transform method

CHEN Bin;ZHANG Lianhai;QU Dan;LI Bicheng   

  1. (Institute of Information System Engineering, PLA Information Engineering Univ., Zhengzhou  450001, China)
  • Received:2014-12-04 Online:2016-04-20 Published:2016-05-27
  • Contact: CHEN Bin

摘要:

针对基于分帧特征变换稳定性不够的问题,提出了一种分段的区分性特征变换方法,并采用正则化方法确定出每一语音段的特征变换矩阵.该方法将特征变换视为数据受限条件下的参数选择问题,在训练阶段,采用状态绑定的方式训练得到区域相关线性变换特征变换矩阵,将所有的变换矩阵构成一个过完备字典;在测试阶段,采用强制对齐的方式对语音进行分段,在似然度目标函数中加入正则项,利用快速迭代收敛阈值算法进行求解,在求解过程中从字典里确定出最佳的特征变换矩阵子集及其组合系数.实验结果表明,结合L1和L2正则化,相比于状态绑定的区域相关线性变换方法,当声学模型采用最大似然准则训练时,识别率可以提高1.30%;模型区分性训练后,识别性能提升了1.66%.

关键词: 特征变换, 语音识别, 域划分, 正则化, 区分性训练

Abstract:

In order to improve the stability of the frame based feature transform method, a segment based discriminative feature transform method is proposed, and the feature transform matrix of each speech segment is determined using the regularization technique. In the novel method, the feature transform is viewed as a parameter selection problem with limited data. In the training stage, an over-complete dictionary is constructed by the feature transform matrices of tied-state based region dependent linear transform. During testing, after the speech signal is segmented through force alignment, an appropriate regularization term is added to the likelihood objective function. An optimal subset of the transform matrices is selected from the dictionary and their corresponding coefficients are estimated following the fast iterative shrinkage thresholding optimization algorithm. Experimental results show that compared with the tied-state RDLT method, after combining L1 and L2 regularization, the recognition rate is increased by 1.30% using the maximum likelihood training criterion. The performance gain is increased to 1.66% after discriminative training.

Key words: feature transform, speech recognition, region dependent, regularization, discriminative training

中图分类号: 

  • TN912.3