›› 2017, Vol. 30 ›› Issue (9): 20-.

• 论文 • 上一篇    下一篇

基于深度学习方法的句子及语素边界划分研究

Toleu Galymzhan,邬春学   

  1. (上海理工大学 光电信息与计算机工程学院,上海 200093)
  • 出版日期:2017-09-15 发布日期:2017-11-03
  • 作者简介:Toleu Galymzhan(1988-),男,硕士研究生。研究方向:自然语言处理。邬春学(1961-),男,教授。研究方向:计算机网络应用等。

Deep Learning for Sentence and Token Boundaries Detection

Toleu Galymzhan, WU Chunxue   

  1. (School of Optical-Electronic and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
  • Online:2017-09-15 Published:2017-11-03

摘要:

针对哈萨克语的句子、单词及语素边界检测问题,文中提出了一种基于深度学习的边界检测方法:CNN-TSS模型。通过将边界检测问题视为序列标注任务,将句子、单词及语素的边界检测合并为一种任务完成。通过对CNN-TSS模型选取最优超参数,对不同语言进行了测试。实验结果表明,该模型在不使用额外特征的情况下,在性能上超过了基于传统方法的边界检测系统。

关键词: 句子边界检测;语素边界检测;黏着语;深度学习

Abstract:

Sentence and token boundaries detection is one of the important tasks in natural language processing. In order to avoid task-specific feature engineering, we have proposed character-level based neural network model for token and sentence segmentation (CNN-TSS). In order to share the information from these tasks, we have treated them as a combined task. The experimental results show that CNN-TSS can achieve high-accuracy without using any external features.

Key words: sentence boundaries detection;token boundaries detection;agglutinative language;deep learning

中图分类号: 

  • TP391.1