西安电子科技大学学报 ›› 2021, Vol. 48 ›› Issue (6): 75-83.doi: 10.19665/j.issn1001-2400.2021.06.010

• 智能嵌入式系统结构与软件关键技术专栏 • 上一篇    下一篇

一种新型高效的文库知识图谱实体关系抽取算法

胡代旺(),焦一源(),李雁妮()   

  1. 西安电子科技大学 计算机科学与技术学院,陕西 西安 710071
  • 收稿日期:2021-06-30 出版日期:2021-12-20 发布日期:2022-02-24
  • 通讯作者: 李雁妮
  • 作者简介:胡代旺(1997—),男,西安电子科技大学硕士研究生,E-mail: hudaiwang@stu.xidian.edu.cn|焦一源(1996—),男,西安电子科技大学博士研究生,E-mail: yiyuan_jiao@stu.xidian.edu.cn
  • 基金资助:
    国家自然科学基金面上项目(61472296)

Novel and efficient algorithm for entity relation extraction with the corpus knowledge graph

HU Daiwang(),JIAO Yiyuan(),LI Yanni()   

  1. School of Computer Science and Technology,Xidian University,Xi’an 710071,China
  • Received:2021-06-30 Online:2021-12-20 Published:2022-02-24
  • Contact: Yanni LI

摘要:

实体关系抽取旨在给定的一个句子中抽取两个实体之间的语义关系。实体关系抽取是信息抽取和自然语言处理中的一项基本而重要的任务。尽管近年来已出现了一些较好的深度实体关系抽取算法,但如何充分利用语料库信息并有效地抽取语句中实体间的语义关系,以进一步提高深度模型的精度仍面临着严峻的挑战。本文首先基于训练文库构建了一种新的实体语义关系图,随着测试的进行它也可以不断扩展。实体语义关系图用于从语料库的所有句子中全局获取实体之间的语义关系。然后,选取语料库中存在的大量“other”关系作为负样本进行训练,以提高分类性能。最后,利用轻量级预训框架ALBERT、图卷积网络和负样本学习三元组损失,提出了一种新的实体关系抽取算法。该算法能不断地汇总和完善与待抽取实体对间关系的相关知识,因此可以有效地提高实体关系抽取的精度。在SemEval-2010 Task8和TACRED的基准测试中对这种算法进行的广泛性能对比实验,表明该算法的性能均优于目前最具代表性的深度实体关系抽取算法。

关键词: 实体关系抽取, 自然语言处理, 图神经网络

Abstract:

Entity relation extraction aims to extract the semantic relation between two entities in a given sentence.Entity relation extraction is a basic and important task in information extraction and natural language processing.Although some good entity relation extraction deep learning algorithms have been presented,how to make full use of corpus information and extract the relationship between entities in a sentence effectively to further improve the accuracy of the model still faces challenges.In this paper,a new entity semantic relation graph is constructed based on the training corpus,which can be extended as the testing goes on.The entity semantic relation graph is used to globally capture the semantic relation correlations between entities from all the sentences in the corpuses.And then,a large number of “other” relations existing in the corpus are selected as negative samples to be trained to improve the classification performance.Finally,equipped with the light pre-trained ALBERT,a graph convolutional network,and the negative sample learning triplet loss,we present a new RE method,which can continuously summarize and perfect the knowledge related to the entity pairs to be extracted,and effectively improve the accuracy of entity relation extraction.Extensive experiments on the SemEval-2010 Task 8 and TACRED benchmark show that our proposed algorithm achieves a better performance than the competitive baselines.

Key words: entity relation extraction, natural language processing, graph neural network

中图分类号: 

  • TP183